Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: eager poll async ops in Isolate #3046

Merged
merged 30 commits into from Oct 14, 2019
Merged

Conversation

bartlomieju
Copy link
Member

@bartlomieju bartlomieju commented Oct 2, 2019

Revamp of #2779 with new dispatcher infra.

I get ridiculous result on my machine...
this PR debug

third_party/wrk/mac/wrk -d 10s --latency http://127.0.0.1:4544
Running 10s test @ http://127.0.0.1:4544
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    52.80ms  185.40ms   1.28s    92.56%
    Req/Sec    16.56k     4.16k   26.77k    74.26%
  Latency Distribution
     50%  283.00us
     75%  291.00us
     90%   73.22ms
     99%  951.90ms
  332831 requests in 10.10s, 16.19MB read
Requests/sec:  32954.40
Transfer/sec:      1.60MB

master debug

third_party/wrk/mac/wrk -d 10s --latency http://127.0.0.1:4544
Running 10s test @ http://127.0.0.1:4544
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   584.31us  191.76us   8.23ms   96.67%
    Req/Sec     8.67k   777.07     8.99k    94.06%
  Latency Distribution
     50%  557.00us
     75%  561.00us
     90%  579.00us
     99%    0.99ms
  174285 requests in 10.10s, 8.48MB read
Requests/sec:  17255.87
Transfer/sec:    859.42KB

this PR release

third_party/wrk/mac/wrk -d 10s --latency http://127.0.0.1:4544
Running 10s test @ http://127.0.0.1:4544
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   104.40us   62.98us   2.97ms   86.85%
    Req/Sec    47.33k     2.91k   50.13k    86.50%
  Latency Distribution
     50%   97.00us
     75%  123.00us
     90%  165.00us
     99%  299.00us
  941792 requests in 10.00s, 45.81MB read
Requests/sec:  94179.51
Transfer/sec:      4.58MB

master release

third_party/wrk/mac/wrk -d 10s --latency http://127.0.0.1:4544
Running 10s test @ http://127.0.0.1:4544
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   122.79us   32.44us   1.95ms   96.50%
    Req/Sec    39.58k     1.49k   40.66k    97.03%
  Latency Distribution
     50%  119.00us
     75%  122.00us
     90%  125.00us
     99%  264.00us
  795753 requests in 10.10s, 38.70MB read
Requests/sec:  78785.96
Transfer/sec:      3.83MB

That's ~90% increase for debug and ~20% for release. You can see that latency got worse, but that's promising results.

CC @ry

@bartlomieju bartlomieju changed the title perf: Greedy poll in deno_core_http_bench perf: Greedy poll async ops in Isolate Oct 3, 2019
cli/tokio_util.rs Outdated Show resolved Hide resolved
core/isolate.rs Outdated Show resolved Hide resolved
@bartlomieju
Copy link
Member Author

bartlomieju commented Oct 7, 2019

@ry I'm very close with this PR, I'm still having some trouble with deno bundle, help would be appreciated.

REDACTED

EDIT: Got it figured out - need to poll worker to properly set up task

@bartlomieju
Copy link
Member Author

bartlomieju commented Oct 8, 2019

Latest benchmarks from my machine:

Threadpool runtime
this PR

$ ./target/release/deno -A tools/deno_tcp.ts 127.0.0.1:4544
$ third_party/wrk/mac/wrk -d 10s --latency http://127.0.0.1:4544
Running 10s test @ http://127.0.0.1:4544
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   319.99us    1.26ms  27.33ms   96.39%
    Req/Sec    41.34k     8.53k   56.36k    68.81%
  Latency Distribution
     50%   91.00us
     75%  132.00us
     90%  255.00us
     99%    6.11ms
  830448 requests in 10.10s, 40.39MB read
Requests/sec:  82228.13
Transfer/sec:      4.00MB

Deno v0.20.0

$ deno -A tools/deno_tcp.ts 127.0.0.1:4544
$ third_party/wrk/mac/wrk -d 10s --latency http://127.0.0.1:4544
Running 10s test @ http://127.0.0.1:4544
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   190.30us  386.52us   4.41ms   95.75%
    Req/Sec    38.94k     2.39k   43.57k    83.66%
  Latency Distribution
     50%  113.00us
     75%  117.00us
     90%  143.00us
     99%    2.58ms
  782699 requests in 10.10s, 38.07MB read
Requests/sec:  77478.20
Transfer/sec:      3.77MB

Current thread runtime
this PR

$ ./target/release/deno -A --current-thread tools/deno_tcp.ts 127.0.0.1:4544
$ third_party/wrk/mac/wrk -d 10s --latency http://127.0.0.1:4544
Running 10s test @ http://127.0.0.1:4544
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   703.41us    2.76ms  47.41ms   94.45%
    Req/Sec    48.80k     4.39k   69.20k    78.11%
  Latency Distribution
     50%   85.00us
     75%   94.00us
     90%  372.00us
     99%   14.68ms
  975423 requests in 10.10s, 47.44MB read
Requests/sec:  96584.13
Transfer/sec:      4.70MB

Deno v0.20.0

$ deno -A --current-thread tools/deno_tcp.ts 127.0.0.1:4544
$ third_party/wrk/mac/wrk -d 10s --latency http://127.0.0.1:4544
Running 10s test @ http://127.0.0.1:4544
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   207.94us  419.47us   8.57ms   96.38%
    Req/Sec    33.70k     3.10k   38.36k    83.50%
  Latency Distribution
     50%  128.00us
     75%  135.00us
     90%  182.00us
     99%    2.70ms
  670688 requests in 10.00s, 32.62MB read
Requests/sec:  67063.98
Transfer/sec:      3.26MB

@bartlomieju
Copy link
Member Author

Update: still having some troubles with fmt_test.py as well as JS unit tests for --allow-net. TTB, I'll try to fix it tomorrow

@bartlomieju bartlomieju force-pushed the greedy_poll branch 2 times, most recently from 417aeee to 8ab0b52 Compare October 10, 2019 17:25
@bartlomieju
Copy link
Member Author

Two things left to do:

  • Isolate tests for greedy - immediate response for async ops
  • fix track_task in Resource

Getting close 💪

@bartlomieju
Copy link
Member Author

@ry ready for review 🎉

also CC @piscisaureus

});
tokio_util::run(bundle_future);
});
tokio_util::run(main_future);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand this change... it seems like it should be functionally equivalent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it looks like it, but if we don't poll isolate then task is not properly set up and ts_compiler panics.

core/isolate.rs Show resolved Hide resolved
core/isolate.rs Show resolved Hide resolved
@ry ry requested a review from piscisaureus October 11, 2019 15:26
@ry
Copy link
Member

ry commented Oct 11, 2019

Looks good - I'm in favor of landing this. Nice work! I look forward to seeing the benchmarks page updated.

I want to give @piscisaureus a chance to review since this is very sensitive code.

@piscisaureus piscisaureus changed the title perf: Greedy poll async ops in Isolate perf: eager poll async ops in Isolate Oct 14, 2019
Copy link
Member

@piscisaureus piscisaureus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving because I want to see the benchmark results.
If it sticks, I hope we can make the integration a bit more streamlined and bring back some of the fairness of the old scheduler (e.g. after 20 eager poll short-circuits, revert back to deferred polling).

@ry ry merged commit 4221b90 into denoland:master Oct 14, 2019
@bartlomieju bartlomieju deleted the greedy_poll branch October 14, 2019 21:46
@ry
Copy link
Member

ry commented Oct 16, 2019

Very nice work @bartlomieju! This PR has proven significant perf gains in the http server benchmark and throughput benchmark:

Screen Shot 2019-10-16 at 3 38 26 PM

Screen Shot 2019-10-16 at 3 38 48 PM

However it has increased tail latency. @bartlomieju's subsequent #3128 has mediated it to reasonable levels:

Screen Shot 2019-10-16 at 3 38 39 PM

It's worth noting that our benchmark page is generated inside Github Actions, which are perhaps not the best servers to undertake CPU heavy measurements. On the benchmarks page, deno_tcp is shown as performing better than node_tcp. On my own machine, when I benchmark them, node_tcp is still significantly better:

deno_tcp:

> wrk http://127.0.0.1:4500
Running 10s test @ http://127.0.0.1:4500
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     0.89ms    6.85ms 114.82ms   98.91%
    Req/Sec    19.33k     2.81k   22.69k    87.00%
  384687 requests in 10.00s, 18.71MB read
Requests/sec:  38461.11
Transfer/sec:      1.87MB

node_tcp:

> wrk http://127.0.0.1:4544
Running 10s test @ http://127.0.0.1:4544
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   219.12us  343.96us  10.19ms   97.11%
    Req/Sec    24.81k     4.84k   38.29k    80.00%
  493490 requests in 10.00s, 24.00MB read
Requests/sec:  49333.27
Transfer/sec:      2.40MB

So there is still work to do. We will be able to upgrade to Tokio's new scheduler in the coming weeks, which should shed some light on how much of this overhead is due to our dependencies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants