Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

core: Isolate should abstract shared buffer #1933

Merged
merged 2 commits into from Mar 15, 2019

Conversation

2 participants
@ry
Copy link
Collaborator

ry commented Mar 14, 2019

We always pass around Box<[u8]>, and adding this generic is an
unnecessary complication.

@ry ry requested a review from piscisaureus Mar 14, 2019

@ry ry force-pushed the ry:core_non_generic branch 2 times, most recently from c7842b5 to da80f96 Mar 15, 2019

@ry

This comment has been minimized.

Copy link
Collaborator Author

ry commented Mar 15, 2019

This branch has about a 20% slow down in deno_core_http_bench

master:

~/src/deno ./third_party/wrk/mac/wrk -c 10 -d 10 http://127.0.0.1:4544/
Running 10s test @ http://127.0.0.1:4544/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   700.79us    2.72ms  50.11ms   95.29%
    Req/Sec    27.68k     2.31k   32.47k    74.26%
  556208 requests in 10.10s, 27.05MB read
Requests/sec:  55069.14
Transfer/sec:      2.68MB

this branch:

~/src/deno ./third_party/wrk/mac/wrk -c 10 -d 10 http://127.0.0.1:4544/
Running 10s test @ http://127.0.0.1:4544/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.89ms    8.82ms 176.84ms   95.67%
    Req/Sec    22.34k     2.50k   30.35k    72.28%
  448968 requests in 10.10s, 21.84MB read
Requests/sec:  44454.64
Transfer/sec:      2.16MB
@ry

This comment has been minimized.

Copy link
Collaborator Author

ry commented Mar 15, 2019

slowdown reduced to 9%

master branch:

~/src/deno ./third_party/wrk/mac/wrk -c 10 -d 10 http://127.0.0.1:4544/
Running 10s test @ http://127.0.0.1:4544/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   496.83us    1.81ms  33.27ms   95.89%
    Req/Sec    26.97k     2.28k   35.55k    74.63%
  539459 requests in 10.10s, 26.24MB read
Requests/sec:  53414.27
Transfer/sec:      2.60MB

this branch:

~/src/deno ./third_party/wrk/mac/wrk -c 10 -d 10 http://127.0.0.1:4544/
Running 10s test @ http://127.0.0.1:4544/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   416.97us    1.29ms  26.02ms   95.85%
    Req/Sec    24.18k     2.40k   30.03k    71.78%
  486153 requests in 10.10s, 23.65MB read
Requests/sec:  48135.23
Transfer/sec:      2.34MB

@ry ry changed the title core: Behavior shouldn't be generic core: Isolate should abstract shared buffer Mar 15, 2019

@ry ry force-pushed the ry:core_non_generic branch from 07f0740 to b1586b7 Mar 15, 2019

Show resolved Hide resolved core/http_bench.js Outdated
}

/// Clears the shared buffer.
pub fn reset(&mut self) {

This comment has been minimized.

@piscisaureus

piscisaureus Mar 15, 2019

Collaborator

TODO: remove fn reset() :)

bytes: Vec<u8>,
}

impl SharedQueue {

This comment has been minimized.

@piscisaureus

piscisaureus Mar 15, 2019

Collaborator

This is a great step in the right direction. Code is very readable.
I still feel reset() should go away; I think the implementation could simply do that by automatically resetting when all messages have been shifted off and the buffer is empty.

@piscisaureus

This comment has been minimized.

Copy link
Collaborator

piscisaureus commented Mar 15, 2019

slowdown reduced to 9%

So this shared queue is slower than the current one ?

Show resolved Hide resolved core/http_bench.rs Outdated
@ry

This comment has been minimized.

Copy link
Collaborator Author

ry commented Mar 15, 2019

So this shared queue is slower than the current one ?

It's 9% slower than master branch's core/http_bench.js
https://github.com/denoland/deno/blob/45fad1b7cfd44376d8be35288b7213636bc93450/core/http_bench.js
It's doing slightly more work - it appears very sensitive in this range - I guess it's just that we're using slightly more JS - with a few more TypedArray constructors. I think we have to take the hit here in order to support this more general API.
Hopefully we can make up for it elsewhere.

@piscisaureus

This comment has been minimized.

Copy link
Collaborator

piscisaureus commented Mar 15, 2019

Rather than converting to a Box<[u8]>, wouldn't it be possible to return a borrowed subslice?
I think the slowdown could potentially originate from excessive memory allocations.

return result;
function recordFromBuf(buf) {
assert(buf.byteLength === 16);
const buf32 = new Int32Array(buf.buffer, buf.byteOffset, buf.byteLength / 4);

This comment has been minimized.

@piscisaureus

piscisaureus Mar 15, 2019

Collaborator

When optimizing further, it's worth a try to create a global static Int32Array rather than making and disposing one every time. In my experience the difference is not huge but noticeable.

@ry

This comment has been minimized.

Copy link
Collaborator Author

ry commented Mar 15, 2019

@piscisaureus libmalloc is showing up at about 20% in the profile - but i'm not sure how that compares to normal v8 loads
Screen Shot 2019-03-15 at 3 30 36 AM

these are tiny 16 byte messages... i find it hard to imagine they're causing that much perf drop

(libmalloc also shows up similarly in the master branch - so it's not a smoking gun)

Show resolved Hide resolved core/shared_queue.rs Outdated
@piscisaureus

This comment has been minimized.

Copy link
Collaborator

piscisaureus commented Mar 15, 2019

It's 9% slower than master branch's core/http_bench.js

I don't understand how this can be slower whereas core_improvements2 is twice as fast as master.
What are we missing here?

let off = getOffset(i);
let end = getEnd(i);
shared32[INDEX_NUM_SHIFTED_OFF] += 1;
return sharedBytes.subarray(off, end);

This comment has been minimized.

@piscisaureus

piscisaureus Mar 15, 2019

Collaborator

Here's another performance sensitive spot where a TypedArray gets constructed.

@ry

This comment has been minimized.

Copy link
Collaborator Author

ry commented Mar 15, 2019

I don't understand how this can be slower whereas core_improvements2 is twice as fast as master.
What are we missing here?

We've landed all the //core changes from core_improvements2 in #1904 - so as far as //core/http_bench.js - it's the same code pretty much.

@piscisaureus

This comment has been minimized.

Copy link
Collaborator

piscisaureus commented Mar 15, 2019

I don't understand how this can be slower whereas core_improvements2 is twice as fast as master.
What are we missing here?

We've landed all the //core changes from core_improvements2 in #1904 - so as far as //core/http_bench.js - it's the same code pretty much.

Even more surprising then right?
Am I being dumb or ... ?

@piscisaureus

This comment has been minimized.

Copy link
Collaborator

piscisaureus commented Mar 15, 2019

[02:54, 3/15/2019] piscisaureus: I'm curious how the two branches compare
[02:54, 3/15/2019] piscisaureus: Did we make any progress at all
[02:54, 3/15/2019] Ryan Dahl: vs master?
[02:54, 3/15/2019] piscisaureus: Yeah
[02:55, 3/15/2019] Ryan Dahl: using v0.3.3 .. yes it's about 2x better
[02:55, 3/15/2019] piscisaureus: Oh ok!
[02:55, 3/15/2019] piscisaureus: That's encouraging
[02:56, 3/15/2019] Ryan Dahl: 17k vs 31k

Maybe I was misreading this, and you meant v0.3.3 already includes the 2x performance increase.

@ry

This comment has been minimized.

Copy link
Collaborator Author

ry commented Mar 15, 2019

Maybe I was misreading this, and you meant v0.3.3 already includes the 2x performance increase.

That was regarding "deno tests/http_bench.ts" ... this is about "deno_core_http_bench" (aka //core/http_bench.js)

in v0.3.3 vs core_integration2, we are 2x faster. This branch does not include integrations into src - it's only changing the interface of //core/

The naming is super confusing. We need to rename soon.

@piscisaureus
Copy link
Collaborator

piscisaureus left a comment

Good to land as-is IMO, so here's the approval.
Please ping me if you end up making big changes, I'd like to look again at the final version then.

@ry

This comment has been minimized.

Copy link
Collaborator Author

ry commented Mar 15, 2019

Getting this failure on arm64

$CARGO_TARGET_DIR/aarch64-unknown-linux-gnu/release/deno tests/002_hello.ts
qemu: Unsupported syscall: 278
qemu: Unsupported syscall: 278

cc @afinch7 (any idea?)

@ry

This comment has been minimized.

Copy link
Collaborator Author

ry commented Mar 15, 2019

@piscisaureus I'm getting this CI failure on windows - any idea?

FAILED: deno_core_http_bench_test.exe deno_core_http_bench_test.exe.pdb 
ninja -t msvc -e environment.x64 -- ../../third_party/llvm-build/Release+Asserts/bin/lld-link.exe /nologo /OUT:./deno_core_http_bench_test.exe /PDB:./deno_core_http_bench_test.exe.pdb @./deno_core_http_bench_test.exe.rsp
lld-link: error: <root>: undefined symbol: mainCRTStartup
lld-link: error: undefined symbol: __CxxFrameHandler3
>>> referenced by rust_crates/deno_core_http_bench_test_bin.o:(.xdata)
lld-link: error: undefined symbol: __CxxFrameHandler3
>>> referenced by libtest-70b4d6fe9ba4c17b.rlib(test-70b4d6fe9ba4c17b.test.7h7r2drh-cgu.0.rcgu.o):(.xdata)
>>> referenced by libtest-70b4d6fe9ba4c17b.rlib(test-70b4d6fe9ba4c17b.test.7h7r2drh-cgu.0.rcgu.o):(.xdata)
>>> referenced by libtest-70b4d6fe9ba4c17b.rlib(test-70b4d6fe9ba4c17b.test.7h7r2drh-cgu.0.rcgu.o):(.xdata)
>>> referenced by libtest-70b4d6fe9ba4c17b.rlib(test-70b4d6fe9ba4c17b.test.7h7r2drh-cgu.0.rcgu.o):(.xdata)
>>> referenced by libtest-70b4d6fe9ba4c17b.rlib(test-70b4d6fe9ba4c17b.test.7h7r2drh-cgu.0.rcgu.o):(.xdata)
>>> referenced by libtest-70b4d6fe9ba4c17b.rlib(test-70b4d6fe9ba4c17b.test.7h7r2drh-cgu.0.rcgu.o):(.xdata)
@piscisaureus

This comment has been minimized.

Copy link
Collaborator

piscisaureus commented Mar 15, 2019

@piscisaureus I'm getting this CI failure on windows - any idea?

hmm

@ry ry force-pushed the ry:core_non_generic branch 3 times, most recently from f3e8cf1 to ae56081 Mar 15, 2019

@piscisaureus

This comment has been minimized.

Copy link
Collaborator

piscisaureus commented Mar 15, 2019

Obviously it is missing msvcrt symbols. But why isn't it linking the crt... :/

@ry ry force-pushed the ry:core_non_generic branch from ae56081 to 8b4b686 Mar 15, 2019

@piscisaureus

This comment has been minimized.

Copy link
Collaborator

piscisaureus commented Mar 15, 2019

@ry Pushed a fix, please don't force push over it.

ry added some commits Mar 14, 2019

core: Behavior shouldn't be generic
We always pass around Box<[u8]>, and adding this generic is an
unnecessary complication.

Add deno_core_http_bench_test to test.py

sharedQueue works on deno_core_http_bench

@ry ry force-pushed the ry:core_non_generic branch from 3ca1d87 to c61e6ab Mar 15, 2019

@ry

This comment has been minimized.

Copy link
Collaborator Author

ry commented Mar 15, 2019

@piscisaureus squashing up, the changes are included

@ry ry merged commit 1e3509d into denoland:master Mar 15, 2019

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
license/cla Contributor License Agreement is signed.
Details
@ry

This comment has been minimized.

Copy link
Collaborator Author

ry commented Mar 15, 2019

Here's what the 9% slowdown looks like on the benchmarks page now that it has landed:

Screen Shot 2019-03-15 at 11 41 05 AM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.