Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor zero-copy buffers for performance and to prevent memory leaks #2238

Merged
merged 2 commits into from May 1, 2019

Conversation

2 participants
@piscisaureus
Copy link
Collaborator

commented Apr 28, 2019

  • In order to prevent ArrayBuffers from getting garbage collected by V8,
    we used to store a v8::Persistent in a map. This patch
    introduces a custom ArrayBuffer allocator which doesn't use Persistent
    handles, but instead stores a pointer to the actual ArrayBuffer data
    alongside with a reference count. Since creating Persistent handles
    has quite a bit of overhead, this change significantly increases
    performance. Various HTTP server benchmarks report about 5-10% more
    requests per second than before.

  • Previously the Persistent handle that prevented garbage collection had
    to be released manually, and this wasn't always done, which was
    causing memory leaks. This has been resolved by introducing a new
    PinnedBuf type in both Rust and C++ that automatically re-enables
    garbage collection when it goes out of scope.

  • Zero-copy buffers are now correctly wrapped in an Option if there is a
    possibility that they're not present. This clears up a correctness
    issue where we were creating zero-length slices from a null pointer,
    which is against the rules.

Show resolved Hide resolved core/libdeno/internal.h Outdated

@piscisaureus piscisaureus force-pushed the piscisaureus:aba_refcount branch from 993933c to d5b7520 Apr 28, 2019

@piscisaureus piscisaureus changed the title TRY: core: use ArrayBuffer allocator lock zero-copy buffers in memory TRY: core: use ArrayBuffer allocator to lock zero-copy buffers in memory Apr 28, 2019

@piscisaureus piscisaureus force-pushed the piscisaureus:aba_refcount branch 4 times, most recently from 182f806 to 32b987e Apr 28, 2019

@piscisaureus piscisaureus changed the title TRY: core: use ArrayBuffer allocator to lock zero-copy buffers in memory core: use ArrayBuffer allocator to lock zero-copy buffers in memory Apr 28, 2019

@piscisaureus piscisaureus requested a review from ry Apr 28, 2019

Show resolved Hide resolved core/libdeno/internal.h Outdated
@@ -25,6 +25,56 @@ struct ModuleInfo {
}
};

class DenoArrayBufferAllocator : public v8::ArrayBuffer::Allocator {

This comment has been minimized.

Copy link
@ry

ry Apr 28, 2019

Collaborator

Would be nice to put this in a separate file... everything is kinda slopped together into binding.cc and internal.h... I've been slowly trying to break it up as I add stuff

This comment has been minimized.

Copy link
@ry

ry Apr 29, 2019

Collaborator

s/DenoArrayBufferAllocator/ArrayBufferAllocator/g

@piscisaureus piscisaureus force-pushed the piscisaureus:aba_refcount branch 4 times, most recently from f616697 to 90645b4 Apr 29, 2019

.pending_ops
.iter_mut()
.filter(|op| !op.zero_copy_alloc_ptr.is_null())
.map(|op| replace(&mut op.zero_copy_alloc_ptr, null_mut()))

This comment has been minimized.

Copy link
@ry

ry Apr 29, 2019

Collaborator

I don't understand what's happening here - is this setting op.zero_copy_alloc_ptr to null? (Consider adding a comment.)

.collect::<Vec<_>>();
zero_copy_bufs
.into_iter()
.for_each(|i| self.zero_copy_release(i));

This comment has been minimized.

Copy link
@ry

ry Apr 29, 2019

Collaborator

Is i a pointer? I think ptr would be a better name.

@@ -233,16 +244,22 @@ impl Isolate {
// return value.
// TODO(ry) check that if JSError thrown during respond(), that it will be
// picked up.
if !zero_copy_alloc_ptr.is_null() {
isolate.zero_copy_release(zero_copy_alloc_ptr);
}

This comment has been minimized.

Copy link
@ry

ry Apr 29, 2019

Collaborator

This seems to be fixing a fairly serious memory leak. I'm trying to think of a way we could have tested for this. Any suggestions? (I was thinking we could have a counter for every time a zero copy buf is dropped, and then check the metric in the unit tests below.)

This comment has been minimized.

Copy link
@piscisaureus

piscisaureus Apr 29, 2019

Author Collaborator

So although I did not add a "named test" for it, it definitely gets tested: the CHECK(ref_count_map_.empty()) in ~ArrayBufferAllocator() checks that, and it runs on every invocation of deno.
Adding another contrived test for it is totally pointless.

What we are missing is a way to "test the tests", it'd be nice if I could write a program that intentionally leaks a buffer and verify if that actually crashes. Any ideas where that would go?

This comment has been minimized.

Copy link
@ry

ry Apr 29, 2019

Collaborator

What we are missing is a way to "test the tests", it'd be nice if I could write a program that intentionally leaks a buffer and verify if that actually crashes. Any ideas where that would go?

In Rust you can do #[should_panic] above a test...

This comment has been minimized.

Copy link
@piscisaureus

piscisaureus Apr 29, 2019

Author Collaborator

In Rust you can do #[should_panic] above a test...

Unfortunately the check runs in the c++ destructor of a static variable, I don't think this would lead to a rust panic.
Rust would have already returned from panic_unwind() and (thus no longer able to catch panics) by the time that happens.

@@ -3,6 +3,7 @@
#define INTERNAL_H_

#include <map>
#include <mutex> // NOLINT

This comment has been minimized.

Copy link
@ry

ry Apr 29, 2019

Collaborator

Why NOLINT?

@@ -25,6 +26,60 @@ struct ModuleInfo {
}
};

class ArrayBufferAllocator : public v8::ArrayBuffer::Allocator {

This comment has been minimized.

Copy link
@ry

ry Apr 29, 2019

Collaborator

Consider moving this to core/libdeno/allocator.h

Show resolved Hide resolved core/libdeno/internal.h Outdated
@ry

ry approved these changes Apr 29, 2019

Copy link
Collaborator

left a comment

LGTM - very nice!

Mention at this is done for perf in the commit message when you land.

@piscisaureus piscisaureus force-pushed the piscisaureus:aba_refcount branch 2 times, most recently from 4558ff9 to f607ba3 Apr 30, 2019

@piscisaureus

This comment has been minimized.

Copy link
Collaborator Author

commented Apr 30, 2019

@ry I tucked another huge refactor to the end, so if you care give it another look.
There's a lot more deno_buf related code that can be removed now but I want to do that in a follow-up PR, the diff is pretty big already.

std::mutex ref_count_map_mutex_;
};

class PinnedBufPin {

This comment has been minimized.

Copy link
@ry

ry May 1, 2019

Collaborator

This could use some explanatory comments. It’s not immediately clear to me what this does

@@ -80,11 +80,11 @@ fn empty_buf() -> Buf {
pub fn dispatch_all(
state: &ThreadSafeState,
control: &[u8],
zero_copy: deno_buf,
zero_copy: Option<PinnedBuf>,

This comment has been minimized.

Copy link
@ry

ry May 1, 2019

Collaborator

Nice to see the Option here

friend class PinnedBufPin;

void Ref(void* data) {
std::lock_guard<std::mutex> lock(ref_count_map_mutex_);

This comment has been minimized.

Copy link
@ry

ry May 1, 2019

Collaborator

Consider adding a TODO which discusses the ref count in the allocation that we talked about.

Show resolved Hide resolved core/libdeno/deno.h

@ry ry referenced this pull request May 1, 2019

Merged

core: express op as enum #2255

Raw AsRaw() { return BitMove<Raw>(this); }
};

class PinnedBuf {

This comment has been minimized.

Copy link
@ry

ry May 1, 2019

Collaborator

I guess this is based of some Rust design? If so can you mention that in the comments with a link where people can read what inspired it.

@ry

This comment has been minimized.

Copy link
Collaborator

commented May 1, 2019

this branch deno_core_http_bench

~/src/deno> ./third_party/wrk/mac/wrk -d 20s http://127.0.0.1:4544/
Running 20s test @ http://127.0.0.1:4544/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   195.79us  358.70us  19.82ms   99.78%
    Req/Sec    25.46k     1.91k   27.83k    83.08%
  1018097 requests in 20.10s, 49.52MB read
Requests/sec:  50652.61
Transfer/sec:      2.46MB

master (c36b5d) deno_core_http_bench

~/src/deno> ./third_party/wrk/mac/wrk -d 20s http://127.0.0.1:4544/
Running 20s test @ http://127.0.0.1:4544/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   205.82us   60.99us   4.50ms   93.11%
    Req/Sec    23.21k     1.84k   26.35k    72.89%
  928136 requests in 20.10s, 45.14MB read
Requests/sec:  46176.63
Transfer/sec:      2.25MB

node v12.1.0 using tools/node_tcp.js

~/src/deno> ./third_party/wrk/mac/wrk -d 20s http://127.0.0.1:4544/
Running 20s test @ http://127.0.0.1:4544/
  2 threads and 10 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   235.46us  155.12us   9.81ms   91.76%
    Req/Sec    20.78k     1.89k   27.41k    87.06%
  831459 requests in 20.10s, 40.44MB read
Requests/sec:  41365.67
Transfer/sec:      2.01MB

nice work!

let buf = vec.into_boxed_slice();
(false, Box::new(futures::future::ok(buf)))
}
config.dispatch(move |control: &[u8], _zero_copy_buf| -> (bool, Box<Op>) {

This comment has been minimized.

Copy link
@ry

ry May 1, 2019

Collaborator

Either remove the type from control or add it to _zero_copy_buf

Show resolved Hide resolved core/libdeno.rs
fn drop(&mut self) {
unsafe {
let raw = &mut *(self as *mut PinnedBufPin as *mut PinnedBufPinRaw);
deno_zero_copy_release(raw);

This comment has been minimized.

Copy link
@ry

ry May 1, 2019

Collaborator

Maybe s/deno_zero_copy_release/deno_pinned_buf_release/

explicit PinnedBufPin(void* ptr) : ptr_(ptr) {
ArrayBufferAllocator::global().Ref(ptr);
}
explicit PinnedBufPin(Raw raw) { *this = BitMove<PinnedBufPin>(&raw); }

This comment has been minimized.

Copy link
@ry

ry May 1, 2019

Collaborator

Add a comment saying this copies the memory in raw

I see. It's not copying the memory of raw, it's just copying the pointer.

This code would be better if it was

explicit PinnedBufPin(Raw raw) { this.ptr_ = raw.ptr; }

... even after staring at this for a few minutes it's very unclear to me what is happening. Is there some way to simplify this? Preferably by being explicit and with less generics, or if BitCopy and AutoDeref actually serve a functional purpose, then with comments.

ArrayBufferAllocator::global().Ref(ptr);
}
explicit PinnedBufPin(Raw raw) { *this = BitMove<PinnedBufPin>(&raw); }
Raw AsRaw() { return BitMove<Raw>(this); }

This comment has been minimized.

Copy link
@ry

ry May 1, 2019

Collaborator

Add a comment saying this copies the memory in raw

Show resolved Hide resolved core/libdeno/buffer.h

@piscisaureus piscisaureus force-pushed the piscisaureus:aba_refcount branch from f607ba3 to 6e31604 May 1, 2019

@ry

This comment has been minimized.

Copy link
Collaborator

commented May 1, 2019

LGTM

@piscisaureus piscisaureus force-pushed the piscisaureus:aba_refcount branch 2 times, most recently from 9073b6a to c0d5c3c May 1, 2019

Refactor zero-copy buffers for performance and to prevent memory leaks
* In order to prevent ArrayBuffers from getting garbage collected by V8,
  we used to store a v8::Persistent<ArrayBuffer> in a map. This patch
  introduces a custom ArrayBuffer allocator which doesn't use Persistent
  handles, but instead stores a pointer to the actual ArrayBuffer data
  alongside with a reference count. Since creating Persistent handles
  has quite a bit of overhead, this change significantly increases
  performance. Various HTTP server benchmarks report about 5-10% more
  requests per second than before.

* Previously the Persistent handle that prevented garbage collection had
  to be released manually, and this wasn't always done, which was
  causing memory leaks. This has been resolved by introducing a new
  `PinnedBuf` type in both Rust and C++ that automatically re-enables
  garbage collection when it goes out of scope.

* Zero-copy buffers are now correctly wrapped in an Option if there is a
  possibility that they're not present. This clears up a correctness
  issue where we were creating zero-length slices from a null pointer,
  which is against the rules.

@piscisaureus piscisaureus force-pushed the piscisaureus:aba_refcount branch from c0d5c3c to 41c7e96 May 1, 2019

@piscisaureus piscisaureus changed the title core: use ArrayBuffer allocator to lock zero-copy buffers in memory @piscisaureus Refactor zero-copy buffers for performance and to prevent memory leaks May 1, 2019

@piscisaureus piscisaureus changed the title @piscisaureus Refactor zero-copy buffers for performance and to prevent memory leaks Refactor zero-copy buffers for performance and to prevent memory leaks May 1, 2019

@piscisaureus piscisaureus merged commit 41c7e96 into denoland:master May 1, 2019

4 checks passed

Travis CI - Branch Build Passed
Details
Travis CI - Pull Request Build Passed
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
license/cla Contributor License Agreement is signed.
Details

@piscisaureus piscisaureus deleted the piscisaureus:aba_refcount branch May 1, 2019

@ry

This comment has been minimized.

Copy link
Collaborator

commented on 41c7e96 May 2, 2019

The 5-10% perf improvement can be seen here:
Screen Shot 2019-05-02 at 12 30 12 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.