High memory usage #3

Closed
nickdesaulniers opened this Issue Nov 2, 2015 · 54 comments

Comments

Projects
None yet
5 participants
@nickdesaulniers

Hi there,
I'm currently benchmarking brotli to write a blog post about support for brotli in Firefox 44+. While running wrk -c 100 -t 6 -d 30s -H 'Accept-Encoding: br' https://localhost:3000 against a custom Node.js server:

var accepts = require('accepts');
var fs = require('fs');
var https = require('https');
var brotli = require('iltorb').compressStream;
var lzma = require('lzma-native').createStream.bind(null, 'aloneEncoder');
var gzip = require('zlib').createGzip;

var filename = 'lorem_ipsum.txt';

function onRequest (req, res) {
  res.setHeader('Content-Type', 'text/html');

  var encodings = new Set(accepts(req).encodings());

  if (encodings.has('br')) {
    res.setHeader('Content-Encoding', 'br');
    fs.createReadStream(filename).pipe(brotli()).pipe(res);
  } else if (encodings.has('lzma')) {
    res.setHeader('Content-Encoding', 'lzma');
    fs.createReadStream(filename).pipe(lzma()).pipe(res);
  } else if (encodings.has('gzip')) {
    res.setHeader('Content-Encoding', 'gzip');
    fs.createReadStream(filename).pipe(gzip()).pipe(res);
  } else {
    fs.createReadStream('./lorem_ipsum.txt').pipe(res);
  }
};

var certs = {
  key: fs.readFileSync('./https-key.pem'),
  cert: fs.readFileSync('./https-cert.pem'),
};

https.createServer(certs, onRequest).listen(3000);

and measuring the peak resident set size in memory I saw about 9.8 GB of usage. Compare that to 330 MB for LZMA, 204 MB for GZIP, and 125 MB for uncompressed.

I assumed that there were objects being created that were using a lot of memory. I would expect them to get GC'ed once finished.

Using top -pid <node process id>, after wrk finished, I observed the memory usage of the process stabalize around 8.2 GB of memory. It looks to me like this library has a memory leak.

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Nov 3, 2015

Owner

Thanks for the report.

First of all, I suggest decreasing the lgwin value, that should help with peak memory usage.

As for the memory leak I've tried this, which helps but I believe it doesn't fix the problem entirely. Could you try it? It's on the mem branch.
npm doesn't seem to like submodules so you'll have to clone the repo itself and node-gyp build it.

I'm a C++ newbie so any help in that department is welcome.

Owner

MayhemYDG commented Nov 3, 2015

Thanks for the report.

First of all, I suggest decreasing the lgwin value, that should help with peak memory usage.

As for the memory leak I've tried this, which helps but I believe it doesn't fix the problem entirely. Could you try it? It's on the mem branch.
npm doesn't seem to like submodules so you'll have to clone the repo itself and node-gyp build it.

I'm a C++ newbie so any help in that department is welcome.

@MayhemYDG MayhemYDG added the bug label Nov 3, 2015

@nickdesaulniers

This comment has been minimized.

Show comment
Hide comment
@nickdesaulniers

nickdesaulniers Nov 3, 2015

I'm a C++ newbie so any help in that department is welcome.

I think any takeaway would greatly benefit the community, a war story that would help other developers. I'll take a look tomorrow and post back with the results.

I'm a C++ newbie so any help in that department is welcome.

I think any takeaway would greatly benefit the community, a war story that would help other developers. I'll take a look tomorrow and post back with the results.

@nickdesaulniers

This comment has been minimized.

Show comment
Hide comment
@nickdesaulniers

nickdesaulniers Nov 3, 2015

branch mem cuts down peak memory (RSS) from 9.8 GB to 3.4 GB. After wrk finishes and GC happens, memory is down to 671 MB. Before running wrk, the server is only using 15MB, so it looks like there's still is a leak. Will do more digging.

branch mem cuts down peak memory (RSS) from 9.8 GB to 3.4 GB. After wrk finishes and GC happens, memory is down to 671 MB. Before running wrk, the server is only using 15MB, so it looks like there's still is a leak. Will do more digging.

@nickdesaulniers

This comment has been minimized.

Show comment
Hide comment
@nickdesaulniers

nickdesaulniers Nov 3, 2015

So I found this with asan's leak checker:

Direct leak of 13330 byte(s) in 10 object(s) allocated from:
    #0 0x7fb7a5a6d827 in __interceptor_malloc (/usr/lib/gcc/x86_64-linux-gnu/4.9/libasan.so+0x57827)
    #1 0x7fb77d1b7715 in StreamEncodeWorker::Execute() ../src/enc/stream_encode_worker.cc:14
    #2 0x7fb77d1b17a1 in Nan::AsyncExecute(uv_work_s*) ../node_modules/nan/nan.h:1693
    #3 0xe0f0d8 in worker ../deps/uv/src/threadpool.c:91
    #4 0xe1c990 in uv__thread_start ../deps/uv/src/unix/thread.c:49
    #5 0x7fb7a4bc66a9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76a9)

Which is https://github.com/MayhemYDG/iltorb/blob/master/src/enc/stream_encode_worker.cc#L14
So it looks like StreamEncodeWorker should have a destructor that calls free on output_buffer it's it's not null: https://github.com/MayhemYDG/iltorb/blob/master/src/enc/stream_encode_worker.h#L19

I had to change the following lines in my binding.gyp:

      "cflags" : ["-Wno-sign-compare", "-O0", "-g", "-fsanitize=address", "-fno-omit-frame-pointer"],
      "ldflags": ["-fsanitize=address"],

then build with node-gyp rebuild --verbose then run with ASAN_OPTIONS=alloc_dealloc_mismatch=0:detect_leaks=1 LD_PRELOAD=/usr/lib/gcc/x86_64-linux-gnu/4.9/libasan.so node custom.js. I had to modify my server to:

  if (encodings.has('br')) {
    res.setHeader('Content-Encoding', 'br');
    fs.createReadStream(filename).pipe(brotli()).pipe(res).on('finish', function () {
      if (++counter === 10) {
        process.exit(1);
      }
    });
  }

otherwise asan's leak checker does not run since the process does not exit.

So I found this with asan's leak checker:

Direct leak of 13330 byte(s) in 10 object(s) allocated from:
    #0 0x7fb7a5a6d827 in __interceptor_malloc (/usr/lib/gcc/x86_64-linux-gnu/4.9/libasan.so+0x57827)
    #1 0x7fb77d1b7715 in StreamEncodeWorker::Execute() ../src/enc/stream_encode_worker.cc:14
    #2 0x7fb77d1b17a1 in Nan::AsyncExecute(uv_work_s*) ../node_modules/nan/nan.h:1693
    #3 0xe0f0d8 in worker ../deps/uv/src/threadpool.c:91
    #4 0xe1c990 in uv__thread_start ../deps/uv/src/unix/thread.c:49
    #5 0x7fb7a4bc66a9 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76a9)

Which is https://github.com/MayhemYDG/iltorb/blob/master/src/enc/stream_encode_worker.cc#L14
So it looks like StreamEncodeWorker should have a destructor that calls free on output_buffer it's it's not null: https://github.com/MayhemYDG/iltorb/blob/master/src/enc/stream_encode_worker.h#L19

I had to change the following lines in my binding.gyp:

      "cflags" : ["-Wno-sign-compare", "-O0", "-g", "-fsanitize=address", "-fno-omit-frame-pointer"],
      "ldflags": ["-fsanitize=address"],

then build with node-gyp rebuild --verbose then run with ASAN_OPTIONS=alloc_dealloc_mismatch=0:detect_leaks=1 LD_PRELOAD=/usr/lib/gcc/x86_64-linux-gnu/4.9/libasan.so node custom.js. I had to modify my server to:

  if (encodings.has('br')) {
    res.setHeader('Content-Encoding', 'br');
    fs.createReadStream(filename).pipe(brotli()).pipe(res).on('finish', function () {
      if (++counter === 10) {
        process.exit(1);
      }
    });
  }

otherwise asan's leak checker does not run since the process does not exit.

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Nov 3, 2015

Owner

StreamEncodeWorker should have a destructor that calls free on output_buffer

I don't believe so.

I call Nan::NewBuffer using the output_buffer:
https://github.com/MayhemYDG/iltorb/blob/master/src/enc/stream_encode_worker.cc#L32

According to the NAN docs:

it is assumed that the ownership of the pointer is being transferred to the new Buffer for management.
[...]
You must not free the memory space manually once you have created a Buffer in this way.

Owner

MayhemYDG commented Nov 3, 2015

StreamEncodeWorker should have a destructor that calls free on output_buffer

I don't believe so.

I call Nan::NewBuffer using the output_buffer:
https://github.com/MayhemYDG/iltorb/blob/master/src/enc/stream_encode_worker.cc#L32

According to the NAN docs:

it is assumed that the ownership of the pointer is being transferred to the new Buffer for management.
[...]
You must not free the memory space manually once you have created a Buffer in this way.

@nickdesaulniers

This comment has been minimized.

Show comment
Hide comment
@nickdesaulniers

nickdesaulniers Nov 3, 2015

oops, yep, just tried this, got a double free...

oops, yep, just tried this, got a double free...

@addaleax

This comment has been minimized.

Show comment
Hide comment
@addaleax

addaleax Nov 3, 2015

Contributor

@nickdesaulniers Btw, by passing --expose-gc at the node.js command line you can enable the global gc function; You can then just use gc(); (e.g. before process.exit) to force v8 to perform a garbage collection… maybe that’s helpful? It has been quite handy for me when looking for memory leaks

Contributor

addaleax commented Nov 3, 2015

@nickdesaulniers Btw, by passing --expose-gc at the node.js command line you can enable the global gc function; You can then just use gc(); (e.g. before process.exit) to force v8 to perform a garbage collection… maybe that’s helpful? It has been quite handy for me when looking for memory leaks

@nickdesaulniers

This comment has been minimized.

Show comment
Hide comment
@nickdesaulniers

nickdesaulniers Nov 3, 2015

ok, added the patch from the mem branch, added gc() before process.exit, increased the counter limit to 1000. The top two offenders in the asan leak check report look like:

Direct leak of 117040 byte(s) in 70 object(s) allocated from:
    #0 0x7f6cdebee1af in operator new(unsigned long) (/usr/lib/gcc/x86_64-linux-gnu/4.9/libasan.so+0x581af)
    #1 0xcb7791 in node::Parser::New(v8::FunctionCallbackInfo<v8::Value> const&) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0xcb7791)
    #2 0x82a6db in v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x82a6db)
    #3 0x84cf50 in v8::internal::Builtin_HandleApiCallConstruct(int, v8::internal::Object**, v8::internal::Isolate*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x84cf50)
    #4 0x3e8118c060a1 (+0xa1)
    #5 0x3e8118c1ef51 (+0x18f51)
    #6 0x3e8118d4062a (+0x4162a)
    #7 0x3e8118c24c65 (+0x1ec65)
    #8 0x3e8118d40506 (+0x41506)
    #9 0x3e8118d3f8b4 (+0x408b4)
    #10 0x3e8118d077dd (+0x87dd)
    #11 0x3e8118c1e9f4 (+0x189f4)
    #12 0x3e8118d3ee91 (+0x3fe91)
    #13 0x3e8118d07789 (+0x8789)
    #14 0x3e8118d3e3e4 (+0x3f3e4)
    #15 0x3e8118d16c43 (+0x17c43)
    #16 0x3e8118c1f05f (+0x1905f)
    #17 0x3e8118c1dfaf (+0x17faf)
    #18 0x916222 in v8::internal::Invoke(bool, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x916222)
    #19 0x9172c7 in v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, bool) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x9172c7)
    #20 0xb64329 in v8::internal::Runtime_Apply(int, v8::internal::Object**, v8::internal::Isolate*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0xb64329)
    #21 0x3e8118c060a1 (+0xa1)
    #22 0x3e8118d1acc0 (+0x1bcc0)
    #23 0x3e8118c1f05f (+0x1905f)
    #24 0x3e8118c1dfaf (+0x17faf)
    #25 0x916222 in v8::internal::Invoke(bool, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x916222)
    #26 0x9172c7 in v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, bool) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x9172c7)
    #27 0x814879 in v8::Function::Call(v8::Handle<v8::Value>, int, v8::Handle<v8::Value>*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x814879)
    #28 0xc91d5d in node::AsyncWrap::MakeCallback(v8::Handle<v8::Function>, int, v8::Handle<v8::Value>*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0xc91d5d)
    #29 0xcef69e in node::TLSCallbacks::SSLInfoCallback(ssl_st const*, int, int) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0xcef69e)

Direct leak of 43472 byte(s) in 26 object(s) allocated from:
    #0 0x7f6cdebee1af in operator new(unsigned long) (/usr/lib/gcc/x86_64-linux-gnu/4.9/libasan.so+0x581af)
    #1 0xcb7791 in node::Parser::New(v8::FunctionCallbackInfo<v8::Value> const&) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0xcb7791)
    #2 0x82a6db in v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x82a6db)
    #3 0x84cf50 in v8::internal::Builtin_HandleApiCallConstruct(int, v8::internal::Object**, v8::internal::Isolate*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x84cf50)
    #4 0x3e8118c060a1 (+0xa1)
    #5 0x3e8118c1ef51 (+0x18f51)
    #6 0x3e8118d4062a (+0x4162a)
    #7 0x3e8118c24c65 (+0x1ec65)
    #8 0x3e8118d40506 (+0x41506)
    #9 0x3e8118d3f8b4 (+0x408b4)
    #10 0x3e8118d077dd (+0x87dd)
    #11 0x3e8118c1e9f4 (+0x189f4)
    #12 0x3e8118d3ee91 (+0x3fe91)
    #13 0x3e8118d07789 (+0x8789)
    #14 0x3e8118dc4f72 (+0xc5f72)
    #15 0x3e8118c1f05f (+0x1905f)
    #16 0x3e8118c1dfaf (+0x17faf)
    #17 0x916222 in v8::internal::Invoke(bool, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x916222)
    #18 0x9172c7 in v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, bool) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x9172c7)
    #19 0xb64329 in v8::internal::Runtime_Apply(int, v8::internal::Object**, v8::internal::Isolate*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0xb64329)
    #20 0x3e8118c060a1 (+0xa1)
    #21 0x3e8118d1acc0 (+0x1bcc0)
    #22 0x3e8118c1f05f (+0x1905f)
    #23 0x3e8118c1dfaf (+0x17faf)
    #24 0x916222 in v8::internal::Invoke(bool, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x916222)
    #25 0x9172c7 in v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, bool) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x9172c7)
    #26 0x814879 in v8::Function::Call(v8::Handle<v8::Value>, int, v8::Handle<v8::Value>*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x814879)
    #27 0xc91d5d in node::AsyncWrap::MakeCallback(v8::Handle<v8::Function>, int, v8::Handle<v8::Value>*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0xc91d5d)
    #28 0xcef69e in node::TLSCallbacks::SSLInfoCallback(ssl_st const*, int, int) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0xcef69e)
    #29 0x6d23e4 in ssl3_accept (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x6d23e4)

ok, added the patch from the mem branch, added gc() before process.exit, increased the counter limit to 1000. The top two offenders in the asan leak check report look like:

Direct leak of 117040 byte(s) in 70 object(s) allocated from:
    #0 0x7f6cdebee1af in operator new(unsigned long) (/usr/lib/gcc/x86_64-linux-gnu/4.9/libasan.so+0x581af)
    #1 0xcb7791 in node::Parser::New(v8::FunctionCallbackInfo<v8::Value> const&) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0xcb7791)
    #2 0x82a6db in v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x82a6db)
    #3 0x84cf50 in v8::internal::Builtin_HandleApiCallConstruct(int, v8::internal::Object**, v8::internal::Isolate*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x84cf50)
    #4 0x3e8118c060a1 (+0xa1)
    #5 0x3e8118c1ef51 (+0x18f51)
    #6 0x3e8118d4062a (+0x4162a)
    #7 0x3e8118c24c65 (+0x1ec65)
    #8 0x3e8118d40506 (+0x41506)
    #9 0x3e8118d3f8b4 (+0x408b4)
    #10 0x3e8118d077dd (+0x87dd)
    #11 0x3e8118c1e9f4 (+0x189f4)
    #12 0x3e8118d3ee91 (+0x3fe91)
    #13 0x3e8118d07789 (+0x8789)
    #14 0x3e8118d3e3e4 (+0x3f3e4)
    #15 0x3e8118d16c43 (+0x17c43)
    #16 0x3e8118c1f05f (+0x1905f)
    #17 0x3e8118c1dfaf (+0x17faf)
    #18 0x916222 in v8::internal::Invoke(bool, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x916222)
    #19 0x9172c7 in v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, bool) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x9172c7)
    #20 0xb64329 in v8::internal::Runtime_Apply(int, v8::internal::Object**, v8::internal::Isolate*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0xb64329)
    #21 0x3e8118c060a1 (+0xa1)
    #22 0x3e8118d1acc0 (+0x1bcc0)
    #23 0x3e8118c1f05f (+0x1905f)
    #24 0x3e8118c1dfaf (+0x17faf)
    #25 0x916222 in v8::internal::Invoke(bool, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x916222)
    #26 0x9172c7 in v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, bool) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x9172c7)
    #27 0x814879 in v8::Function::Call(v8::Handle<v8::Value>, int, v8::Handle<v8::Value>*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x814879)
    #28 0xc91d5d in node::AsyncWrap::MakeCallback(v8::Handle<v8::Function>, int, v8::Handle<v8::Value>*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0xc91d5d)
    #29 0xcef69e in node::TLSCallbacks::SSLInfoCallback(ssl_st const*, int, int) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0xcef69e)

Direct leak of 43472 byte(s) in 26 object(s) allocated from:
    #0 0x7f6cdebee1af in operator new(unsigned long) (/usr/lib/gcc/x86_64-linux-gnu/4.9/libasan.so+0x581af)
    #1 0xcb7791 in node::Parser::New(v8::FunctionCallbackInfo<v8::Value> const&) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0xcb7791)
    #2 0x82a6db in v8::internal::FunctionCallbackArguments::Call(void (*)(v8::FunctionCallbackInfo<v8::Value> const&)) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x82a6db)
    #3 0x84cf50 in v8::internal::Builtin_HandleApiCallConstruct(int, v8::internal::Object**, v8::internal::Isolate*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x84cf50)
    #4 0x3e8118c060a1 (+0xa1)
    #5 0x3e8118c1ef51 (+0x18f51)
    #6 0x3e8118d4062a (+0x4162a)
    #7 0x3e8118c24c65 (+0x1ec65)
    #8 0x3e8118d40506 (+0x41506)
    #9 0x3e8118d3f8b4 (+0x408b4)
    #10 0x3e8118d077dd (+0x87dd)
    #11 0x3e8118c1e9f4 (+0x189f4)
    #12 0x3e8118d3ee91 (+0x3fe91)
    #13 0x3e8118d07789 (+0x8789)
    #14 0x3e8118dc4f72 (+0xc5f72)
    #15 0x3e8118c1f05f (+0x1905f)
    #16 0x3e8118c1dfaf (+0x17faf)
    #17 0x916222 in v8::internal::Invoke(bool, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x916222)
    #18 0x9172c7 in v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, bool) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x9172c7)
    #19 0xb64329 in v8::internal::Runtime_Apply(int, v8::internal::Object**, v8::internal::Isolate*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0xb64329)
    #20 0x3e8118c060a1 (+0xa1)
    #21 0x3e8118d1acc0 (+0x1bcc0)
    #22 0x3e8118c1f05f (+0x1905f)
    #23 0x3e8118c1dfaf (+0x17faf)
    #24 0x916222 in v8::internal::Invoke(bool, v8::internal::Handle<v8::internal::JSFunction>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x916222)
    #25 0x9172c7 in v8::internal::Execution::Call(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, bool) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x9172c7)
    #26 0x814879 in v8::Function::Call(v8::Handle<v8::Value>, int, v8::Handle<v8::Value>*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x814879)
    #27 0xc91d5d in node::AsyncWrap::MakeCallback(v8::Handle<v8::Function>, int, v8::Handle<v8::Value>*) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0xc91d5d)
    #28 0xcef69e in node::TLSCallbacks::SSLInfoCallback(ssl_st const*, int, int) (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0xcef69e)
    #29 0x6d23e4 in ssl3_accept (/home/nick/.nvm/versions/node/v0.12.2/bin/node+0x6d23e4)

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Nov 3, 2015

Owner

117040 byte(s)
43472 byte(s)

That's less than 1 MB total, this doesn't look like it.

Owner

MayhemYDG commented Nov 3, 2015

117040 byte(s)
43472 byte(s)

That's less than 1 MB total, this doesn't look like it.

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Nov 3, 2015

Owner

Try reading from a larger file perhaps?

Owner

MayhemYDG commented Nov 3, 2015

Try reading from a larger file perhaps?

@nickdesaulniers

This comment has been minimized.

Show comment
Hide comment
@nickdesaulniers

nickdesaulniers Nov 4, 2015

Even then, I don't see anything crazy high. Maybe this isn't a C/C++ memory leak, but one in JS. I could have sworn I that the heap snapshot I took didn't indicate that, though.
screen shot 2015-11-03 at 1 56 54 pm

I mean, I'm seeing 51% of system memory usage by the node.js process. Someone somewhere is retaining too much memory.

I'll try doing the memory graph over time, maybe I'm just getting a saw tooth from the GC waiting longer and longer to run.

Even then, I don't see anything crazy high. Maybe this isn't a C/C++ memory leak, but one in JS. I could have sworn I that the heap snapshot I took didn't indicate that, though.
screen shot 2015-11-03 at 1 56 54 pm

I mean, I'm seeing 51% of system memory usage by the node.js process. Someone somewhere is retaining too much memory.

I'll try doing the memory graph over time, maybe I'm just getting a saw tooth from the GC waiting longer and longer to run.

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Nov 4, 2015

Owner

From my own testing, the culprit does not reside in JS, as the heapTotal/heapUsed remained low throughout the test.
https://nodejs.org/api/process.html#process_process_memoryusage

Owner

MayhemYDG commented Nov 4, 2015

From my own testing, the culprit does not reside in JS, as the heapTotal/heapUsed remained low throughout the test.
https://nodejs.org/api/process.html#process_process_memoryusage

@nickdesaulniers

This comment has been minimized.

Show comment
Hide comment
@nickdesaulniers

nickdesaulniers Nov 4, 2015

@MayhemYDG but you did see a high RSS?

@MayhemYDG but you did see a high RSS?

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Nov 4, 2015

Owner

Yes definitely.

Here's the test I wrote quickly:

'use strict';

var fs = require('fs');
var brotli = require('.');

var i = 100;

function compress() {
  fs.createReadStream('./test/fixtures/large.txt')
    .pipe(brotli.compressStream({lgwin: 1}));

  if (i-- > 0) {
    compress();
  }
}

compress();

function p(n) {
  return Math.round( n / (1024 * 1024) * 100 ) / 100;
}

function report() {
  var mem = process.memoryUsage();
  console.log(p(mem.rss), p(mem.heapTotal), p(mem.heapUsed));

  setTimeout(report, 500);
}

report();

You'll have to quit the process manually but it lets you observe the memory over time.

Owner

MayhemYDG commented Nov 4, 2015

Yes definitely.

Here's the test I wrote quickly:

'use strict';

var fs = require('fs');
var brotli = require('.');

var i = 100;

function compress() {
  fs.createReadStream('./test/fixtures/large.txt')
    .pipe(brotli.compressStream({lgwin: 1}));

  if (i-- > 0) {
    compress();
  }
}

compress();

function p(n) {
  return Math.round( n / (1024 * 1024) * 100 ) / 100;
}

function report() {
  var mem = process.memoryUsage();
  console.log(p(mem.rss), p(mem.heapTotal), p(mem.heapUsed));

  setTimeout(report, 500);
}

report();

You'll have to quit the process manually but it lets you observe the memory over time.

@nickdesaulniers

This comment has been minimized.

Show comment
Hide comment
@nickdesaulniers

nickdesaulniers Nov 4, 2015

in the meantime, can you merge the mem branch?

in the meantime, can you merge the mem branch?

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Nov 4, 2015

Owner

Done.
@nickdesaulniers want me to release 1.0.6 right now?

Owner

MayhemYDG commented Nov 4, 2015

Done.
@nickdesaulniers want me to release 1.0.6 right now?

@nickdesaulniers

This comment has been minimized.

Show comment
Hide comment
@nickdesaulniers

nickdesaulniers Nov 4, 2015

yes please 🎉

yes please 🎉

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Nov 4, 2015

Owner

And done.
Still need to fix the remaining leaks, and investigate sync/async compression as well as the decompression methods.

Owner

MayhemYDG commented Nov 4, 2015

And done.
Still need to fix the remaining leaks, and investigate sync/async compression as well as the decompression methods.

@nickdesaulniers

This comment has been minimized.

Show comment
Hide comment
@nickdesaulniers

nickdesaulniers Nov 4, 2015

Here's a heap profile from running valgrind --tool=massif node app.js. You can view it with ms_print massif.out.15609 | less.

Here's a heap profile from running valgrind --tool=massif node app.js. You can view it with ms_print massif.out.15609 | less.

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Nov 4, 2015

Owner

I've pushed two new commits to master, but they don't seem to make much of a difference. 202f156 1dd8e7e

@nickdesaulniers
Not sure what to make of this. Does the profile show the memory that remained allocated? Did you try @addaleax's suggestion?

@kkoopa
Could you clarify about reseting persistent handles as you mentioned here?
nan isn't obvious to JS devs that are new to C++.

Owner

MayhemYDG commented Nov 4, 2015

I've pushed two new commits to master, but they don't seem to make much of a difference. 202f156 1dd8e7e

@nickdesaulniers
Not sure what to make of this. Does the profile show the memory that remained allocated? Did you try @addaleax's suggestion?

@kkoopa
Could you clarify about reseting persistent handles as you mentioned here?
nan isn't obvious to JS devs that are new to C++.

@kkoopa

This comment has been minimized.

Show comment
Hide comment
@kkoopa

kkoopa Nov 4, 2015

That really has nothing with NAN to do. Persistent handles are a V8 thing, NAN only wraps them to provide a unified API and semantics. Read the V8 embedder's guide, especially the section on Handles. Essentially, what you should do is define destructors for StreamDecode and StreamEncode wherein you call constructor.Reset();.

kkoopa commented Nov 4, 2015

That really has nothing with NAN to do. Persistent handles are a V8 thing, NAN only wraps them to provide a unified API and semantics. Read the V8 embedder's guide, especially the section on Handles. Essentially, what you should do is define destructors for StreamDecode and StreamEncode wherein you call constructor.Reset();.

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
Owner

MayhemYDG commented Nov 4, 2015

@kkoopa
So if I get this right I should revert 1dd8e7e and add constructor.Reset(); in the relevant destructors?
I can't find any example like that in the docs:
https://github.com/nodejs/nan/blob/master/doc/object_wrappers.md
https://nodejs.org/api/addons.html#addons_wrapping_c_objects
https://github.com/nodejs/node-addon-examples/blob/master/6_object_wrap/nan/myobject.cc

@kkoopa

This comment has been minimized.

Show comment
Hide comment
@kkoopa

kkoopa Nov 4, 2015

Yes, that is what you should do.

On Wednesday 04 November 2015 08:07:59 Mayhem wrote:

@kkoopa
So if I get this right I should revert
1dd8e7e and add constructor.Reset(); in
the relevant destructors? I can't find any example like that in the docs:
https://github.com/nodejs/nan/blob/master/doc/object_wrappers.md
https://nodejs.org/api/addons.html#addons_wrapping_c_objects
https://github.com/nodejs/node-addon-examples/blob/master/6_object_wrap/nan/
myobject.cc


Reply to this email directly or view it on GitHub:
#3 (comment)

kkoopa commented Nov 4, 2015

Yes, that is what you should do.

On Wednesday 04 November 2015 08:07:59 Mayhem wrote:

@kkoopa
So if I get this right I should revert
1dd8e7e and add constructor.Reset(); in
the relevant destructors? I can't find any example like that in the docs:
https://github.com/nodejs/nan/blob/master/doc/object_wrappers.md
https://nodejs.org/api/addons.html#addons_wrapping_c_objects
https://github.com/nodejs/node-addon-examples/blob/master/6_object_wrap/nan/
myobject.cc


Reply to this email directly or view it on GitHub:
#3 (comment)

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Nov 4, 2015

Owner

I think I figured out where the memory leak comes from.
The callbacks sent to the C++ code hold a reference to the wrapped objects, but these callbacks are not destroyed somehow, so the wrapped objects are never GC'd.

Owner

MayhemYDG commented Nov 4, 2015

I think I figured out where the memory leak comes from.
The callbacks sent to the C++ code hold a reference to the wrapped objects, but these callbacks are not destroyed somehow, so the wrapped objects are never GC'd.

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Nov 4, 2015

Owner

Actually the destructor is being called, I'm confused.

Owner

MayhemYDG commented Nov 4, 2015

Actually the destructor is being called, I'm confused.

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Nov 4, 2015

Owner

If I don't explicitly use gc() the wrapped objects won't get destructed, unless I set a low --max_old_space_size.

Owner

MayhemYDG commented Nov 4, 2015

If I don't explicitly use gc() the wrapped objects won't get destructed, unless I set a low --max_old_space_size.

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Nov 4, 2015

Owner

@nickdesaulniers
Could you try to profile 1.0.7?

Owner

MayhemYDG commented Nov 4, 2015

@nickdesaulniers
Could you try to profile 1.0.7?

@nickdesaulniers

This comment has been minimized.

Show comment
Hide comment
@nickdesaulniers

nickdesaulniers Nov 4, 2015

will do, another idea is we can use things like https://github.com/andreasgal/finalize.js or https://www.npmjs.com/package/weak to verify if things aren't getting GC'd. Tracking down why is more fun, but we could verify that that's the case.

will do, another idea is we can use things like https://github.com/andreasgal/finalize.js or https://www.npmjs.com/package/weak to verify if things aren't getting GC'd. Tracking down why is more fun, but we could verify that that's the case.

@nickdesaulniers

This comment has been minimized.

Show comment
Hide comment
@nickdesaulniers

nickdesaulniers Nov 4, 2015

iltorb 1.0.7 peak memory 3.4 GB, post wrk 675 MB.

wrk -c 100 -t 6 -d 30s -H 'Accept-Encoding: br' https://localhost:3000
# peak memory
/usr/bin/time -l node custom.js
# post wrk memory
top -pid `pgrep -n node`

iltorb 1.0.7 peak memory 3.4 GB, post wrk 675 MB.

wrk -c 100 -t 6 -d 30s -H 'Accept-Encoding: br' https://localhost:3000
# peak memory
/usr/bin/time -l node custom.js
# post wrk memory
top -pid `pgrep -n node`
@nickdesaulniers

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Nov 4, 2015

Owner

With compressStream({lgwin: 10}) you should get much lower memory usage.

Owner

MayhemYDG commented Nov 4, 2015

With compressStream({lgwin: 10}) you should get much lower memory usage.

@kkoopa

This comment has been minimized.

Show comment
Hide comment
@kkoopa

kkoopa Nov 4, 2015

@MayhemYDG I was wrong regarding the Persistent leaks. I had missed that they were declared as static, so nothing was wrong with them.

kkoopa commented Nov 4, 2015

@MayhemYDG I was wrong regarding the Persistent leaks. I had missed that they were declared as static, so nothing was wrong with them.

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Nov 4, 2015

Owner

@kkoopa
... so I just have to remove the constructor.Reset(); in destructors now.

Owner

MayhemYDG commented Nov 4, 2015

@kkoopa
... so I just have to remove the constructor.Reset(); in destructors now.

@kkoopa

This comment has been minimized.

Show comment
Hide comment
@kkoopa

kkoopa Nov 4, 2015

Yes, that should do it. Sorry for the confusion.

kkoopa commented Nov 4, 2015

Yes, that should do it. Sorry for the confusion.

@nickdesaulniers

This comment has been minimized.

Show comment
Hide comment
@nickdesaulniers

nickdesaulniers Nov 4, 2015

So here's something interesting. I've modified the server to use the weak module which let's you invoke a callback when an object is GC'd. I also manually trigger a GC after the wrk stress test.

var accepts = require('accepts');
var fs = require('fs');
var https = require('https');
var brotli = require('iltorb').compressStream;
var lzma = require('lzma-native').createStream.bind(null, 'aloneEncoder');
var gzip = require('zlib').createGzip;
var weak = require('weak');

var filename = 'lorem_ipsum.txt';
//var filename = 'reddit.com.10.01.15.html';

var outstanding = 0;
function onRequest (req, res) {
  res.setHeader('Content-Type', 'text/html');

  var encodings = new Set(accepts(req).encodings());

  if (encodings.has('br')) {
    res.setHeader('Content-Encoding', 'br');
    var br = brotli();
    ++outstanding;
    weak(br, function x () {
      --outstanding;
      console.log('a br was gc-d, outstanding: ', outstanding);
      x = null;
    });
    fs.createReadStream(filename).pipe(br).pipe(res);
  } else if (encodings.has('lzma')) {
    res.setHeader('Content-Encoding', 'lzma');
    fs.createReadStream(filename).pipe(lzma()).pipe(res);
  } else if (encodings.has('gzip')) {
    res.setHeader('Content-Encoding', 'gzip');
    fs.createReadStream(filename).pipe(gzip()).pipe(res);
  } else {
    fs.createReadStream(filename).pipe(res);
  }
};

var certs = {
  key: fs.readFileSync('./https-key.pem'),
  cert: fs.readFileSync('./https-cert.pem'),
};

https.createServer(certs, onRequest).listen(3000);

setTimeout(function () {
  console.log('gc!');
  gc();
}, 60000);

run with: node --expose-gc custom.js
stress tested with wrk -c 100 -t 6 -d 30s -H 'Accept-Encoding: br' https://localhost:3000
memory observed with:

top -pid `pgrep -n node`

stdout:

node --expose-gc custom.js
...
a br was gc-d, outstanding:  90
gc!
a br was gc-d, outstanding:  90
...
a br was gc-d, outstanding:  0

Where there's roughly a 30s pause before gc! is logged/triggered. During the pause, memory is stable at 2531 MB, after gc it drops to 106 MB.

The key takeaway is: these objects are not getting GC'd. They are not necessarily leaking, it's just that the GC is not running. The million dollar question is: why?

I supposed the GC probably has a heuristic along the lines of "well, you've been allocating GBs on the heap already, so it's not worth it for me to run a GC until the current RSS approaches previously seen levels."

So here's something interesting. I've modified the server to use the weak module which let's you invoke a callback when an object is GC'd. I also manually trigger a GC after the wrk stress test.

var accepts = require('accepts');
var fs = require('fs');
var https = require('https');
var brotli = require('iltorb').compressStream;
var lzma = require('lzma-native').createStream.bind(null, 'aloneEncoder');
var gzip = require('zlib').createGzip;
var weak = require('weak');

var filename = 'lorem_ipsum.txt';
//var filename = 'reddit.com.10.01.15.html';

var outstanding = 0;
function onRequest (req, res) {
  res.setHeader('Content-Type', 'text/html');

  var encodings = new Set(accepts(req).encodings());

  if (encodings.has('br')) {
    res.setHeader('Content-Encoding', 'br');
    var br = brotli();
    ++outstanding;
    weak(br, function x () {
      --outstanding;
      console.log('a br was gc-d, outstanding: ', outstanding);
      x = null;
    });
    fs.createReadStream(filename).pipe(br).pipe(res);
  } else if (encodings.has('lzma')) {
    res.setHeader('Content-Encoding', 'lzma');
    fs.createReadStream(filename).pipe(lzma()).pipe(res);
  } else if (encodings.has('gzip')) {
    res.setHeader('Content-Encoding', 'gzip');
    fs.createReadStream(filename).pipe(gzip()).pipe(res);
  } else {
    fs.createReadStream(filename).pipe(res);
  }
};

var certs = {
  key: fs.readFileSync('./https-key.pem'),
  cert: fs.readFileSync('./https-cert.pem'),
};

https.createServer(certs, onRequest).listen(3000);

setTimeout(function () {
  console.log('gc!');
  gc();
}, 60000);

run with: node --expose-gc custom.js
stress tested with wrk -c 100 -t 6 -d 30s -H 'Accept-Encoding: br' https://localhost:3000
memory observed with:

top -pid `pgrep -n node`

stdout:

node --expose-gc custom.js
...
a br was gc-d, outstanding:  90
gc!
a br was gc-d, outstanding:  90
...
a br was gc-d, outstanding:  0

Where there's roughly a 30s pause before gc! is logged/triggered. During the pause, memory is stable at 2531 MB, after gc it drops to 106 MB.

The key takeaway is: these objects are not getting GC'd. They are not necessarily leaking, it's just that the GC is not running. The million dollar question is: why?

I supposed the GC probably has a heuristic along the lines of "well, you've been allocating GBs on the heap already, so it's not worth it for me to run a GC until the current RSS approaches previously seen levels."

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Nov 4, 2015

Owner

The JS side of iltorb was never the problem, the v8 heap always remained low, like less than 10MB, while the RSS can reach infinity and beyond.

The memory remains stable because the GC is lazy.
nodejs/node#3370 (comment)
https://github.com/nodejs/node/wiki/Frequently-Asked-Questions

I still think there's a leak in the C++ somewhere, maybe even in brotli itself.
lgblock dictates how much data brotli needs before encoding. If the lgblock is small, brotli encodes more often but uses less memory, the opposite goes for a big lgblock.
If you feed multiple iltorb.compressStream with a large file and a lgblock of 24, while forcing gc(), you'll still have a few hundred megabytes in the RSS. With a small lgblock I get less than 100MB RSS at the end of my test.
https://github.com/google/brotli/blob/master/enc/encode.h#L81-L82
(Dunno why I previously confused lgwin with lgblock, but lgwin does control it somewhat anyway.)

Owner

MayhemYDG commented Nov 4, 2015

The JS side of iltorb was never the problem, the v8 heap always remained low, like less than 10MB, while the RSS can reach infinity and beyond.

The memory remains stable because the GC is lazy.
nodejs/node#3370 (comment)
https://github.com/nodejs/node/wiki/Frequently-Asked-Questions

I still think there's a leak in the C++ somewhere, maybe even in brotli itself.
lgblock dictates how much data brotli needs before encoding. If the lgblock is small, brotli encodes more often but uses less memory, the opposite goes for a big lgblock.
If you feed multiple iltorb.compressStream with a large file and a lgblock of 24, while forcing gc(), you'll still have a few hundred megabytes in the RSS. With a small lgblock I get less than 100MB RSS at the end of my test.
https://github.com/google/brotli/blob/master/enc/encode.h#L81-L82
(Dunno why I previously confused lgwin with lgblock, but lgwin does control it somewhat anyway.)

@simonlindholm

This comment has been minimized.

Show comment
Hide comment
@simonlindholm

simonlindholm Nov 7, 2015

I'd guess that v8 only counts memory it allocates by itself, and therefore doesn't know that it ought to GC. Have you looked into AdjustAmountOfExternalAllocatedMemory?

I'd guess that v8 only counts memory it allocates by itself, and therefore doesn't know that it ought to GC. Have you looked into AdjustAmountOfExternalAllocatedMemory?

@nickdesaulniers

This comment has been minimized.

Show comment
Hide comment
@nickdesaulniers

nickdesaulniers Nov 10, 2015

interesting looking at how libxmljs does it.
Also, lzma-stream, particularly this.

interesting looking at how libxmljs does it.
Also, lzma-stream, particularly this.

@addaleax

This comment has been minimized.

Show comment
Hide comment
@addaleax

addaleax Nov 11, 2015

Contributor

Yes, the possibility to provide custom memory allocation functions to a library is great and often very helpful, esp. for bindings and/or memory-intensive applications (like compression 😃). zlib-based interfaces (zlib, libbzip2, liblzma, probably many more) provide these, but unfortunately, the brotli codebase was not written with this feature in mind.

I contemplated writing a PR for their code that adds this in order to help with this issue, but it looks like a rather large job: They freely mix C and C++ allocation primitives and many changes to function signatures would be necessary, and I don’t have the time on my hands to code this anytime soon. Maybe it’s a good idea to create an issue there to make them aware of this?

Contributor

addaleax commented Nov 11, 2015

Yes, the possibility to provide custom memory allocation functions to a library is great and often very helpful, esp. for bindings and/or memory-intensive applications (like compression 😃). zlib-based interfaces (zlib, libbzip2, liblzma, probably many more) provide these, but unfortunately, the brotli codebase was not written with this feature in mind.

I contemplated writing a PR for their code that adds this in order to help with this issue, but it looks like a rather large job: They freely mix C and C++ allocation primitives and many changes to function signatures would be necessary, and I don’t have the time on my hands to code this anytime soon. Maybe it’s a good idea to create an issue there to make them aware of this?

@nickdesaulniers

This comment has been minimized.

Show comment
Hide comment
@nickdesaulniers

nickdesaulniers Nov 11, 2015

I think @addaleax is uniquely qualified to eloquently state requirements of what would need to be added to brotli to the developers.

I think @addaleax is uniquely qualified to eloquently state requirements of what would need to be added to brotli to the developers.

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment

@addaleax addaleax referenced this issue in google/brotli Nov 11, 2015

Closed

Custom memory allocation #263

@addaleax

This comment has been minimized.

Show comment
Hide comment
@addaleax

addaleax Nov 11, 2015

Contributor

Mh, having an overview of the memory used by brotli is not really an I/O thing. The above links to other bindings (libxmljs and my lzma-native) lead to code that works in the following way:

  • The underlying library has some support for supplying a custom malloc/realloc/free replacements.
  • The bindings write wrapper functions, e.g. a malloc replacement xmlMemMallocWrap which calls the “real” malloc and notes how many bytes were allocated, then passes this information to Nan::AdjustExternalMemory. The free replacement does everything the other way around.
  • These wrapper functions are then passed to the library on creation (see the call to xmlMemSetup for example). The library then uses these functions for all memory management.

Note that Nan::AdjustExternalMemory should be called only from the main thread (iirc, otherwise with current V8 versions it silently does nothing), so what I did in lzma-native is store the information in some counter (nonAdjustedExternalMemory) and pass it to Nan::AdjustExternalMemory when the work was done, which turned out to work quite well.

Contributor

addaleax commented Nov 11, 2015

Mh, having an overview of the memory used by brotli is not really an I/O thing. The above links to other bindings (libxmljs and my lzma-native) lead to code that works in the following way:

  • The underlying library has some support for supplying a custom malloc/realloc/free replacements.
  • The bindings write wrapper functions, e.g. a malloc replacement xmlMemMallocWrap which calls the “real” malloc and notes how many bytes were allocated, then passes this information to Nan::AdjustExternalMemory. The free replacement does everything the other way around.
  • These wrapper functions are then passed to the library on creation (see the call to xmlMemSetup for example). The library then uses these functions for all memory management.

Note that Nan::AdjustExternalMemory should be called only from the main thread (iirc, otherwise with current V8 versions it silently does nothing), so what I did in lzma-native is store the information in some counter (nonAdjustedExternalMemory) and pass it to Nan::AdjustExternalMemory when the work was done, which turned out to work quite well.

@addaleax

This comment has been minimized.

Show comment
Hide comment
@addaleax

addaleax Nov 12, 2015

Contributor

Btw, the paragraph from the Python docs which I copied to google/brotli#263 explains pretty well how this works from the garbage collector’s point of view, maybe that’s a bit clearer than what I wrote :)

Contributor

addaleax commented Nov 12, 2015

Btw, the paragraph from the Python docs which I copied to google/brotli#263 explains pretty well how this works from the garbage collector’s point of view, maybe that’s a bit clearer than what I wrote :)

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Nov 16, 2015

Owner

So if I understand this correctly, I cannot right now use Nan::AdjustExternalMemory in any meaningful way?

Owner

MayhemYDG commented Nov 16, 2015

So if I understand this correctly, I cannot right now use Nan::AdjustExternalMemory in any meaningful way?

@addaleax

This comment has been minimized.

Show comment
Hide comment
@addaleax

addaleax Nov 16, 2015

Contributor

I’m afraid so, yes; You can use it for everything allocated directly by this library, but I’m afraid that’s not much in comparison to brotli’s internal structures. I don’t exactly know what the brotli maintainers mean by “soon”, but it’s probably easiest to just wait until they have implemented custom memory allocation over there…

Contributor

addaleax commented Nov 16, 2015

I’m afraid so, yes; You can use it for everything allocated directly by this library, but I’m afraid that’s not much in comparison to brotli’s internal structures. I don’t exactly know what the brotli maintainers mean by “soon”, but it’s probably easiest to just wait until they have implemented custom memory allocation over there…

@simonlindholm

This comment has been minimized.

Show comment
Hide comment
@simonlindholm

simonlindholm Nov 16, 2015

Well, I mean, you could also approximate the memory usage by the size of the compressed data plus/times some empirically gathered constants. There's no need for an exact value since it's only used for GC heuristics. But the added reliability gained by hooking allocators would certainly be nice, of course.

Well, I mean, you could also approximate the memory usage by the size of the compressed data plus/times some empirically gathered constants. There's no need for an exact value since it's only used for GC heuristics. But the added reliability gained by hooking allocators would certainly be nice, of course.

@addaleax

This comment has been minimized.

Show comment
Hide comment
@addaleax

addaleax Nov 16, 2015

Contributor

@simonlindholm You’re absolutely right, that’s a possibility, and one can probably find some way to estimate the memory usage.

Contributor

addaleax commented Nov 16, 2015

@simonlindholm You’re absolutely right, that’s a possibility, and one can probably find some way to estimate the memory usage.

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Jan 31, 2016

Owner

I'm trying to update iltorb to use non-deprecated functions in brotli version 0.3.0, and make use of custom allocators somehow.

I started looking into BrotliDecompressBuffer at first, but it seems to require the output buffer to be already allocated? I can get the decoded size with BrotliDecompressedSize before decompressing but it fails for the test with the largest file: test/fixtures/large.

I then tried looking into custom allocators but this is beyond my knowledge of C.

Any pointers are welcome. Pull requests are welcome as well.

Owner

MayhemYDG commented Jan 31, 2016

I'm trying to update iltorb to use non-deprecated functions in brotli version 0.3.0, and make use of custom allocators somehow.

I started looking into BrotliDecompressBuffer at first, but it seems to require the output buffer to be already allocated? I can get the decoded size with BrotliDecompressedSize before decompressing but it fails for the test with the largest file: test/fixtures/large.

I then tried looking into custom allocators but this is beyond my knowledge of C.

Any pointers are welcome. Pull requests are welcome as well.

addaleax added a commit to addaleax/iltorb that referenced this issue Jun 16, 2016

[squash] encoder: switch to new brotli streaming API
This parallels the changes previously done in the decoder.

Fixes: MayhemYDG#3

addaleax added a commit to addaleax/iltorb that referenced this issue Jun 16, 2016

[squash] encoder: switch to new brotli streaming API
This parallels the changes previously done in the decoder.

Fixes: MayhemYDG#3

@addaleax addaleax referenced this issue Jun 16, 2016

Merged

Brotli v0.5 #11

addaleax added a commit to addaleax/iltorb that referenced this issue Jun 17, 2016

[squash] encoder: switch to new brotli streaming API
This parallels the changes previously done in the decoder.

Fixes: MayhemYDG#3

addaleax added a commit to addaleax/iltorb that referenced this issue Jun 17, 2016

[squash] encoder: switch to new brotli streaming API
This parallels the changes previously done in the decoder.

Fixes: MayhemYDG#3

addaleax added a commit to addaleax/iltorb that referenced this issue Jun 17, 2016

[squash] encoder: switch to new brotli streaming API
This parallels the changes previously done in the decoder.

Fixes: MayhemYDG#3

addaleax added a commit to addaleax/iltorb that referenced this issue Jun 17, 2016

[squash] encoder: switch to new brotli streaming API
This parallels the changes previously done in the decoder.

Fixes: MayhemYDG#3

@MayhemYDG MayhemYDG closed this in f6852bd Jun 17, 2016

@nickdesaulniers

This comment has been minimized.

Show comment
Hide comment
@nickdesaulniers

nickdesaulniers Jun 19, 2016

🍻 🍻 🍻 🎆

🍻 🍻 🍻 🎆

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Jun 19, 2016

Owner

@nickdesaulniers Have you tried benchmarking again? I wonder how much of an improvement the changes made.

Owner

MayhemYDG commented Jun 19, 2016

@nickdesaulniers Have you tried benchmarking again? I wonder how much of an improvement the changes made.

@nickdesaulniers

This comment has been minimized.

Show comment
Hide comment
@nickdesaulniers

nickdesaulniers Jun 19, 2016

I no longer poses the machine I originally reported. I can follow along from my post, but I didn't list the version of Node I used, so the old numbers are worthless. My current machine is relatively underpowered.

iltorb iltorb v 1.0.12:
193MB peak RSS
post wrk mem: 168MB
352.21 req/s
266.29ms avg latency + 19.24ms stddev
444.52ms max latency

If I downgrade iltorb to v1.0.7 as in this comment:
1787MB peak RSS
post wrk mem: 1119MB
31.44 req/s
660.17ms avg latency + 206.29ms
1.37s max latency

So way less memory usage, 10x req/s, half the latency, 1/10th the stddev for latency, and 1/3rd worst case latency. Nice!

nickdesaulniers commented Jun 19, 2016

I no longer poses the machine I originally reported. I can follow along from my post, but I didn't list the version of Node I used, so the old numbers are worthless. My current machine is relatively underpowered.

iltorb iltorb v 1.0.12:
193MB peak RSS
post wrk mem: 168MB
352.21 req/s
266.29ms avg latency + 19.24ms stddev
444.52ms max latency

If I downgrade iltorb to v1.0.7 as in this comment:
1787MB peak RSS
post wrk mem: 1119MB
31.44 req/s
660.17ms avg latency + 206.29ms
1.37s max latency

So way less memory usage, 10x req/s, half the latency, 1/10th the stddev for latency, and 1/3rd worst case latency. Nice!

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Jun 19, 2016

Owner

Absolutely terrific! 🎉

Owner

MayhemYDG commented Jun 19, 2016

Absolutely terrific! 🎉

@addaleax

This comment has been minimized.

Show comment
Hide comment
@addaleax

addaleax Jun 19, 2016

Contributor

Huh, didn’t expect the impact to be this big. But yep, awesome to hear it helps!

Contributor

addaleax commented Jun 19, 2016

Huh, didn’t expect the impact to be this big. But yep, awesome to hear it helps!

@MayhemYDG

This comment has been minimized.

Show comment
Hide comment
@MayhemYDG

MayhemYDG Jun 19, 2016

Owner

I assume part of it is due to improvements in brotli itself.

Owner

MayhemYDG commented Jun 19, 2016

I assume part of it is due to improvements in brotli itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment