Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak(?) when working with Workers #18414

Open
tmcw opened this issue Mar 24, 2023 · 13 comments
Open

Memory leak(?) when working with Workers #18414

tmcw opened this issue Mar 24, 2023 · 13 comments
Labels
needs investigation requires further investigation before determining if it is an issue or not

Comments

@tmcw
Copy link

tmcw commented Mar 24, 2023

Maybe this is similar to #18369 and should be folded in? I know that memory leak-like issues are pretty hard to really diagnose.

Example:

index.ts

const href = new URL("./worker.ts", import.meta.url).href

const startingMemory = Deno.memoryUsage().rss;

for (let i = 0; i < 10000; i++) {
  const w = new Worker(href, {
    type: "module",
  });

  w.terminate();

  if (i % 1000 === 0) {
    const newMemory = Deno.memoryUsage().rss;
    console.log({
      newMemory,
      memoryPerWorker: (newMemory - startingMemory) / i
    })
  }
}

worker.ts

self.close()

I'd expect, at some point for GC to kick in and collect old worker instances, but it appears that this code just has ever-increasing memory requirements, with memory-per-worker decreasing but not that much:

➜  worker-leak deno run -A index.ts
{ newMemory: 48693248, memoryPerWorker: Infinity }
{ newMemory: 98631680, memoryPerWorker: 60162.048 }
{ newMemory: 125681664, memoryPerWorker: 43606.016 }
{ newMemory: 152059904, memoryPerWorker: 37863.424 }
{ newMemory: 178847744, memoryPerWorker: 35094.528 }
{ newMemory: 205078528, memoryPerWorker: 33321.7792 }
{ newMemory: 230440960, memoryPerWorker: 31995.221333333335 }
{ newMemory: 256196608, memoryPerWorker: 31103.853714285713 }
{ newMemory: 282492928, memoryPerWorker: 30502.912 }
{ newMemory: 307576832, memoryPerWorker: 29900.8 }

This is in

deno --version
deno 1.31.3 (release, aarch64-apple-darwin)
v8 11.0.226.19
typescript 4.9.4

Results with Deno 1.32.1 are the same:

deno run -A index.ts
{ newMemory: 43794432, memoryPerWorker: Infinity }
{ newMemory: 95682560, memoryPerWorker: 61521.92 }
{ newMemory: 127369216, memoryPerWorker: 46604.288 }
{ newMemory: 158236672, memoryPerWorker: 41358.67733333333 }
{ newMemory: 188874752, memoryPerWorker: 38678.528 }
{ newMemory: 219594752, memoryPerWorker: 37086.8224 }
{ newMemory: 250249216, memoryPerWorker: 36014.76266666667 }
{ newMemory: 280625152, memoryPerWorker: 35209.216 }
{ newMemory: 310968320, memoryPerWorker: 34600.96 }
{ newMemory: 340738048, memoryPerWorker: 34064.156444444445 }

Trying other versions, it does look like there's a jump in memory-per worker at some point - 1.29.0 has much lower numbers:

deno run -A index.ts
{ newMemory: 3948544, memoryPerWorker: NaN }
{ newMemory: 5685248, memoryPerWorker: 1736.704 }
{ newMemory: 6455296, memoryPerWorker: 1253.376 }
{ newMemory: 8290304, memoryPerWorker: 1447.2533333333333 }
{ newMemory: 9617408, memoryPerWorker: 1417.216 }
{ newMemory: 11173888, memoryPerWorker: 1445.0688 }
{ newMemory: 12238848, memoryPerWorker: 1381.7173333333333 }
{ newMemory: 13778944, memoryPerWorker: 1404.3428571428572 }
{ newMemory: 14303232, memoryPerWorker: 1294.336 }
{ newMemory: 16138240, memoryPerWorker: 1354.4106666666667 }
@bartlomieju
Copy link
Member

Possibly related to #18369, but needs a proper investigation since it involves web workers.

@bartlomieju bartlomieju added the needs investigation requires further investigation before determining if it is an issue or not label Mar 28, 2023
@tmcw
Copy link
Author

tmcw commented Apr 29, 2023

We've updated to 1.33.0, which seems like a great improvement for Node APIs and other perf concerns 🚀, but, unfortunately, memory growth when booting up and terminating workers seems the same.

@svix-onelson
Copy link

@tmcw I think the proposed fix for #18369 landed in 1.33.1. Do you see the same memory profile on that version?

@tmcw
Copy link
Author

tmcw commented May 5, 2023

Unfortunately, no - still seeing the same memory creep in production and in this isolated test case:

➜  deno-worker ls
index.ts  worker.ts
➜  deno-worker deno run -A index.ts
{ newMemory: 50429952, memoryPerWorker: Infinity }
{ newMemory: 179699712, memoryPerWorker: 141000.704 }
{ newMemory: 222576640, memoryPerWorker: 91938.816 }
{ newMemory: 262307840, memoryPerWorker: 74536.27733333333 }
{ newMemory: 302809088, memoryPerWorker: 66027.52 }
{ newMemory: 347553792, memoryPerWorker: 61770.9568 }
{ newMemory: 383778816, memoryPerWorker: 57513.30133333334 }
{ newMemory: 418480128, memoryPerWorker: 54254.44571428571 }
{ newMemory: 457146368, memoryPerWorker: 52305.92 }
{ newMemory: 494387200, memoryPerWorker: 50632.02133333333 }
➜  deno-worker deno --version
deno 1.33.2 (release, aarch64-apple-darwin)
v8 11.4.183.1
typescript 5.0.3

@onelson
Copy link

onelson commented May 6, 2023

We depend on deno_core directly, rather than deno. We are also seeing linear growth with the deno_core 0.185.0 which was yesterday's release iirc.

I don't think we're using workers (but not certain). Still not sure what it might be related to. Useful to hear your results are consistent.

@bartlomieju
Copy link
Member

@onelson could you provide a reproduction for your case? If it's limited to deno_core it might be easier to debug.

@onelson
Copy link

onelson commented May 6, 2023

I'll dig into this and see if I can reduce it down 👍

@svix-onelson
Copy link

svix-onelson commented May 11, 2023

So, I had cause to work on an entirely different project which called for some deno that appears to leak with deno_core@0.185.0. I was able to profile it a bit with bytehound, but I won't get a moment to simplify it for a nice repro for a while.

In this case we're using MainWorker from deno_runtime@0.111.0.

Here's the RSS:

Graph of resident memory over time

Allocations over time for the biggest leaking backtrace:
Graph of allocations over time

I ran a steady load through this process for around 37 minutes. It does seem like the growth could actually be bounded and started to taper off towards the end of the run. I might repeat this overnight at some point just to see if it does plateau.

The backtrace related to this particular graph:

Total: 14494
Leaked: 14494
Backtrace:
#00 [libc.so.6] 7ff3f33cb9ff
#01 [libc.so.6] 7ff3f3339b42
#02 [svix-webhook-bridge] std::sys::unix::thread::Thread::new::thread_start [thread.rs:108]
#03 [svix-webhook-bridge] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once [boxed.rs:1987]
#04 [svix-webhook-bridge] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once [boxed.rs:1987]
#05 [svix-webhook-bridge] core::ops::function::FnOnce::call_once{{vtable.shim}}
#07 [svix-webhook-bridge] <F as threadpool::FnBox>::call_box
#08 [svix-webhook-bridge] deno_runtime::worker::MainWorker::bootstrap_from_options
#09 [svix-webhook-bridge] deno_runtime::worker::MainWorker::from_options
#10 [svix-webhook-bridge] deno_core::runtime::JsRuntime::new
#11 [svix-webhook-bridge] deno_core::bindings::external_references
#12 [libbytehound.so] malloc [api.rs:294]

pointing back to https://github.com/denoland/deno/blob/v1.33.2/core/bindings.rs#L22

The other "big" items in the profile are in the details 👇 👀:

Total: 14494
Leaked: 14494
Backtrace:
#00 [libc.so.6] 7ff3f33cb9ff
#01 [libc.so.6] 7ff3f3339b42
#02 [svix-webhook-bridge] std::sys::unix::thread::Thread::new::thread_start [thread.rs:108]
#03 [svix-webhook-bridge] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once [boxed.rs:1987]
#04 [svix-webhook-bridge] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once [boxed.rs:1987]
#05 [svix-webhook-bridge] core::ops::function::FnOnce::call_once{{vtable.shim}}
#07 [svix-webhook-bridge] <F as threadpool::FnBox>::call_box
#08 [svix-webhook-bridge] deno_runtime::worker::MainWorker::bootstrap_from_options
#09 [svix-webhook-bridge] deno_runtime::worker::MainWorker::from_options
#10 [svix-webhook-bridge] deno_core::runtime::JsRuntime::new
#11 [svix-webhook-bridge] deno_core::bindings::external_references
#12 [svix-webhook-bridge] <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
#13 [libbytehound.so] malloc [api.rs:294]

Total: 1376930
Leaked: 1376930
Backtrace:
#00 [libc.so.6] 7ff3f33cb9ff
#01 [libc.so.6] 7ff3f3339b42
#02 [svix-webhook-bridge] std::sys::unix::thread::Thread::new::thread_start [thread.rs:108]
#03 [svix-webhook-bridge] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once [boxed.rs:1987]
#04 [svix-webhook-bridge] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once [boxed.rs:1987]
#05 [svix-webhook-bridge] core::ops::function::FnOnce::call_once{{vtable.shim}}
#07 [svix-webhook-bridge] <F as threadpool::FnBox>::call_box
#08 [svix-webhook-bridge] deno_runtime::worker::MainWorker::bootstrap_from_options
#09 [svix-webhook-bridge] deno_runtime::worker::MainWorker::from_options
#10 [svix-webhook-bridge] deno_core::runtime::JsRuntime::new
#11 [svix-webhook-bridge] alloc::vec::in_place_collect::<impl alloc::vec::spec_from_iter::SpecFromIter<T,I> for alloc::vec::Vec<T>>::from_iter
#12 [svix-webhook-bridge] <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
#13 [svix-webhook-bridge] deno_core::ops::OpCtx::new
#14 [svix-webhook-bridge] v8__CFunctionInfo__New [binding.cc:1926]
#15 [svix-webhook-bridge] operator new(unsigned long) [new.cpp:67]
#16 [libbytehound.so] malloc [api.rs:294]

Total: 1376930
Leaked: 1376930
Backtrace:
#00 [libc.so.6] 7ff3f33cb9ff
#01 [libc.so.6] 7ff3f3339b42
#02 [svix-webhook-bridge] std::sys::unix::thread::Thread::new::thread_start [thread.rs:108]
#03 [svix-webhook-bridge] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once [boxed.rs:1987]
#04 [svix-webhook-bridge] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once [boxed.rs:1987]
#05 [svix-webhook-bridge] core::ops::function::FnOnce::call_once{{vtable.shim}}
#07 [svix-webhook-bridge] <F as threadpool::FnBox>::call_box
#08 [svix-webhook-bridge] deno_runtime::worker::MainWorker::bootstrap_from_options
#09 [svix-webhook-bridge] deno_runtime::worker::MainWorker::from_options
#10 [svix-webhook-bridge] deno_core::runtime::JsRuntime::new
#11 [svix-webhook-bridge] alloc::vec::in_place_collect::<impl alloc::vec::spec_from_iter::SpecFromIter<T,I> for alloc::vec::Vec<T>>::from_iter
#12 [svix-webhook-bridge] <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
#13 [svix-webhook-bridge] deno_core::ops::OpCtx::new
#14 [svix-webhook-bridge] v8::fast_api::CTypeInfo::new_from_slice
#15 [svix-webhook-bridge] v8__CTypeInfo__New__From__Slice [binding.cc:1916]
#16 [libbytehound.so] malloc [api.rs:294]

It wouldn't surprise me if we're doing something to promote this growth just from our usage. I do plan to downgrade the version of deno_core a bit to see if the profile changes or if this new code is just behaving poorly because it's bad 😅.

In the earlier case, an entirely separate project, the leak only showed up when we upgraded from an older version.

@svix-onelson
Copy link

svix-onelson commented May 12, 2023

I rewrote my stuff in terms of deno_core only. Mostly this is cobbled together from various examples. This is still lacking as a nice repro, but it's close. I'll have more time to work this soon 😓

fn run_script_inner(input: &serde_json::Value, script: String) -> Result<JsReturn> {
    let mut runtime = JsRuntime::new(Default::default());
    let res = runtime.execute_script(
        "<anon>",
        format!("{script}\nhandler({})", serde_json::to_string(input)?).into(),
    );
    match res {
        Ok(global) => {
            let scope = &mut runtime.handle_scope();
            let local = v8::Local::new(scope, global);
            match serde_v8::from_v8::<JsObject>(scope, local) {
                Ok(v) => Ok(JsReturn::Object(v)),
                Err(serde_v8::Error::ExpectedObject(_)) => Ok(JsReturn::Invalid),
                Err(e) => Err(e)?,
            }
        }
        Err(err) => Err(anyhow::format_err!("Evaling error: {:?}", err)),
    }
}

While running this in a threadpool, once per request, in my service (where script contains a function definition named handler and I'm mashing some value in there, crudely, as an argument to an invocation of it), I see steady growth in most recent deno versions.

Actually, I spot-checked the highest patch level for each minor deno tag, and the earliest I saw that didn't leak was 1.23.3 (aka deno_core@0.142.0). Seems like what I'm seeing has been around for maybe 10 months.

Honorable mention to the minor version to follow, 1.24.3 (aka deno_core@0.147.0), which leaked differently (not quite linear) with the biggest leaking backtrace pointing to something in the v8 garbage collector code.

After 1.24.3, the leaks look pretty consistent.

Perhaps the bisection is less helpful. I just found it interesting.

Here's the top couple of leaky backtraces from deno_core@0.185.0. Looks similar to my last via deno_runtime.

Total: 8814
Leaked: 8814
Backtrace:
#00 [libc.so.6] 7fc5bc7f99ff
#01 [libc.so.6] 7fc5bc767b42
#02 [svix-webhook-bridge] std::sys::unix::thread::Thread::new::thread_start [thread.rs:108]
#03 [svix-webhook-bridge] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once [boxed.rs:1987]
#04 [svix-webhook-bridge] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once [boxed.rs:1987]
#05 [svix-webhook-bridge] core::ops::function::FnOnce::call_once{{vtable.shim}}
#07 [svix-webhook-bridge] <F as threadpool::FnBox>::call_box
#08 [svix-webhook-bridge] deno_core::runtime::JsRuntime::new
#09 [svix-webhook-bridge] deno_core::bindings::external_references
#10 [libbytehound.so] malloc [api.rs:294]
Total: 8814
Leaked: 8814
Backtrace:
#00 [libc.so.6] 7fc5bc7f99ff
#01 [libc.so.6] 7fc5bc767b42
#02 [svix-webhook-bridge] std::sys::unix::thread::Thread::new::thread_start [thread.rs:108]
#03 [svix-webhook-bridge] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once [boxed.rs:1987]
#04 [svix-webhook-bridge] <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once [boxed.rs:1987]
#05 [svix-webhook-bridge] core::ops::function::FnOnce::call_once{{vtable.shim}}
#07 [svix-webhook-bridge] <F as threadpool::FnBox>::call_box
#08 [svix-webhook-bridge] deno_core::runtime::JsRuntime::new
#09 [svix-webhook-bridge] deno_core::bindings::external_references
#10 [svix-webhook-bridge] <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
#11 [libbytehound.so] malloc [api.rs:294]

In both cases, the leaks seem to be coming from deno_core::bindings::external_references.

After these two, the next few are very low allocation counts and stemming from things like tracing subscribers and so on, which always seem to leak a little bit, as a treat.

My expectation is I should be able to create a new JsRuntime, use it to run some script, drop it and not see a leak but that's not what I'm seeing.

@svix-onelson
Copy link

svix-onelson commented May 18, 2023

Here's a modest repro for what I've been seeing. Still unsure if @tmcw and I are seeing the same leak.

Using the default iteration count of 10,000 I see this program run in just over 1 minute and steadily grow the RSS in that time.

[package]
name = "deno-repro"
version = "0.1.0"
edition = "2021"

[dependencies]
deno_core = "0.186.0"
use deno_core::{serde_v8, v8, JsRuntime};

fn run_script() -> i64 {
    let mut runtime = JsRuntime::new(Default::default());
    let global = runtime.execute_script_static("<anon>", "1 + 1").unwrap();
    let scope = &mut runtime.handle_scope();
    let local = v8::Local::new(scope, global);
    let out = serde_v8::from_v8::<i64>(scope, local).unwrap();
    assert_eq!(out, 2);
    out
}

fn main() {
    let iterations: usize = std::env::args()
        .nth(1)
        .as_deref()
        .unwrap_or("10000")
        .parse()
        .unwrap();
    for _i in 0..iterations {
        let _x = run_script();
    }
}

The cold start memory was around 26Mb, and grew to over 68Mb.

graph of resident memory growing from mid-20mb to mid-60mb over the span of a minute

Per bytehound, the top leaking points in the code appear to be:

Total: 10000
Leaked: 10000
Backtrace:
#00 [deno-repro] _start
#01 [libc.so.6] __libc_start_main
#02 [libc.so.6] 7ff4e4374d8f
#03 [deno-repro] main
#15 [deno-repro] deno_repro::main
#16 [deno-repro] deno_core::runtime::JsRuntime::new
#17 [deno-repro] deno_core::bindings::external_references
#18 [libbytehound.so] malloc [api.rs:294]
Total: 10000
Leaked: 10000
Backtrace:
#00 [deno-repro] _start
#01 [libc.so.6] __libc_start_main
#02 [libc.so.6] 7ff4e4374d8f
#03 [deno-repro] main
#15 [deno-repro] deno_repro::main
#16 [deno-repro] deno_core::runtime::JsRuntime::new
#17 [deno-repro] deno_core::bindings::external_references
#18 [deno-repro] <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
#19 [libbytehound.so] malloc [api.rs:294]
Total: 60000
Leaked: 60000
Backtrace:
#00 [deno-repro] _start
#01 [libc.so.6] __libc_start_main
#02 [libc.so.6] 7ff4e4374d8f
#03 [deno-repro] main
#15 [deno-repro] deno_repro::main
#16 [deno-repro] deno_core::runtime::JsRuntime::new
#17 [deno-repro] alloc::vec::in_place_collect::<impl alloc::vec::spec_from_iter::SpecFromIter<T,I> for alloc::vec::Vec<T>>::from_iter
#18 [deno-repro] <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
#19 [deno-repro] deno_core::ops::OpCtx::new
#20 [deno-repro] v8__CFunctionInfo__New [binding.cc:1964]
#21 [deno-repro] operator new(unsigned long) [new.cpp:67]
#22 [libbytehound.so] malloc [api.rs:294]
Total: 60000
Leaked: 60000
Backtrace:
#00 [deno-repro] _start
#01 [libc.so.6] __libc_start_main
#02 [libc.so.6] 7ff4e4374d8f
#03 [deno-repro] main
#15 [deno-repro] deno_repro::main
#16 [deno-repro] deno_core::runtime::JsRuntime::new
#17 [deno-repro] alloc::vec::in_place_collect::<impl alloc::vec::spec_from_iter::SpecFromIter<T,I> for alloc::vec::Vec<T>>::from_iter
#18 [deno-repro] <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
#19 [deno-repro] deno_core::ops::OpCtx::new
#20 [deno-repro] v8::fast_api::CTypeInfo::new_from_slice
#21 [deno-repro] v8__CTypeInfo__New__From__Slice [binding.cc:1954]
#22 [libbytehound.so] malloc [api.rs:294]

@svix-onelson
Copy link

Tried to run deno using @tmcw's repro through bytehound, but it segfaults. Might need to try a different tool like DHAT or something. It would be nice to be able to prove or disprove these are the same leak 😩.

@sthamman2024
Copy link

Here is a minimalistic repro. You can notice that the memory usage grow up to 200+ MB

[dependencies]
deno_core = "0.195.0"
use deno_core::JsRuntime;
use deno_core::RuntimeOptions;

fn run_js() {
    JsRuntime::new(RuntimeOptions::default());
}

fn main() {
    for i in 0..50000 {
        run_js();
        if i % 1000 == 0 {
            println!("{}", i);
        }
    }
}

@tmcw
Copy link
Author

tmcw commented Aug 8, 2023

CleanShot 2023-08-08 at 15 26 09@2x

Our approach so far is to restart intermittently because creating & destroying workers doesn't seem to ever free memory - we go from 450mb to 10gb every hour.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs investigation requires further investigation before determining if it is an issue or not
Projects
None yet
Development

No branches or pull requests

5 participants