Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eachAsync with async function causes memory leak #5949

Closed
weeco opened this issue Dec 31, 2017 · 15 comments
Closed

eachAsync with async function causes memory leak #5949

weeco opened this issue Dec 31, 2017 · 15 comments

Comments

@weeco
Copy link

weeco commented Dec 31, 2017

Do you want to request a feature or report a bug?
Bug

What is the current behavior?
When iterating on a query cursor with eachAsync and an async function as callback the memory quickly grows until the node app crashes.

If the current behavior is a bug, please provide the steps to reproduce.
No memory leak:

const playerCursor: QueryCursor<IPlayerProfileModel> = PlayerProfile.find({}, projection).lean().cursor();
    await playerCursor.eachAsync(
      (profile: IPlayerProfileModel) => {
        return;
      },
      { parallel: 50 }
    );

Memory leak:

const playerCursor: QueryCursor<IPlayerProfileModel> = PlayerProfile.find({}, projection).lean().cursor();
    await playerCursor.eachAsync(
      async (profile: IPlayerProfileModel) => {
        return;
      },
      { parallel: 50 }
    );

What is the expected behavior?

Iteration on cursor, with max 50 async functions running concurrently without having a memory leak. The same issue appears with a concurrency of 1 (the memory just grows way slower then).

Please mention your node.js, mongoose and MongoDB version.
Node 8.2.1, Mongoose 4.13.5 MongoDb 3.4

@vkarpov15 vkarpov15 added this to the 4.13.9 milestone Dec 31, 2017
@vkarpov15
Copy link
Collaborator

Definitely shouldn't leak memory, will investigate ASAP 👍

@vkarpov15
Copy link
Collaborator

So the below script run against a mongodb collection with 1M documents doesn't seem to leak memory, heapUsed fluctuates up and down as expected:

const mongoose = require('mongoose');

const ObjectId = mongoose.Schema.Types.ObjectId;
const Schema = mongoose.Schema;

run().catch(error => console.error(error.stack));

async function run() {
  await mongoose.connect('mongodb://localhost/test', { useMongoClient: true });
  
  const schema = new mongoose.Schema({
    x: Number
  });

  const M = mongoose.model('Test', schema, 'Test');

  setInterval(() => console.log(process.memoryUsage()), 250);
  
  await M.find().lean().cursor().eachAsync(() => { return; }, { parallel: 50 });

  process.exit(0);
}

However, there is a memory leak if I remove the attempt to connect:

const mongoose = require('mongoose');

const ObjectId = mongoose.Schema.Types.ObjectId;
const Schema = mongoose.Schema;

run().catch(error => console.error(error.stack));

async function run() {
  // If you don't connect to mongodb, `heapUsed` will grow monotonically
  //await mongoose.connect('mongodb://localhost/test', { useMongoClient: true });
  
  const schema = new mongoose.Schema({
    x: Number
  });

  const M = mongoose.model('Test', schema, 'Test');

  setInterval(() => console.log(process.memoryUsage()), 250);
  
  await M.find().lean().cursor().eachAsync(() => { return; }, { parallel: 50 });

  process.exit(0);
}

Are you using eachAsync() when you're not connected to mongodb?

@weeco
Copy link
Author

weeco commented Jan 1, 2018

Take a look at the difference of my posted snippets above. I agree that this does not lead to a memory leak, but once you change this line:

await M.find().lean().cursor().eachAsync(() => { return; }, { parallel: 50 });

to

await M.find().lean().cursor().eachAsync(async () => { return; }, { parallel: 50 });

it should leak memory. Take note of the async callback

@vkarpov15
Copy link
Collaborator

Confirmed, adding async makes heapUsed grow monotonically. Good catch, thanks for your patience. Will investigate more later today 👍

@vkarpov15 vkarpov15 added the confirmed-bug We've confirmed this is a bug in Mongoose and will fix it. label Jan 2, 2018
@vkarpov15
Copy link
Collaborator

Upon closer inspection I was wrong with the above, heapUsed only grew when I ran it yesterday because I still had the connection logic commented out. If I'm successfully connected, the below script (with 1M docs) gives me the below heapUsed graph:

const mongoose = require('mongoose');

const ObjectId = mongoose.Schema.Types.ObjectId;
const Schema = mongoose.Schema;

run().catch(error => console.error(error.stack));

async function run() {
  await mongoose.connect('mongodb://localhost/test', { useMongoClient: true });

  const schema = new mongoose.Schema({
    x: Number
  });

  const M = mongoose.model('Test', schema, 'Test');

  setInterval(() => console.log(process.memoryUsage().heapUsed), 250);

  await M.find().lean().cursor().eachAsync(async () => { return; }, { parallel: 50 });

  process.exit(0);
}

image

Can you try modifying the above script to get a growing heapUsed?

@vkarpov15 vkarpov15 added can't reproduce Mongoose devs have been unable to reproduce this issue. Close after 14 days of inactivity. and removed confirmed-bug We've confirmed this is a bug in Mongoose and will fix it. labels Jan 3, 2018
@vkarpov15 vkarpov15 removed this from the 4.13.9 milestone Jan 3, 2018
@weeco
Copy link
Author

weeco commented Jan 3, 2018

@vkarpov15 Apparently you have to add a filter (empty object) and a projection to reproduce this. I connected it to my own model instead of your test model and I was able to reproduce the behaviour was the latest npm package (5.0.0-rc1).

So what I did is:

  const projection = { tag: 1, clan: 1 };
  await M.find({}, projection).lean().cursor().eachAsync(async (profile) => { return; }, { parallel: 50 });

I also noticed that the printed heapUsed numbers doesn't grow that much, but the node process eventually still leaks memory. I am not familiar with tracking down memory leaks, sp I hope you can enlighten me what that means / what the actual issue is.

@vkarpov15
Copy link
Collaborator

How many documents do you have and what does your schema look like? Also, what is the actual error you get and how are you determining that the node process is leaking memory? I usually just use process.memoryUsage() to check whether we're leaking memory or heapdump if I'm really stuck.

@weeco
Copy link
Author

weeco commented Jan 4, 2018

On my dev environment I have ~4m documents, in production roughly 500m. I can send you the actual schmea but does that matter since I use projection anyways? The projected properties have the following structure:

tag: string;
clan: {
    tag: string;
    name: string;
    badgeId: number;
}

I look at windows process manager where the node process grows in memory. Right now I am doing nothing but return; inside of the async callback with your given snippet and still the process consumes 1-2gb ram (even though it drops down to 400-500mb every 30-60s, probably the GC?)

When I print the process.memoryUsage() every 250ms I get something like this:
https://i.imgur.com/3GUcr8K.png

This is how it usually crashes (node process crashes at 4gb memory usage - which takes just a few second until it grew up to 4gb):
https://i.imgur.com/ufY9xay.png

Did you test if you experience similiar memory usage if you add a filter + projection to your given script?

Also: When I run the exact same callback function without a prefixed async I am able to iterate ~40x faster. I understand that promises are heavy, but I didn't expect a 40x difference?

@vkarpov15
Copy link
Collaborator

Yeah that's strange, rss, the resident set size, is growing monotonically. Same thing on my machine, that's strange that the heap is growing and shrinking but the resident set size isn't. I need to investigate this more.

Re: async, I'm not certain about the perf implications but at the very least, without async we go on to the next doc immediately, whereas with async mongoose waits until at least the next tick of the event loop. I don't have a good answer for why its that much slower but that fact may have something to do with it.

@vkarpov15 vkarpov15 added performance help wanted and removed can't reproduce Mongoose devs have been unable to reproduce this issue. Close after 14 days of inactivity. labels Feb 9, 2018
@vkarpov15 vkarpov15 added this to the 5.x Unprioritized milestone Feb 9, 2018
@simllll
Copy link
Contributor

simllll commented Sep 3, 2018

Just ran into this issue today, refactored a "eachAsync" call and added "async" to it. And suddenly I saw out of memory exceptions on this thread. Very strange indeed.
Does anyone have any new insights on this? Didn't experience anything similar with other async fcts.

@simllll
Copy link
Contributor

simllll commented Sep 5, 2018

TL;DR: This wasn't actually the real cause of this problem.

I'm using elastic APM with custom transactinos for my background processes. Unfortuantely the default setting in APM is that it collects unlimited "spans" (a span contains information about a specific code path, it's instrumented automatically for a lot of things). Anyway, this was causing the memory problem. Solved this by setting a transactionMaxSpans.

The second thing (async is somehow way slower than non-async) is still valid, but I guess this is related to node itself. Every kind of loop, where the function is defined with async is way slower than without. Probably this has something to do with the next tick that @vkarpov15 was mentioning. For node 8 this is at least something you should care of (not tested with node 10, could be "solved" there): Do not use eachAsync() or any other function that gets called for any amount of entries with "async" prefixed functions.

Regards
Simon

@vkarpov15
Copy link
Collaborator

@simllll glad you figured out the issue. Async functions are necessarily slower because every time you call an async function you create a new promise. That should get faster as v8 continues to optimize native promises. In order to minimize this overhead, you can do things like make sure you don't create unnecessary promises in your async functions, like doing return Promise.resolve(foo) in an async function.

@danielspoke
Copy link

Is this still an issue?

@vkarpov15
Copy link
Collaborator

@danielspoke it may be, we haven't been able to reproduce this issue yet. Are you experiencing this issue?

@vkarpov15 vkarpov15 modified the milestones: 6.x Unprioritized, 6.3.6 May 7, 2022
@vkarpov15
Copy link
Collaborator

We found and fixed a memory leak in eachAsync() with populate() in #11641 . Given that we haven't been able to repro this particular issue, we'll chalk this up to #11641 and close this. If you're running into a similar problem, please open a new issue and follow the issue template.

@Automattic Automattic locked and limited conversation to collaborators Jun 4, 2022
@vkarpov15 vkarpov15 removed this from the 6.3.6 milestone Jun 4, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants