Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

queue.jobs memory leak #54

Closed
jbielick opened this issue Dec 1, 2016 · 6 comments
Closed

queue.jobs memory leak #54

jbielick opened this issue Dec 1, 2016 · 6 comments
Milestone

Comments

@jbielick
Copy link

jbielick commented Dec 1, 2016

I believe this line leaks memory.

I'm using this in an infinitely long-running application that continuously pushes jobs to redis to be processed by other threads. The app tails a MySQL replication log, so it's never-ending.

I noticed a memory leak because my node process was consuming as much memory as redis itself (where all the queued messages are stored). Redis and the node process's memory usage were essentially 1:1.

I noticed this line which stores the jobs on the queue object indexed by the job id. Over time, would this not leak memory on the queue object? These are never garbage collected. Over a period of 20 minutes or so, this can increase to 400mb.

The memory leak appeared to come from this library for two reasons:
1.) when I commented out the queue.createJob, the processes only used ~30mb of memory continuously.
2.) Using memwatch and heapdump, I exported a heap snapshot after a memory leak was detected (mem usages increased consistently after 5 garbage collection events) and upon viewing the results, queue.jobs accounted for 92% of the heap usage

screen shot 2016-11-30 at 9 22 17 pm

I'm not super familiar with this lib. Is this line crucial or can it be removed? I'll probably just clear this array manually as I'm not using progress, events or callbacks. Just need to fire and forget.

@LewisJEllis
Copy link
Member

LewisJEllis commented Dec 1, 2016

Heh, I have admittedly neglected this project for a while, but I noticed this issue in my feed and as soon as I saw the title I knew the line you'd be linking to. I remember when making this thinking "better be careful not to leak memory here," but I also recall having convinced myself it would only happen in an unusual scenario.

We do drop that reference from the jobs table here upon receiving confirmation of the job's success/failure, and I assumed any long-running process that might enqueue lots of jobs would also be receiving those success/failure confirmations. I suppose now that that's not necessarily the case, but it was in all my own use cases that I envisioned. Can you give more details on whether these jobs are being completed? Are they waiting in the queue for a while, or are they a bit slow to run, or is the job-producing process otherwise not going to receive a success/failure confirmation in a timely manner?

I think regardless, it should at least be possible to add a config option to not keep that reference around for the fire-and-forget use case (where we won't be concerned that none of the lifecycle events are emitted from the created job object). For a quick workaround fix for your use, deleting that line (or periodically wiping the queue.jobs map) should be fine as long as you're not relying on any of the job lifecycle events.

@jbielick
Copy link
Author

jbielick commented Dec 1, 2016

That's really interesting—thanks for the background / explanation. I assume without grokking the code for what used this map later that it was related to the callbacks or job status—thanks for pointing to the dereference.

In my case, there were no workers consuming the queue when profiling this. I'm working with a binlog reader which queues a job for every row in the changefeed from the binlog. The consumers are responsible for some ETL-like job, which relies on the speed of external services. Needless to say, the job queue will be a buffer which will more often than not overflow until the workers are scaled horizontally. The RDS instance can produce thousands of events per second, but the consumption of those jobs likely won't be 1:1 most of the time (and doesn't need to be). The performance of this framework has been absolutely top-notch—I tried 2 others, enqueueing and consuming alone at this rate was choppy at best in the others.

If I run a pool of workers and the messages are consumed, there's no memory leak at all, just a little buffer space occupied the size of the queue of unprocessed jobs.

Having worked with other job frameworks like resque, sidekiq, rabbitmq in various languages, the job status / retention of enqueued job was a bit surprising to me, but I can see how it's related to the job events feature. This isn't really a show-stopper, but I also wouldn't want to run with bee-queue without something like queue.jobs = {}; in my code.

It'd be interesting if the job objects themselves held a reference to the Queue, where the queue acted as a proxy for the job events and the jobs just listened to the queue for specific events related to itself. I supposed that would still leak event handlers, but if there was a chain-able method for .evented() or something that would make it opt (or out, since that would be a breaking change) the impact could be much smaller. Not saying I'm about to work on that, but just curious what your thoughts are.

Thanks for the response and explanation!

Should this issue be closed? Would a mention of this behavior do well in the docs? I'd be happy to contribute.

@LewisJEllis
Copy link
Member

LewisJEllis commented Dec 1, 2016

The performance of this framework has been absolutely top-notch

Really glad to hear that! Performance was my main motivation for building this, so it's nice to see that be useful.

If I run a pool of workers and the messages are consumed, there's no memory leak at all, just a little buffer space occupied the size of the queue of unprocessed jobs.

Yep, as expected, and the hope is that buffer won't ever be too big. That's why I was curious about if/how/when the jobs were being completed.

Having worked with other job frameworks like resque, sidekiq, rabbitmq in various languages, the job status / retention of enqueued job was a bit surprising to me, but I can see how it's related to the job events feature

Yea, I'd say it's sort of an artifact of my original use case's focus on short real-time jobs and the ability to get job results easily. I wrote a bit on that here, if you're interested. I haven't put much thought into long-lived/scheduled jobs like (most) other job queues, so they probably do that a bit differently.

It'd be interesting if the job objects themselves held a reference to the Queue, where the queue acted as a proxy for the job events and the jobs just listened to the queue for specific events related to itself.

The queue already sort of does act as a proxy to the job events - or, really, a dispatcher. All messages/status updates go through Queue.onMessage and then are emitted from the relevant job here if possible. In the flipped around arrangement you're describing, each job would have to listen to this queue-wide event stream and ask "is this event for me or for some other job?", and most of the time it will be for some other job.

If there are, say, 1000 open jobs, then we'll have 1000 listeners (!) and every time we get a message we'll have 1000 function calls and 999 of them will just be pointless "oh not for me" checks. This only really gains us the nice property that the job object only sticks around in memory as long as the user keeps it around, but we can also pull that off with an option in the current scheme to bypass the insertion to the queue's jobs table.

I supposed that would still leak event handlers, but if there was a chain-able method for .evented() or something that would make it opt (or out, since that would be a breaking change) the impact could be much smaller. Not saying I'm about to work on that, but just curious what your thoughts are.

Yea, I think an opt-out from inserting into the queue's jobs table is worthwhile. It could look something like this:

myQueue.createJob({ data }).disableEvents().save(...);

and it should only take about five lines to implement. I'd be happy to guide/review a PR if you want to contribute, or I can put it together if you're not in a rush.

@jbielick
Copy link
Author

jbielick commented Dec 3, 2016

If there are, say, 1000 open jobs, then we'll have 1000 listeners (!) and every time we get a message we'll have 1000 function calls and 999 of them will just be pointless "oh not for me" checks. This only really gains us the nice property that the job object only sticks around in memory as long as the user keeps it around, but we can also pull that off with an option in the current scheme to bypass the insertion to the queue's jobs table.

If you have a thousand open jobs, you have a thousand job objects dangling from the queue in-memory. On those dangling objects you have the event listeners for each job (job.on('succeeded',). You already have 1000 listeners/closures! One event name for the pub/sub bus is the most pessimistic way to view it, where event names like {event_name}:{job_id} would be an obvious win. I don't know why "oh not for me" was assumed.

Again, if events were opt-in, that setup wouldn't be all that crazy :) Sorry to suggest.

@skeggse
Copy link
Member

skeggse commented Jul 19, 2017

We're adding several new configuration options to Bee in #64, including getEvents, sendEvents,
and storeJobs. The storeJobs option will disable storing jobs on the queue by their job id - thus the local job storage will be opt-out, but the defaults will still cause the reported behavior.

EDIT: sorry for closing this - will close when #64 is actually merged.

@skeggse skeggse closed this as completed Jul 19, 2017
@skeggse skeggse reopened this Jul 19, 2017
@skeggse skeggse added this to the 1.0.0 milestone Jul 19, 2017
@LewisJEllis LewisJEllis mentioned this issue Aug 7, 2017
3 tasks
@skeggse
Copy link
Member

skeggse commented Aug 9, 2017

Per my previous comment, I believe this issue is fixed as of 1.0.0. Feel free to reopen if this is not the case.

@skeggse skeggse closed this as completed Aug 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants