-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
queue.jobs memory leak #54
Comments
Heh, I have admittedly neglected this project for a while, but I noticed this issue in my feed and as soon as I saw the title I knew the line you'd be linking to. I remember when making this thinking "better be careful not to leak memory here," but I also recall having convinced myself it would only happen in an unusual scenario. We do drop that reference from the jobs table here upon receiving confirmation of the job's success/failure, and I assumed any long-running process that might enqueue lots of jobs would also be receiving those success/failure confirmations. I suppose now that that's not necessarily the case, but it was in all my own use cases that I envisioned. Can you give more details on whether these jobs are being completed? Are they waiting in the queue for a while, or are they a bit slow to run, or is the job-producing process otherwise not going to receive a success/failure confirmation in a timely manner? I think regardless, it should at least be possible to add a config option to not keep that reference around for the fire-and-forget use case (where we won't be concerned that none of the lifecycle events are emitted from the created job object). For a quick workaround fix for your use, deleting that line (or periodically wiping the |
That's really interesting—thanks for the background / explanation. I assume without grokking the code for what used this map later that it was related to the callbacks or job status—thanks for pointing to the dereference. In my case, there were no workers consuming the queue when profiling this. I'm working with a binlog reader which queues a job for every row in the changefeed from the binlog. The consumers are responsible for some ETL-like job, which relies on the speed of external services. Needless to say, the job queue will be a buffer which will more often than not overflow until the workers are scaled horizontally. The RDS instance can produce thousands of events per second, but the consumption of those jobs likely won't be 1:1 most of the time (and doesn't need to be). The performance of this framework has been absolutely top-notch—I tried 2 others, enqueueing and consuming alone at this rate was choppy at best in the others. If I run a pool of workers and the messages are consumed, there's no memory leak at all, just a little buffer space occupied the size of the queue of unprocessed jobs. Having worked with other job frameworks like resque, sidekiq, rabbitmq in various languages, the job status / retention of enqueued job was a bit surprising to me, but I can see how it's related to the job events feature. This isn't really a show-stopper, but I also wouldn't want to run with bee-queue without something like It'd be interesting if the job objects themselves held a reference to the Thanks for the response and explanation! Should this issue be closed? Would a mention of this behavior do well in the docs? I'd be happy to contribute. |
Really glad to hear that! Performance was my main motivation for building this, so it's nice to see that be useful.
Yep, as expected, and the hope is that buffer won't ever be too big. That's why I was curious about if/how/when the jobs were being completed.
Yea, I'd say it's sort of an artifact of my original use case's focus on short real-time jobs and the ability to get job results easily. I wrote a bit on that here, if you're interested. I haven't put much thought into long-lived/scheduled jobs like (most) other job queues, so they probably do that a bit differently.
The queue already sort of does act as a proxy to the job events - or, really, a dispatcher. All messages/status updates go through If there are, say, 1000 open jobs, then we'll have 1000 listeners (!) and every time we get a message we'll have 1000 function calls and 999 of them will just be pointless "oh not for me" checks. This only really gains us the nice property that the job object only sticks around in memory as long as the user keeps it around, but we can also pull that off with an option in the current scheme to bypass the insertion to the queue's jobs table.
Yea, I think an opt-out from inserting into the queue's jobs table is worthwhile. It could look something like this: myQueue.createJob({ data }).disableEvents().save(...); and it should only take about five lines to implement. I'd be happy to guide/review a PR if you want to contribute, or I can put it together if you're not in a rush. |
If you have a thousand open jobs, you have a thousand job objects dangling from the queue in-memory. On those dangling objects you have the event listeners for each job ( Again, if events were opt-in, that setup wouldn't be all that crazy :) Sorry to suggest. |
We're adding several new configuration options to Bee in #64, including EDIT: sorry for closing this - will close when #64 is actually merged. |
Per my previous comment, I believe this issue is fixed as of |
I believe this line leaks memory.
I'm using this in an infinitely long-running application that continuously pushes jobs to redis to be processed by other threads. The app tails a MySQL replication log, so it's never-ending.
I noticed a memory leak because my node process was consuming as much memory as redis itself (where all the queued messages are stored). Redis and the node process's memory usage were essentially 1:1.
I noticed this line which stores the jobs on the queue object indexed by the job id. Over time, would this not leak memory on the queue object? These are never garbage collected. Over a period of 20 minutes or so, this can increase to 400mb.
The memory leak appeared to come from this library for two reasons:
1.) when I commented out the
queue.createJob
, the processes only used ~30mb of memory continuously.2.) Using
memwatch
andheapdump
, I exported a heap snapshot after a memory leak was detected (mem usages increased consistently after 5 garbage collection events) and upon viewing the results,queue.jobs
accounted for 92% of the heap usageI'm not super familiar with this lib. Is this line crucial or can it be removed? I'll probably just clear this array manually as I'm not using progress, events or callbacks. Just need to fire and forget.
The text was updated successfully, but these errors were encountered: