-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Properly debugging job stalled more than allowable limit
#412
Comments
Does it happen for all jobs? Or only some?
A job can only get stalled if bull isn't able to renew the lock (which it does internally). This only happens if the entire node event loop gets behind due to high CPU (and setInterval isn't run).
… On Dec 9, 2016, at 8:58 AM, Eric Hagman ***@***.***> wrote:
Hi!
I have a job that simply spins up an Amazon lambda function and awaits the return of the response. I thought that stalled jobs only had to do when there was too much CPU work occurring on the main thread and so I am confused as to why my job would be stalling.
Would you mind explaining the different ways a job could be stalled? I think I am missing something as far as how the job stalling works.
Version: 1.1.3
Redis Version: 3.2.1
Error: job stalled more than allowable limit
at /app/node_modules/bull/lib/queue.js:569:50
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Only for this one job type at the moment. This job used to do a lot more work but now all of that has been moved off the server to Amazon Lambda so I find it odd that now that it does no work on the the server with bull, it stalls. I do have multiple workers running bull if that matters at all. I use Trace and did detect that there was event loop lag during that time. I will look into it further on my end, thanks for the explanation! |
Any update on this? |
Started getting the same error too. |
@n3trino does it happen for all job types, or only some? Are you seeing high CPU when the job is running (that might cause it to fail to renew the timer)? |
Is it possible to increase the allowable limit for specific queues? |
Is it possible for somebody to explain what this limit is? Is it a limit on time for a worker/job? If a worker/job takes too long, will this be triggered? |
are the jobs properly resolved? eg. called jobDone()? or Promise.resolve() at the end of process? |
I've started to experience this as well, however the job doesn't truly fail. What I mean by that is, the work that I wanted to get done is finished in entirety. But because of the error (I think because of the error, anyway), the job is automatically retried, and the second run is guaranteed to error with the following:
@bradvogel my jobs do typically peg my machine's CPU. But I don't see failures until I run jobs that take over 45s to complete, appx. @zhaohanweng I'm returning a promise from the |
Can you remove parts of your job processing function until you can get it run successfully? I bet some part (probably near the end of the processing function) is stalling the Javascript event loop and causing Bull's setInterval() call to renew the timer to lag. |
@bradvogel yep, I reorganized the work into two separate jobs that run sequentially and things are cranking along smoothly now. The CPU is much less taxed, so it seems you were exactly right about the timer latency. Thanks! |
Since this issue seems to be very recurrent, I have added a new feature request that hopefully will solve this problem once and for all: #488 |
What's the feature request that replaced this bug? |
I'd love to know as well |
any update on this? |
Hi!
I have a job that simply spins up an Amazon lambda function and awaits the return of the response. I thought that stalled jobs only had to do when there was too much CPU work occurring on the main thread and so I am confused as to why my job would be stalling.
Would you mind explaining the different ways a job could be stalled? I think I am missing something as far as how the job stalling works.
Version: 1.1.3
Redis Version: 3.2.1
The text was updated successfully, but these errors were encountered: