New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stuck & Ophaned jobs #688
Comments
I am not really sure what stuck state means. I guess this is using bull-ui right? Jobs should always have one of the following states: wait, active, delayed, completed or failed. So first thing I would like to know in which state are these jobs in? |
Oh well that's exactly the problem. To me, it seems none. Look: (I'm using Arena) Also I got the job using queue.getJob:
|
that is very strange actually. It should not be possible that a job is not in one of the states, jobs are always moved atomically from one state to another. |
Some more clues: While the UI shows the number of items in priority queue as 0, I'm getting this in Redis:
They are not getting processed as the active queue is completely empty and the user interface is showing 0 items are in the queue. Also I can confirm that this specific job |
do you have any way to check if the job has actually being processed correctly? |
yes it seems as if the jobs have not been processed at all actually. What priority values are you using? |
The priority for this job was 7 I think. |
can you verify the jobId is really not in the wait list? |
I can confirm that it's not in the 'bull:upload photo:wait' What I did was this:
It has 3K items in it. None of them are this job. (I have a load of jobs now, which will be processed in a few minutes) |
According to the logic, if a job leaves the wait status (which is the first status a job gets when being added to the queue), it gets the property, |
oh wait, could it be that my jobs have had a ttl and their ttl was reached before I processed them? |
you mean the timeout option?, that is only considered when already processing the job. So if it had timedout the job would be in the failed status, and not in the priority set. |
from which version did you upgrade? we would need to find out from which version you get this issue. |
I'm on bull 3.0.0 according to I was using 2.x but when I upgraded last week I flushed the DB. If you want I think I can arrange access for you so you can take a deeper look if its worrying. Feel free to contact me directly. |
are you able to reproduce the issue in a development machine? |
I don't think I can reproduce easily as I've no idea how it happened. |
Here is another clue I got. Of course the job keys are still there. While the workers are shutting down, we're not pausing the queues. This might be related to this issue or might be an irrelevant one. I'll try to find that out. |
Maybe you are calling clean in you shutdown or start up process? In any case you do not need to call resume when starting up, the workers will keep up processing the queue as soon as they start. |
I have the same issue, thousands of jobs are "stuck" in the queue. In fact they are processed, even I'm using |
@Nelrohd its good if you could upgrade to 3.2.0. |
@emilsedgh any news regarding this issue? |
No not any news. This keeps happening though. I haven't looked into it with more details. Probably within next few weeks I will. Is bull using any debugs I can enable using env vars that helps? |
According to the docs, seems like |
this issue sounds troubling to get in sensitive production enviroment. how probable this issue is, anyone else has this issue? |
@nitzanav Unfortunately the poster is not able to reproduce it consistently so I am going to close this. For all we know it can be an issue in the poster process function. |
That'd be just fair to close this. I was not able to reproduce this consistently. And our load on it has decreased a lot and we're not seeing it again. |
I'm experiencing the same issues; Jobs are created with If the reason is those IDs have been taken already, what would be the right approach to clean up the state? For context, this queue has custom Job IDs, and the job has to be |
2 Years later and I'm still seeing the same behavior of jobs getting the "stuck" status even when explicitly calling done() and returning. Any updates? |
@W4G1 this is a closed issue. If you have more information and a/or a case that reproduces it you are more than welcome to create a new issue and we will look into it. |
Hi there.
It seems 3.0 is very stable. Thank you.
I process tens of thousands of jobs every day. I always have
removeOnComplete
set to true on them, so ideally, if the queue is empty, redis should be empty.Today I looked and there are thousands (~30K) jobs that are just there. Stuck.
Some of them has had 0 attempts made.
This is an example:
First I thought maybe I'm not doing a graceful shutdown or something like that. But this job ~12 hours old and I haven't had a restart in the queue worker for the past 48 hours or something.
Any ideas how I can get more clues to find the problem?
The text was updated successfully, but these errors were encountered: