-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible infinite loop if BRPOPLPUSH fails ? #1304
Comments
@balexandre actually reaching the Redis limit is not the real problem here. There could be a multiple reason causing your Redis db to be full and there is a few tools to show you what keys are taking the most memory space. So you can either clean a little bit or upgrade your plan. Problem is that the Bull library is in an infinite loop when this occurs, generating a huge volume of logs, and ignoring the error, so we have no way to be warned this is occurring. |
I have the same issue..... Is there some way to catch this error? I am not even sure where it's thrown from |
Seems to have been introduced in this commit: Lines 893 to 896 in f8ff368
Perhaps the fix is simply to not swallow any errors here? Wouldn't this fire an error event on the queue if not swallowed? |
@mmelvin0 Did you receive a response or have any updates on this? Thanks! |
actually I think it would be as easy as only swalling "Connection is closed" error, since this error happens when we force to close the connection with redis, should be an easy fix. |
This blew up our logging pipeline too. Filled up our Papertrail daily limit in two minutes. I love Bull and would like to move on this quickly. @manast how can I or my team help fix this? We actually have an open ticket about this on our backlog because it can potentially break our production at https://checklyhq.com when this happens and we don't intervene manually |
if you provide a PR I will merge it quickly and make a release. It is as mentioned as simple as only swalling the error in case of connection is closed, and maybe add a small 5 seconds delay before trying again. |
@manast I will make a PR |
Would love feedback or insights into the PR I mentioned above. This is still a significant risk to overwhelm any logging pipeline when things go south. |
This is high prio, will try to merge today. |
@manast-apsis thanks, let me know if I can help anywhere. |
@manast @manast-apsis Let me know what I need to do to get this merged. Thanks |
@manast @manast-apsis hope all is well, let me know if I can assist with merging the PR for this? |
The thing is that the current PR does not do anything valuable. I am not sure which solution is the best here, basically I think we need to just delay some seconds in the case of error, maybe can be configurable. But sometimes the connection will error with connection closed error since we are manually closing the queue, and in this case we do not want to delay so we need to check if the closing promise is set and in that case no delay. |
@manast what is your suggestion for moving this ticket forward? I have the feeling we are going in circles a bit. I would be happy to remove/close my PR in favour of a better solution. The reason I'm pushing this is that it is still a risk for many folks using logging solutions with hard limits. As mentioned, I'm ready to help or have my team also look at it. |
same problem here, only in production, using docker |
@manast what do you mean when you say "we need to just delay some seconds".
I think I'm just really confused on the "delay" part. This will not solve spamming the logging channel. It will just spam with a delay... |
I posted a possible solution to this problem in #1746 |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
@manast I was not aware of this. Thanks for mentioning and fixing this! I will remove my thumbs down. |
Hello, I am currently running into this. Finding that if the redis memory is full, Seems to be failing here: https://github.com/OptimalBits/bull/blob/develop/lib/queue.js#L1202 And the catch block doesn't catch anything, no error is thrown. We get stuck in an infinite loop. |
If Redis is full, then the best is to delete some data by hand or upgrade the instance to one with more memory, most hosting provides seamless upgrade procedures. |
Description
Hello today we faced a problem that seemed to be an infinite loop on Bull library.
Our service is hosted on Heroku with the Redis addon and today we reached the memory quota of the Redis DB.
What happened is that we had an enormous log stack saying this:
The log stack took up to 1gb in a few minutes until we fix the quota issue.
By looking a little bit to the code in
lib/queue.js
, it seems that the error on BRPOPLPUSH is ignored inQueue.prototype.getNextJob
. So I guess that what happened is that the loop searching for new jobs to process was infinitely popping the error.I don't have enough knowledge about how Bull internally works to propose a fix, but I think this is something that should be handled, maybe by detecting when there is several errors on BRPOPLPUSH and add a waiting duration when this happens to frequently.
Bull version
v3.7.0
(just seen that 3.8 is out ; it doesn't seem that it would be fixed in this version by reading the changelog)
The text was updated successfully, but these errors were encountered: