Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locking queues #7

Open
jpallen opened this issue Mar 19, 2013 · 7 comments
Open

Locking queues #7

jpallen opened this issue Mar 19, 2013 · 7 comments

Comments

@jpallen
Copy link

jpallen commented Mar 19, 2013

We've been using Fairy in production for a while now, and one of the problems that we keep running into is that we end up with a bunch of FAIRY:QUEUE:* keys in our Redis database that aren't being processed by a worker. Due to the way that Fairy locks the queues, if they contain something, these queues never get processed again and are effectively blocked. I don't know why these queues are getting stuck in the first place, but it could be to do with them not being cleared on a restart, due to a crash, or possibly a Fairy bug (wherever it comes from, I would definitely consider this behaviour a bug though).

I think Fairy could be improved by a more sophisticated and dedicated locking scheme. The Redis documentation comes with a thorough description of how to use SETNX to create a locking mechanism with a timeout. Each queue would have its own lock that would be acquired by a worker when processing the queue. This would then be released when the queue was empty again. This would replace the test of whether the queue already contains entries in the _poll method. Since the lock has a timeout (which could be set by the user depending on the expected length of tasks), even if the queue did get stuck as before, the lock would eventually expire and the queue could be processed again without manual intervention.

@henryoswald
Copy link
Contributor

Hey, we are probably going to start working on this soon as its becoming a problem for us. Would you be ok with the different locking mechanism?

@baoshan
Copy link
Owner

baoshan commented Jun 6, 2013

Sorry! I missed the issue for 3 months! Does your project still need a workaround? Please let me know.

@baoshan
Copy link
Owner

baoshan commented Jun 6, 2013

The locking mechanism is promising I think. But the blocking behavior is by design and there's a reschedule method to restart the blocked tasks. Does that decision suite your needs?

@jpallen
Copy link
Author

jpallen commented Jun 10, 2013

We found that we had problems with the queues getting stuck, as I mentioned. We haven't been able to track down how these got into an inconsistent state, where the FAIRY:QUEUE:* lists has elements in them but no fairy worker was processing them. Using a locking mechanism would be a partial fix for this since it would be more robust to stuck queues.

Yes, we managed to get around it by calling reschedule regularly when we need to, but it's still a workaround rather than a solution to the underlying problem.

We're also seeing trouble with the workers getting stuck regularly for some reason. We still haven't figured out why, but after a while, and at random intervals, the 5 workers that we have running have all stopped processing anything. A restart of our worker process sets everything going again, but it's not clear what the problem is here. We will keep you updated as we find out more.

@baoshan
Copy link
Owner

baoshan commented Jun 10, 2013

Are there any FAIRY:PROCESSING:* keys when workers are stuck? That may because workers haven't callback for some reason. Could you please post a keys * result when the problem happens?

pub/sub rather than polling is definitely the right way to improve. That'll be done in the next version. But I'm afraid a timeout locking is not always better. Sometimes tasks show vast differences in demanding of processing time.

Thanks.

@jpallen
Copy link
Author

jpallen commented Jun 10, 2013

I don't think so - I think the only keys set were the FAIRY:QUEUE:* keys. It's been a while since we investigated this though since we've been running with the above workaround for a while now. I'll try to get you more debugging data next time we look into this.

Thanks a lot.

@ducdigital
Copy link

@baoshan @jpallen have you ever implemented the lock?

https://github.com/mike-marcacci/node-redlock This might help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants