Locking queues #7

jpallen · 2013-03-19T10:43:31Z

We've been using Fairy in production for a while now, and one of the problems that we keep running into is that we end up with a bunch of FAIRY:QUEUE:* keys in our Redis database that aren't being processed by a worker. Due to the way that Fairy locks the queues, if they contain something, these queues never get processed again and are effectively blocked. I don't know why these queues are getting stuck in the first place, but it could be to do with them not being cleared on a restart, due to a crash, or possibly a Fairy bug (wherever it comes from, I would definitely consider this behaviour a bug though).

I think Fairy could be improved by a more sophisticated and dedicated locking scheme. The Redis documentation comes with a thorough description of how to use SETNX to create a locking mechanism with a timeout. Each queue would have its own lock that would be acquired by a worker when processing the queue. This would then be released when the queue was empty again. This would replace the test of whether the queue already contains entries in the _poll method. Since the lock has a timeout (which could be set by the user depending on the expected length of tasks), even if the queue did get stuck as before, the lock would eventually expire and the queue could be processed again without manual intervention.

The text was updated successfully, but these errors were encountered:

henryoswald · 2013-03-29T15:03:17Z

Hey, we are probably going to start working on this soon as its becoming a problem for us. Would you be ok with the different locking mechanism?

baoshan · 2013-06-06T14:45:11Z

Sorry! I missed the issue for 3 months! Does your project still need a workaround? Please let me know.

baoshan · 2013-06-06T14:51:43Z

The locking mechanism is promising I think. But the blocking behavior is by design and there's a reschedule method to restart the blocked tasks. Does that decision suite your needs?

jpallen · 2013-06-10T11:12:46Z

We found that we had problems with the queues getting stuck, as I mentioned. We haven't been able to track down how these got into an inconsistent state, where the FAIRY:QUEUE:* lists has elements in them but no fairy worker was processing them. Using a locking mechanism would be a partial fix for this since it would be more robust to stuck queues.

Yes, we managed to get around it by calling reschedule regularly when we need to, but it's still a workaround rather than a solution to the underlying problem.

We're also seeing trouble with the workers getting stuck regularly for some reason. We still haven't figured out why, but after a while, and at random intervals, the 5 workers that we have running have all stopped processing anything. A restart of our worker process sets everything going again, but it's not clear what the problem is here. We will keep you updated as we find out more.

baoshan · 2013-06-10T11:35:57Z

Are there any FAIRY:PROCESSING:* keys when workers are stuck? That may because workers haven't callback for some reason. Could you please post a keys * result when the problem happens?

pub/sub rather than polling is definitely the right way to improve. That'll be done in the next version. But I'm afraid a timeout locking is not always better. Sometimes tasks show vast differences in demanding of processing time.

Thanks.

jpallen · 2013-06-10T12:37:26Z

I don't think so - I think the only keys set were the FAIRY:QUEUE:* keys. It's been a while since we investigated this though since we've been running with the above workaround for a while now. I'll try to get you more debugging data next time we look into this.

Thanks a lot.

ducdigital · 2017-02-04T23:49:57Z

@baoshan @jpallen have you ever implemented the lock?

https://github.com/mike-marcacci/node-redlock This might help?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Locking queues #7

Locking queues #7

jpallen commented Mar 19, 2013

henryoswald commented Mar 29, 2013

baoshan commented Jun 6, 2013

baoshan commented Jun 6, 2013

jpallen commented Jun 10, 2013

baoshan commented Jun 10, 2013

jpallen commented Jun 10, 2013

ducdigital commented Feb 4, 2017

Locking queues #7

Locking queues #7

Comments

jpallen commented Mar 19, 2013

henryoswald commented Mar 29, 2013

baoshan commented Jun 6, 2013

baoshan commented Jun 6, 2013

jpallen commented Jun 10, 2013

baoshan commented Jun 10, 2013

jpallen commented Jun 10, 2013

ducdigital commented Feb 4, 2017