New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
multi-hop / dependent jobs? #25
Comments
It feels a bit strange to me, but it should be possible. I've never had/tried a use case like this myself, and it's not a use case I've tried to design for, but the necessary pieces/functionalities are there to pull it off. I've focused mostly on small jobs that wouldn't be large enough to worry about breaking up into pieces. I'm curious about how much processing time one of these To clarify the sequence of events:
This could even work if If you end up doing this and find that some small changes could make the use case easier/cleaner, please let me know. |
Yes, that's the gist of it. There's some design reason why it is the way it is -- mostly related to: some jobs are useful standalone and also as steps of other jobs depending on the context, so they are split up...and some producers should only ever be run from a single node, but we may have any number of consumers. I'll let you know if I run into any particular issue. Thanks! Feel free to close. |
This is working without issue, FWIW. |
Great to hear! I was kind of concerned about the failure model of such a scheme, but after giving it some thought I think it should be reasonably robust. With bee-queue generally following at-least-once delivery, retries and stalled job resets should ensure that everything gets done in these "chained" jobs, just with a bit more risk of some things happening more than once. |
Yea, I'm not quite out of the woods yet...as it relates to failures, timeouts and retries...I think I still have some tweaking to do. NOTE: apologies in advance for this novel-size comment. I'm trying to write it out clearly mostly for myself, to try to reason it out, and hopefully you may have a recommendation or two after understanding the problem frame better. Here's what these jobs do. We have a reporting site for restaurant POS data. It's heavy on charts, and tables. To print, we use phantomjs. From the webpage, the user can click the print button which will trigger a websocket message to the server which puts a job in the Redis queue that a server running phantom picks up, it creates a pdf of the page, uploads to s3, and then the user can open the pdf, or download it or email it. So, that's one producer/consumer job. There's a But some people don't want to log into the website every day and run the report. Some reports they just want mailed to them, so we have the concept of Here's how I've defined the queues / producers / consumers: deliverSubQueue = new Queue(name, {
redis: config.redis,
isWorker: false, // not a worker, only a producer of jobs
removeOnSuccess: true
});
// jobs are created like this; phantomjs sometimes fails, so I do want retries
// also, we need at least 60s, some reports can take a while for various reasons
var job = deliverSubQueue.createJob({sub: sub});
job.timeout(1000 * 60).retries(4).save(function (err, job) {
if (err) {
return error('Error creating subscription job in Redis', err);
}
log('job.%s[%d] created for subscription %s', job.queue.name, job.id, sub._id);
}); There's a separate node that runs the consumer code for this job. The consumer code generates a second job to create a pdf (it re-uses the same queue/job def from above where the user can print direct from the page). I want zero retries on this, b/c if it fails, I preferred the original (parent) job would re-try and just consider the child job a failure. The timeout needs to be somewhat lesser than the subscription job timeout. deliverSubQueue = new Queue(name, {
redis: config.redis,
isWorker: true,
removeOnSuccess: true
});
// worker code
deliverSubQueue.process(function (job, done) {
// the processor for this job creates the second job:
var pdfjob = createPdfQueue.createJob({ data: data });
pdfjob.timeout(1000 * 50).retries(0).save(function (err, subjob) {
if (err) { done(err); }
});
pdfjob.on('succeeded', function (result) {
// etc.
});
}); The createPdfQueue = new Queue(name, {
redis: config.redis,
isWorker: false,
removeOnSuccess: true
}); I'll spare the queue / worker setup for generating the pdfs, as I don't think it is so relevant. So, here's the problem I'm facing. I'm trying to test and instrument various failure scenarios. Under normal conditions, this setup works just fine. I have a concurrency setting of 2, and I've chugged through 50 or so jobs repeatedly without failure. It's the retries and timeouts in the face of failure that I need to resolve. So, if I bootstrap all this, and then kill the
This will continue, and I end up generating only a single I'm trying to figure out a configuration that will retry the |
Here's an example of the state of the keys:
|
So, it works as expected, if I propagate the subjob failure back to the parent, i.e.: pdfjob.on('failed', function (err) {
done(err); // calls parent job's done with error
}); This causes the parent to kick-in and retry. The only problem is when the |
Good news: this all sounds like expected behavior to me, and it also sounds like you have a strong handle on everything that's going on as far as the library and its behavior is concerned. Bad news: this risk of sending multiple emails is somewhat inherent in the at-least-once-delivery model. Good news: I think it's possible to move some parts around and end up with something with a bit less risk of sending multiple emails. It should at the very least be possible to avoid queueing up email after email if the pdf consumer is down. I'm not immediately sure what the best structure is exactly, though; I'll have to give it some thought. |
Oh, missed your most recent comment; yes, that seems like it would work out better. I think it might be possible to make it work without propagating all the way back to the parent, but I suppose that's not such a bad thing to be doing. |
Yea, if you have some ideas on what to do in the scenario where the pdf consumer is offline (that has happened before, although there are other ways to mitigate against that). I'm not certain I have a clear solution. By the very nature of pub/sub, the publisher isn't supposed to care who the subscriber is or if they're available. The broker just queues up messages in a mailbox for them. If there's a way I can query the queue to figure out if a pdfjob has already been submitted and is pending, that would do the trick, though? Alternative solution might be to move the parent |
Hey, do you see any potential issue with jobs that have multiple hops / dependencies. For example:
producer1
-->consumer1
which then createsproducer2
-->consumer2
The
producer1
job isn't successful until the whole chain returns successful.I also want to make use of retries and timeouts for the upstream job, but not the downstream. I'm looking to get rid of kue...I have had nothing but problems with it, and I'm looking at giving yours a try. I only have one particular job that has a multi-hop like that, but I thought I'd inquire before I spent the time to migrate the code and try it out.
The text was updated successfully, but these errors were encountered: