Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there an advised way to handle jobs that get stuck in an active state? #391

Closed
dbousamra opened this issue Aug 15, 2014 · 17 comments
Closed

Comments

@dbousamra
Copy link

I have a situation where a worker picks a job up, starts processing it, and then in the middle of processing, I get a fatal process error (due to bad OpenCV errors I can't seem to catch). PM2 restarts the app, and continues processing.

However, that job has been marked as active, and is now stuck.

Is there a way to clear out active jobs, pushing them back to failed if they are in an active state too long?

@behrad
Copy link
Collaborator

behrad commented Aug 15, 2014

solutions at the worker level code:

  1. using node.js domain to capture async errors and call done(err)
  2. bind to process.on('uncaughtException') like the sample in graceful shutdown example in docs.

solutions at the kue built-in (not implemented yet):

  1. use a active job TTL, after which jobs are automatically marked failed.
  2. wrap client's process in a domain so that do the above for them, but there should be a way to pass back the error to client.

@licq
Copy link

licq commented Aug 18, 2014

Is there any plan on the built-in mechanism?

@behrad
Copy link
Collaborator

behrad commented Aug 18, 2014

No milestones for those two yet.

  1. about the TTL, it doesn't guarantee that your code won't face unhandled exceptions and stuck active jobs till there optionally set TTL arrives!

  2. As I said, the problem is I don't know how should Kue pass back the error to the client code.

lchenay added a commit to lchenay/kue that referenced this issue Sep 5, 2014
…failed.

Uncatched exception make kue to stop, and make job stuck in unknown statement.

Related to Automattic#391
@jowy
Copy link

jowy commented Sep 6, 2014

How do you suggest pushing these broken jobs back on to the queue safely?

@behrad
Copy link
Collaborator

behrad commented Sep 6, 2014

  1. By Web-UI: you can simply click on the job state, the combo appears, change it to inactive
  2. Programmatic: you can query active jobs, filter BROKEN jobs by id/creation time and call .inactive on them.

@jowy
Copy link

jowy commented Sep 6, 2014

Excellent, thank you!

@behrad
Copy link
Collaborator

behrad commented Sep 6, 2014

@jowy Welcome, However you'd better properly gaurd your worker code so that it doesn't crash on uncaught errors. Please read #403

@jowy
Copy link

jowy commented Sep 6, 2014

Yep that's the idea, I wasn't aware of some silent exceptions not bubbling up in my promises until today. Luckily it was only my dev env. Thanks for the hard work and advice, I will check it out!!

@behrad behrad added this to the 0.9.0 milestone Sep 15, 2014
@kithokit
Copy link

Hi, How to query active jobs programmatically? I can't find it kue docs. Thanks

@behrad
Copy link
Collaborator

behrad commented Sep 24, 2014

#418 (comment)

@GeoffreyPlitt
Copy link

I'm having this problem right now too. If an error happens you can catch it with domains/uncaughtException etc, but what if you just forgot to call done()?

All other message queues I know of (RabbitMQ, Iron, AWS SQS) have the notion of timeouts where a job is retried if it runs past the TTL. Really hoping Kue gets this!

@behrad
Copy link
Collaborator

behrad commented Dec 12, 2014

but what if you just forgot to call done()?

then you shouldn't blame Kue for your stuck active jobs ;) and as I said above, TTL would be just a workaround, you will lose your queue concurrency bandwidth until TTL arrives for stuck jobs

All other message queues I know of (RabbitMQ, Iron, AWS SQS) have the notion of timeouts where a job is retried if it runs past the TTL. Really hoping Kue gets this!

I'm eager to implement TTL in kue, however I think default action should be marking job as failed. It will be then retried if it has remaining attempts. and bear in mind that Kue is a Job queue not message queue :)

@GeoffreyPlitt
Copy link

You say "however I think default action should be marking job as failed"-- how else do you do TTL's? I think that is what's being proposed.

Forgetting to call done isn't always a programming flaw, functions can hang for lots of reasons unanticipated by the primary programmer, such as library flaws.

Given the amount of functionality Kue does have, it just seems strange that it doesn't provide TTL functionality. Most of the popular alternatives I've used (SQS, IronMQ, RabbitMQ) seem to have it.

Also, from my research, Job Queue's are a subset of Message Queues, so all Job Queues are Message Queues, which means Kue is a Message Queue. What do you mean it's not a Message Queue? What specific aspect of Message Queues doesn't apply to Kue?

@behrad
Copy link
Collaborator

behrad commented Dec 12, 2014

There's no argue Kue should and will have a TTL implementation.

Also, from my research, Job Queue's are a subset of Message Queues,

Job Queues are more granular, and focused abstractions usually on top of message queues which are more related to batch processing, task distribution, workload management, ...

so all Job Queues are Message Queues, which means Kue is a Message Queue.

You can't say Resque, Celery, Kue, ... are Redis, RabbitMQ, ActiveMQ, ...! Can you? Message Queues are under the hoods to the job queues, and offer a more wide set of MOM and middleware as durability, reliability, types of queues, routing, pub/sub, selectable producer/consumer patterns, ...
Job queues only provide subset of these and also from your business task point of view, not a single fine-grained message

What do you mean it's not a Message Queue? What specific aspect of Message Queues doesn't apply to Kue?

Kue has no routing, wildcard consumers, message level configurability, ...

@GeoffreyPlitt
Copy link

Really helpful explanation, thanks!

On Fri, Dec 12, 2014 at 2:27 PM, Behrad notifications@github.com wrote:

There's no argue Kue should and will have a TTL implementation.

Also, from my research, Job Queue's are a subset of Message Queues,

Job Queues are more granular, and focused abstractions usually on top of
message queues which are more related to batch processing, task
distribution, workload management, ...

so all Job Queues are Message Queues, which means Kue is a Message Queue.

You can't say Resque, Celery, Kue, ... are Redis, RabbitMQ, ActiveMQ, ...!
Can you? Message Queues are under the hoods to the job queues, and offer a
more wide set of MOM and middleware as durability, reliability, types of
queues, routing, pub/sub, selectable producer/consumer patterns, ...
Job queues only provide subset of these and also from your business task
point of view, not a single fine-grained message

What do you mean it's not a Message Queue? What specific aspect of Message
Queues doesn't apply to Kue?

Kue has no routing, wildcard consumers, message level configurability, ...


Reply to this email directly or view it on GitHub
#391 (comment).

http://www.geoffplitt.com
http://facebook.com/geoffrey.plitt
https://twitter.com/GeoffreyPlitt
773.339.0915

@behrad
Copy link
Collaborator

behrad commented Mar 21, 2015

#544

@behrad
Copy link
Collaborator

behrad commented Mar 21, 2015

Document on using domains #403 and why Kue has not used them builtin !?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants