Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delayed jobs repeats #449

Closed
brianconnoly opened this issue Oct 20, 2014 · 13 comments

Comments

Projects
None yet
3 participants
@brianconnoly
Copy link

commented Oct 20, 2014

Hi there!
Our company uses kue to perform delayed posting in social networks.
We love everything about kue, except one thing. Sometimes jobs run two and even more times.
Our server configuration has a lot of forked workers, that perform jobs execution.
Maybe the reason is my misunderstanding of jobs.promote mechanism.
Should I place this line in every worker or only in master process?

@behrad

This comment has been minimized.

Copy link
Collaborator

commented Oct 20, 2014

promote should be called once, in only ONE (=master) process.
#413

Are you passing parameters to it? what values?
Read #443

@brianconnoly

This comment has been minimized.

Copy link
Author

commented Oct 21, 2014

I tried different values and finally come to this:
every = 5 * SEC
check = 10 * 1000
jobs.promote every, check

I'll try to describe my situation more detailed.
I have one redis server. I have a lot of job types. Some of them are delayed.
First problem was that NOT delayed jobs stuck in queue. They are "pushed" by new ones, but the last often stuck. I added promote to worker that processed that kind of jobs. Problem was gone, but some delayed jobs started to run twice.

Then I divided it to two separated contexts (via prefixes). I left one jobs.promote in first context (NOT delayed jobs) and other one in delayed context.

When I tried to put "jobs.promote" to every worker, jobs started to perform a lot of times. I reverted this changes.

Maybe my project need something "bigger" than kue? This will be sad news to me, because I love everything in kue, except this situations.

@behrad

This comment has been minimized.

Copy link
Collaborator

commented Oct 21, 2014

First problem was that NOT delayed jobs stuck in queue. They are "pushed" by new ones, but the last often stuck. I added promote to worker that processed that kind of jobs. Problem was gone, but some delayed jobs started to run twice.

Stuck jobs in inactive (queued) state is not related to kue. You can read #130 about this. So first try node.js domains, or uncaughtException hook with kue's graceful shutdown to secure you node.js process from hard crashing, then please check your redis logs for any crashes, restarts as the source. You should remove promote() call when you are not using delayed jobs

When I tried to put "jobs.promote" to every worker, jobs started to perform a lot of times. I reverted this changes.

again! do not do this

Maybe my project need something "bigger" than kue? This will be sad news to me, because I love everything in kue, except this situations.

How big is your big? What you mean by BIG?

I don't doubt on Kue's scalability and simplicity it offers at that scale. When it comes to criticality you will have an small percentage of inconsistencies (=stuck jobs) when you cannot robustly configure your redis server (or is it an unstable remote hosted instance or with poor connectivity )

You can help, with focusing on the potential duplication on delayed promotion bug if any.

@brianconnoly

This comment has been minimized.

Copy link
Author

commented Oct 22, 2014

My redis server is in the same datacenter that application.
I have my own nodejs module "Sequence", that bounds async methods in one chain. Every step is try/catched, so when sequence crash, it quits gracefully and done() is called.

Also we have our own monitoring system for process instances and forks. There are no crashes. When I tried to log job start/end I noticed that stucked job was not even started.

About project size... We have 5-10 thousands posting jobs a day. Every posting job consists of 2 delayed kue jobs and one not. Then we have a lot of jobs not related to posting (converting uploaded pictures, saving converted paths to db, importing from social network and so on).

I read #130. When 1.0.0 will be available?

@behrad

This comment has been minimized.

Copy link
Collaborator

commented Oct 22, 2014

Also we have our own monitoring system for process instances and forks. There are no crashes. When I tried to log job start/end I noticed that stuck job was not even started.

Started means not queued? or not pushed to active state? We should talk in kue terminology so I can plainly understand your problem and give you the right advice

I have my own nodejs module "Sequence", that bounds async methods in one chain. Every step is try/catched, so when sequence crash, it quits gracefully and done() is called.

try-catch just gaurds against synchronous errors (which are very rare in node.js code). You sure know about the proper error handling in node.js (promises, domains, ...)

About project size... We have 5-10 thousands posting jobs a day. Every posting job consists of 2 delayed kue jobs and one not. Then we have a lot of jobs not related to posting (converting uploaded pictures, saving converted paths to db, importing from social network and so on).

That's not BIG. we are processing 1.5 million jobs/day, and seen not a stuck job in last 200,000,000 processed jobs during our last redis polishment 4 months ago. We are not even using watchStuckJobs call.

I read #130. When 1.0.0 will be available?

may be early 2015,
Add watchStuckJobs in one of your processes, and properly gaurd your redis client node.js processes against crashes as I said, then sit down and enjoy Kueing until then :)

@behrad

This comment has been minimized.

Copy link
Collaborator

commented Oct 22, 2014

and what news about the topics main issue (=duplicates on delayed jobs) ?

@brianconnoly

This comment has been minimized.

Copy link
Author

commented Oct 22, 2014

  1. Started means jobs.process..... fired. Job remains in queued. (I'll try to be more concrete. English is not my native language, sorry for blurred description)
  2. nodejs domains are new to me. Started reading documentation on it.
  3. It's good. Now I know that I will stay with kue )
  4. Am I right, that watchStuckJobs should be added to each kue prefix?
  5. About duplicated delayed jobs, I puted a lot of logging code and am waiting for users to report duplicated posts, to see what's happening.

Thank you very much for your answers!
Kue = THE BEST! ;)

@behrad

This comment has been minimized.

Copy link
Collaborator

commented Oct 22, 2014

I'll try to be more concrete. English is not my native language, sorry for blurred description

Not that bad, just read kue documentation carefully

nodejs domains are new to me. Started reading documentation on it

I'll push some docs on proper error handling in documentation in a few weeks

@brianconnoly

This comment has been minimized.

Copy link
Author

commented Oct 22, 2014

I edited message above.

@behrad

This comment has been minimized.

Copy link
Collaborator

commented Oct 22, 2014

  1. Am I right, that watchStuckJobs should be added to each kue prefix?

Yes, but unfortunately I've hard coded it for default q prefix since it's initial imlementation version, I'm opening an issue to fix this in next patch version

  1. About duplicated delayed jobs, I puted a lot of logging code and am waiting for users to report duplicated posts, to see what's happening

nice, we can have a better kue this way

@behrad

This comment has been minimized.

Copy link
Collaborator

commented Dec 15, 2014

@jamiter

This comment has been minimized.

Copy link

commented Jan 16, 2015

Any updated on this @brianconnoly? I'm having the same issue. Only running one process with one jobs.promote() without parameters.

Something I'm doing is changing the max attempts during job execution, which might be the cause. I'm doing this because the error generated by the job should determine if it should be run again or not. (The job is using the backoff mechanism which triggers the delay.) Maybe theres a better pattern for this.

Edit:
Now that I think of it, maybe I should listen to failed attempt messages and if that error shouldn't retry again, mark the job as failed.

@behrad

This comment has been minimized.

Copy link
Collaborator

commented Mar 21, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.