Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Seeing NOT_FOUND after job times out and should be buried #82

Open
boncey opened this issue Jan 11, 2015 · 7 comments
Open

Seeing NOT_FOUND after job times out and should be buried #82

boncey opened this issue Jan 11, 2015 · 7 comments

Comments

@boncey
Copy link
Contributor

boncey commented Jan 11, 2015

I am seeing strange behaviour when a job times out.

If a job times out then it doesn't get buried, instead backburner gets a NOT_FOUND error when it tries to bury it.
This is an excerpt of the logs when this happens.

   .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/job.rb:81:in `rescue in timeout_job_after'
   .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/job.rb:78:in `timeout_job_after'
   .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/job.rb:49:in `block in process'
   .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/hooks.rb:28:in `call'
   .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/hooks.rb:28:in `block in around_hook_events'
   .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/hooks.rb:37:in `call'
   .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/hooks.rb:37:in `around_hook_events'
   .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/job.rb:48:in `process'
   .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/worker.rb:143:in `work_one_job'
   .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/workers/simple.rb:25:in `block in start'
   .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/workers/simple.rb:25:in `loop'
   .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/workers/simple.rb:25:in `start'
   .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/worker.rb:48:in `start'
   .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner.rb:39:in `work'
   .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/cli.rb:21:in `block in start'
   .../ruby/2.1.0/gems/dante-0.1.5/lib/dante/runner.rb:112:in `call'
   .../ruby/2.1.0/gems/dante-0.1.5/lib/dante/runner.rb:112:in `start'
   .../ruby/2.1.0/gems/dante-0.1.5/lib/dante/runner.rb:72:in `execute'
   .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/cli.rb:17:in `start'
   .../ruby/2.1.0/gems/backburner-0.4.5/bin/backburner:7:in `<top (required)>'
   .../ruby/2.1.0/bin/backburner:23:in `load'
   .../ruby/2.1.0/bin/backburner:23:in `<main>'
.../ruby/2.1.0/gems/beaneater-0.3.2/lib/beaneater/connection.rb:116:in `parse_response': Response failed with: NOT_FOUND (Beaneater::NotFoundError)
    from .../ruby/2.1.0/gems/beaneater-0.3.2/lib/beaneater/connection.rb:55:in `block in transmit'
    from .../ruby/2.1.0/gems/beaneater-0.3.2/lib/beaneater/connection.rb:49:in `synchronize'
    from .../ruby/2.1.0/gems/beaneater-0.3.2/lib/beaneater/connection.rb:49:in `transmit'
    from .../ruby/2.1.0/gems/beaneater-0.3.2/lib/beaneater/job/record.rb:193:in `transmit'
    from .../ruby/2.1.0/gems/beaneater-0.3.2/lib/beaneater/job/record.rb:208:in `with_reserved'
    from .../ruby/2.1.0/gems/beaneater-0.3.2/lib/beaneater/job/record.rb:38:in `bury'
    from .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/worker.rb:156:in `rescue in work_one_job'
    from .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/worker.rb:140:in `work_one_job'
    from .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/workers/simple.rb:25:in `block in start'
    from .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/workers/simple.rb:25:in `loop'
    from .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/workers/simple.rb:25:in `start'
    from .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/worker.rb:48:in `start'
    from .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner.rb:39:in `work'
    from .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/cli.rb:21:in `block in start'
    from .../ruby/2.1.0/gems/dante-0.1.5/lib/dante/runner.rb:112:in `call'
    from .../ruby/2.1.0/gems/dante-0.1.5/lib/dante/runner.rb:112:in `start'
    from .../ruby/2.1.0/gems/dante-0.1.5/lib/dante/runner.rb:72:in `execute'
    from .../ruby/2.1.0/gems/backburner-0.4.5/lib/backburner/cli.rb:17:in `start'
    from .../ruby/2.1.0/gems/backburner-0.4.5/bin/backburner:7:in `<top (required)>'
    from .../ruby/2.1.0/bin/backburner:23:in `load'
    from .../ruby/2.1.0/bin/backburner:23:in `<main>'

my-queue-name: process with pid 12641 started.
Starting backburner service...

At that point the worker process dies and is restarted (via Ruby Daemons) at which point the timed out job then runs successfully to completion.

In my backburner configuration I don't override retries so it's set to the default of 1.
Also, I only have a single worker processing jobs on the queue.
Queue size is never more than a 100 jobs at a time.

I'm using backburner 0.4.5.

@nesquena
Copy link
Owner

Hmm, this definitely doesn't sound great. Like retry functionality may be improperly executing now in certain cases. Unfortunately I don't have the bandwidth to investigate this at the moment. I could definitely use additional help with maintainers for this library if anyone is using it and interested in helping fix issues.

@silentshade
Copy link
Collaborator

This might be the same issue as i explained here #86

@contentfree
Copy link
Collaborator

Ran into this today myself (or at least uncovered it… sigh). I'm on Backburner 1.1.0 with beaneater ~> 1.0.

I'm baffled by this. The only way that beaneater would say NOT_FOUND is if the job was deleted, right? Though I suppose it could be that it has lost its connection (maybe?) and hasn't reconfigured itself correctly (though that should be happening in beaneater's _with_retry)… maybe?

What do you think @nesquena @silentshade?

@contentfree
Copy link
Collaborator

@boncey Do you have a repro case?

@boncey
Copy link
Contributor Author

boncey commented Oct 1, 2015

@contentfree Sorry, I'm afraid not.
I've just looked in my logs and it's still happening although my system copes fine with it so it's not actually bothering me.

I've switched from Ruby Daemons to God and I have a lot more jobs and workers now than when I reported this but I still see it happening.

@contentfree
Copy link
Collaborator

From beanstalkd's protocol.txt it states:

A client can delete jobs that it has reserved, ready jobs, delayed jobs, and jobs that are buried.
Unfortunately beanstalkd isn't verbose enough with its -VVV flag (it won't show the actual command submitted) so it's not easy to track what's going on.

I'll run a packet capture. I can recreate it pretty consistently (on Windows at least) by suspending one of the child threads and waiting for the timeout. Fails every time.

@contentfree
Copy link
Collaborator

Working on the packet capture, but here's a stacktrace from my failure (using 1.1.0 and beaneater 1.0) – And yes, this be Windows... don't ask!

E, [2015-10-01T21:17:09.196290 #2832] ERROR -- : Exception Backburner::Job::JobTimeout -> Renderer::AfterEffects hit 180s timeout.
backtrace: [
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/nio4r-1.1.1/lib/nio/selector.rb:61:in `select'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/nio4r-1.1.1/lib/nio/selector.rb:61:in `block in select'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/nio4r-1.1.1/lib/nio/selector.rb:51:in `synchronize'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/nio4r-1.1.1/lib/nio/selector.rb:51:in `select'", 
  "C:/var/www/vpe-public-api/current/lib/better_timeout.rb:56:in `block in run_with_timeout'", 
  "C:/Ruby200/lib/ruby/2.0.0/open3.rb:217:in `popen_run'", 
  "C:/Ruby200/lib/ruby/2.0.0/open3.rb:206:in `popen2e'", 
  "C:/var/www/vpe-public-api/current/lib/better_timeout.rb:26:in `run_with_timeout'", 
  "C:/var/www/vpe-public-api/current/lib/renderers/after_effects.rb:69:in `generate_aep'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/performable.rb:36:in `perform'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/job.rb:49:in `block (2 levels) in process'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/job.rb:89:in `block in timeout_job_after'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/job.rb:89:in `timeout_job_after'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/job.rb:49:in `block in process'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/hooks.rb:28:in `call'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/hooks.rb:28:in `block in around_hook_events'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/hooks.rb:37:in `call'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/hooks.rb:37:in `around_hook_events'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/job.rb:48:in `process'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/worker.rb:134:in `work_one_job'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/workers/simple.rb:25:in `block in start'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/workers/simple.rb:25:in `loop'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/workers/simple.rb:25:in `start'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/worker.rb:49:in `start'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner.rb:39:in `work'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/cli.rb:23:in `block in start'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/dante-0.2.0/lib/dante/runner.rb:123:in `call'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/dante-0.2.0/lib/dante/runner.rb:123:in `start'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/dante-0.2.0/lib/dante/runner.rb:74:in `execute'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/cli.rb:20:in `start'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/bin/backburner:7:in `<top (required)>'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/bin/backburner:23:in `load'", 
  "C:/Ruby200/lib/ruby/gems/2.0.0/bin/backburner:23:in `<main>'"
]
   C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/job.rb:91:in `rescue in timeout_job_after'
   C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/job.rb:88:in `timeout_job_after'
   C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/job.rb:49:in `block in process'
   C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/hooks.rb:28:in `call'
   C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/hooks.rb:28:in `block in around_hook_events'
   C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/hooks.rb:37:in `call'
   C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/hooks.rb:37:in `around_hook_events'
   C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/job.rb:48:in `process'
   C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/worker.rb:134:in `work_one_job'
   C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/workers/simple.rb:25:in `block in start'
   C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/workers/simple.rb:25:in `loop'
   C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/workers/simple.rb:25:in `start'
   C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/worker.rb:49:in `start'
   C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner.rb:39:in `work'
   C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/cli.rb:23:in `block in start'
   C:/Ruby200/lib/ruby/gems/2.0.0/gems/dante-0.2.0/lib/dante/runner.rb:123:in `call'
   C:/Ruby200/lib/ruby/gems/2.0.0/gems/dante-0.2.0/lib/dante/runner.rb:123:in `start'
   C:/Ruby200/lib/ruby/gems/2.0.0/gems/dante-0.2.0/lib/dante/runner.rb:74:in `execute'
   C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/lib/backburner/cli.rb:20:in `start'
   C:/Ruby200/lib/ruby/gems/2.0.0/gems/backburner-1.1.0/bin/backburner:7:in `<top (required)>'
   C:/Ruby200/lib/ruby/gems/2.0.0/bin/backburner:23:in `load'
   C:/Ruby200/lib/ruby/gems/2.0.0/bin/backburner:23:in `<main>'

My problem might might be slightly different, but when my job times out it results in the threads I open in my job to become zombies. I probably need to do something different on my side to handle the timeout better…

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants