Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems if worker doesn't have an error listener #47

Closed
nathanbowser opened this issue Aug 26, 2014 · 10 comments
Closed

Problems if worker doesn't have an error listener #47

nathanbowser opened this issue Aug 26, 2014 · 10 comments

Comments

@nathanbowser
Copy link
Contributor

If a worker doesn't have an error event listener attached and a worker has an error, things get into a weird state because doneWorking is never called. You can see why by looking into node's eventemitter: https://github.com/joyent/node/blob/master/lib/events.js#L74-89

nathanbowser added a commit to SpiderStrategies/node-resque that referenced this issue Aug 26, 2014
@evantahler
Copy link
Member

Can you clarify the behavior you are seeing? In general, I would expect the process/project using node-resque to crash outright if the error event is not handled.

@nathanbowser
Copy link
Contributor Author

Resque will keep that worker around as running and it will never be freed.

@evantahler
Copy link
Member

Ahh! I think you are referring to the fact that the process will die, but the worker.untrack() (and doneWorking()) won't be called (because the process is dead). This means the worker will be listed in redis as still 'working'. In #48 (comment) we are discussing how I don't think that this package should handle process signal listeners, and I think that this is a similar kind of thing. Due to the way node assumes that uncaught error emits should bubble up as errors, I think we would keep the same pattern, and not try to do more logic on top of it.

There are some hacks that can be done, IE:

  • place your invocation of node resque within a domain (to catch any un-caught error events)
  • use process.on('uncaughtException') to catch that error event
  • ... or just catch that error event :D

@evantahler
Copy link
Member

Perhaps we should add a note about this to the readme.

@nathanbowser
Copy link
Contributor Author

The process doesn't die because resque is already using a domain. Here's an example:

var NR = require(__dirname + '/../index.js')

var connectionDetails = {
  package:   'redis',
  host:      '127.0.0.1',
  password:  '',
  port:      6379,
  database:  1,
  namespace: 'resquetest'
}

var jobs = {
  add: {
    perform: function (a, b, next) {
      console.log('performing...')
      next(new Error('Blue smoke'))
    }
  }
}

var worker = new NR.worker({connection: connectionDetails, queues: ['math']}, jobs, function () {
  worker.workerCleanup()
  worker.start()
})

var queue = new NR.queue({connection: connectionDetails}, jobs, function () {
  queue.enqueue('math', 'add', [1,2])
})

worker.start()

setTimeout(function () {
  queue.enqueue('math', 'add', [1,2])
}, 10000)

I had to put in a temporary solution for catching the error event for now. If that's your ideal permanent solution (which I personally don't feel is very clean), then we should at least document that.

@nathanbowser
Copy link
Contributor Author

Oh, you published the readme note before I hit send. :)

@nathanbowser
Copy link
Contributor Author

I guess another thing you might want to think about... some perform errors might not be errors in the sense that the process will be all out of whack. It could just be an error from not being able to talk to some API. If we were always dealing with errors that put node in a horrible state, then yea I think the process should be killed, but that's not always the case.

@evantahler
Copy link
Member

Ahh! That last example was helpful. Yes, I agree this is broken in some way. I'll look into it.

@evantahler evantahler reopened this Aug 27, 2014
@evantahler
Copy link
Member

The weirdness actually happens in worker.fail. we get here from the domain on('error') in worker.perform, so we are emitting an error within an uncaught error callback...

@evantahler
Copy link
Member

Closing in favor of #49
Thanks for pointing this out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants