Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling error emails going forward? #3588

Closed
jdotjdot opened this issue Nov 13, 2016 · 11 comments
Closed

Handling error emails going forward? #3588

jdotjdot opened this issue Nov 13, 2016 · 11 comments

Comments

@jdotjdot
Copy link

The changelog says that all email-related functionality has been removed. We made heavy use of the error emails feature in Celery.

Since this is removed, what is now the recommended way to handle errors in Celery tasks?

If we were to reimplement the error email sender internally, where would be the right place to hook that?

Thanks!

@fuhrysteve
Copy link
Contributor

@jdotjdot take a look at https://sentry.io

@jdotjdot
Copy link
Author

I currently use Opbeat but I want to actually send them from Celery--also
without using an external service. Is there no hook to do this?

On Sunday, November 13, 2016, Stephen J. Fuhry notifications@github.com
wrote:

@jdotjdot https://github.com/jdotjdot take a look at https://sentry.io


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3588 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABRHQX3l0PyKx5C6M7sjCYVEUJJUArfFks5q9xrYgaJpZM4Kwovv
.

@fuhrysteve
Copy link
Contributor

You can implement your own base task like this to handle errors however you would like:

http://docs.celeryproject.org/en/latest/userguide/tasks.html#task-inheritance
http://docs.celeryproject.org/en/latest/userguide/tasks.html#on_failure

@jdotjdot
Copy link
Author

Thanks. And on_failure will properly handle tasks even if they're hard-killed with SIGKILL? That was always my concern about on_failure. For (bad) architectural reasons, we end up with tasks SIGKILL-ed a lot, and that's when I need to handle this the most.

@fuhrysteve
Copy link
Contributor

fuhrysteve commented Nov 13, 2016

No, when SIGKILL is delivered the kernel does not allow any activity by the process.

Try using SIGTERM instead of SIGKILL if you are able. This will enable a warm shutdown.

The process manager supervisor, for instance, uses SIGTERM as the default kill signal sent by supervisorctl stop $JOB. It then it waits for a configurable amount of time (specified by config value stopwaitsecs) before sending the process SIGKILL, if it is still alive.

I'd suggest mimicking that pattern if you are able.

  • send SIGTERM
  • wait n seconds (whatever you decide is reasonable for your situation)
  • if process is still alive, send SIGKILL or SIGQUIT

http://celery.readthedocs.io/en/latest/userguide/workers.html#process-signals

@jdotjdot
Copy link
Author

I am aware of that this--unfortunately I have no control over what signals are sent or when, our celery workers currently are on Heroku and Heroku follows this shutdown process when dynos restart for any reason.

I was always shocked that when celery tasks were shut down by SIGKILL that I still got an error email since I would have imagined the SIGKILL would have prevented it, yet I did receive them.

Example:
image

Because there appears to be no clean way in tasks to listen for SIGTERM, we've come to rely on these error emails to deal with prematurely shut down tasks.

If on_failure will not do this, are you saying we have no options?

@jdotjdot
Copy link
Author

The best I've got since I know RabbitMQ well is to make all task idempotent and use the new reject_on_worker_lost (which, by the way, I'm really excited about), but it doesn't feel like the best solution to me, and also not all tasks can be idempotent--and some will require cleanup on kill anyway.

@fuhrysteve
Copy link
Contributor

Interesting, I'm surprised you got the error email after SIGKILL was received.

That being the case, I'd suspect that on_failure is your best bet. If that doesn't work, you'll have to dig around and see where mail_admins was being called

@jdotjdot
Copy link
Author

Is it possible that I'm actually wrong, and that what is happening is that the celery worker is receiving the SIGTERM and forcibly shutting down the child processes (resulting in the email being sent)--and this is all happening before SIGKILL?

@ask
Copy link
Contributor

ask commented Dec 1, 2016

All errors in Celery are logged, so make sure you set sentry or other monitoring tools to listen for all types of Celery error logs, not just in the on_failure signal.

It used to be necessary to configure special support for Celery in Sentry, this was done to register
the variables in the stack frame, but this should not be needed for Celery 4.

@jdotjdot
Copy link
Author

jdotjdot commented Dec 1, 2016

Thanks. We use Django, so I could simply configure the celery logger to use mail_admins and send to Sentry? What is the logger named?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants