Handling error emails going forward? #3588

jdotjdot · 2016-11-13T08:10:44Z

The changelog says that all email-related functionality has been removed. We made heavy use of the error emails feature in Celery.

Since this is removed, what is now the recommended way to handle errors in Celery tasks?

If we were to reimplement the error email sender internally, where would be the right place to hook that?

Thanks!

fuhrysteve · 2016-11-13T14:12:36Z

@jdotjdot take a look at https://sentry.io

jdotjdot · 2016-11-13T15:02:45Z

I currently use Opbeat but I want to actually send them from Celery--also
without using an external service. Is there no hook to do this?

On Sunday, November 13, 2016, Stephen J. Fuhry notifications@github.com
wrote:

@jdotjdot https://github.com/jdotjdot take a look at https://sentry.io

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#3588 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/ABRHQX3l0PyKx5C6M7sjCYVEUJJUArfFks5q9xrYgaJpZM4Kwovv
.

fuhrysteve · 2016-11-13T16:06:24Z

You can implement your own base task like this to handle errors however you would like:

http://docs.celeryproject.org/en/latest/userguide/tasks.html#task-inheritance
http://docs.celeryproject.org/en/latest/userguide/tasks.html#on_failure

jdotjdot · 2016-11-13T17:13:02Z

Thanks. And on_failure will properly handle tasks even if they're hard-killed with SIGKILL? That was always my concern about on_failure. For (bad) architectural reasons, we end up with tasks SIGKILL-ed a lot, and that's when I need to handle this the most.

fuhrysteve · 2016-11-13T18:40:57Z

No, when SIGKILL is delivered the kernel does not allow any activity by the process.

Try using SIGTERM instead of SIGKILL if you are able. This will enable a warm shutdown.

The process manager supervisor, for instance, uses SIGTERM as the default kill signal sent by supervisorctl stop $JOB. It then it waits for a configurable amount of time (specified by config value stopwaitsecs) before sending the process SIGKILL, if it is still alive.

I'd suggest mimicking that pattern if you are able.

send SIGTERM
wait n seconds (whatever you decide is reasonable for your situation)
if process is still alive, send SIGKILL or SIGQUIT

http://celery.readthedocs.io/en/latest/userguide/workers.html#process-signals

jdotjdot · 2016-11-13T18:49:43Z

I am aware of that this--unfortunately I have no control over what signals are sent or when, our celery workers currently are on Heroku and Heroku follows this shutdown process when dynos restart for any reason.

I was always shocked that when celery tasks were shut down by SIGKILL that I still got an error email since I would have imagined the SIGKILL would have prevented it, yet I did receive them.

Example:

Because there appears to be no clean way in tasks to listen for SIGTERM, we've come to rely on these error emails to deal with prematurely shut down tasks.

If on_failure will not do this, are you saying we have no options?

jdotjdot · 2016-11-13T18:52:39Z

The best I've got since I know RabbitMQ well is to make all task idempotent and use the new reject_on_worker_lost (which, by the way, I'm really excited about), but it doesn't feel like the best solution to me, and also not all tasks can be idempotent--and some will require cleanup on kill anyway.

fuhrysteve · 2016-11-13T19:03:33Z

Interesting, I'm surprised you got the error email after SIGKILL was received.

That being the case, I'd suspect that on_failure is your best bet. If that doesn't work, you'll have to dig around and see where mail_admins was being called

jdotjdot · 2016-11-14T06:41:27Z

Is it possible that I'm actually wrong, and that what is happening is that the celery worker is receiving the SIGTERM and forcibly shutting down the child processes (resulting in the email being sent)--and this is all happening before SIGKILL?

ask · 2016-12-01T19:51:18Z

All errors in Celery are logged, so make sure you set sentry or other monitoring tools to listen for all types of Celery error logs, not just in the on_failure signal.

It used to be necessary to configure special support for Celery in Sentry, this was done to register
the variables in the stack frame, but this should not be needed for Celery 4.

jdotjdot · 2016-12-01T19:56:48Z

Thanks. We use Django, so I could simply configure the celery logger to use mail_admins and send to Sentry? What is the logger named?

auvipy added the Issue Type: Question label Dec 20, 2017

auvipy closed this as completed Dec 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling error emails going forward? #3588

Handling error emails going forward? #3588

jdotjdot commented Nov 13, 2016

fuhrysteve commented Nov 13, 2016

jdotjdot commented Nov 13, 2016

fuhrysteve commented Nov 13, 2016

jdotjdot commented Nov 13, 2016

fuhrysteve commented Nov 13, 2016 •

edited

jdotjdot commented Nov 13, 2016

jdotjdot commented Nov 13, 2016

fuhrysteve commented Nov 13, 2016

jdotjdot commented Nov 14, 2016

ask commented Dec 1, 2016

jdotjdot commented Dec 1, 2016

Handling error emails going forward? #3588

Handling error emails going forward? #3588

Comments

jdotjdot commented Nov 13, 2016

fuhrysteve commented Nov 13, 2016

jdotjdot commented Nov 13, 2016

fuhrysteve commented Nov 13, 2016

jdotjdot commented Nov 13, 2016

fuhrysteve commented Nov 13, 2016 • edited

jdotjdot commented Nov 13, 2016

jdotjdot commented Nov 13, 2016

fuhrysteve commented Nov 13, 2016

jdotjdot commented Nov 14, 2016

ask commented Dec 1, 2016

jdotjdot commented Dec 1, 2016

fuhrysteve commented Nov 13, 2016 •

edited