-
Notifications
You must be signed in to change notification settings - Fork 13.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle and log exceptions raised during task callback #17347
Handle and log exceptions raised during task callback #17347
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add tests for on_failure_callback and on_retry_callback. This is to avoid regression.
5acd814
to
2120671
Compare
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
Can you rebase @SamWheating |
2120671
to
0bd7b3b
Compare
Closing and reopening to trigger tests |
@ashb please take a look |
@SamWheating please rebase again |
0bd7b3b
to
5dd6848
Compare
Add missing exception handling in success/retry/failure callbacks (cherry picked from commit faf9f73)
Add missing exception handling in success/retry/failure callbacks (cherry picked from commit faf9f73)
Add missing exception handling in success/retry/failure callbacks (cherry picked from commit faf9f73)
Add missing exception handling in success/retry/failure callbacks (cherry picked from commit faf9f73)
Currently, an exception thrown in a task-level callback will be unhandled, so potentially important parts of the callback (for example, sending a slack notification after a job failure) may be skipped. This can make it really difficult to identify and fix errors in callbacks.
To demonstrate, here'a a simple DAG with an error in the callback function:
The task execution logs will cut off where the exception is thrown:
And the unhandled exception will show up in the executor logs:
By handling this exception and logging it, we can make callback failures much easier to identify and fix. After this change, the task logs look like this:
This removes the stacktrace from the executor logs. If we want to avoid changing this behaviour then we could also re-raise the exception after logging. Thoughts?