-
Notifications
You must be signed in to change notification settings - Fork 16.6k
Description
Description
Any time an Airflow component changes the state of a task instance, it should record that change in an audit-log-like table of changes. Thus the user will be able to easily see what happened to their tasks.
| dag_id | task_id | run_id | map_index | state | time_changed | component_type | component_id |
|---|---|---|---|---|---|---|---|
| example_dag | config_file_sensor | scheduled_2022... | -1 | queued | 2022-07-25T12:01:01 | scheduler | |
| example_dag | config_file_sensor | scheduled_2022... | -1 | running | 2022-07-25T12:24:01 | worker |
Since task_instance is already one of the biggest tables, this table definitely has the potential to be very big. I think it should probably be off by default with a config flag for turning it on. It seems like it should probably only be used in conjunction with regular runs of airflow db clean.
Use case/motivation
Tracing the lifecycle of a task instance across Airflow component logs is quite tedious and involves effectively building the described table in your head or on a notepad. Many times when I'm trying to understand what happened to a task, such investigation is necessary. It would also help answer questions like "which task instances were in [state] at this particular time in the past".
Related issues
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Code of Conduct
- I agree to follow this project's Code of Conduct