Skip to content

Add Task Instance Lifecycle History table #25254

@collinmcnulty

Description

@collinmcnulty

Description

Any time an Airflow component changes the state of a task instance, it should record that change in an audit-log-like table of changes. Thus the user will be able to easily see what happened to their tasks.

dag_id task_id run_id map_index state time_changed component_type component_id
example_dag config_file_sensor scheduled_2022... -1 queued 2022-07-25T12:01:01 scheduler
example_dag config_file_sensor scheduled_2022... -1 running 2022-07-25T12:24:01 worker

Since task_instance is already one of the biggest tables, this table definitely has the potential to be very big. I think it should probably be off by default with a config flag for turning it on. It seems like it should probably only be used in conjunction with regular runs of airflow db clean.

Use case/motivation

Tracing the lifecycle of a task instance across Airflow component logs is quite tedious and involves effectively building the described table in your head or on a notepad. Many times when I'm trying to understand what happened to a task, such investigation is necessary. It would also help answer questions like "which task instances were in [state] at this particular time in the past".

Related issues

#25252

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions