Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add triggered_by field to DAG Run model to distinguish the source of a trigger #39165

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

molcay
Copy link
Contributor

@molcay molcay commented Apr 22, 2024

Add the triggered_by field to the DAG Run model to distinguish the source of a trigger.
This is for the discussion in #37087.

Currently this PR contains only the backend changes. No changes done on the frontend. We can discuss and add some UI elements (different border types, colors, or icons etc.) to have a more distinct look on the UI as well.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@boring-cyborg boring-cyborg bot added area:API Airflow's REST/HTTP API area:CLI area:core-operators Operators, Sensors and hooks within Core Airflow area:dev-tools area:Scheduler Scheduler or dag parsing Issues area:serialization area:webserver Webserver related Issues labels Apr 22, 2024
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

airflow/api/client/* files exists for mainly to support legacy experimental API, I don't think we need to extend those

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since they currently exist, I thought that I need to extend those files also. I can revert those changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reverted changes for the airflow/api/client folder. Since some of the clients are using other methods, I only touched on fixing the related places

@Taragolis Taragolis added this to the Airflow 2.10.0 milestone Apr 22, 2024
@molcay molcay force-pushed the feat/add-manual-triggerer-type branch 2 times, most recently from f20f695 to 3a1a6d2 Compare April 25, 2024 09:35
class DagRunTriggeredByType(str, enum.Enum):
"""Class with TriggeredBy types for DagRun."""

CLI = "cli" # for the trigger subcommand for dag command in cli: airflow dags trigger
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not read like English

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the comment wording

REST_API = "rest_api" # for triggering the DAG via RESTful API
UI = "ui" # for clicking the `Trigger DAG` button
TEST = "test" # for dag.test()
SCHEDULER = "scheduler" # for scheduler
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be say TIME or TIMETABLE or something to distinguish from DATASET. Maybe something like TIME_TRIGGERD and EVENT_TRIGGERED would work too.

Copy link
Contributor Author

@molcay molcay May 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I was thinking that the SCHEDULER value is for timetable related triggering and DATASET value is for Dataset related triggering.
I saw this image on the documentation and it states "Triggered by Datasets" so I was thinking the DATASET is a good value for this.

@molcay molcay force-pushed the feat/add-manual-triggerer-type branch 3 times, most recently from e404910 to f04baeb Compare May 9, 2024 17:13
@molcay molcay force-pushed the feat/add-manual-triggerer-type branch 7 times, most recently from 0700bbe to 0d09778 Compare May 22, 2024 08:31
@ephraimbuddy
Copy link
Contributor

Do we still need run_type? I think it should be removed in a follow-up PR and this triggered_by is used in places where run_type is used

@molcay molcay force-pushed the feat/add-manual-triggerer-type branch 4 times, most recently from c56b94c to bb2aa72 Compare May 27, 2024 08:51
@uranusjr
Copy link
Member

I don’t disagree with using triggered_by, but am not sure about removing run_type due to compatibility issues. And logically the two might not line up exactly…? (In the sense that many runs triggered by the same component may have different types, and vice versa.)

@molcay
Copy link
Contributor Author

molcay commented May 31, 2024

Side note: this PR and the triggered_by field is a result of an attempt to add a new run_type in #37087. When we think about the backward compatibility I think it makes sense to keep the run_type field.

@molcay molcay force-pushed the feat/add-manual-triggerer-type branch 2 times, most recently from bce180c to c2627b9 Compare May 31, 2024 13:45
Copy link
Contributor

@jscheffl jscheffl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was taking a look to this PR and see that the field will add valuable information. But not completely as besides adding the marker as type we still would not know "which CLI", which "User" in the UI or which DAG run/Operator triggered a DAG run. So it adds parts of traceability with a lot of changes in the code (1000+LoC code changed).

Have you considered adding the details for your needs as traceability to the LOG / audit log table? This would just be another type in the audit logs and would prevent the need to extend the DB scheme for the DAG run table.

@molcay molcay force-pushed the feat/add-manual-triggerer-type branch 2 times, most recently from e1190e5 to 4da529c Compare June 3, 2024 13:29
@molcay
Copy link
Contributor Author

molcay commented Jun 3, 2024

Hi @jscheffl,

Thank you for your message and your point of view. The thing is that at the beginning we were trying to address a very specific use case to distinguish operator-triggered runs and scheduled runs. While trying to find a solution, I implemented #37087 but we had a discussion on that PR and as a conclusion we decided to add the triggered_by field to distinguish the source for the dag run. It might be handy to know which component triggered the DAG Run.

Since we can trigger a DAG run with different sources, it was making sense to store this information in the table.
To be able to distinguish who started the run from the UI, or which operator/task triggered the run in the code a bit more excessive, at least at this point. Maybe adding a record to LOG/audit log table can be useful to know which specific task triggered the run or who started the run from the UI.

@molcay molcay force-pushed the feat/add-manual-triggerer-type branch from 4da529c to cf273d1 Compare June 10, 2024 16:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:API Airflow's REST/HTTP API area:CLI area:core-operators Operators, Sensors and hooks within Core Airflow area:dev-tools area:Scheduler Scheduler or dag parsing Issues area:serialization area:webserver Webserver related Issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants