-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add triggered_by field to DAG Run model to distinguish the source of a trigger #39165
base: main
Are you sure you want to change the base?
Conversation
airflow/api/client/api_client.py
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
airflow/api/client/*
files exists for mainly to support legacy experimental API, I don't think we need to extend those
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since they currently exist, I thought that I need to extend those files also. I can revert those changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reverted changes for the airflow/api/client
folder. Since some of the clients are using other methods, I only touched on fixing the related places
f20f695
to
3a1a6d2
Compare
3a1a6d2
to
1cd84f6
Compare
airflow/utils/types.py
Outdated
class DagRunTriggeredByType(str, enum.Enum): | ||
"""Class with TriggeredBy types for DagRun.""" | ||
|
||
CLI = "cli" # for the trigger subcommand for dag command in cli: airflow dags trigger |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not read like English
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the comment wording
REST_API = "rest_api" # for triggering the DAG via RESTful API | ||
UI = "ui" # for clicking the `Trigger DAG` button | ||
TEST = "test" # for dag.test() | ||
SCHEDULER = "scheduler" # for scheduler |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be say TIME
or TIMETABLE
or something to distinguish from DATASET
. Maybe something like TIME_TRIGGERD
and EVENT_TRIGGERED
would work too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I was thinking that the SCHEDULER
value is for timetable related triggering and DATASET
value is for Dataset related triggering.
I saw this image on the documentation and it states "Triggered by Datasets" so I was thinking the DATASET
is a good value for this.
e404910
to
f04baeb
Compare
0700bbe
to
0d09778
Compare
Do we still need |
c56b94c
to
bb2aa72
Compare
I don’t disagree with using |
Side note: this PR and the |
bce180c
to
c2627b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was taking a look to this PR and see that the field will add valuable information. But not completely as besides adding the marker as type we still would not know "which CLI", which "User" in the UI or which DAG run/Operator triggered a DAG run. So it adds parts of traceability with a lot of changes in the code (1000+LoC code changed).
Have you considered adding the details for your needs as traceability to the LOG / audit log table? This would just be another type in the audit logs and would prevent the need to extend the DB scheme for the DAG run table.
e1190e5
to
4da529c
Compare
Hi @jscheffl, Thank you for your message and your point of view. The thing is that at the beginning we were trying to address a very specific use case to distinguish operator-triggered runs and scheduled runs. While trying to find a solution, I implemented #37087 but we had a discussion on that PR and as a conclusion we decided to add the Since we can trigger a DAG run with different sources, it was making sense to store this information in the table. |
4da529c
to
cf273d1
Compare
Add the
triggered_by
field to the DAG Run model to distinguish the source of a trigger.This is for the discussion in #37087.
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.