-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DAG regex flag in backfill command #23870
Conversation
There are few more tests will fail if you change "dag" into "dags". I think this is a bit of compatibility issue to replace it this way - if anyone used that command externally this is a bit unexpected change. I think better approach will be to add separate "dags" field rather than replace "dag" and fail the command if both parameters are specified. Also it would be great to set some deterministic order of processing the dags. Right now it's a bit random (depends on sequence of hashmap of values in dagbag which depends on sequence of dags returned by SQL query which is not deterministic). Likely sortting the dag_id would be a bit more "stable". |
Thank you for the reply. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
But another approval woudl be great as this is part of the core |
The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease. |
9c5e582
to
4499490
Compare
4499490
to
e0d6785
Compare
Rebased to check if Helm tests are fixed. |
e0d6785
to
51d8549
Compare
Docs failing |
Hello @potiuk, after running build-docs locally I'm getting: Incorrect spelling in |
yes you need to fix the incorrect spelling |
@potiuk, thank you. All checks have passed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should check if we have more than one dag returned and ask for confirmation (optionally with a flag that should assume you are ok with it).
I also think --dag-regex
flag is wrong name. It is supprising that this is a boolean flag, rather than string argument (as opposed to --task-regex
). This is very inconsistent and confusing.
I think it should be maybe --treat-dag-as-regex
- yeah, longer name, but far less confusing - maybe you will come up with better name that will indicate better its boolean flag nature.
Or maybe it should be --dag-regex REGEX
in which case dag id should not be specified (and become optional).
Both solutions are better than the current one.
I wonder if it’s even viable to automatically detect whether the passed value is regex. DAG IDs can only contain a very limited set of characters, and the only character that has special meanings is |
@potiuk @uranusjr Thank you for the feedback. |
LGTM. WDYT @uranusjr ? I think "Explicit is better than implicit" so even if we could detect it automatically, I think we should not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are just doing this sequentially (rather than all dags in a single BackfillJob) I'm not even all that sure we need this feature. Isn't this very very similar to a loop in bash:
for dag_id in dag1 dag2 dag3 do;
airflow dags backfil "$dag_id"
done
(Slight difference of explicit list of IDs vs a regex, yes)
(This point is a discussion to talk about, the comment below is a must fix)
I think you need to rebase and solve conflicts @domagojrazum - there were some main fixes that need to be taken into account. |
015ab26
to
cd7fbe9
Compare
@potiuk, rebased and pushed. |
The failing test is fixed in main already (was broken by me :( ) |
Awesome work, congrats on your first merged pull request! |
When this parameter is set, the dag_id string will be used as a regular expression to match the available DAGs.
This enables multiple DAG backfills to be consecutively executed under the same backfill session.
This is often used in the DAG factory based approach where more than one DAG have same task names, which are based on some ID, therefore it's commonly combined with task-regex.