-
Notifications
You must be signed in to change notification settings - Fork 13.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add subdir parameter to dags reserialize command #26170
Conversation
@@ -503,6 +503,5 @@ def dag_reserialize(args, session: Session = NEW_SESSION): | |||
session.query(SerializedDagModel).delete(synchronize_session=False) | |||
|
|||
if not args.clear_only: | |||
dagbag = DagBag() | |||
dagbag.collect_dags(only_if_updated=False, safe_mode=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this line need to be kept?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, and I described in the PR description why it is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the arguments here are different from the call in __init__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since DagBag().file_last_changed
will be empty, I think it won't make a difference if only_if_updated
is set to True
or False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only_if_update = False means the dags will be read again if the file has changed. In our case, we don't need to load the same files twice, so this parameter doesn't apply here.
Test failure @mik-laj :
|
Hi.
Today I looked at this command and noticed a few problems:
DagBag.collect_dags
method is called two times. The first time throughDagBag.__init__
, and the second time explicitly in a command code. This is not needed and causes performance degradation.safe_mode
has been overridden tofalse
in the command code for no reason, which means more files are processed than needed.subdir
is not supported, which is inconsistent with the rest of the commands that create the DagBag instance. Whenever a DagBag is created by any other commands, it is possible to set thedag_folder
using thesubdir
parameter includingairflow dag list
,airflow scheduler
, and others.I had a problem with choosing a PR title that would look good in a changelog, because we have 3 very related problems here, but I think adding a new CLI parameter is the most user-facing, The rest of the problems would not be (probably) noticed by anyone without looking at the code.
CC: @collinmcnulty, @potiuk, @uranusjr, @sfc-gh-mkmak
Best regards,
Kamil Breguła
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.