-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid duplicate QA check tasks in FlowETL #6496
Conversation
Passing run #22056 ↗︎Details:
Review all test suite changes for PR #6496 ↗︎ |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6496 +/- ##
=======================================
Coverage 92.30% 92.31%
=======================================
Files 268 268
Lines 10583 10586 +3
Branches 855 855
=======================================
+ Hits 9769 9772 +3
Misses 676 676
Partials 138 138 ☔ View full report in Codecov by Sentry. |
I've changed to a different approach from the one I started off using. The underlying problem here was that Airflow assumes the DAG file (i.e. the file that needs to be imported to define the DAG) is the file in which the DAG constructor is called, but when using FlowETL's To get around this, I've updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find this solution distressing, but the best available option as it stands.
Closes #6494
I have:
Description
Changesget_qa_checks()
to only addsettings.DAGS_FOLDER
to the template searchpath ifdag.folder
does not already point to the correct DAG folder (which is the case if a DAG is created by thecreate_dag()
function).Changes thedefault_path
inget_qa_checks()
to always bePath(__file__).parent
, rather than being eitherPath(__file__).parent
orPath(__file__).parent / "qa_checks"
depending on whether or not thecreate_dag()
function is being used. An upshot of this is that the default QA check files no longer need the additional level of nesting that was required to ensure the template paths containd the string "qa_checks".flowetl.util.create_dag
so that the returned DAG has the correctdag.folder
flowetl.util.get_qa_checks
that was necessary due to thedag.folder
sometimes being incorrectOverall, I think we still want to re-work the QA check discovery more substantially - the template-path-contains-"qa_checks" criterion and the silent addition of a path within the FlowETL module to the template searchpath both leave us open to unexpected and hard-to-debug bugs, and it would be useful to have more explicit control over which QA checks are added to a DAG. But this PR fixes the immediate issue, to enable the use of custom QA checks again.
Note: it would still be possible to get duplicates if the user defining a DAG specifies a
template_searchpath
oradditional_qa_check_paths
containing either a parent or child of the DAG folder - this PR does not prevent errors in that situation.