New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[COMMON] feat: Add support for SCRIPT type jobs in BigQuery #2564
Conversation
12c3cd5
to
c71aa51
Compare
c71aa51
to
752aac5
Compare
752aac5
to
ba8dff9
Compare
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #2564 +/- ##
==========================================
- Coverage 84.47% 84.00% -0.48%
==========================================
Files 59 54 -5
Lines 3356 3225 -131
==========================================
- Hits 2835 2709 -126
+ Misses 521 516 -5 ☔ View full report in Codecov by Sentry. |
ba8dff9
to
b129005
Compare
b129005
to
2163da7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love you've made it backwards compatible with notice of deprecation 🚀
Maybe it's time we should really get rid of this dependency and keep logic only in Airflow integration? Same goes for RedshiftDataDatasetsProvider
. It's of course not the subject of this PR, just raising this. cc @mobuchowski
Would it be also possible to modify existing BQ integration test?
2163da7
to
60f1424
Compare
60f1424
to
1f3ef89
Compare
81890e0
to
795167a
Compare
Signed-off-by: Kacper Muda <mudakacper@gmail.com>
795167a
to
a3aee61
Compare
Signed-off-by: Kacper Muda <mudakacper@gmail.com> Signed-off-by: Fabio Manganiello <fabio@manganiello.tech>
Problem
When using SCRIPT type jobs in BigQuery, no lineage is extracted, because SCRIPT job has no lineage information - it only spawns child jobs that have that information.
Solution
Extract lineage information from child jobs when dealing with SCRIPT type job.
I removed query string from
BigQueryJobRunFacet
- it can increase event size a lot and it's already included in SqlJobFacet so it's not necessary here.I also added deduplication of input and output datasets to avoid duplicates in case the script job writes to / read from a table multiple times.
One-line summary:
Checklist
SPDX-License-Identifier: Apache-2.0
Copyright 2018-2023 contributors to the OpenLineage project