Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[COMMON] feat: Add support for SCRIPT type jobs in BigQuery #2564

Merged
merged 1 commit into from Apr 4, 2024

Conversation

kacpermuda
Copy link
Contributor

@kacpermuda kacpermuda commented Apr 3, 2024

Problem

When using SCRIPT type jobs in BigQuery, no lineage is extracted, because SCRIPT job has no lineage information - it only spawns child jobs that have that information.

Solution

Extract lineage information from child jobs when dealing with SCRIPT type job.

I removed query string from BigQueryJobRunFacet - it can increase event size a lot and it's already included in SqlJobFacet so it's not necessary here.

I also added deduplication of input and output datasets to avoid duplicates in case the script job writes to / read from a table multiple times.

One-line summary:

Checklist

  • You've signed-off your work
  • Your pull request title follows our guidelines
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • Your comment includes a one-liner for the changelog about the specific purpose of the change (if necessary)
  • You've versioned the core OpenLineage model or facets according to SchemaVer (if relevant)
  • You've added a header to source files (if relevant)

SPDX-License-Identifier: Apache-2.0
Copyright 2018-2023 contributors to the OpenLineage project

@codecov-commenter
Copy link

codecov-commenter commented Apr 3, 2024

Codecov Report

Attention: Patch coverage is 97.05882% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 84.00%. Comparing base (03484a8) to head (0327c74).

Files Patch % Lines
...ion/common/openlineage/common/provider/bigquery.py 97.01% 2 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2564      +/-   ##
==========================================
- Coverage   84.47%   84.00%   -0.48%     
==========================================
  Files          59       54       -5     
  Lines        3356     3225     -131     
==========================================
- Hits         2835     2709     -126     
+ Misses        521      516       -5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@JDarDagran JDarDagran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love you've made it backwards compatible with notice of deprecation 🚀
Maybe it's time we should really get rid of this dependency and keep logic only in Airflow integration? Same goes for RedshiftDataDatasetsProvider. It's of course not the subject of this PR, just raising this. cc @mobuchowski

Would it be also possible to modify existing BQ integration test?

Signed-off-by: Kacper Muda <mudakacper@gmail.com>
@JDarDagran JDarDagran merged commit 7ea0cd5 into main Apr 4, 2024
29 checks passed
@JDarDagran JDarDagran deleted the common-fix-bq-script branch April 4, 2024 15:30
blacklight pushed a commit to blacklight/OpenLineage that referenced this pull request Apr 4, 2024
Signed-off-by: Kacper Muda <mudakacper@gmail.com>
Signed-off-by: Fabio Manganiello <fabio@manganiello.tech>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:integration/airflow openlineage-airflow area:integration/common openlineage-integration-common tool:bigquery Google BigQuery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants