-
Notifications
You must be signed in to change notification settings - Fork 276
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
python: use v2 Python facets #2693
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice 🚀 Leaving some comments and questions
integration/common/openlineage/common/provider/dbt/processor.py
Outdated
Show resolved
Hide resolved
9d62354
to
0ba663a
Compare
Codecov ReportAttention: Patch coverage is
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #2693 +/- ##
==========================================
- Coverage 84.59% 83.81% -0.78%
==========================================
Files 59 54 -5
Lines 3446 3281 -165
==========================================
- Hits 2915 2750 -165
Misses 531 531 ☔ View full report in Codecov by Sentry. |
@kacpermuda @mobuchowski I've added |
Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com> Use v2 facets in common. Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>
Adjust bigquery URI to be valid URI. Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>
Fix three problems observed while running the tests in the CI recently: 1. Static check Fix spelling issue captured by static check (#1000) 2. Integration test that's no longer needed The issue we were trying to capture for no longer happens in the latest version of the Apache Airflow provider `apache-airflow-providers-postgres==5.11.1rc1`: apache/airflow#39842 3. Skip a buggy version of OL There was a breaking change between `openlineage-integration-common==1.14.0` and `openlineage-integration-common==1.15.0` . It may have been an unintended side-effect of OpenLineage/OpenLineage#2693. This is an example of how Cosmos was using `DbtLocalArtifactProcessor` - something that had been agreed upon in the past: ``` openlineage_processor = DbtLocalArtifactProcessor( producer=OPENLINEAGE_PRODUCER, job_namespace=LINEAGE_NAMESPACE, project_dir=project_dir, profile_name=self.profile_config.profile_name, target=self.profile_config.target_name, ) events = openlineage_processor.parse() for completed in events.completes: for output in getattr(completed, source): dataset_uri = output.namespace + "/" + output.name uris.append(dataset_uri) ``` In `openlineage-integration-common==1.14.0` and earlier versions, this would create URIs in the format: ``` postgres://0.0.0.0:5432/postgres.public.stg_customers ``` Since openlineage-integration-common==1.15.0 , this leads to URIs being created in the format: ``` postgres.public.stg_customers/postgres://0.0.0.0:5432 ``` This was fixed in OpenLineage/OpenLineage#2735 and released as part of OL 1.16: https://github.com/OpenLineage/OpenLineage/releases/tag/1.16.0
Fix three problems observed while running the tests in the CI recently: 1. Static check Fix spelling issue captured by static check (#1000) 2. Integration test that's no longer needed The issue we were trying to capture for no longer happens in the latest version of the Apache Airflow provider `apache-airflow-providers-postgres==5.11.1rc1`: apache/airflow#39842 3. Skip a buggy version of OL There was a breaking change between `openlineage-integration-common==1.14.0` and `openlineage-integration-common==1.15.0` . It may have been an unintended side-effect of OpenLineage/OpenLineage#2693. This is an example of how Cosmos was using `DbtLocalArtifactProcessor` - something that had been agreed upon in the past: ``` openlineage_processor = DbtLocalArtifactProcessor( producer=OPENLINEAGE_PRODUCER, job_namespace=LINEAGE_NAMESPACE, project_dir=project_dir, profile_name=self.profile_config.profile_name, target=self.profile_config.target_name, ) events = openlineage_processor.parse() for completed in events.completes: for output in getattr(completed, source): dataset_uri = output.namespace + "/" + output.name uris.append(dataset_uri) ``` In `openlineage-integration-common==1.14.0` and earlier versions, this would create URIs in the format: ``` postgres://0.0.0.0:5432/postgres.public.stg_customers ``` Since openlineage-integration-common==1.15.0 , this leads to URIs being created in the format: ``` postgres.public.stg_customers/postgres://0.0.0.0:5432 ``` This was fixed in OpenLineage/OpenLineage#2735 and released as part of OL 1.16: https://github.com/OpenLineage/OpenLineage/releases/tag/1.16.0
* Use v2 facets in Airflow. Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com> Use v2 facets in common. Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com> * Python URI validator should expect scheme only. Adjust bigquery URI to be valid URI. Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com> --------- Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>
Problem
#2520 introduced v2 facets in Python client. Rest of Python code requires migration to use them.
Solution
Remove usage of v1 facets. In apache/airflow#39614 there is removed
openlineage.common
dependency on BigQuery in Google provider so there should be no discrepancy between versions.One-line summary:
Migrate integrations to use v2 python facets.
Checklist
SPDX-License-Identifier: Apache-2.0
Copyright 2018-2023 contributors to the OpenLineage project