Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python: use v2 Python facets #2693

Merged
merged 2 commits into from
May 22, 2024
Merged

python: use v2 Python facets #2693

merged 2 commits into from
May 22, 2024

Conversation

JDarDagran
Copy link
Contributor

Problem

#2520 introduced v2 facets in Python client. Rest of Python code requires migration to use them.

Solution

Remove usage of v1 facets. In apache/airflow#39614 there is removed openlineage.common dependency on BigQuery in Google provider so there should be no discrepancy between versions.

One-line summary:

Migrate integrations to use v2 python facets.

Checklist

  • You've signed-off your work
  • Your pull request title follows our guidelines
  • Your changes are accompanied by tests (if relevant)
  • Your change contains a small diff and is self-contained
  • You've updated any relevant documentation (if relevant)
  • Your comment includes a one-liner for the changelog about the specific purpose of the change (not required for changes to tests, docs, or CI config)
  • You've versioned the core OpenLineage model or facets according to SchemaVer (if relevant)
  • You've added a header to source files (if relevant)

SPDX-License-Identifier: Apache-2.0
Copyright 2018-2023 contributors to the OpenLineage project

Copy link
Contributor

@kacpermuda kacpermuda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 🚀 Leaving some comments and questions

integration/airflow/openlineage/airflow/utils.py Outdated Show resolved Hide resolved
integration/airflow/setup.cfg Outdated Show resolved Hide resolved
integration/airflow/tests/test_macros.py Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

codecov-commenter commented May 16, 2024

Codecov Report

Attention: Patch coverage is 92.78351% with 7 lines in your changes are missing coverage. Please review.

Project coverage is 83.81%. Comparing base (0f68c74) to head (ef7c699).

Files Patch % Lines
integration/common/openlineage/common/dataset.py 71.42% 2 Missing ⚠️
...neage/common/provider/great_expectations/action.py 75.00% 2 Missing ⚠️
...nlineage/airflow/extractors/dbt_cloud_extractor.py 50.00% 1 Missing ⚠️
...ow/openlineage/airflow/extractors/sql_extractor.py 75.00% 1 Missing ⚠️
...ommon/openlineage/common/provider/dbt/processor.py 96.15% 1 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2693      +/-   ##
==========================================
- Coverage   84.59%   83.81%   -0.78%     
==========================================
  Files          59       54       -5     
  Lines        3446     3281     -165     
==========================================
- Hits         2915     2750     -165     
  Misses        531      531              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JDarDagran
Copy link
Contributor Author

@kacpermuda @mobuchowski I've added facets section in dbt integration with TODO comment to remove it next release.

Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>

Use v2 facets in common.

Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>
Adjust bigquery URI to be valid URI.

Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>
@boring-cyborg boring-cyborg bot added area:tests Testing code language:python Uses Python programming language labels May 22, 2024
@mobuchowski mobuchowski merged commit ec5b0f3 into main May 22, 2024
29 checks passed
@mobuchowski mobuchowski deleted the use-v2-facets branch May 22, 2024 08:55
tatiana added a commit to astronomer/astronomer-cosmos that referenced this pull request May 29, 2024
Fix three problems observed while running the tests in the CI recently:

1. Static check

Fix spelling issue captured by static check (#1000)

2. Integration test that's no longer needed

The issue we were trying to capture for no longer happens in the latest
version of the Apache Airflow provider
`apache-airflow-providers-postgres==5.11.1rc1`:
apache/airflow#39842

3. Skip a buggy version of OL

There was a breaking change between
`openlineage-integration-common==1.14.0` and
`openlineage-integration-common==1.15.0` . It may have been an
unintended side-effect of
OpenLineage/OpenLineage#2693.

This is an example of how Cosmos was using `DbtLocalArtifactProcessor` -
something that had been agreed upon in the past:

```
      openlineage_processor = DbtLocalArtifactProcessor(
            producer=OPENLINEAGE_PRODUCER,
            job_namespace=LINEAGE_NAMESPACE,
            project_dir=project_dir,
            profile_name=self.profile_config.profile_name,
            target=self.profile_config.target_name,
        )
        events = openlineage_processor.parse()
        for completed in events.completes:
            for output in getattr(completed, source):
                dataset_uri = output.namespace + "/" + output.name
                uris.append(dataset_uri)
```

In `openlineage-integration-common==1.14.0` and earlier versions, this
would create URIs in the format:
```
postgres://0.0.0.0:5432/postgres.public.stg_customers
```
Since openlineage-integration-common==1.15.0 , this leads to URIs being
created in the format:
```
postgres.public.stg_customers/postgres://0.0.0.0:5432
```

This was fixed in OpenLineage/OpenLineage#2735
and released as part of OL 1.16:
https://github.com/OpenLineage/OpenLineage/releases/tag/1.16.0
tatiana added a commit to astronomer/astronomer-cosmos that referenced this pull request Jun 6, 2024
Fix three problems observed while running the tests in the CI recently:

1. Static check

Fix spelling issue captured by static check (#1000)

2. Integration test that's no longer needed

The issue we were trying to capture for no longer happens in the latest
version of the Apache Airflow provider
`apache-airflow-providers-postgres==5.11.1rc1`:
apache/airflow#39842

3. Skip a buggy version of OL

There was a breaking change between
`openlineage-integration-common==1.14.0` and
`openlineage-integration-common==1.15.0` . It may have been an
unintended side-effect of
OpenLineage/OpenLineage#2693.

This is an example of how Cosmos was using `DbtLocalArtifactProcessor` -
something that had been agreed upon in the past:

```
      openlineage_processor = DbtLocalArtifactProcessor(
            producer=OPENLINEAGE_PRODUCER,
            job_namespace=LINEAGE_NAMESPACE,
            project_dir=project_dir,
            profile_name=self.profile_config.profile_name,
            target=self.profile_config.target_name,
        )
        events = openlineage_processor.parse()
        for completed in events.completes:
            for output in getattr(completed, source):
                dataset_uri = output.namespace + "/" + output.name
                uris.append(dataset_uri)
```

In `openlineage-integration-common==1.14.0` and earlier versions, this
would create URIs in the format:
```
postgres://0.0.0.0:5432/postgres.public.stg_customers
```
Since openlineage-integration-common==1.15.0 , this leads to URIs being
created in the format:
```
postgres.public.stg_customers/postgres://0.0.0.0:5432
```

This was fixed in OpenLineage/OpenLineage#2735
and released as part of OL 1.16:
https://github.com/OpenLineage/OpenLineage/releases/tag/1.16.0
fafnirZ pushed a commit to fafnirZ/OpenLineage that referenced this pull request Jul 3, 2024
* Use v2 facets in Airflow.

Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>

Use v2 facets in common.

Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>

* Python URI validator should expect scheme only.

Adjust bigquery URI to be valid URI.

Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>

---------

Signed-off-by: Jakub Dardzinski <kuba0221@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:integration/airflow openlineage-airflow area:integration/common openlineage-integration-common area:integration/dbt openlineage-dbt area:tests Testing code language:python Uses Python programming language tool:bigquery Google BigQuery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants