Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Dataset URI by using openlineage-integration-common as lib #485

Merged
merged 22 commits into from
Sep 5, 2023

Conversation

tatiana
Copy link
Collaborator

@tatiana tatiana commented Aug 22, 2023

Use openlineage-dbt to create outlets Dataset URIs from within Cosmos.

Closes: #305
Closes: #497
Closes: #433 (only emits outlet events from the model - the same behaviour as openlineage-dbt)

Validation
This change was tested by running Marquez locally and triggering the dag basic_cosmos_dag using Airflow 2.6.1 and Python 3.10.10.

The output generated by this version of Cosmos can be seen in the following screenshots:

Screenshot 2023-09-04 at 22 08 32 Screenshot 2023-09-04 at 22 09 01 Screenshot 2023-09-04 at 22 09 31

Tasks

  • Fix pre-commit checks
  • Add test
  • Inlets support
  • Emit open lineage events
  • Support Docker/K8s (deverred to issue: Add support for openlineage when running from Kubernetes/Docker #496)
  • Create a PR on openlineage-dbt to remove the dbt dependency Not needed since the code depends on openlineage-integration-common and not openlineage-dbt (more info)
  • Understand which dataset is being emitted from test tasks (only inlets, no outlets)

@netlify
Copy link

netlify bot commented Aug 22, 2023

👷 Deploy Preview for amazing-pothos-a3bca0 processing.

Name Link
🔨 Latest commit b38fa75
🔍 Latest deploy log https://app.netlify.com/sites/amazing-pothos-a3bca0/deploys/64f68be4eb9fe90009e602ac

@tatiana tatiana temporarily deployed to internal August 22, 2023 14:26 — with GitHub Actions Inactive
@pre-commit-ci pre-commit-ci bot temporarily deployed to internal August 22, 2023 14:27 Inactive
@tatiana tatiana temporarily deployed to internal August 22, 2023 14:28 — with GitHub Actions Inactive
@tatiana tatiana temporarily deployed to internal August 22, 2023 14:28 — with GitHub Actions Inactive
@tatiana tatiana added this to the 1.1.0 milestone Aug 23, 2023
@tatiana tatiana temporarily deployed to internal August 29, 2023 11:42 — with GitHub Actions Inactive
@tatiana tatiana temporarily deployed to internal August 29, 2023 12:02 — with GitHub Actions Inactive
@tatiana tatiana temporarily deployed to internal August 29, 2023 12:52 — with GitHub Actions Inactive
@tatiana tatiana temporarily deployed to internal August 29, 2023 13:13 — with GitHub Actions Inactive
@tatiana tatiana changed the title WIP: Fix Dataset URI by using openlineage-dbt Fix Dataset URI by using openlineage-dbt Aug 29, 2023
@codecov
Copy link

codecov bot commented Aug 29, 2023

Codecov Report

Patch coverage: 92.77% and project coverage change: -0.04% ⚠️

Comparison is base (13087d5) 91.51% compared to head (b38fa75) 91.48%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #485      +/-   ##
==========================================
- Coverage   91.51%   91.48%   -0.04%     
==========================================
  Files          50       49       -1     
  Lines        1874     1914      +40     
==========================================
+ Hits         1715     1751      +36     
- Misses        159      163       +4     
Files Changed Coverage Δ
cosmos/__init__.py 100.00% <ø> (ø)
cosmos/airflow/graph.py 100.00% <ø> (ø)
cosmos/converter.py 95.58% <ø> (-0.19%) ⬇️
cosmos/operators/local.py 82.13% <92.10%> (+3.30%) ⬆️
cosmos/config.py 92.85% <100.00%> (ø)
cosmos/constants.py 100.00% <100.00%> (ø)
cosmos/dbt/executable.py 100.00% <100.00%> (ø)
cosmos/operators/base.py 100.00% <100.00%> (+1.09%) ⬆️

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tatiana tatiana mentioned this pull request Aug 29, 2023
@tatiana tatiana temporarily deployed to internal August 30, 2023 14:30 — with GitHub Actions Inactive
@tatiana tatiana temporarily deployed to internal August 30, 2023 14:31 — with GitHub Actions Inactive
@tatiana tatiana temporarily deployed to internal August 30, 2023 14:35 — with GitHub Actions Inactive
@tatiana tatiana marked this pull request as ready for review August 30, 2023 14:44
@tatiana tatiana requested review from a team as code owners August 30, 2023 14:44
@tatiana tatiana temporarily deployed to internal August 30, 2023 14:56 — with GitHub Actions Inactive
@tatiana tatiana temporarily deployed to internal September 4, 2023 23:23 — with GitHub Actions Inactive
@tatiana tatiana temporarily deployed to internal September 4, 2023 23:29 — with GitHub Actions Inactive
@tatiana tatiana temporarily deployed to internal September 4, 2023 23:36 — with GitHub Actions Inactive
@tatiana tatiana temporarily deployed to internal September 5, 2023 00:31 — with GitHub Actions Inactive
@tatiana tatiana temporarily deployed to internal September 5, 2023 01:06 — with GitHub Actions Inactive
@tatiana tatiana temporarily deployed to internal September 5, 2023 01:39 — with GitHub Actions Inactive
@tatiana tatiana temporarily deployed to internal September 5, 2023 01:46 — with GitHub Actions Inactive
@tatiana tatiana temporarily deployed to internal September 5, 2023 01:49 — with GitHub Actions Inactive
@tatiana tatiana temporarily deployed to internal September 5, 2023 01:54 — with GitHub Actions Inactive
@tatiana tatiana temporarily deployed to internal September 5, 2023 02:01 — with GitHub Actions Inactive
@tatiana
Copy link
Collaborator Author

tatiana commented Sep 5, 2023

Thanks a lot for the review and feedback, @harels @mobuchowski and @JDarDagran ! I've addressed most of the feedback. As @harels mentioned, one point that is missing is to have a better automated testing strategy to make sure Cosmos is emitting the open lineage events correctly. I've logged a follow up ticket for us to improve this: #509 509

@tatiana tatiana merged commit 033a4b3 into main Sep 5, 2023
40 of 41 checks passed
@tatiana tatiana deleted the issue-305-dataset-uri branch September 5, 2023 08:50
tatiana added a commit that referenced this pull request Sep 5, 2023
As observed in:
#482

The following exception was raised:
build of https://github.com/pre-commit/mirrors-mypy:types-PyYAML,types-attrs,attrs,types-requests,types-python-dateutil,apache-airflow@v1.5.1 for python@python3 exceeds tier max size 250MiB: 262.2MiB

So we disabled MyPy checks as part of #485
tatiana added a commit that referenced this pull request Sep 6, 2023
As observed in:
#482

The following exception was raised:
build of https://github.com/pre-commit/mirrors-mypy:types-PyYAML,types-attrs,attrs,types-requests,types-python-dateutil,apache-airflow@v1.5.1 for python@python3 exceeds tier max size 250MiB: 262.2MiB

So we disabled MyPy checks as part of #485
tatiana added a commit that referenced this pull request Sep 6, 2023
As observed in:
#482

The following exception was raised:
build of https://github.com/pre-commit/mirrors-mypy:types-PyYAML,types-attrs,attrs,types-requests,types-python-dateutil,apache-airflow@v1.5.1 for python@python3 exceeds tier max size 250MiB: 262.2MiB

So we disabled MyPy checks as part of #485
@tatiana tatiana mentioned this pull request Sep 6, 2023
tatiana added a commit that referenced this pull request Sep 6, 2023
**Features**

* Support dbt global flags (via dbt_cmd_global_flags in operator_args)
by @tatiana in #469
* Support parsing DAGs when there are no connections by @jlaneve in #489

**Enhancements**

* Hide sensitive field when using BigQuery keyfile_dict profile mapping
by @jbandoro in #471
* Consistent Airflow Dataset URIs, inlets and outlets with `Openlineage
package <https://pypi.org/project/openlineage-integration-common/>`_ by
@tatiana in #485. `Read more
<https://astronomer.github.io/astronomer-cosmos/configuration/lineage.html>`_.
* Refactor ``LoadMethod.DBT_LS`` to run from a temporary directory with
symbolic links by @tatiana in #488
* Run ``dbt deps`` when using ``LoadMethod.DBT_LS`` by @DanMawdsleyBA in
#481
* Update Cosmos log color to purple by @harels in #494
* Change operators to log ``dbt`` commands output as opposed to
recording to XCom by @tatiana in #513

**Bug fixes**

* Fix bug on select node add exclude selector subset ids logic by
@jensenity in #463
* Refactor dbt ls to run from a temporary directory, to avoid Read-only
file system errors during DAG parsing, by @tatiana in #414
* Fix profile_config arg in DbtKubernetesBaseOperator by @david-mag in
#505
* Fix SnowflakePrivateKeyPemProfileMapping private_key reference by
@nacpacheco in #501
* Fix incorrect temporary directory creation in VirtualenvOperator init
by @tatiana in #500
* Fix log propagation issue by @tatiana in #498
* Fix PostgresUserPasswordProfileMapping to retrieve port from
connection by @jlneve in #511

**Others**

* Docs: Fix RenderConfig load argument by @jbandoro in #466
* Enable CI integration tests from external forks by @tatiana in #458
* Improve CI tests runtime by @tatiana in #457
* Change CI to run coverage after tests pass by @tatiana in #461
* Fix forks code revision in code coverage by @tatiana in #472
* [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in #467
* Drop support to Python 3.7 in the CI test matrix by @harels in #490
* Add Airflow 2.7 to the CI test matrix by @tatiana in #487
* Add MyPy type checks to CI since we exceeded pre-commit disk quota
usage by @tatiana in #510
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Emit openlineage events Improve Dataset URIs
4 participants