Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for openlineage to AFS and common.io #36410

Merged
merged 9 commits into from Jan 4, 2024

Conversation

bolkedebruin
Copy link
Contributor

This adds low level support for open lineage to ObjectStorage and integrates it into common.io.


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@bolkedebruin
Copy link
Contributor Author

should I just add openlineage to the list to 'always' test @potiuk ?

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I just added a comment about the change on key extraction.

airflow/io/path.py Show resolved Hide resolved
airflow/io/path.py Show resolved Hide resolved
@uranusjr
Copy link
Member

uranusjr commented Dec 25, 2023

Pulling @mobuchowski to the conversation since he mentioned he wants to investigate adding this some time ago.

@bolkedebruin
Copy link
Contributor Author

@mobuchowski if there is any way to emit lineage events and openlineage can reconcile those we can add those events at the lower level so lineage becomes available for TaskFlow as well without manual intervention. Not sure if we can though?

@mobuchowski
Copy link
Contributor

mobuchowski commented Dec 27, 2023

@bolkedebruin working on it 🙂 f15a1e0
Main use would be instrumenting hooks - but would work with any other relevant code like object storage.

@bolkedebruin
Copy link
Contributor Author

ptal @hussein-awala @potiuk @uranusjr. The doc caching update has been split from this pr, I've kept the resolution of a key relative per examples and best practices. Integrating 'emitting' open lineage events requires work on the openlineage side per @mobuchowski.

Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@bolkedebruin bolkedebruin merged commit 33996a4 into apache:main Jan 4, 2024
74 checks passed
@bolkedebruin bolkedebruin deleted the ostorage_lineage branch January 4, 2024 17:45
@ephraimbuddy ephraimbuddy added this to the Airflow 2.9.0 milestone Jan 10, 2024
@ephraimbuddy ephraimbuddy added the type:new-feature Changelog: New Features label Jan 10, 2024
potiuk pushed a commit that referenced this pull request Jan 13, 2024
This adds low level support for open lineage to ObjectStorage
and integrates it into common.io.

(cherry picked from commit 33996a4)
abhishekbhakat pushed a commit to abhishekbhakat/my_airflow that referenced this pull request Mar 5, 2024
This adds low level support for open lineage to ObjectStorage
and integrates it into common.io.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants