Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openlineage support - Add Extractor for DataframeOperator #1183

Merged
merged 17 commits into from
Nov 4, 2022

Conversation

phanikumv
Copy link
Collaborator

@phanikumv phanikumv commented Nov 2, 2022

Description

What is the current behavior?

We are unable to extract open lineage info from the DataframeOperator.

closes: #904

What is the new behavior?

Does this introduce a breaking change?

No

Checklist

  • Post the screenshot of how it looks in the Openlineage/Marquez UI
  • All checks and tests in the CI should pass
  • Unit tests (90% code coverage or more, Enable test coverage #191)
  • Integration tests (if the feature relates to a new database or external service)
  • Docstrings in reStructuredText for each of methods, classes, functions and module-level attributes (including Example DAG on how it should be used)
  • Exception handling in case of errors
  • Logging (are we exposing useful information to the user? e.g. source and destination)
  • Improve the documentation (README, Sphinx, and any other relevant)

@codecov
Copy link

codecov bot commented Nov 2, 2022

Codecov Report

Base: 94.20% // Head: 94.43% // Increases project coverage by +0.22% 🎉

Coverage data is based on head (42ce755) compared to base (2b4ff04).
Patch has no changes to coverable lines.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1183      +/-   ##
==========================================
+ Coverage   94.20%   94.43%   +0.22%     
==========================================
  Files          68       16      -52     
  Lines        3140      521    -2619     
  Branches      358       51     -307     
==========================================
- Hits         2958      492    -2466     
+ Misses        120       20     -100     
+ Partials       62        9      -53     
Impacted Files Coverage Δ
python-sdk/src/astro/lineage/extractor.py
python-sdk/src/astro/sql/operators/dataframe.py
...k/src/astro/custom_backend/astro_custom_backend.py
python-sdk/src/astro/utils/path.py
python-sdk/src/astro/sql/operators/load_file.py
python-sdk/src/astro/files/types/base.py
python-sdk/src/astro/sql/operators/append.py
python-sdk/src/astro/files/locations/amazon/s3.py
python-sdk/src/astro/databases/google/bigquery.py
python-sdk/src/astro/files/locations/http.py
... and 42 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@phanikumv phanikumv force-pushed the ol_dataframeop branch 2 times, most recently from 99626df to 023456e Compare November 2, 2022 13:08
@phanikumv
Copy link
Collaborator Author

Marquez UI screenshot:-

image

@rajaths010494
Copy link
Contributor

LGTM.

@phanikumv phanikumv force-pushed the ol_dataframeop branch 4 times, most recently from 30e45ce to 3626491 Compare November 4, 2022 06:10
@pankajastro pankajastro merged commit 4785827 into main Nov 4, 2022
@pankajastro pankajastro deleted the ol_dataframeop branch November 4, 2022 16:31
utkarsharma2 pushed a commit that referenced this pull request Nov 4, 2022
# Description
## What is the current behavior?
We are unable to extract open lineage info from the DataframeOperator.


closes: #904



## What is the new behavior?
<!-- Please describe the behavior or changes that are being added by
this PR. -->

- Added a method get_openlineage_facets on the Operator (based on [this
doc](https://docs.google.com/document/d/1vPsvHejQ24xTbzpz_LYSf0_ixk9oUuBiEUHVEaF9J2U/edit?usp=sharing))
- The "PythonSDKExtractor" built-in
#898 should be able to
work with DataframeOperator.get_openlineage_facets. Tested it to make
sure it works


## Does this introduce a breaking change?
No

### Checklist

- [x] Post the screenshot of how it looks in the Openlineage/Marquez UI
- [x]  All checks and tests in the CI should pass
- [x] Unit tests (90% code coverage or more,
#191)
- [ ] Integration tests (if the feature relates to a new database or
external service)
- [ ] Docstrings in
[reStructuredText](https://peps.python.org/pep-0287/) for each of
methods, classes, functions and module-level attributes (including
Example DAG on how it should be used)
- [ ]  Exception handling in case of errors
- [ ] Logging (are we exposing useful information to the user? e.g.
source and destination)
- [ ] Improve the documentation (README, Sphinx, and any other relevant)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Pankaj Singh <98807258+pankajastro@users.noreply.github.com>
Co-authored-by: Pankaj <pankaj.singh@astronomer.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Openlineage support - Add Extractor for DataframeOperator
4 participants