Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add source code facet for operators #1537

Merged
merged 7 commits into from
Jan 9, 2023
Merged

Add source code facet for operators #1537

merged 7 commits into from
Jan 9, 2023

Conversation

rajaths010494
Copy link
Contributor

@rajaths010494 rajaths010494 commented Jan 6, 2023

Please describe the feature you'd like to see
closes: #1467

The python transform should also expose its content with the sourceCode facet, similar to the sql facet. Example: link [Action item: Create an issue on astro-sdk]
Include the transformation python code that the transformations were running in the OpenLineage events so that they showed up in the Info tab
For demo purposes I hard-coded both of these in a custom openlineage-airflow fork like:

code = inspect.getsource(task.python_callable)
job_facet = {"sql": SqlJobFacet(query=code), "sourceCodeLocation": SourceCodeLocationJobFacet("git", "https://github.com/astronomer/astro-days-chicago/blob/9cca4e166d73106e903f3d9f32af334d7b5560a3/dags/airflow_ecosystem.py")}

^ the code would probably be cleaner as {"sourceCode": SourceCodeJobFacet("python", code)} I stuffed it in sql only due to the lack of: https://github.com/astronomer/astro/issues/2150 which is a temporary limitation

Describe the solution you'd like

  • Add source code facet for operators for open lineage integrations

Currently source code facet is done for base decorator operators.

@rajaths010494 rajaths010494 changed the title Add python_callable for decorators Add source code facet for operators Jan 6, 2023
@codecov
Copy link

codecov bot commented Jan 6, 2023

Codecov Report

Base: 94.01% // Head: 94.00% // Decreases project coverage by -0.01% ⚠️

Coverage data is based on head (123c37f) compared to base (38809fa).
Patch coverage: 92.30% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1537      +/-   ##
==========================================
- Coverage   94.01%   94.00%   -0.02%     
==========================================
  Files          89       89              
  Lines        4347     4372      +25     
  Branches      428      432       +4     
==========================================
+ Hits         4087     4110      +23     
  Misses        178      178              
- Partials       82       84       +2     
Impacted Files Coverage Δ
python-sdk/src/astro/lineage/__init__.py 80.00% <ø> (ø)
python-sdk/src/astro/sql/operators/dataframe.py 99.14% <91.66%> (-0.86%) ⬇️
...thon-sdk/src/astro/sql/operators/base_decorator.py 94.69% <92.30%> (-0.31%) ⬇️
python-sdk/src/astro/settings.py 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@pankajastro pankajastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add tests for changes and deepsource also failing

Copy link
Contributor

@sunank200 sunank200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the comments

python-sdk/src/astro/sql/operators/base_decorator.py Outdated Show resolved Hide resolved
python-sdk/src/astro/sql/operators/dataframe.py Outdated Show resolved Hide resolved
Copy link
Contributor

@sunank200 sunank200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the following before merging this PR:

  1. Screenshot of DAG with lineage values on Marquez populating for operators for databases SQLite, google bigquery and Snowflake.
  2. The documentation changes for details around OPENLINEAGE_AIRFLOW_DISABLE_SOURCE_CODE.
  3. Issue link on Openlineage if sourceCodeFacet is not visible on Marquez UI.
  4. Screenshot of DAG with lineage values on astro-cloud populating for operators for databases SQLite, google bigquery and Snowflake.

@rajaths010494
Copy link
Contributor Author

Snowflake transform and dataframe
transform_snowflake
dataframe_snwoflake
operator.

@rajaths010494
Copy link
Contributor Author

BigQuery
bigquery_transform
bigquery_dataframe

@rajaths010494
Copy link
Contributor Author

Sqlite
transform_with_sqlite
dataframe_with_sqlie

@rajaths010494
Copy link
Contributor Author

rajaths010494 commented Jan 9, 2023

Opened an issue in. Openlineage for SourceCodeJobFacet not visible on Marquez UI MarquezProject/marquez#2351

Copy link
Contributor

@pankajastro pankajastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@JDarDagran JDarDagran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_source_code could be extracted to some common module to avoid code duplication. Other than that - LGTM 👍

@pankajastro
Copy link
Contributor

Merging it. please let we have any further improvement suggestion we will take care of that in a separate PR

@pankajastro pankajastro merged commit a150cbf into main Jan 9, 2023
@pankajastro pankajastro deleted the source_code_facet branch January 9, 2023 14:19
utkarsharma2 pushed a commit that referenced this pull request Jan 17, 2023
**Please describe the feature you'd like to see**
closes: #1467

The python transform should also expose its content with the sourceCode
facet, similar to the sql facet. Example:
[link](https://github.com/OpenLineage/OpenLineage/blob/3090ced24604c95716dacd667c2cff52bf438aba/integration/airflow/openlineage/airflow/extractors/python_extractor.py#L31)
[Action item: Create an issue on astro-sdk]
Include the transformation python code that the transformations were
running in the OpenLineage events so that they showed up in the Info tab
For demo purposes I hard-coded both of these in a custom
openlineage-airflow fork like:
```
code = inspect.getsource(task.python_callable)
job_facet = {"sql": SqlJobFacet(query=code), "sourceCodeLocation": SourceCodeLocationJobFacet("git", "https://github.com/astronomer/astro-days-chicago/blob/9cca4e166d73106e903f3d9f32af334d7b5560a3/dags/airflow_ecosystem.py")}
```
^ the code would probably be cleaner as {"sourceCode":
SourceCodeJobFacet("python", code)} I stuffed it in sql only due to the
lack of: astronomer/astro#2150 which is a
temporary limitation

**Describe the solution you'd like**
- Add source code facet for operators for open lineage integrations

Currently source code facet is done for base decorator operators.

Co-authored-by: Pankaj <pankaj.singh@astronomer.io>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add sourceCode facet for open lineage
5 participants