Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenLineage: Rename the name and namespace along with URI #1306

Merged
merged 13 commits into from
Nov 29, 2022

Conversation

rajaths010494
Copy link
Contributor

@rajaths010494 rajaths010494 commented Nov 24, 2022

Describe the bug

  • There have been new conventions introduced: Naming.md#local-file-system
    I believe sqlite counts as local file, therefore you could adjust naming convention in astro as well.

A word of explanation: both IP and port are mandatory. It means that if port is unknown (e. g. if file exists in Airflow's local file system only) you should default to some value. In one of OL extractors it is defaulted to paramiko.config.SSH_PORT, in most cases it's equal to 22.

Reference: #1142 (comment)

  • Add openlineage_dataset_uri() method to create URI as per each database.

closes #1281

Copy link

@JDarDagran JDarDagran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

python-sdk/src/astro/databases/sqlite.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Nov 24, 2022

Codecov Report

Base: 95.00% // Head: 94.99% // Decreases project coverage by -0.01% ⚠️

Coverage data is based on head (b200223) compared to base (0c64f48).
Patch coverage: 94.44% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1306      +/-   ##
==========================================
- Coverage   95.00%   94.99%   -0.02%     
==========================================
  Files          71       71              
  Lines        3304     3315      +11     
  Branches      381      381              
==========================================
+ Hits         3139     3149      +10     
- Misses        101      102       +1     
  Partials       64       64              
Impacted Files Coverage Δ
python-sdk/src/astro/sql/operators/append.py 95.65% <ø> (-0.19%) ⬇️
...thon-sdk/src/astro/sql/operators/base_decorator.py 95.00% <ø> (ø)
python-sdk/src/astro/sql/operators/dataframe.py 100.00% <ø> (ø)
python-sdk/src/astro/sql/operators/export_file.py 93.87% <ø> (-0.13%) ⬇️
python-sdk/src/astro/sql/operators/load_file.py 96.62% <ø> (-0.04%) ⬇️
python-sdk/src/astro/sql/operators/merge.py 95.83% <ø> (-0.17%) ⬇️
python-sdk/src/astro/databases/base.py 95.71% <50.00%> (-0.44%) ⬇️
python-sdk/src/astro/databases/aws/redshift.py 94.02% <100.00%> (+0.09%) ⬆️
python-sdk/src/astro/databases/google/bigquery.py 94.62% <100.00%> (+0.05%) ⬆️
python-sdk/src/astro/databases/postgres.py 96.62% <100.00%> (+0.07%) ⬆️
... and 3 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@rajaths010494
Copy link
Contributor Author

Attaching a screenshot of the URI for reference on Marquez UI.
image

Copy link
Contributor

@sunank200 sunank200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@sunank200
Copy link
Contributor

@rajaths010494 @JDarDagran is the dataSource uri in the correct format as aspected?

@JDarDagran
Copy link

JDarDagran commented Nov 25, 2022

Actually why do you need this format:

f"{self.input_data.openlineage_dataset_namespace()}"
f"://{self.input_data.openlineage_dataset_name()}"

with two slashes before dataset name? The path should be absolute so I'd rather see URI as file://172.26.0.2:22/tmp/sqlite.db.top_animation.

This spec, 3.3. Path explains more.

Copy link
Contributor

@sunank200 sunank200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually why do you need this format:

f"{self.input_data.openlineage_dataset_namespace()}"
f"://{self.input_data.openlineage_dataset_name()}"

with two slashes before dataset name? The path should be absolute so I'd rather see URI as file://172.26.0.2:22/tmp/sqlite.db.top_animation.

This spec, 3.3. Path explains more.

@rajaths010494 we should do this change across.

@rajaths010494
Copy link
Contributor Author

Actually why do you need this format:

f"{self.input_data.openlineage_dataset_namespace()}"
f"://{self.input_data.openlineage_dataset_name()}"

with two slashes before dataset name? The path should be absolute so I'd rather see URI as file://172.26.0.2:22/tmp/sqlite.db.top_animation.
This spec, 3.3. Path explains more.

@rajaths010494 we should do this change across.

Yeah saw i will make the necessary changes

@rajaths010494
Copy link
Contributor Author

Actually why do you need this format:

f"{self.input_data.openlineage_dataset_namespace()}"
f"://{self.input_data.openlineage_dataset_name()}"

with two slashes before dataset name? The path should be absolute so I'd rather see URI as file://172.26.0.2:22/tmp/sqlite.db.top_animation.
This spec, 3.3. Path explains more.

@rajaths010494 we should do this change across.

Yeah saw i will make the necessary changes

image

Does this look fine. @JDarDagran @sunank200 . If yes i will make the necessary changes.

@JDarDagran
Copy link

Actually why do you need this format:

f"{self.input_data.openlineage_dataset_namespace()}"
f"://{self.input_data.openlineage_dataset_name()}"

with two slashes before dataset name? The path should be absolute so I'd rather see URI as file://172.26.0.2:22/tmp/sqlite.db.top_animation.
This spec, 3.3. Path explains more.

@rajaths010494 we should do this change across.

Yeah saw i will make the necessary changes

image

Does this look fine. @JDarDagran @sunank200 . If yes i will make the necessary changes.

👍

Copy link
Contributor

@sunank200 sunank200 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rajaths010494 Change looks good to me.But lets add the test for this change.

@kaxil kaxil changed the title Rename the name and namespace for sqlite as per new convention introduced OpenLineage: Rename the name and namespace for sqlite Nov 28, 2022
@sunank200
Copy link
Contributor

We should create a method for URI for each database and use that. I will do that change and push it

@sunank200 sunank200 changed the title OpenLineage: Rename the name and namespace for sqlite OpenLineage: Rename the name and namespace along with URI Nov 29, 2022
@sunank200
Copy link
Contributor

Ran and tested the example DAGs for operators on Marquez.
image
image (1)

@sunank200
Copy link
Contributor

Screenshot 2022-11-29 at 5 03 09 PM

Screenshot 2022-11-29 at 5 04 39 PM

"https://user-images.githubusercontent.com/8670962/204521134-a276f0ff-7e3d-434f-a4c9-ccf11aac53d7.png">

@rajaths010494
Copy link
Contributor Author

LGTM

@sunank200 sunank200 merged commit b725430 into main Nov 29, 2022
@sunank200 sunank200 deleted the rename-localfilesystem-namespace branch November 29, 2022 21:50
sunank200 pushed a commit that referenced this pull request Dec 1, 2022
Describe the bug
- There have been new conventions introduced:
[Naming.md#local-file-system](https://github.com/OpenLineage/OpenLineage/blob/main/spec/Naming.md#local-file-system)
I believe sqlite counts as local file, therefore you could adjust naming
convention in astro as well.

A word of explanation: both IP and port are mandatory. It means that if
port is unknown (e. g. if file exists in Airflow's local file system
only) you should default to some value. In one of OL extractors it is
defaulted to paramiko.config.SSH_PORT, in most cases it's equal to 22.

Reference:
#1142 (comment)
- Add openlineage_dataset_uri() method to create URI as per each
database.

closes #1281

Co-authored-by: Ankit Chaurasia <8670962+sunank200@users.noreply.github.com>
(cherry picked from commit b725430)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rename the name and namespace for sqlite as per new convention introduced
4 participants