Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(lineage source): add fine grained lineage support #7904

Merged
merged 15 commits into from
May 26, 2023

Conversation

anshbansal
Copy link
Collaborator

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Apr 25, 2023
Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approach looks pretty reasonable

…ge.py

Co-authored-by: Harshal Sheth <hsheth2@gmail.com>
@anshbansal anshbansal marked this pull request as ready for review May 9, 2023 14:22
Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nits around the defaults but otherwise lgtm

metadata-ingestion/examples/mce_files/bootstrap_mce.json Outdated Show resolved Hide resolved
@@ -49,9 +53,44 @@ def type_must_be_supported(cls, v: str) -> str:
return v


class FineGrainedLineageConfig(ConfigModel):
upstreamType: str = "NONE"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably want a default of FIELD_SET

class FineGrainedLineageConfig(ConfigModel):
upstreamType: str = "NONE"
upstreams: Optional[List[str]]
downstreamType: str
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should default this to FIELD

downstreamType: str
downstreams: Optional[List[str]]
transformOperation: Optional[str]
confidenceScore: Optional[float]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default to 1.0

@anshbansal
Copy link
Collaborator Author

Waiting on #8135 for docs to get fixed

@anshbansal
Copy link
Collaborator Author

Doc build will be fixed in the linked PR.

@anshbansal anshbansal merged commit 96f3648 into master May 26, 2023
42 of 43 checks passed
@anshbansal anshbansal deleted the aseem-update-file-lineage-source branch May 26, 2023 11:39
@githendrik
Copy link
Contributor

please excuse me hijacking this PR - I'm just wondering if there's a roadmap entry to allow querying the transformOperation via GraphQL and / or visualising it in the UI? We have a case for this specific feature where our users would like to see the actual operation in the UI to better understand the lineage.

Could someone maybe point me in the right direction if this is planned already? Otherwise happy to try and contribute back.

@hsheth2
Copy link
Collaborator

hsheth2 commented Jul 10, 2023

@githendrik we don't have any concrete plans for it, but I think it probably does make sense do to that.

Surfacing it via GraphQL feels like a no-brainer, and we'd definitely accept a PR there.

I'm less sure where we'd want to surface it in the UI - might be worth thinking about more / mocking. The reason is that it's an "edge annotation", but there's a bunch of other edge annotations that we might also want to support in the future e.g. the actual SQL logic, confidence scores on our lineage generation, etc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants