Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ui) Fix displaying column level lineage for sibling nodes #7955

Conversation

chriscollins3456
Copy link
Collaborator

We had an issue where we sometimes were not displaying column level lineage properly for nodes that have siblings. This was happening because we had one "main" node that we display and if that node didn't have CLL, then we wouldn't show it. But one of its siblings may have CLL and we should be showing that if so.

This fixes that by creating a mapping of urns to inputs so we can create this fineGrainedMap (that's used to draw the lines between columns) and we check through each entity's siblings against this new mapping and update this fineGrainedMap to account for siblings and show CLL on the "visible" sibling node.

This ultimately this combines CLL between siblings and shows it as if it were one node.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the product PR or Issue related to the DataHub UI/UX label May 3, 2023
Copy link
Contributor

@aditya-radhakrishnan aditya-radhakrishnan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Do you have any before/after screenshots?

@@ -146,3 +146,20 @@ export const handleBatchError = (urns, e, defaultMessage) => {
}
return defaultMessage;
};

// put all of the fineGrainedLineages for a given entity and its siblings into one array so we have all of it in one place
export function getFineGrainedLineageWithSiblings(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome!


// upstreamEntityUrn could belong to a sibling we don't "render", so store its inputs to updateFineGrainedMap
// and update the fine grained map later when we see the entity with these siblings
// eslint-disable-next-line no-param-reassign
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit challenging for me to grasp...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely. so when we get fineGrainedLineages for a single entity, we get a combo of an upstream schema field urn and a downstream schema field urn. The downstream schema field always belongs to the current entity at hand. The upstream schema field belongs to some other entity upstream of this one on the graph.

The thing is that this schema field may belong to a sibling of an upstream entity and that sibling doesn't get "rendered" since we collapse sibling nodes. and since all we have is an urn here, we don't know if that upstream urn has siblings or is a sibling itself or what, so I don't know what entity urn I need to pass into updateFineGrainedMap - I need to pass in the urn of the sibling being rendered but I don't know that yet

Also - since fineGrainedLineages belong on downstream entities, the upstream entity doesn't know about it and I can't draw these edges based on its knowledge.

So I have to basically save this upstream urn and its edge so that I can check siblings for every subsequent entity to see if this upstream urn is a sibling, and if so, construct the edge with the correct urn (the urn of the node being rendered, not the sibling urn).

So this solves the case where we have CLL edges for entities we haven't seen yet in the graph. Below, where I loop over fetchedEntities I'm solving for the case where I have already seen the entity (same problem, just one solves for things I haven't seen yet, the other solves for things I have seen already)

I'm sorry this is tough to explain - happy to hop on a call! but also the complexity of (1) this code and (2) our fineGrainedLineage and upstreamLineage makes mapping things real tough. as we're talking about offline, i'm sure there's a better way to fetch and map this data. but that may be smart to solve after this fix solves customer issues.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay got it. Yeah makes sense - seems like siblings + graph is a really tricky combo. As long as this doesn't introduce any regression I'm happy! :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

haha for sure! things are looking good locally for me but i'll make sure to give it a once more around before merging.

@@ -161,7 +166,8 @@ export default class EntityRegistry {
(genericEntityProperties?.upstream?.filtered || 0),
status: genericEntityProperties?.status,
siblingPlatforms: genericEntityProperties?.siblingPlatforms,
fineGrainedLineages: genericEntityProperties?.fineGrainedLineages,
fineGrainedLineages,
siblings: genericEntityProperties?.siblings,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where is this being used?

@@ -51,6 +52,7 @@ export type FetchedEntity = {
status?: Maybe<Status>;
siblingPlatforms?: Maybe<DataPlatform[]>;
fineGrainedLineages?: [FineGrainedLineage];
siblings?: Maybe<SiblingProperties>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this being used on line 83 in constructtree?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup it is indeed - we need to the siblings objects so we can get their fineGrainedLineages

@chriscollins3456 chriscollins3456 merged commit 3f8a532 into datahub-project:master May 4, 2023
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants