Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(dataset): improve glossary term load performance for datasets #6396

Merged
merged 2 commits into from
Nov 15, 2022

Conversation

Reilman79
Copy link
Contributor

Improves the performance of loading datasets by reducing the amount of information being fetched from the graph database. Data was being fetched that wasn't used and resulted in potentially hundreds of calls to the graph database. This issue is explained more in issue #6395.

The heavy use of fragments in the affected portion of the graphql query means that the problematic code (in the glossaryNode fragment) cannot be changed directly as this additional information is needed by other queries which use this fragment, namely getGlossaryNode(). Additionally, the glossaryNode fragment is four levels of abstraction away from the primary fragment of the getDataset() query (nonSiblingDatasetFields -> glossaryTerms -> glossaryTerm -> parentNodesFields -> glossaryNode). Instead of creating four new fragments for one change at the fourth layer, I combined them into a single new fragment which can then be inserted as a whole into the getDataset() query. If this is not preferred or if the fragment could be better named as something else, then I can make those changes.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the product PR or Issue related to the DataHub UI/UX label Nov 9, 2022
@maggiehays maggiehays added the community-contribution PR or Issue raised by member(s) of DataHub Community label Nov 14, 2022
@jjoyce0510
Copy link
Collaborator

Hey we are reviewing this. Will get back shortly.

Copy link
Collaborator

@chriscollins3456 chriscollins3456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! I'm actually going to request you make a pivot to place your change elsewhere as I think this can benefit the performance everywhere that we load glossary terms. thanks for digging into this!

Comment on lines +23 to +30
nodes {
urn
type
properties {
name
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the change here is amazing (so that we don't also fetch children of nodes in the parentNodesFields when we don't actually need to). in fact, so good that I think we should apply this everywhere!

In order to do that, I think you can actually drop this change and simply change the fragment parentNodesFields from:

fragment parentNodesFields on ParentNodesResult {
    count
    nodes {
        ...glossaryNode
    }
}

to:

fragment parentNodesFields on ParentNodesResult {
    count
    nodes {
        urn
        type
        properties {
            name
        }
    }
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then this performance change will benefit all entities and wherever we fetch parentNodes (on existing nodes as well)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great idea! I’m out of town this week so I don’t have access to my computer to make the change, but I can do so this weekend.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay! i'm going to merge this PR once CI passes and then go and make the additional change right when it gets in, just cuz it'll be a nice simple fix.

Thanks again for putting this up!

@github-actions
Copy link

Unit Test Results (build & test)

613 tests  ±0   609 ✔️ ±0   11m 53s ⏱️ -8s
151 suites ±0       4 💤 ±0 
151 files   ±0       0 ±0 

Results for commit 8c2dc02. ± Comparison against base commit ef5c712.

@chriscollins3456 chriscollins3456 merged commit 6e415ca into datahub-project:master Nov 15, 2022
cccs-Dustin pushed a commit to CybercentreCanada/datahub that referenced this pull request Feb 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants