Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(systemMetadata): Adding a lastRunId field system metadata #8672

Conversation

jjoyce0510
Copy link
Collaborator

@jjoyce0510 jjoyce0510 commented Aug 18, 2023

Summary

Currently, the system metadata object captures the runId when the aspect was first observed. This is problematic for resolving the ingestion source that was responsible for an ingestion of a specific aspect, mainly because we use the runId to backtrace to an ingestion source, but we grab the runid using the max of the lastObserved timestamps.

If an entity does not change frequently, it's possible that we do not have the ingestion executionRequest object anymore for the runId in the system metadata.

To address this problem, we are now ALSO saving a lastRunId field in system metadata which can be used to always track the most recent run id that touched a given aspect, even if the aspect DID NOT CHANGE. This is a MUCH more reliable way to find the ingestion source that was used to ingest an urn.

Status

Ready for review

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added product PR or Issue related to the DataHub UI/UX devops PR or Issue related to DataHub backend & deployment labels Aug 18, 2023
Copy link
Collaborator

@chriscollins3456 chriscollins3456 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neato

hsheth2 added a commit to hsheth2/datahub that referenced this pull request Aug 21, 2023
@hsheth2 hsheth2 added release-0.10.6 merge-pending-ci A PR that has passed review and should be merged once CI is green. labels Aug 22, 2023
@iprentic iprentic merged commit 94fa514 into datahub-project:master Sep 6, 2023
62 of 63 checks passed
spadhi7 added a commit to spadhi7/datahub that referenced this pull request Oct 4, 2023
* tag 'v0.11.0': (188 commits)
  fix(spark-test): upgrade gradle and fix spark smoke test (datahub-project#8777)
  fix(gms): Fixed Recently Viewed section for users with '@' in the URN. (datahub-project#8754)
  feat: add feedback widget (datahub-project#8732)
  fix(custom-search): fix custom search to be able to use unquoted query (datahub-project#8805)
  docs(db-retention): update with default setting (datahub-project#8797)
  feat(openapi): entity endpoints & analytics raw (datahub-project#8537)
  feat(search): Also de-duplicate the field queries based on field names (datahub-project#8788)
  fix(ingest): drop `wrap_aspect_as_workunit` method (datahub-project#8766)
  feat(ingest): drop sql_metadata parser (datahub-project#8765)
  docs: minor fix on versioning navbar and dropdown (datahub-project#8790)
  chore(ingest): upgrade sqlglot fork (datahub-project#8775)
  docs: add datahub source to integrations page (datahub-project#8787)
  fix(ingest/bigquery): fix partition and median queries for profiling (datahub-project#8778)
  fix(ingest/tableau): fix tableau native CLL for snowflake, add type annotations (datahub-project#8779)
  refactor(ingest): Add support for group-owners in dataflow entities (datahub-project#8154)
  feat(systemMetadata): Adding a lastRunId field system metadata  (datahub-project#8672)
  feat(airflow-plugin): add package type information (datahub-project#8795)
  fix(ingest/datahub): Support postgres; build(postgres): Modernize postgres docker setup (datahub-project#8762)
  docs(session): add documentation for session token duration and fix default (datahub-project#8791)
  chore(analytics): bump version (datahub-project#8786)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops PR or Issue related to DataHub backend & deployment merge-pending-ci A PR that has passed review and should be merged once CI is green. product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants