Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest/oracle) add database name to oracle urn name #7016

Merged
merged 13 commits into from
Feb 21, 2023

Conversation

jaegwonseo
Copy link
Contributor

@jaegwonseo jaegwonseo commented Jan 12, 2023

Summary

change urn name from oracle ingest like postgresql

schema.table -> database.schema.table

Issue

#6977

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

urn format : schema.table -> database.schema.table
@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Jan 12, 2023
Copy link
Collaborator

@jjoyce0510 jjoyce0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi there! This PR is somewhat concerning because it will change the URN structure for existing users of the Oracle source.

This means that if users are NOT using stateful ingestion, then they will end up with duplicates entities in DataHub representing the same tables.

I'm going to enlist @mayurinehate and @hsheth2 to evaluate the changes to understand why this was not included as part of the URN in the first place.

If this indeed is the correct approach (e.g. fixing a bug), we'll need to add specific instructions and notes inside of updating-datahub.md so that existing users of this source can have an easier time upgrading.

Cheers
John

@anshbansal anshbansal added the community-contribution PR or Issue raised by member(s) of DataHub Community label Jan 13, 2023
@jjoyce0510
Copy link
Collaborator

Ping on this one. Do you mind adding a note for this change in updating-datahub.md?

@jaegwonseo
Copy link
Contributor Author

jaegwonseo commented Jan 23, 2023

Ping on this one. Do you mind adding a note for this change in updating-datahub.md?
@jjoyce0510
i'm sorry for late replying
if this approach is correct, i'm gonna update updating-datahub.md

@jjoyce0510
Copy link
Collaborator

Yes i think it looks okay. Cheers!

regular = f"{schema}.{table}"
if self.database_alias:
return f"{self.database_alias.lower()}.{regular}"
if self.database:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question - why do we need to lower the database here? I'd prefer to retain casing where possible, and lower at a global place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjoyce0510
Copy link
Collaborator

(One comment)

Copy link
Collaborator

@anshbansal anshbansal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a flag to not have this behaviour by default. Orgs might have really large number of datasets. We cannot suddenly do such breaking change as the default.

get_identifier use add_database_name_to_urn parameter from OracleConfig
@jaegwonseo
Copy link
Contributor Author

jaegwonseo commented Jan 25, 2023

We need a flag to not have this behaviour by default. Orgs might have really large number of datasets. We cannot suddenly do such breaking change as the default.

@anshbansal
i add flag(add_database_name_to_urn) to OracleConfig
plz check it

@treff7es
Copy link
Contributor

@jaegwonseo, we had some concerns about this change, and please, can you make it disabled by default?
If this gets merged with default on, then it can potentially break existing Oracle ingestions (if stateful ingestion is enabled, then because of the urn change, it would soft delete the old datasets, which can potentially mean attached aspect as well will be removed like attached glossary terms, lineage links, etc..)

@jjoyce0510
Copy link
Collaborator

Checking in here @jaegwonseo - are you able to separate this into a flag?

@jaegwonseo
Copy link
Contributor Author

@jjoyce0510
I already added flag(add_database_name_to_urn)

@treff7es
add_database_name_to_urn has been added, and since it is false by default, it will basically work the same as before.

sorry for the late reply(covid 19)

Copy link
Collaborator

@jjoyce0510 jjoyce0510 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jaegwonseo Thank you for the hard work.

The change looks good to me. Will plan to merge once CI is green!

@jaegwonseo
Copy link
Contributor Author

@jjoyce0510
is this test fail related this pr?

@anshbansal anshbansal dismissed their stale review February 20, 2023 11:38

john approved

@chriscollins3456
Copy link
Collaborator

is this test fail related this pr?

re-running as it looked like an issue with the process (a setup job was still running) and not an actual failed test

@chriscollins3456 chriscollins3456 merged commit 3068e7f into datahub-project:master Feb 21, 2023
oleg-ruban pushed a commit to RChygir/datahub that referenced this pull request Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants