-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ingest/oracle) add database name to oracle urn name #7016
fix(ingest/oracle) add database name to oracle urn name #7016
Conversation
urn format : schema.table -> database.schema.table
normalize oracle database name
apply lint fix
change test resource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi there! This PR is somewhat concerning because it will change the URN structure for existing users of the Oracle source.
This means that if users are NOT using stateful ingestion, then they will end up with duplicates entities in DataHub representing the same tables.
I'm going to enlist @mayurinehate and @hsheth2 to evaluate the changes to understand why this was not included as part of the URN in the first place.
If this indeed is the correct approach (e.g. fixing a bug), we'll need to add specific instructions and notes inside of updating-datahub.md so that existing users of this source can have an easier time upgrading.
Cheers
John
Ping on this one. Do you mind adding a note for this change in |
|
Yes i think it looks okay. Cheers! |
regular = f"{schema}.{table}" | ||
if self.database_alias: | ||
return f"{self.database_alias.lower()}.{regular}" | ||
if self.database: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question - why do we need to lower the database here? I'd prefer to retain casing where possible, and lower at a global place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
schema, table is already normalized from below
https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/sql/oracle.py#L100
https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/sql/oracle.py#L132
so I tried to normalize it inside this function.
but i'think, it's better to normalize at global place
i remove lower
(One comment) |
remove lower from oracle get_identifier method
update updating-datahub.md
change test resource
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need a flag to not have this behaviour by default. Orgs might have really large number of datasets. We cannot suddenly do such breaking change as the default.
get_identifier use add_database_name_to_urn parameter from OracleConfig
@anshbansal |
@jaegwonseo, we had some concerns about this change, and please, can you make it disabled by default? |
Checking in here @jaegwonseo - are you able to separate this into a flag? |
@jjoyce0510 @treff7es sorry for the late reply(covid 19) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaegwonseo Thank you for the hard work.
The change looks good to me. Will plan to merge once CI is green!
@jjoyce0510 |
re-running as it looked like an issue with the process (a setup job was still running) and not an actual failed test |
Summary
change urn name from oracle ingest like postgresql
Issue
#6977
Checklist