fix(ingest/oracle) add database name to oracle urn name #7016

jaegwonseo · 2023-01-12T07:54:08Z

Summary

change urn name from oracle ingest like postgresql

schema.table -> database.schema.table

Issue

#6977

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

urn format : schema.table -> database.schema.table

normalize oracle database name

apply lint fix

change test resource

jjoyce0510

Hi there! This PR is somewhat concerning because it will change the URN structure for existing users of the Oracle source.

This means that if users are NOT using stateful ingestion, then they will end up with duplicates entities in DataHub representing the same tables.

I'm going to enlist @mayurinehate and @hsheth2 to evaluate the changes to understand why this was not included as part of the URN in the first place.

If this indeed is the correct approach (e.g. fixing a bug), we'll need to add specific instructions and notes inside of updating-datahub.md so that existing users of this source can have an easier time upgrading.

Cheers
John

metadata-ingestion/src/datahub/ingestion/source/sql/oracle.py

jjoyce0510 · 2023-01-18T17:33:26Z

Ping on this one. Do you mind adding a note for this change in updating-datahub.md?

jaegwonseo · 2023-01-23T13:38:05Z

Ping on this one. Do you mind adding a note for this change in updating-datahub.md?
@jjoyce0510
i'm sorry for late replying
if this approach is correct, i'm gonna update updating-datahub.md

jjoyce0510 · 2023-01-24T02:48:42Z

Yes i think it looks okay. Cheers!

jjoyce0510 · 2023-01-24T02:49:29Z

metadata-ingestion/src/datahub/ingestion/source/sql/oracle.py

+        regular = f"{schema}.{table}"
+        if self.database_alias:
+            return f"{self.database_alias.lower()}.{regular}"
+        if self.database:


Question - why do we need to lower the database here? I'd prefer to retain casing where possible, and lower at a global place

schema, table is already normalized from below
https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/sql/oracle.py#L100
https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/source/sql/oracle.py#L132

so I tried to normalize it inside this function.

but i'think, it's better to normalize at global place

i remove lower

jjoyce0510 · 2023-01-24T02:49:41Z

(One comment)

remove lower from oracle get_identifier method

update updating-datahub.md

change test resource

anshbansal

We need a flag to not have this behaviour by default. Orgs might have really large number of datasets. We cannot suddenly do such breaking change as the default.

get_identifier use add_database_name_to_urn parameter from OracleConfig

jaegwonseo · 2023-01-25T11:55:18Z

We need a flag to not have this behaviour by default. Orgs might have really large number of datasets. We cannot suddenly do such breaking change as the default.

@anshbansal
i add flag(add_database_name_to_urn) to OracleConfig
plz check it

apply lint

treff7es · 2023-01-25T18:00:59Z

@jaegwonseo, we had some concerns about this change, and please, can you make it disabled by default?
If this gets merged with default on, then it can potentially break existing Oracle ingestions (if stateful ingestion is enabled, then because of the urn change, it would soft delete the old datasets, which can potentially mean attached aspect as well will be removed like attached glossary terms, lineage links, etc..)

jjoyce0510 · 2023-01-30T19:45:55Z

Checking in here @jaegwonseo - are you able to separate this into a flag?

jaegwonseo · 2023-02-10T03:48:51Z

@jjoyce0510
I already added flag(add_database_name_to_urn)

@treff7es
add_database_name_to_urn has been added, and since it is false by default, it will basically work the same as before.

sorry for the late reply(covid 19)

jjoyce0510

@jaegwonseo Thank you for the hard work.

The change looks good to me. Will plan to merge once CI is green!

jaegwonseo · 2023-02-20T02:01:31Z

@jjoyce0510
is this test fail related this pr?

john approved

chriscollins3456 · 2023-02-21T16:25:22Z

is this test fail related this pr?

re-running as it looked like an issue with the process (a setup job was still running) and not an actual failed test

…ect#7016)

jaegwonseo added 2 commits January 12, 2023 16:21

fix(ingest) add database name to oracle urn name

0d31dde

urn format : schema.table -> database.schema.table

fix(ingest) add database name to oracle urn name

2ff8a97

normalize oracle database name

github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Jan 12, 2023

jaegwonseo mentioned this pull request Jan 12, 2023

dataset from oracle ingest doesn't match to kafka-connect lineage #6977

Closed

jaegwonseo added 2 commits January 12, 2023 17:32

fix(ingest) add database name to oracle urn name

6b332f8

apply lint fix

fix(ingest) add database name to oracle urn name

2d29c40

change test resource

jjoyce0510 reviewed Jan 12, 2023

View reviewed changes

anshbansal added the community-contribution PR or Issue raised by member(s) of DataHub Community label Jan 13, 2023

Merge branch 'master' into oracle_urn

2c58e2b

jjoyce0510 reviewed Jan 18, 2023

View reviewed changes

metadata-ingestion/src/datahub/ingestion/source/sql/oracle.py Outdated Show resolved Hide resolved

jjoyce0510 reviewed Jan 24, 2023

View reviewed changes

jaegwonseo and others added 4 commits January 24, 2023 13:17

Merge branch 'master' into oracle_urn

dfda83d

fix(ingest) add database name to oracle urn name

adf7ae1

remove lower from oracle get_identifier method

fix(ingest) add database name to oracle urn name

1c3dde7

update updating-datahub.md

fix(ingest) add database name to oracle urn name

8cc2f5f

change test resource

anshbansal previously requested changes Jan 25, 2023

View reviewed changes

fix(ingest) add database name to oracle urn name

eaed5f7

get_identifier use add_database_name_to_urn parameter from OracleConfig

fix(ingest) add database name to oracle urn name

fb52edc

apply lint

jjoyce0510 added 2 commits February 17, 2023 13:32

Merge branch 'master' into oracle_urn

0a7e806

Update updating-datahub.md

10a6fd9

jjoyce0510 approved these changes Feb 17, 2023

View reviewed changes

chriscollins3456 merged commit 3068e7f into datahub-project:master Feb 21, 2023

oleg-ruban pushed a commit to RChygir/datahub that referenced this pull request Feb 28, 2023

fix(ingest/oracle) add database name to oracle urn name (datahub-proj…

a5856da

…ect#7016)

yoonhyejin pushed a commit that referenced this pull request Mar 3, 2023

fix(ingest/oracle) add database name to oracle urn name (#7016)

66be521

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ingest/oracle) add database name to oracle urn name #7016

fix(ingest/oracle) add database name to oracle urn name #7016

jaegwonseo commented Jan 12, 2023 •

edited

Loading

jjoyce0510 left a comment

jjoyce0510 commented Jan 18, 2023

jaegwonseo commented Jan 23, 2023 •

edited

Loading

jjoyce0510 commented Jan 24, 2023

jjoyce0510 Jan 24, 2023

jaegwonseo Jan 24, 2023

jjoyce0510 commented Jan 24, 2023

anshbansal left a comment

jaegwonseo commented Jan 25, 2023 •

edited

Loading

treff7es commented Jan 25, 2023

jjoyce0510 commented Jan 30, 2023

jaegwonseo commented Feb 10, 2023

jjoyce0510 left a comment

jaegwonseo commented Feb 20, 2023

chriscollins3456 commented Feb 21, 2023

fix(ingest/oracle) add database name to oracle urn name #7016

fix(ingest/oracle) add database name to oracle urn name #7016

Conversation

jaegwonseo commented Jan 12, 2023 • edited Loading

Summary

Issue

Checklist

jjoyce0510 left a comment

Choose a reason for hiding this comment

jjoyce0510 commented Jan 18, 2023

jaegwonseo commented Jan 23, 2023 • edited Loading

jjoyce0510 commented Jan 24, 2023

jjoyce0510 Jan 24, 2023

Choose a reason for hiding this comment

jaegwonseo Jan 24, 2023

Choose a reason for hiding this comment

jjoyce0510 commented Jan 24, 2023

anshbansal left a comment

Choose a reason for hiding this comment

jaegwonseo commented Jan 25, 2023 • edited Loading

treff7es commented Jan 25, 2023

jjoyce0510 commented Jan 30, 2023

jaegwonseo commented Feb 10, 2023

jjoyce0510 left a comment

Choose a reason for hiding this comment

jaegwonseo commented Feb 20, 2023

chriscollins3456 commented Feb 21, 2023

jaegwonseo commented Jan 12, 2023 •

edited

Loading

jaegwonseo commented Jan 23, 2023 •

edited

Loading

jaegwonseo commented Jan 25, 2023 •

edited

Loading