Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent datasets name case in UI #9276

Closed
remisalmon opened this issue Nov 20, 2023 · 8 comments · Fixed by #9613
Closed

Inconsistent datasets name case in UI #9276

remisalmon opened this issue Nov 20, 2023 · 8 comments · Fixed by #9613
Labels
bug Bug report

Comments

@remisalmon
Copy link
Contributor

remisalmon commented Nov 20, 2023

Describe the bug

The DataHub UI shows a (random?) mix of uppercase and lowercase dataset names for the same database/schema.

See the attached screenshot where Snowflake has only 1 SNOWPIPE.PUBLIC database/schema but DataHub shows 2 of those. The tables in this schema are split between those two.

To Reproduce
Steps to reproduce the behavior:

  1. Ingest a Snowflake source (tested with convert_urns_to_lowercase: true)
  2. Snowflake dataset are split between uppercase and lowercase in the UI

Expected behavior

The DataHub UI should not split datasets between uppercase and lowercase in the UI if their Snowflake identifiers are not explicitely uppercase or lowercase.

Snowflake query:

select table_schema, count(*)
from snowpipe.information_schema.tables
where table_schema in ('PUBLIC', 'public')
group by table_schema;

returns

TABLE_SCHEMA	COUNT(*)
PUBLIC	21

(21 = 13+8 in the screenshot...)

Screenshots

image

Desktop (please complete the following information):

  • OS: MacOS
  • Browser: Firefox
  • Version: DataHub v0.12.0
@remisalmon remisalmon added the bug Bug report label Nov 20, 2023
@remisalmon
Copy link
Contributor Author

Seeing the same issue with Postgres sources, even more confusing where instance, database and schema are all lowercase but show in separate folders:
image

@hsheth2
Copy link
Collaborator

hsheth2 commented Nov 29, 2023

@remisalmon I have a hypothesis for this, related to #9227

If you re-run the snowflake/postgres ingestion, does this problem appear to go away?

@remisalmon
Copy link
Contributor Author

@remisalmon I have a hypothesis for this, related to #9227

If you re-run the snowflake/postgres ingestion, does this problem appear to go away?

Hi @hsheth2 thanks for taking a look! Yes this does go away when I re-run the ingestion (even with stateful ingestion enabled):
image

This is with:

> cat .env
DATAHUB_VERSION="v0.11.0"
> datahub version
DataHub CLI version: 0.11.0.5
Python version: 3.11.6 (main, Oct  2 2023, 22:00:51) [Clang 14.0.3 (clang-1403.0.22.14.1)]

@glasalvia
Copy link

Hi! We have the same issue!

DataHub CLI version: 0.12.0.3
Python version: 3.9.16 (main, Sep 8 2023, 00:00:00)

But when we re run the ingestión the issue is not solved, any suggestion? Thanks!

Copy link

github-actions bot commented Jan 1, 2024

This issue is stale because it has been open for 30 days with no activity. If you believe this is still an issue on the latest DataHub release please leave a comment with the version that you tested it with. If this is a question/discussion please head to https://slack.datahubproject.io. For feature requests please use https://feature-requests.datahubproject.io

@github-actions github-actions bot added the stale label Jan 1, 2024
@JoakimNil
Copy link

We are also seeing the same issue on Snowflake, which doesn't disappear when rerunning the ingestion.

Datahub version: 0.12.1.3

@hsheth2
Copy link
Collaborator

hsheth2 commented Jan 9, 2024

@JoakimNil that's surprising! Can we get some more details about your setup - are you using quickstart or a helm deployment? Have you tweaked any settings e.g. GMS replicas, async ingest, standalone mae/mce consumers?

Edit: also, if you could use a file sink and send us the resulting JSON (can be shared in a private DM in slack if needed). I mainly want to look at the browsePathV2 aspects within that file.

@github-actions github-actions bot removed the stale label Jan 10, 2024
@JoakimNil
Copy link

@hsheth2 We're using quickstart, with all default settings. The only thing that's changed is adding the Snowflake source, with this config:

source:
    type: snowflake
    config:
        account_id: [REMOVED]
        warehouse: COMPUTE_WH
        username: datahub
        password: '${SNOWFLAKE_DATAHUB_PASSWORD}'
        incremental_lineage: true
        profiling:
            enabled: false
        stateful_ingestion:
            enabled: true

I will send you the resulting JSON in a DM in slack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants