fix(oracle): denormalize column names where applicable #24471

villebro · 2023-06-21T13:52:56Z

SUMMARY

Currently all databases store physical datasets with normalized column names, and virtual datasets with denormalized column names. This is problematic for Oracle-like databases where the normalized name is typically all lowercase, and the denormalized one is ALL UPPERCASE. This causes issues when using native filters on dashboards, as the column references will be different, depending on whether or not the chart is built on a physical or virtual dataset.

To fix this, this PR proposes calling dialect.denormalize_name() on dialects that have set the requires_name_normalize property to True. This ensures that databases like Oracle and Snowflake will have the same case for both physical and virtual datasets.

We add a unit test to verify that Oracle changes the case correctly, and that MSSQL doesn't do any conversion. We don't add a Snowflake test, as our current testing requirements don't include the necessary Snowflake drivers, and I don't want to bloat our testing dependencies.

Note that there's no migration for migrating existing physical datasets, as that would have introduced significant risks. Therefore only datasets created after this PR will be affected. Consequently anyone who wants to migrate old datasets to the new format needs to manyally sync the columns on the dataset modal.

AFTER

After the change, Snowflake will denormalize column names, typically resulting in ALL UPPERCASE:

BEFORE

Before physical datasets would usually store the column names as all lowercase on Snowflake:

TESTING INSTRUCTIONS

Create a physical dataset on Snowflake
Create a select * from ... virtual dataset referencing the same table
Notice how one has all lowercase, and the other ALL UPPERCASE.

ADDITIONAL INFORMATION

Has associated issue: Fixes Snowflake: Inconsistent column name case #18085
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

villebro · 2023-06-21T13:53:36Z

Ping @rumbin @agusfigueroa-htg

villebro · 2023-06-21T13:58:10Z

superset/connectors/sqla/utils.py

+                        "name": name,
+                        "column_name": name,


A recently merged PR changed "name" to "column_name", but left "name": #24248 To be consistent with this, I'm setting both.

@villebro we were worried about backwards compatibility, since some clients/users are only using the API. Looking to deprecate name in the future

Thanks for the context @hughhhh - makes sense 👍 Let's try to remove these when we do the next round of breaking changes.

codecov · 2023-06-21T13:59:58Z

Codecov Report

Merging #24471 (e11b8c1) into master (d5f88c1) will decrease coverage by 0.08%.
The diff coverage is 100.00%.

❗ Current head e11b8c1 differs from pull request most recent head 35a4b05. Consider uploading reports for the commit 35a4b05 to get more accurate results

@@            Coverage Diff             @@
##           master   #24471      +/-   ##
==========================================
- Coverage   69.03%   68.95%   -0.08%     
==========================================
  Files        1901     1901              
  Lines       74002    74008       +6     
  Branches     8116     8116              
==========================================
- Hits        51086    51034      -52     
- Misses      20805    20863      +58     
  Partials     2111     2111

Flag	Coverage Δ
hive	`53.93% <83.33%> (+<0.01%)`	⬆️
mysql	`79.42% <83.33%> (+<0.01%)`	⬆️
postgres	`?`
presto	`53.83% <33.33%> (-0.01%)`	⬇️
python	`83.33% <100.00%> (-0.17%)`	⬇️
sqlite	`?`
unit	`54.68% <83.33%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
superset/connectors/sqla/utils.py	`88.04% <100.00%> (-2.07%)`	⬇️
superset/db_engine_specs/base.py	`89.23% <100.00%> (-1.66%)`	⬇️

... and 16 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

rumbin · 2023-06-21T14:08:50Z

@villebro this is great!

rusackas · 2023-06-21T19:13:43Z

Loving this! @villebro and I have been talking about this for a while now, and I'm happy to see that this solution tackles the intricate issue in a clear and straightforward way. The PR description of the fix, the tests, and the code itself all look fantastic to me.

If anyone reading this thread has access to Oracle/Snowflake test DBs, or knows their intricacies, please consider adding your name/handle to the Database Rolodex. Unfortunately we have no additional folks to ping for Oracle/Snowflake... yet.

agusfigueroa-htg · 2023-06-22T07:53:51Z

Late to the party here, thank you very much @villebro !

Also looking forward to seeing how name/column_name ends up looking like after the next round of breaking changes (we have quite a few API flows in place).

…e#24471)" This reverts commit ea89949.

eschutho · 2023-08-01T00:29:47Z

HI @villebro I believe we found a breaking change with this PR when a chart has been created with lower case columns that are later synced and changed to uppercase. The chart can no longer find those columns. Have you heard of any other reports? Do you think we could fast-follow with a fix? (This is currently reverted from our local branch but we'd like to pull it back in!)

…e#24471)" This reverts commit ea89949.

michael-s-molina · 2023-08-09T16:28:45Z

@villebro Should we revert this on 3.0 or are we planning on a follow-up?

villebro · 2023-08-09T16:30:51Z

@villebro Should we revert this on 3.0 or are we planning on a follow-up?

Hi @michael-s-molina, I'm working on a fix, and I should have it ready by the end of this week.

fix(oracle): denormalize column names where applicable

35a4b05

pull-request-size bot added the size/M label Jun 21, 2023

villebro mentioned this pull request Jun 21, 2023

Snowflake: Inconsistent column name case #18085

Closed

3 tasks

villebro commented Jun 21, 2023

View reviewed changes

rusackas requested review from rusackas and betodealmeida June 21, 2023 16:57

rusackas approved these changes Jun 21, 2023

View reviewed changes

villebro merged commit ea89949 into apache:master Jun 21, 2023
32 checks passed

villebro deleted the villebro/snowflake-case branch June 22, 2023 06:31

sadpandajoe added a commit to preset-io/superset that referenced this pull request Jul 17, 2023

Revert "fix(oracle): denormalize column names where applicable (apach…

d391d49

…e#24471)" This reverts commit ea89949.

jinghua-qa added a commit to preset-io/superset that referenced this pull request Jul 21, 2023

Revert "fix(oracle): denormalize column names where applicable (apach…

03cd6ca

…e#24471)" This reverts commit ea89949.

sadpandajoe added a commit to preset-io/superset that referenced this pull request Aug 2, 2023

Revert "fix(oracle): denormalize column names where applicable (apach…

4b41082

…e#24471)" This reverts commit ea89949.

eschutho added the risk:breaking-change Issues or PRs that will introduce breaking changes label Aug 7, 2023

villebro mentioned this pull request Aug 14, 2023

fix(snowflake): opt-in denormalization of column names #24982

Merged

9 tasks

villebro mentioned this pull request Nov 25, 2023

[SIP-84] Case-insensitive handling of datasets' column names #19773

Closed

Vitor-Avila mentioned this pull request Dec 6, 2023

Chart filter options are not populated for datasets with normalize column enabled #26198

Closed

3 tasks

mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.0.0 labels Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(oracle): denormalize column names where applicable #24471

fix(oracle): denormalize column names where applicable #24471

villebro commented Jun 21, 2023 •

edited

Loading

villebro commented Jun 21, 2023

villebro Jun 21, 2023

hughhhh Jun 22, 2023

villebro Jun 22, 2023

codecov bot commented Jun 21, 2023 •

edited

Loading

rumbin commented Jun 21, 2023

rusackas commented Jun 21, 2023

agusfigueroa-htg commented Jun 22, 2023

eschutho commented Aug 1, 2023

michael-s-molina commented Aug 9, 2023 •

edited

Loading

villebro commented Aug 9, 2023

fix(oracle): denormalize column names where applicable #24471

fix(oracle): denormalize column names where applicable #24471

Conversation

villebro commented Jun 21, 2023 • edited Loading

SUMMARY

AFTER

BEFORE

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

villebro commented Jun 21, 2023

villebro Jun 21, 2023

Choose a reason for hiding this comment

hughhhh Jun 22, 2023

Choose a reason for hiding this comment

villebro Jun 22, 2023

Choose a reason for hiding this comment

codecov bot commented Jun 21, 2023 • edited Loading

Codecov Report

rumbin commented Jun 21, 2023

rusackas commented Jun 21, 2023

agusfigueroa-htg commented Jun 22, 2023

eschutho commented Aug 1, 2023

michael-s-molina commented Aug 9, 2023 • edited Loading

villebro commented Aug 9, 2023

villebro commented Jun 21, 2023 •

edited

Loading

codecov bot commented Jun 21, 2023 •

edited

Loading

michael-s-molina commented Aug 9, 2023 •

edited

Loading