Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(oracle): denormalize column names where applicable #24471

Merged
merged 1 commit into from
Jun 21, 2023

Conversation

villebro
Copy link
Member

@villebro villebro commented Jun 21, 2023

SUMMARY

Currently all databases store physical datasets with normalized column names, and virtual datasets with denormalized column names. This is problematic for Oracle-like databases where the normalized name is typically all lowercase, and the denormalized one is ALL UPPERCASE. This causes issues when using native filters on dashboards, as the column references will be different, depending on whether or not the chart is built on a physical or virtual dataset.

To fix this, this PR proposes calling dialect.denormalize_name() on dialects that have set the requires_name_normalize property to True. This ensures that databases like Oracle and Snowflake will have the same case for both physical and virtual datasets.

We add a unit test to verify that Oracle changes the case correctly, and that MSSQL doesn't do any conversion. We don't add a Snowflake test, as our current testing requirements don't include the necessary Snowflake drivers, and I don't want to bloat our testing dependencies.

Note that there's no migration for migrating existing physical datasets, as that would have introduced significant risks. Therefore only datasets created after this PR will be affected. Consequently anyone who wants to migrate old datasets to the new format needs to manyally sync the columns on the dataset modal.

AFTER

After the change, Snowflake will denormalize column names, typically resulting in ALL UPPERCASE:
image

BEFORE

Before physical datasets would usually store the column names as all lowercase on Snowflake:
image

TESTING INSTRUCTIONS

  1. Create a physical dataset on Snowflake
  2. Create a select * from ... virtual dataset referencing the same table
  3. Notice how one has all lowercase, and the other ALL UPPERCASE.

ADDITIONAL INFORMATION

  • Has associated issue: Fixes Snowflake: Inconsistent column name case #18085
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@villebro
Copy link
Member Author

Ping @rumbin @agusfigueroa-htg

Comment on lines +79 to +80
"name": name,
"column_name": name,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A recently merged PR changed "name" to "column_name", but left "name": #24248 To be consistent with this, I'm setting both.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@villebro we were worried about backwards compatibility, since some clients/users are only using the API. Looking to deprecate name in the future

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context @hughhhh - makes sense 👍 Let's try to remove these when we do the next round of breaking changes.

@codecov
Copy link

codecov bot commented Jun 21, 2023

Codecov Report

Merging #24471 (e11b8c1) into master (d5f88c1) will decrease coverage by 0.08%.
The diff coverage is 100.00%.

❗ Current head e11b8c1 differs from pull request most recent head 35a4b05. Consider uploading reports for the commit 35a4b05 to get more accurate results

@@            Coverage Diff             @@
##           master   #24471      +/-   ##
==========================================
- Coverage   69.03%   68.95%   -0.08%     
==========================================
  Files        1901     1901              
  Lines       74002    74008       +6     
  Branches     8116     8116              
==========================================
- Hits        51086    51034      -52     
- Misses      20805    20863      +58     
  Partials     2111     2111              
Flag Coverage Δ
hive 53.93% <83.33%> (+<0.01%) ⬆️
mysql 79.42% <83.33%> (+<0.01%) ⬆️
postgres ?
presto 53.83% <33.33%> (-0.01%) ⬇️
python 83.33% <100.00%> (-0.17%) ⬇️
sqlite ?
unit 54.68% <83.33%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
superset/connectors/sqla/utils.py 88.04% <100.00%> (-2.07%) ⬇️
superset/db_engine_specs/base.py 89.23% <100.00%> (-1.66%) ⬇️

... and 16 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@rumbin
Copy link
Contributor

rumbin commented Jun 21, 2023

@villebro this is great!

@rusackas
Copy link
Member

Loving this! @villebro and I have been talking about this for a while now, and I'm happy to see that this solution tackles the intricate issue in a clear and straightforward way. The PR description of the fix, the tests, and the code itself all look fantastic to me.

If anyone reading this thread has access to Oracle/Snowflake test DBs, or knows their intricacies, please consider adding your name/handle to the Database Rolodex. Unfortunately we have no additional folks to ping for Oracle/Snowflake... yet.

@villebro villebro merged commit ea89949 into apache:master Jun 21, 2023
32 checks passed
@villebro villebro deleted the villebro/snowflake-case branch June 22, 2023 06:31
@agusfigueroa-htg
Copy link

Late to the party here, thank you very much @villebro !

Also looking forward to seeing how name/column_name ends up looking like after the next round of breaking changes (we have quite a few API flows in place).

sadpandajoe added a commit to preset-io/superset that referenced this pull request Jul 17, 2023
jinghua-qa added a commit to preset-io/superset that referenced this pull request Jul 21, 2023
@eschutho
Copy link
Member

eschutho commented Aug 1, 2023

HI @villebro I believe we found a breaking change with this PR when a chart has been created with lower case columns that are later synced and changed to uppercase. The chart can no longer find those columns. Have you heard of any other reports? Do you think we could fast-follow with a fix? (This is currently reverted from our local branch but we'd like to pull it back in!)

sadpandajoe added a commit to preset-io/superset that referenced this pull request Aug 2, 2023
@eschutho eschutho added the risk:breaking-change Issues or PRs that will introduce breaking changes label Aug 7, 2023
@michael-s-molina
Copy link
Member

michael-s-molina commented Aug 9, 2023

@villebro Should we revert this on 3.0 or are we planning on a follow-up?

@villebro
Copy link
Member Author

villebro commented Aug 9, 2023

@villebro Should we revert this on 3.0 or are we planning on a follow-up?

Hi @michael-s-molina, I'm working on a fix, and I should have it ready by the end of this week.

@mistercrunch mistercrunch added 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.0.0 labels Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels risk:breaking-change Issues or PRs that will introduce breaking changes size/M 🚢 3.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Snowflake: Inconsistent column name case
8 participants