Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(chart-filter): Avoid column denormalization if not enabled #26199

Merged
merged 2 commits into from
Dec 8, 2023

Conversation

Vitor-Avila
Copy link
Contributor

SUMMARY

Avoid de-normalizing column names in case the engine supports it but column normalization is enabled in the dataset level. This is enabled by default to datasets created prior to this feature, to make sure that syncing columns wouldn't break existing charts/etc.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Before

Chart.Filter.issue.-.new.mov

After

Chart.Filter.fixed.-.new.mov

TESTING INSTRUCTIONS

  1. Create a dataset powered by an engine that supports column de-normalization (such as Snowflake). Note that:
    a. All columns are uppercase.
    b. Column normalization is disabled (under the SETTINGS tab).
  2. Modify the dataset, and enable column normalization.
  3. Save changes.
  4. Modify the dataset again, and sync columns. Note that all columns are now lowercase.
  5. Save changes.
  6. Create a new chart using this dataset, and drop any column in the FILTERS section.
  7. Validate the filter is showing available options in the dropdown.

ADDITIONAL INFORMATION

Copy link

codecov bot commented Dec 6, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (4d4b19e) 69.18% compared to head (7953e45) 69.18%.
Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #26199   +/-   ##
=======================================
  Coverage   69.18%   69.18%           
=======================================
  Files        1944     1944           
  Lines       75925    75928    +3     
  Branches     8451     8451           
=======================================
+ Hits        52531    52534    +3     
  Misses      21209    21209           
  Partials     2185     2185           
Flag Coverage Δ
hive 53.68% <25.00%> (-0.01%) ⬇️
mysql 78.10% <100.00%> (+<0.01%) ⬆️
postgres 78.19% <100.00%> (+<0.01%) ⬆️
presto 53.64% <25.00%> (-0.01%) ⬇️
python 82.88% <100.00%> (+<0.01%) ⬆️
sqlite 76.85% <100.00%> (+<0.01%) ⬆️
unit 55.81% <25.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense. The original naming could be better and definitely needs a cleanup, seems like we're using "denormalize" for "don't normalize". Ideally we'd have a boolean feature called something like needs_normalization, and we'd do normalization/denormalization as needed, otherwise we wouldn't touch column names.

@Vitor-Avila
Copy link
Contributor Author

Vitor-Avila commented Dec 7, 2023

thanks @betodealmeida. @hughhhh @villebro I know you've worked recently with this feature. Do you have any concerns with this PR? thank you!

@@ -1340,14 +1340,19 @@ def get_time_filter( # pylint: disable=too-many-arguments
)
return and_(*l)

def values_for_column(self, column_name: str, limit: int = 10000) -> list[Any]:
# always denormalize column name before querying for values
def values_for_column(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do our unit tests already have cases where it covers if denormalize_column returns the correct values for both true and false already? If not, I would add tests to confirm that the correct things are returned for when the flag is true since we're defaulting to false.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. However, this part of the codebase is very tricky to add tests for, so it may be difficult. If it's easy to add the test I suggest doing it, otherwise LGTM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sadpandajoe @villebro I just added some basic tests to #26220 (since this one got merged). It doesn't test the logic implemented in the DB engine level to denormalize a column (I believe this is DB-specific and would require a more complex setup), but should be at least testing the business logic.

Copy link
Member

@villebro villebro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the fix. My apologies for contributing to the confusing naming here, it seems I hadn't thought it fully through (is it really in fact denormalizing, or not normalizing etc)..

@@ -1340,14 +1340,19 @@ def get_time_filter( # pylint: disable=too-many-arguments
)
return and_(*l)

def values_for_column(self, column_name: str, limit: int = 10000) -> list[Any]:
# always denormalize column name before querying for values
def values_for_column(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good idea. However, this part of the codebase is very tricky to add tests for, so it may be difficult. If it's easy to add the test I suggest doing it, otherwise LGTM

@michael-s-molina
Copy link
Member

@Vitor-Avila Tip: If you include the text "Fixes: #26198" in the PR description, when the PR is merged, the issue is automatically closed. This only works for the description, not comments.

@michael-s-molina
Copy link
Member

I don't know why but when it's part of a checkbox (Has associated issue in the template) it does not work. To see if the link worked, you can check the original issue for a message saying that the issue will be closed by the PR.

@eschutho eschutho merged commit 05d7060 into apache:master Dec 8, 2023
33 checks passed
@Vitor-Avila
Copy link
Contributor Author

@michael-s-molina thanks for the tip! I remember a previous PR I created did automatically closed the bug, but I never understood why. I'll make sure to include that out of the checkbox next time 🙌

@michael-s-molina michael-s-molina added v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch v3.1 Label added by the release manager to track PRs to be included in the 3.1 branch labels Dec 8, 2023
michael-s-molina pushed a commit that referenced this pull request Dec 8, 2023
michael-s-molina pushed a commit that referenced this pull request Dec 8, 2023
jinghua-qa pushed a commit to preset-io/superset that referenced this pull request Dec 8, 2023
sadpandajoe pushed a commit to preset-io/superset that referenced this pull request Dec 11, 2023
josedev-union pushed a commit to Ortege-xyz/studio that referenced this pull request Jan 22, 2024
@mistercrunch mistercrunch added 🍒 3.0.3 🍒 3.0.4 🍒 3.1.0 🍒 3.1.1 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels labels Mar 8, 2024
sfirke pushed a commit to sfirke/superset that referenced this pull request Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels preset:2023.49 size/S v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch v3.1 Label added by the release manager to track PRs to be included in the 3.1 branch 🍒 3.0.3 🍒 3.0.4 🍒 3.1.0 🍒 3.1.1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants