fix(dataset-import): support empty strings for extra fields #24663

Vitor-Avila · 2023-07-11T13:39:02Z

SUMMARY

During the import process, the extra value for datasets is loaded as a dictionary. This operation fails in case the extra key has an empty string (extra: "").

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

Edit an existing dataset.
Add any information to the extra value (in the UI).
Save changes.
Modify the dataset again.
Remove your text changes.
Save changes. The dataset now has extra: "".
Export the dataset.
Import it back.

ADDITIONAL INFORMATION

Has associated issue: Fixes Import dataset fails in case extra value was added and then removed through the UI #24662
Required feature flags:
Changes UI
Includes DB Migration (follow approval process in SIP-59)
- Migration is atomic, supports rollback & is backwards-compatible
- Confirm DB migration upgrade and downgrade tested
- Runtime estimates and downtime expectations provided
Introduces new feature or API
Removes existing feature or API

* fix(dataset-import):support empty strings for extra fields * Adding unit test * black update

github-actions

Congrats on making your first PR and thank you for contributing to Superset! 🎉 ❤️

We hope to see you in our Slack community too! Not signed up? Use our Slack App to self-register.

codecov · 2023-07-11T17:45:01Z

Codecov Report

Merging #24663 (719a633) into master (4881328) will decrease coverage by 0.08%.
The diff coverage is 63.67%.

❗ Current head 719a633 differs from pull request most recent head 02a2626. Consider uploading reports for the commit 02a2626 to get more accurate results

@@            Coverage Diff             @@
##           master   #24663      +/-   ##
==========================================
- Coverage   69.05%   68.97%   -0.08%     
==========================================
  Files        1907     1902       -5     
  Lines       74151    74011     -140     
  Branches     8182     8186       +4     
==========================================
- Hits        51204    51049     -155     
- Misses      20824    20841      +17     
+ Partials     2123     2121       -2

Flag	Coverage Δ
hive	`54.12% <23.95%> (-0.02%)`	⬇️
mysql	`79.35% <77.24%> (-0.13%)`	⬇️
postgres	`79.44% <77.24%> (-0.13%)`	⬇️
presto	`54.02% <23.95%> (-0.02%)`	⬇️
python	`83.45% <79.64%> (-0.04%)`	⬇️
sqlite	`78.03% <68.86%> (?)`
unit	`54.84% <29.34%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
...frontend/src/SqlLab/components/SqlEditor/index.jsx	`52.73% <0.00%> (-7.16%)`	⬇️
superset-frontend/src/components/Icons/Icon.tsx	`0.00% <ø> (ø)`
...Filters/FilterBar/FilterControls/FilterControl.tsx	`27.69% <0.00%> (-0.44%)`	⬇️
...veFilters/FilterBar/FilterControls/FilterValue.tsx	`5.76% <ø> (+0.10%)`	⬆️
...tersConfigModal/FiltersConfigForm/DefaultValue.tsx	`9.09% <0.00%> (ø)`
...mponents/nativeFilters/FiltersConfigModal/utils.ts	`46.96% <0.00%> (ø)`
...nd/src/dashboard/components/nativeFilters/state.ts	`64.28% <ø> (ø)`
.../src/explore/components/ControlPanelsContainer.tsx	`69.76% <0.00%> (ø)`
...features/annotationLayers/AnnotationLayerModal.tsx	`58.97% <ø> (ø)`
...ntend/src/features/annotations/AnnotationModal.tsx	`50.98% <ø> (ø)`
... and 55 more

... and 7 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

betodealmeida

Nice!

superset/datasets/schemas.py

eschutho · 2023-07-12T18:38:31Z

superset/datasets/schemas.py

+                if extra.strip():
+                    data["extra"] = json.loads(extra)
+                else:
+                    data["extra"] = {}


You may want to try the EAFP approach here.. it could simplify this logic. Something like:

Suggested change

data["extra"] = {}

if isinstance(data.get("extra"), str):

try:

data["extra"] = json.loads(data["extra"])

except JSONDecodeError:

data["extra"] = {}

@john-bodley had also suggested that we could try to fix the underlying code in order to make the data structures consistent. Unless you're feeling ambitious in this PR we could also fix the default in the import to be the correct format in a subsequent PR. I assume it should be an empty dictionary?

Of course @betodealmeida's solution with the ternary is even shorter. :)

I ended up using try/except, because if we use the walrus operator and the extra field is an empty string the if block wouldn't be executed and the import would still fail. Also defaulted the extra value to None since the schema actually accepts it.

I 100% agree this PR only fixes the "symptom" but doesn't address the cause. I can try to work on that in a future PR. Some thoughts to consider:

Currently the dataset modification modal doesn't validate the text added to the extra field, so users can add strings.

The string would be successfully exported, but then the import would fail until this PR. Once this gets merged, if the extra is set to a string in the YAML, its value would be discarded and extra would be set to None during import.

We could either:

Apply validation in the dataset modification form so that it only accepts json/dict data in the extra field: I think it's a stable solution and prevents future data type variations. However, users that currently have a string set could struggle to update the datasets in case the error message is not clear.

Modify the dataset schema so that the extra field accepts either a dictionary or a string: not really helpful in ensuring a data type, but should allow users with strings to import their existing datasets and reflect the information.

@betodealmeida @eschutho @john-bodley any thoughts? Also if you could approve the tests, that would be awesome! Thanks!

* fix(dataset-import):support empty strings for extra fields * Adding unit tests * black update * Simplifying logic * Updating tests

eschutho · 2023-07-13T23:18:55Z

This is great, thanks @Vitor-Avila!

(cherry picked from commit 65fb8e1)

fix(dataset-import): support empty strings for extra fields

daa89ef

* fix(dataset-import):support empty strings for extra fields * Adding unit test * black update

pull-request-size bot added the size/M label Jul 11, 2023

github-actions bot reviewed Jul 11, 2023

View reviewed changes

betodealmeida approved these changes Jul 12, 2023

View reviewed changes

superset/datasets/schemas.py Outdated Show resolved Hide resolved

eschutho reviewed Jul 12, 2023

View reviewed changes

Import fix (#2)

02a2626

* fix(dataset-import):support empty strings for extra fields * Adding unit tests * black update * Simplifying logic * Updating tests

rusackas approved these changes Jul 13, 2023

View reviewed changes

eschutho merged commit 65fb8e1 into apache:master Jul 13, 2023

michael-s-molina added the v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch label Jul 19, 2023

michael-s-molina pushed a commit that referenced this pull request Jul 26, 2023

fix(dataset-import): support empty strings for extra fields (#24663)

c94bee4

(cherry picked from commit 65fb8e1)

mistercrunch added 🍒 3.0.0 🍒 3.0.1 🍒 3.0.2 🍒 3.0.3 🍒 3.0.4 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.1.0 labels Mar 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(dataset-import): support empty strings for extra fields #24663

fix(dataset-import): support empty strings for extra fields #24663

Vitor-Avila commented Jul 11, 2023

github-actions bot left a comment

codecov bot commented Jul 11, 2023 •

edited

Loading

betodealmeida left a comment

eschutho Jul 12, 2023

eschutho Jul 12, 2023 •

edited

Loading

eschutho Jul 12, 2023

Vitor-Avila Jul 13, 2023

eschutho commented Jul 13, 2023

-                    data["extra"] = {}
+            if isinstance(data.get("extra"), str):
+                    try:
+                            data["extra"] = json.loads(data["extra"])
+                     except JSONDecodeError:
+                             data["extra"] = {}

fix(dataset-import): support empty strings for extra fields #24663

fix(dataset-import): support empty strings for extra fields #24663

Conversation

Vitor-Avila commented Jul 11, 2023

SUMMARY

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

github-actions bot left a comment

Choose a reason for hiding this comment

codecov bot commented Jul 11, 2023 • edited Loading

Codecov Report

betodealmeida left a comment

Choose a reason for hiding this comment

eschutho Jul 12, 2023

Choose a reason for hiding this comment

eschutho Jul 12, 2023 • edited Loading

Choose a reason for hiding this comment

eschutho Jul 12, 2023

Choose a reason for hiding this comment

Vitor-Avila Jul 13, 2023

Choose a reason for hiding this comment

eschutho commented Jul 13, 2023

codecov bot commented Jul 11, 2023 •

edited

Loading

eschutho Jul 12, 2023 •

edited

Loading