Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(dataset-import): support empty strings for extra fields #24663

Merged
merged 2 commits into from
Jul 13, 2023
Merged

fix(dataset-import): support empty strings for extra fields #24663

merged 2 commits into from
Jul 13, 2023

Conversation

Vitor-Avila
Copy link
Contributor

SUMMARY

During the import process, the extra value for datasets is loaded as a dictionary. This operation fails in case the extra key has an empty string (extra: "").

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

TESTING INSTRUCTIONS

  1. Edit an existing dataset.
  2. Add any information to the extra value (in the UI).
  3. Save changes.
  4. Modify the dataset again.
  5. Remove your text changes.
  6. Save changes. The dataset now has extra: "".
  7. Export the dataset.
  8. Import it back.

ADDITIONAL INFORMATION

* fix(dataset-import):support empty strings for extra fields
* Adding unit test
* black update
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on making your first PR and thank you for contributing to Superset! 🎉 ❤️

We hope to see you in our Slack community too! Not signed up? Use our Slack App to self-register.

@codecov
Copy link

codecov bot commented Jul 11, 2023

Codecov Report

Merging #24663 (719a633) into master (4881328) will decrease coverage by 0.08%.
The diff coverage is 63.67%.

❗ Current head 719a633 differs from pull request most recent head 02a2626. Consider uploading reports for the commit 02a2626 to get more accurate results

@@            Coverage Diff             @@
##           master   #24663      +/-   ##
==========================================
- Coverage   69.05%   68.97%   -0.08%     
==========================================
  Files        1907     1902       -5     
  Lines       74151    74011     -140     
  Branches     8182     8186       +4     
==========================================
- Hits        51204    51049     -155     
- Misses      20824    20841      +17     
+ Partials     2123     2121       -2     
Flag Coverage Δ
hive 54.12% <23.95%> (-0.02%) ⬇️
mysql 79.35% <77.24%> (-0.13%) ⬇️
postgres 79.44% <77.24%> (-0.13%) ⬇️
presto 54.02% <23.95%> (-0.02%) ⬇️
python 83.45% <79.64%> (-0.04%) ⬇️
sqlite 78.03% <68.86%> (?)
unit 54.84% <29.34%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...frontend/src/SqlLab/components/SqlEditor/index.jsx 52.73% <0.00%> (-7.16%) ⬇️
superset-frontend/src/components/Icons/Icon.tsx 0.00% <ø> (ø)
...Filters/FilterBar/FilterControls/FilterControl.tsx 27.69% <0.00%> (-0.44%) ⬇️
...veFilters/FilterBar/FilterControls/FilterValue.tsx 5.76% <ø> (+0.10%) ⬆️
...tersConfigModal/FiltersConfigForm/DefaultValue.tsx 9.09% <0.00%> (ø)
...mponents/nativeFilters/FiltersConfigModal/utils.ts 46.96% <0.00%> (ø)
...nd/src/dashboard/components/nativeFilters/state.ts 64.28% <ø> (ø)
.../src/explore/components/ControlPanelsContainer.tsx 69.76% <0.00%> (ø)
...features/annotationLayers/AnnotationLayerModal.tsx 58.97% <ø> (ø)
...ntend/src/features/annotations/AnnotationModal.tsx 50.98% <ø> (ø)
... and 55 more

... and 7 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

Copy link
Member

@betodealmeida betodealmeida left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

superset/datasets/schemas.py Outdated Show resolved Hide resolved
if extra.strip():
data["extra"] = json.loads(extra)
else:
data["extra"] = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want to try the EAFP approach here.. it could simplify this logic. Something like:

Suggested change
data["extra"] = {}
if isinstance(data.get("extra"), str):
try:
data["extra"] = json.loads(data["extra"])
except JSONDecodeError:
data["extra"] = {}

Copy link
Member

@eschutho eschutho Jul 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@john-bodley had also suggested that we could try to fix the underlying code in order to make the data structures consistent. Unless you're feeling ambitious in this PR we could also fix the default in the import to be the correct format in a subsequent PR. I assume it should be an empty dictionary?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course @betodealmeida's solution with the ternary is even shorter. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up using try/except, because if we use the walrus operator and the extra field is an empty string the if block wouldn't be executed and the import would still fail. Also defaulted the extra value to None since the schema actually accepts it.

I 100% agree this PR only fixes the "symptom" but doesn't address the cause. I can try to work on that in a future PR. Some thoughts to consider:

  • Currently the dataset modification modal doesn't validate the text added to the extra field, so users can add strings.
  • The string would be successfully exported, but then the import would fail until this PR. Once this gets merged, if the extra is set to a string in the YAML, its value would be discarded and extra would be set to None during import.

We could either:

  • Apply validation in the dataset modification form so that it only accepts json/dict data in the extra field: I think it's a stable solution and prevents future data type variations. However, users that currently have a string set could struggle to update the datasets in case the error message is not clear.
  • Modify the dataset schema so that the extra field accepts either a dictionary or a string: not really helpful in ensuring a data type, but should allow users with strings to import their existing datasets and reflect the information.

@betodealmeida @eschutho @john-bodley any thoughts? Also if you could approve the tests, that would be awesome! Thanks!

* fix(dataset-import):support empty strings for extra fields
* Adding unit tests
* black update
* Simplifying logic
* Updating tests
@eschutho
Copy link
Member

This is great, thanks @Vitor-Avila!

@eschutho eschutho merged commit 65fb8e1 into apache:master Jul 13, 2023
29 checks passed
@michael-s-molina michael-s-molina added the v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch label Jul 19, 2023
michael-s-molina pushed a commit that referenced this pull request Jul 26, 2023
@mistercrunch mistercrunch added 🍒 3.0.0 🍒 3.0.1 🍒 3.0.2 🍒 3.0.3 🍒 3.0.4 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels 🚢 3.1.0 labels Mar 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/M v3.0 Label added by the release manager to track PRs to be included in the 3.0 branch 🍒 3.0.0 🍒 3.0.1 🍒 3.0.2 🍒 3.0.3 🍒 3.0.4 🚢 3.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Import dataset fails in case extra value was added and then removed through the UI
6 participants