fix(result_set): preserve unicode characters in stringified nested va…#39712
fix(result_set): preserve unicode characters in stringified nested va…#39712shojiiii wants to merge 2 commits into
Conversation
…lues Pass `ensure_ascii=False` through `json.dumps` so that non-ASCII characters (CJK, accented Latin, emoji, etc.) in nested column values are serialised as-is instead of being escaped to `\uXXXX` sequences.
Code Review Agent Run #414dcbActionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR updates Superset’s JSON serialization used by SupersetResultSet.stringify() so nested values (e.g., ARRAY<STRING>) preserve Unicode characters instead of escaping them as \uXXXX.
Changes:
- Adds an
ensure_asciiparameter tosuperset.utils.json.dumps(defaulting toTruefor backward compatibility). - Updates
result_set.stringify()to call JSON dumps withensure_ascii=Falseto preserve Unicode output. - Adds a parametrized unit test covering ASCII, CJK, accented Latin, and emoji nested values.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
superset/utils/json.py |
Threads ensure_ascii through the JSON wrapper so callers can control Unicode escaping. |
superset/result_set.py |
Uses ensure_ascii=False when stringifying nested values to keep Unicode readable. |
tests/unit_tests/result_set_test.py |
Adds coverage ensuring nested stringification preserves Unicode characters. |
| obj: Any, | ||
| default: Optional[Callable[[Any], Any]] = json_iso_dttm_ser, | ||
| allow_nan: bool = False, | ||
| ignore_nan: bool = True, | ||
| sort_keys: bool = False, | ||
| indent: Union[str, int, None] = None, | ||
| separators: Union[tuple[str, str], None] = None, | ||
| cls: Union[type[simplejson.JSONEncoder], None] = None, | ||
| encoding: Optional[str] = "utf-8", | ||
| ensure_ascii: bool = True, | ||
| ) -> str: |
There was a problem hiding this comment.
Added docstring entries for both \encoding and ensure_ascii in superset.utils.json.dumps() so the new behavior is documented explicitly in ff5ae81.
| @pytest.mark.parametrize( | ||
| ("nested_value", "expected"), | ||
| [ | ||
| pytest.param( | ||
| ["ASCII", "plain text"], | ||
| '["ASCII", "plain text"]', | ||
| id="ascii", | ||
| ), | ||
| pytest.param( | ||
| ["日本語", "ひらがな"], | ||
| '["日本語", "ひらがな"]', | ||
| id="japanese", | ||
| ), | ||
| pytest.param( | ||
| ["móre", "áccent"], | ||
| '["móre", "áccent"]', | ||
| id="accented-latin", | ||
| ), | ||
| pytest.param( | ||
| ["emoji", "😁"], | ||
| '["emoji", "😁"]', | ||
| id="emoji", | ||
| ), | ||
| ], | ||
| ) |
There was a problem hiding this comment.
Updated the test to avoid asserting exact JSON formatting in ff5ae81.
It now checks semantic equality via json.loads(), verifies that the serialized value does not contain \u escapes, and confirms that the Unicode characters a represent in the raw string.
| result_set = SupersetResultSet(data, description, BaseEngineSpec) | ||
| df = result_set.to_pandas_df() | ||
|
|
||
| assert df["tags"].iloc[0] == expected |
There was a problem hiding this comment.
Updated the test to avoid asserting exact JSON formatting in ff5ae81.
It now checks semantic equality via json.loads(), verifies that the serialized value does not contain \u escapes, and confirms that the Unicode characters a represent in the raw string.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #39712 +/- ##
==========================================
- Coverage 64.48% 64.46% -0.02%
==========================================
Files 2566 2566
Lines 133926 133969 +43
Branches 31096 31104 +8
==========================================
+ Hits 86357 86363 +6
- Misses 46074 46111 +37
Partials 1495 1495
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
✅ Deploy Preview for superset-docs-preview ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
Code Review Agent Run #5ea529Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
SUMMARY
When nested column values (e.g.
ARRAY<STRING>) are serialised viastringify(), the JSON encoder was called withensure_ascii=True(the default). This caused all non-ASCII characters — CJK, accented
Latin, emoji, etc. — to be escaped to
\uXXXXsequences, making thevalues unreadable in query results.
The fix threads an
ensure_asciiparameter throughsuperset.utils.json.dumps(defaulting to
Truefor backwards compatibility) and passesensure_ascii=Falsefromresult_set.stringify()so that Unicodecharacters are preserved as-is.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A — backend-only change, no UI impact.
TESTING INSTRUCTIONS
New parametrized test
test_stringify_nested_values_preserves_unicodecovers ASCII, Japanese (CJK), accented-Latin, and emoji values.
ADDITIONAL INFORMATION