Skip to content

fix(result_set): preserve unicode characters in stringified nested va…#39712

Open
shojiiii wants to merge 2 commits into
apache:masterfrom
shojiiii:fix/result-set-preserve-unicode
Open

fix(result_set): preserve unicode characters in stringified nested va…#39712
shojiiii wants to merge 2 commits into
apache:masterfrom
shojiiii:fix/result-set-preserve-unicode

Conversation

@shojiiii
Copy link
Copy Markdown

SUMMARY

When nested column values (e.g. ARRAY<STRING>) are serialised via
stringify(), the JSON encoder was called with ensure_ascii=True
(the default). This caused all non-ASCII characters — CJK, accented
Latin, emoji, etc. — to be escaped to \uXXXX sequences, making the
values unreadable in query results.

The fix threads an ensure_ascii parameter through superset.utils.json.dumps
(defaulting to True for backwards compatibility) and passes
ensure_ascii=False from result_set.stringify() so that Unicode
characters are preserved as-is.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A — backend-only change, no UI impact.

TESTING INSTRUCTIONS

pytest tests/unit_tests/result_set_test.py -v

New parametrized test test_stringify_nested_values_preserves_unicode
covers ASCII, Japanese (CJK), accented-Latin, and emoji values.

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration
  • Introduces new feature or API
  • Removes existing feature or API

…lues

Pass `ensure_ascii=False` through `json.dumps` so that non-ASCII
characters (CJK, accented Latin, emoji, etc.) in nested column values
are serialised as-is instead of being escaped to `\uXXXX` sequences.
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented Apr 28, 2026

Code Review Agent Run #414dcb

Actionable Suggestions - 0
Review Details
  • Files reviewed - 3 · Commit Range: 758fa01..758fa01
    • superset/result_set.py
    • superset/utils/json.py
    • tests/unit_tests/result_set_test.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@dosubot dosubot Bot added the change:backend Requires changing the backend label Apr 28, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR updates Superset’s JSON serialization used by SupersetResultSet.stringify() so nested values (e.g., ARRAY<STRING>) preserve Unicode characters instead of escaping them as \uXXXX.

Changes:

  • Adds an ensure_ascii parameter to superset.utils.json.dumps (defaulting to True for backward compatibility).
  • Updates result_set.stringify() to call JSON dumps with ensure_ascii=False to preserve Unicode output.
  • Adds a parametrized unit test covering ASCII, CJK, accented Latin, and emoji nested values.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
superset/utils/json.py Threads ensure_ascii through the JSON wrapper so callers can control Unicode escaping.
superset/result_set.py Uses ensure_ascii=False when stringifying nested values to keep Unicode readable.
tests/unit_tests/result_set_test.py Adds coverage ensuring nested stringification preserves Unicode characters.

Comment thread superset/utils/json.py
Comment on lines 188 to 198
obj: Any,
default: Optional[Callable[[Any], Any]] = json_iso_dttm_ser,
allow_nan: bool = False,
ignore_nan: bool = True,
sort_keys: bool = False,
indent: Union[str, int, None] = None,
separators: Union[tuple[str, str], None] = None,
cls: Union[type[simplejson.JSONEncoder], None] = None,
encoding: Optional[str] = "utf-8",
ensure_ascii: bool = True,
) -> str:
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added docstring entries for both \encoding and ensure_ascii in superset.utils.json.dumps() so the new behavior is documented explicitly in ff5ae81.

Comment on lines +150 to +174
@pytest.mark.parametrize(
("nested_value", "expected"),
[
pytest.param(
["ASCII", "plain text"],
'["ASCII", "plain text"]',
id="ascii",
),
pytest.param(
["日本語", "ひらがな"],
'["日本語", "ひらがな"]',
id="japanese",
),
pytest.param(
["móre", "áccent"],
'["móre", "áccent"]',
id="accented-latin",
),
pytest.param(
["emoji", "😁"],
'["emoji", "😁"]',
id="emoji",
),
],
)
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the test to avoid asserting exact JSON formatting in ff5ae81.

It now checks semantic equality via json.loads(), verifies that the serialized value does not contain \u escapes, and confirms that the Unicode characters a represent in the raw string.

Comment thread tests/unit_tests/result_set_test.py Outdated
result_set = SupersetResultSet(data, description, BaseEngineSpec)
df = result_set.to_pandas_df()

assert df["tags"].iloc[0] == expected
Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the test to avoid asserting exact JSON formatting in ff5ae81.

It now checks semantic equality via json.loads(), verifies that the serialized value does not contain \u escapes, and confirms that the Unicode characters a represent in the raw string.

@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.46%. Comparing base (b791f4c) to head (758fa01).
⚠️ Report is 236 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #39712      +/-   ##
==========================================
- Coverage   64.48%   64.46%   -0.02%     
==========================================
  Files        2566     2566              
  Lines      133926   133969      +43     
  Branches    31096    31104       +8     
==========================================
+ Hits        86357    86363       +6     
- Misses      46074    46111      +37     
  Partials     1495     1495              
Flag Coverage Δ
hive 39.74% <100.00%> (-0.03%) ⬇️
mysql 60.18% <100.00%> (-0.04%) ⬇️
postgres 60.26% <100.00%> (-0.04%) ⬇️
presto 41.51% <100.00%> (-0.03%) ⬇️
python 61.82% <100.00%> (-0.04%) ⬇️
sqlite 59.89% <100.00%> (-0.04%) ⬇️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 14, 2026

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit ff5ae81
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/6a05135ccb69140009dbc0a3
😎 Deploy Preview https://deploy-preview-39712--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
🤖 Make changes Run an agent on this branch

To edit notification comments on pull requests, go to your Netlify project configuration.

@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented May 15, 2026

Code Review Agent Run #5ea529

Actionable Suggestions - 0
Review Details
  • Files reviewed - 3 · Commit Range: 758fa01..ff5ae81
    • superset/utils/json.py
    • tests/unit_tests/result_set_test.py
    • tests/unit_tests/utils/json_tests.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:backend Requires changing the backend size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants