Skip to content

fix(csv): apply CSV_EXPORT encoding explicitly, Werkzeug 3 removed Response.charset#40801

Open
Bungic wants to merge 2 commits into
apache:masterfrom
Bungic:fix/csv-export-encoding-werkzeug3
Open

fix(csv): apply CSV_EXPORT encoding explicitly, Werkzeug 3 removed Response.charset#40801
Bungic wants to merge 2 commits into
apache:masterfrom
Bungic:fix/csv-export-encoding-werkzeug3

Conversation

@Bungic
Copy link
Copy Markdown

@Bungic Bungic commented Jun 5, 2026

SUMMARY

CSV_EXPORT["encoding"] currently has no effect on CSV downloads. CsvResponse sets it through the Response.charset class attribute, but Werkzeug deprecated charset in 2.3 and removed it in 3.0 (changelog). Since Superset pins werkzeug==3.1.6, the attribute is dead code and str bodies are always encoded as plain utf-8, whatever the config says.

The painful part: the default config is CSV_EXPORT = {"encoding": "utf-8-sig"}, which promises a BOM that never arrives. Without it, Excel decodes exports with the local Windows codepage and garbles every non-ASCII character. #36374 (Chinese) and #29410 (Arabic) are this bug; we hit it with Turkish data in emailed report CSVs on a production 4.1.4 install, where it took a while to track down because the multi-query ZIP export path encodes manually (query_data.encode(encoding) in charts/data/api.py) and was never broken. Only single-query exports are affected.

This PR encodes str bodies in CsvResponse.__init__, reading the encoding at request time. Bytes bodies pass through unchanged, so the ZIP path keeps working as before. #29506 tried to solve this by pinning Werkzeug back to 2.x and was closed; fixing the response class keeps Werkzeug 3.

First bytes of /api/v1/chart/<id>/data/?format=csv with utf-8-sig configured, before:

b',Sipari\xc5\x9f Ta'             -> Excel renders "SipariÅŸ"

after:

b'\xef\xbb\xbf,Sipari\xc5\x9f'    -> Excel renders "Sipariş"

Fixes #36374

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A, byte-level change (hex dumps above).

TESTING INSTRUCTIONS

  1. Keep CSV_EXPORT at its default.
  2. Export any chart with non-ASCII data as CSV, or download SQL Lab results as CSV.
  3. The file now starts with EF BB BF (xxd file.csv | head -1) and Excel opens it with correct characters.
  4. pytest tests/unit_tests/views/test_base.py -k csv_response

ADDITIONAL INFORMATION

  • Has associated issue: Fixes chinese characters appear garbled when export to CSV #36374
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

…sponse.charset

CsvResponse set the export encoding through the Response.charset class
attribute, which Werkzeug deprecated in 2.3 and removed in 3.0. With
werkzeug>=3 the override is ignored: str bodies are always plain utf-8
and the default utf-8-sig never emits its BOM, so Excel decodes exports
with the local codepage and garbles non-ASCII characters (apache#36374,
apache#29224, apache#29410).

Encode str bodies in CsvResponse.__init__ instead, reading the encoding
at request time. Bytes pass through unchanged, same as the multi-query
ZIP branch in charts/data/api.py which already encodes manually.

Fixes apache#36374
@dosubot dosubot Bot added the data:csv Related to import/export of CSVs label Jun 5, 2026
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented Jun 5, 2026

Code Review Agent Run #c317f6

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 1d29552..1d29552
    • superset/views/base.py
    • tests/unit_tests/views/test_base.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Comment thread superset/views/base.py Outdated
Comment on lines +740 to +742
encoding = app.config["CSV_EXPORT"].get("encoding", "utf-8")
if isinstance(response, str):
response = response.encode(encoding)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: This now unconditionally reads CSV_EXPORT from config before checking the payload type. If CSV_EXPORT is absent in a minimal/custom app config, CsvResponse raises KeyError even for byte payloads that do not need re-encoding. Read the config defensively (with .get) and only resolve the encoding inside the str branch so byte responses remain pass-through. [incorrect condition logic]

Severity Level: Major ⚠️
- ❌ Chart CSV exports crash if CSV_EXPORT missing.
- ❌ SQL Lab CSV exports crash under misconfigured CSV_EXPORT.
- ⚠️ Reusing CsvResponse in minimal Flask apps unsafe.
- ⚠️ Byte payload responses can raise unexpected KeyError.
Steps of Reproduction ✅
1. Create a minimal Flask app modeled after `tests/unit_tests/sqllab/api_test.py:10-17`,
but omit the configuration line `app.config["CSV_EXPORT"] = {"encoding": "utf-8"}` so the
Flask app has no `CSV_EXPORT` key in its config.

2. Within this app (inside `app.app_context()`), import `CsvResponse` from
`superset/views/base.py` (class defined at `superset/views/base.py:728-744`) and construct
a response with a bytes payload, e.g. `CsvResponse(b"test")`, similar to how `CsvResponse`
is used in `tests/unit_tests/views/test_base.py:22-26`.

3. When `CsvResponse.__init__` runs at `superset/views/base.py:739-743`, it executes
`encoding = app.config["CSV_EXPORT"].get("encoding", "utf-8")` (line 740 in the PR hunk),
which attempts to index `app.config["CSV_EXPORT"]` before checking `isinstance(response,
str)` at line 741.

4. Because `app.config` has no `"CSV_EXPORT"` key in this minimal/custom setup, Python
raises a `KeyError` at `superset/views/base.py:740`, causing the request/test to fail even
though the payload is already bytes and does not need re-encoding; this demonstrates that
unconditional config access can crash CSV responses for byte payloads when `CSV_EXPORT` is
absent.

Fix in Cursor | Fix in VSCode Claude

(Use Cmd/Ctrl + Click for best experience)

Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/views/base.py
**Line:** 740:742
**Comment:**
	*Incorrect Condition Logic: This now unconditionally reads `CSV_EXPORT` from config before checking the payload type. If `CSV_EXPORT` is absent in a minimal/custom app config, `CsvResponse` raises `KeyError` even for byte payloads that do not need re-encoding. Read the config defensively (with `.get`) and only resolve the encoding inside the `str` branch so byte responses remain pass-through.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fix
👍 | 👎

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CSV_EXPORT cannot be absent in a Superset app: it is defined unconditionally in superset/config.py, and every other read in the codebase subscripts it directly (charts/data/api.py, viz.py, common/query_context_processor.py, commands/sql_lab/export.py). Switching to .get("CSV_EXPORT", {}) here would imply the key is optional when it is not, and CsvResponse is internal to Superset rather than a generic Flask utility, so I would rather keep the access consistent and fail loudly on a broken config.

The bytes point is fair though: moved the config read inside the str branch in 7cb5d14, so byte payloads no longer touch config at all.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Customized review instruction saved!

Instruction:

Do not suggest making CSV_EXPORT optional in CsvResponse; it is required Superset config, so keep direct access and fail loudly on broken config.

Applied to:

  • superset/views/base.py

💡 To manage or update this instruction, visit: CodeAnt AI Settings

@bito-code-review
Copy link
Copy Markdown
Contributor

The flagged issue is correct. Accessing app.config["CSV_EXPORT"] directly will raise a KeyError if the key is missing in the application configuration. To resolve this, you should use .get() on the app.config object to safely retrieve the dictionary, and then access the encoding key from that result.

Here is the corrected implementation for superset/views/base.py:

    def __init__(self, response: Any = None, *args: Any, **kwargs: Any) -> None:
        csv_export = app.config.get("CSV_EXPORT", {})
        encoding = csv_export.get("encoding", "utf-8")
        if isinstance(response, str):
            response = response.encode(encoding)
        super().__init__(response, *args, **kwargs)

There are no other review comments in this pull request. Would you like me to proceed with any other changes?

superset/views/base.py

def __init__(self, response: Any = None, *args: Any, **kwargs: Any) -> None:
        csv_export = app.config.get("CSV_EXPORT", {})
        encoding = csv_export.get("encoding", "utf-8")
        if isinstance(response, str):
            response = response.encode(encoding)
        super().__init__(response, *args, **kwargs)

@Bungic
Copy link
Copy Markdown
Author

Bungic commented Jun 5, 2026

CSV_EXPORT is defined unconditionally in superset/config.py and direct subscript is the existing pattern everywhere else it is read (charts/data/api.py, viz.py, commands/sql_lab/export.py), so a missing key is not a reachable state in a Superset app and I would rather stay consistent with that. What I did take from the review: the config read now happens only inside the str branch (7cb5d14), so byte payloads pass through without touching config.

@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented Jun 5, 2026

Code Review Agent Run #50cf1a

Actionable Suggestions - 0
Review Details
  • Files reviewed - 1 · Commit Range: 1d29552..7cb5d14
    • superset/views/base.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes CSV download encoding regression under Werkzeug 3 by explicitly applying CSV_EXPORT["encoding"] when CsvResponse is constructed with a str body, restoring the expected BOM behavior for the default utf-8-sig configuration (notably for Excel compatibility).

Changes:

  • Encode str payloads inside CsvResponse.__init__ using the configured CSV_EXPORT["encoding"], while leaving bytes payloads unchanged.
  • Add unit tests to verify configured encoding is applied for str bodies and that bytes bodies pass through untouched.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
superset/views/base.py Updates CsvResponse to encode str bodies using CSV_EXPORT["encoding"] (Werkzeug 3 compatible).
tests/unit_tests/views/test_base.py Adds regression tests covering encoding application for str payloads and passthrough for bytes.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 5, 2026

Codecov Report

❌ Patch coverage is 40.00000% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.07%. Comparing base (45a6164) to head (7cb5d14).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
superset/views/base.py 40.00% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #40801      +/-   ##
==========================================
- Coverage   64.07%   64.07%   -0.01%     
==========================================
  Files        2664     2664              
  Lines      143831   143835       +4     
  Branches    33084    33085       +1     
==========================================
+ Hits        92160    92161       +1     
- Misses      50062    50064       +2     
- Partials     1609     1610       +1     
Flag Coverage Δ
hive 39.74% <20.00%> (-0.01%) ⬇️
mysql 58.42% <40.00%> (-0.01%) ⬇️
postgres 58.49% <40.00%> (-0.01%) ⬇️
presto 41.34% <20.00%> (-0.01%) ⬇️
python 59.96% <40.00%> (-0.01%) ⬇️
sqlite 58.12% <40.00%> (-0.01%) ⬇️
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data:csv Related to import/export of CSVs size/M

Projects

None yet

Development

Successfully merging this pull request may close these issues.

chinese characters appear garbled when export to CSV

2 participants