fix(csv): apply CSV_EXPORT encoding explicitly, Werkzeug 3 removed Response.charset#40801
fix(csv): apply CSV_EXPORT encoding explicitly, Werkzeug 3 removed Response.charset#40801Bungic wants to merge 2 commits into
Conversation
…sponse.charset CsvResponse set the export encoding through the Response.charset class attribute, which Werkzeug deprecated in 2.3 and removed in 3.0. With werkzeug>=3 the override is ignored: str bodies are always plain utf-8 and the default utf-8-sig never emits its BOM, so Excel decodes exports with the local codepage and garbles non-ASCII characters (apache#36374, apache#29224, apache#29410). Encode str bodies in CsvResponse.__init__ instead, reading the encoding at request time. Bytes pass through unchanged, same as the multi-query ZIP branch in charts/data/api.py which already encodes manually. Fixes apache#36374
Code Review Agent Run #c317f6Actionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
| encoding = app.config["CSV_EXPORT"].get("encoding", "utf-8") | ||
| if isinstance(response, str): | ||
| response = response.encode(encoding) |
There was a problem hiding this comment.
Suggestion: This now unconditionally reads CSV_EXPORT from config before checking the payload type. If CSV_EXPORT is absent in a minimal/custom app config, CsvResponse raises KeyError even for byte payloads that do not need re-encoding. Read the config defensively (with .get) and only resolve the encoding inside the str branch so byte responses remain pass-through. [incorrect condition logic]
Severity Level: Major ⚠️
- ❌ Chart CSV exports crash if CSV_EXPORT missing.
- ❌ SQL Lab CSV exports crash under misconfigured CSV_EXPORT.
- ⚠️ Reusing CsvResponse in minimal Flask apps unsafe.
- ⚠️ Byte payload responses can raise unexpected KeyError.Steps of Reproduction ✅
1. Create a minimal Flask app modeled after `tests/unit_tests/sqllab/api_test.py:10-17`,
but omit the configuration line `app.config["CSV_EXPORT"] = {"encoding": "utf-8"}` so the
Flask app has no `CSV_EXPORT` key in its config.
2. Within this app (inside `app.app_context()`), import `CsvResponse` from
`superset/views/base.py` (class defined at `superset/views/base.py:728-744`) and construct
a response with a bytes payload, e.g. `CsvResponse(b"test")`, similar to how `CsvResponse`
is used in `tests/unit_tests/views/test_base.py:22-26`.
3. When `CsvResponse.__init__` runs at `superset/views/base.py:739-743`, it executes
`encoding = app.config["CSV_EXPORT"].get("encoding", "utf-8")` (line 740 in the PR hunk),
which attempts to index `app.config["CSV_EXPORT"]` before checking `isinstance(response,
str)` at line 741.
4. Because `app.config` has no `"CSV_EXPORT"` key in this minimal/custom setup, Python
raises a `KeyError` at `superset/views/base.py:740`, causing the request/test to fail even
though the payload is already bytes and does not need re-encoding; this demonstrates that
unconditional config access can crash CSV responses for byte payloads when `CSV_EXPORT` is
absent.Fix in Cursor | Fix in VSCode Claude
(Use Cmd/Ctrl + Click for best experience)
Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** superset/views/base.py
**Line:** 740:742
**Comment:**
*Incorrect Condition Logic: This now unconditionally reads `CSV_EXPORT` from config before checking the payload type. If `CSV_EXPORT` is absent in a minimal/custom app config, `CsvResponse` raises `KeyError` even for byte payloads that do not need re-encoding. Read the config defensively (with `.get`) and only resolve the encoding inside the `str` branch so byte responses remain pass-through.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
Once fix is implemented, also check other comments on the same PR, and ask user if the user wants to fix the rest of the comments as well. if said yes, then fetch all the comments validate the correctness and implement a minimal fixThere was a problem hiding this comment.
CSV_EXPORT cannot be absent in a Superset app: it is defined unconditionally in superset/config.py, and every other read in the codebase subscripts it directly (charts/data/api.py, viz.py, common/query_context_processor.py, commands/sql_lab/export.py). Switching to .get("CSV_EXPORT", {}) here would imply the key is optional when it is not, and CsvResponse is internal to Superset rather than a generic Flask utility, so I would rather keep the access consistent and fail loudly on a broken config.
The bytes point is fair though: moved the config read inside the str branch in 7cb5d14, so byte payloads no longer touch config at all.
There was a problem hiding this comment.
✅ Customized review instruction saved!
Instruction:
Do not suggest making CSV_EXPORT optional in CsvResponse; it is required Superset config, so keep direct access and fail loudly on broken config.
Applied to:
superset/views/base.py
💡 To manage or update this instruction, visit: CodeAnt AI Settings
|
The flagged issue is correct. Accessing Here is the corrected implementation for def __init__(self, response: Any = None, *args: Any, **kwargs: Any) -> None:
csv_export = app.config.get("CSV_EXPORT", {})
encoding = csv_export.get("encoding", "utf-8")
if isinstance(response, str):
response = response.encode(encoding)
super().__init__(response, *args, **kwargs)There are no other review comments in this pull request. Would you like me to proceed with any other changes? superset/views/base.py |
|
CSV_EXPORT is defined unconditionally in superset/config.py and direct subscript is the existing pattern everywhere else it is read (charts/data/api.py, viz.py, commands/sql_lab/export.py), so a missing key is not a reachable state in a Superset app and I would rather stay consistent with that. What I did take from the review: the config read now happens only inside the str branch (7cb5d14), so byte payloads pass through without touching config. |
Code Review Agent Run #50cf1aActionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
There was a problem hiding this comment.
Pull request overview
Fixes CSV download encoding regression under Werkzeug 3 by explicitly applying CSV_EXPORT["encoding"] when CsvResponse is constructed with a str body, restoring the expected BOM behavior for the default utf-8-sig configuration (notably for Excel compatibility).
Changes:
- Encode
strpayloads insideCsvResponse.__init__using the configuredCSV_EXPORT["encoding"], while leavingbytespayloads unchanged. - Add unit tests to verify configured encoding is applied for
strbodies and thatbytesbodies pass through untouched.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
superset/views/base.py |
Updates CsvResponse to encode str bodies using CSV_EXPORT["encoding"] (Werkzeug 3 compatible). |
tests/unit_tests/views/test_base.py |
Adds regression tests covering encoding application for str payloads and passthrough for bytes. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #40801 +/- ##
==========================================
- Coverage 64.07% 64.07% -0.01%
==========================================
Files 2664 2664
Lines 143831 143835 +4
Branches 33084 33085 +1
==========================================
+ Hits 92160 92161 +1
- Misses 50062 50064 +2
- Partials 1609 1610 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
SUMMARY
CSV_EXPORT["encoding"]currently has no effect on CSV downloads.CsvResponsesets it through theResponse.charsetclass attribute, but Werkzeug deprecatedcharsetin 2.3 and removed it in 3.0 (changelog). Since Superset pinswerkzeug==3.1.6, the attribute is dead code and str bodies are always encoded as plain utf-8, whatever the config says.The painful part: the default config is
CSV_EXPORT = {"encoding": "utf-8-sig"}, which promises a BOM that never arrives. Without it, Excel decodes exports with the local Windows codepage and garbles every non-ASCII character. #36374 (Chinese) and #29410 (Arabic) are this bug; we hit it with Turkish data in emailed report CSVs on a production 4.1.4 install, where it took a while to track down because the multi-query ZIP export path encodes manually (query_data.encode(encoding)incharts/data/api.py) and was never broken. Only single-query exports are affected.This PR encodes str bodies in
CsvResponse.__init__, reading the encoding at request time. Bytes bodies pass through unchanged, so the ZIP path keeps working as before. #29506 tried to solve this by pinning Werkzeug back to 2.x and was closed; fixing the response class keeps Werkzeug 3.First bytes of
/api/v1/chart/<id>/data/?format=csvwithutf-8-sigconfigured, before:after:
Fixes #36374
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A, byte-level change (hex dumps above).
TESTING INSTRUCTIONS
CSV_EXPORTat its default.EF BB BF(xxd file.csv | head -1) and Excel opens it with correct characters.pytest tests/unit_tests/views/test_base.py -k csv_responseADDITIONAL INFORMATION