Bug description
Bug description
In Superset 6.1.0, the new streaming CSV export pipeline introduced by #35478 ("feat(streaming): Streaming CSV uploads for over 100k records for constant memory usage") bypasses Superset's standard query-preparation pipeline. This produces two distinct regressions, both reproducible against Trino.
Bug 1 — CSV exports crash on Trino with __STREAM_ERROR__
The streaming path in superset/commands/streaming_export/base.py::_execute_query_and_stream sends raw chart SQL directly to engine.execute(text(sql)) without running it through database.mutate_sql_based_on_config() first. The SQL Superset generates for a chart ends with a LIMIT N; line — and Trino's HTTP statement endpoint rejects trailing semicolons as mismatched input ';'. Expecting: <EOF>.
Because the streaming response has already flushed headers by the time the exception fires, Flask cannot change the status code. The generator instead writes the sentinel string __STREAM_ERROR__: Export failed. Please try again in some time. (63 bytes) into the response body and closes the stream. The user receives an HTTP 200 with that text inside what should have been their CSV file. The frontend has no way to distinguish this from a successful download.
Bug 2 — User impersonation is bypassed
On databases configured with impersonate_user: true (Trino, Presto, etc.), every other Superset execution site acquires the engine via database.get_sqla_engine_with_context(user_name=…) so the end user's identity is forwarded as the X-Trino-User header. The streaming export path acquires its engine without this context and runs every query as the service principal.
Consequences:
- Audit trail broken — every CSV export, from every user, shows up in the Trino query log as the service account.
- Resource-group routing broken — exports no longer route to the user's configured Trino resource group.
- Possible authorization bypass — engines that key per-user authz off
X-Trino-User (Ranger, OPA, file-based ACLs, row/column-level security via session-aware views) will see the service account on the streaming path. A Superset user may be able to export data via "Download CSV" that they are not permitted to read via SQL Lab.
Bug 1 is the visible crash. Bug 2 is independently reproducible — even with bug 1 patched, every CSV in the Trino query log is misattributed.
The non-streaming export paths (Excel export, SQL Lab, /api/v1/chart/data JSON renders) are unaffected because they go through the proper pipeline.
How to reproduce the bug
- Connect Superset 6.1.0 to a Trino cluster with
impersonate_user: true.
- Create a dashboard tile or standalone chart backed by a Trino dataset.
- As any logged-in OAuth user (not the service principal), click
… → Download → Export to CSV.
- Open the downloaded file.
- Open the Trino UI / query history and locate the corresponding query.
Expected
- The CSV contains the chart's data.
- The Trino query record shows
User: <logged-in user>, the user's normal resource group, and the database's default schema.
Actual
- The downloaded file is 63 bytes and contains only:
__STREAM_ERROR__: Export failed. Please try again in some time.
- The Trino query record shows:
Error Type: USER_ERROR
Error Code: SYNTAX_ERROR (1)
Message: line N:13: mismatched input ';'. Expecting: <EOF>
User: <service principal> (not the end user)
Resource Group: n/a
Schema: <empty>
Performing the same action with Export to Excel instead of Export to CSV works correctly and shows the end user, the right resource group, the default schema, and a sqlglot-reformatted SQL body.
Side-by-side evidence
Same chart, same user, two consecutive export attempts seconds apart.
Failing CSV export — streaming path
User: superset
Principal: superset
Source: Apache Superset
Catalog: my_catalog
Schema: (empty)
Resource Group: n/a
Status: USER_ERROR / SYNTAX_ERROR
SQL (last line): LIMIT 500000;
SQL form: raw, lowercase keywords, DATE '2026-05-20'
Succeeding Excel export — non-streaming path
User: analyst@example.com <-- end user via X-Trino-User
Principal: superset
Source: Apache Superset
Catalog: my_catalog
Schema: my_schema
Resource Group: analysts
Status: FINISHED
SQL (last line): LIMIT 500000
SQL form: uppercased keywords, CAST('2026-05-20' AS DATE)
Both SQL strings are derived from the same chart definition. The differences (trailing ;, missing sqlglot reformat, missing schema context, missing user impersonation) are all consequences of the streaming path skipping mutate_sql_based_on_config() and get_sqla_engine_with_context(user_name=…).
Minimal SQL illustrating the difference
What the streaming CSV path sends to Trino (fails):
SELECT category AS category, region AS region, sum(amount) AS "SUM(amount)"
FROM (select date, order_id, region, amount, category
from my_catalog.my_schema.orders) AS virtual_table
WHERE date >= DATE '2026-05-20' AND date < DATE '2026-05-27'
AND amount > 100 AND region IS NOT NULL
GROUP BY category, region
ORDER BY "SUM(amount)" DESC
LIMIT 500000;
What the non-streaming Excel path sends to Trino (works):
SELECT
category AS category,
region AS region,
SUM(amount) AS "SUM(amount)"
FROM (
SELECT date, order_id, region, amount, category
FROM my_catalog.my_schema.orders
) AS virtual_table
WHERE
date >= CAST('2026-05-20' AS DATE)
AND date < CAST('2026-05-27' AS DATE)
AND amount > 100
AND NOT region IS NULL
GROUP BY category, region
ORDER BY "SUM(amount)" DESC
LIMIT 500000
Stack trace
ERROR:superset.commands.streaming_export.base:Traceback: Traceback (most recent call last):
File ".../sqlalchemy/engine/base.py", line 1910, in _execute_context
self.dialect.do_execute(
File ".../trino/sqlalchemy/dialect.py", line 442, in do_execute
cursor.execute(statement, parameters)
File ".../trino/dbapi.py", line 640, in execute
self._iterator = iter(self._query.execute())
File ".../trino/client.py", line 938, in execute
self._result.rows += self.fetch()
File ".../trino/client.py", line 958, in fetch
status = self._request.process(response)
File ".../trino/client.py", line 727, in process
raise self._process_error(response["error"], response.get("id"))
trino.exceptions.TrinoUserError: TrinoUserError(type=USER_ERROR, name=SYNTAX_ERROR,
message="line 24:13: mismatched input ';'. Expecting: <EOF>", query_id=...)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/app/superset/commands/streaming_export/base.py", line 225, in csv_generator
yield from self._execute_query_and_stream(sql, database, limit)
File "/app/superset/commands/streaming_export/base.py", line 168, in _execute_query_and_stream
).execute(text(sql))
...
sqlalchemy.exc.ProgrammingError: (trino.exceptions.TrinoUserError) TrinoUserError(
type=USER_ERROR, name=SYNTAX_ERROR,
message="line 24:13: mismatched input ';'. Expecting: <EOF>", query_id=...)
Trino-side parser stack (from the corresponding query in the Trino UI):
io.trino.sql.parser.ParsingException: line 24:13: mismatched input ';'. Expecting: <EOF>
at io.trino.sql.parser.ErrorHandler.syntaxError(ErrorHandler.java:108)
...
at io.trino.dispatcher.DispatchManager.createQueryInternal(DispatchManager.java:225)
Environment
- Superset version: 6.1.0
- Database engine: Trino 480 (
trino-python-client via SQLAlchemy)
- DB connection setting:
impersonate_user: true
- Python: 3.10
- Deployment: Helm chart on Kubernetes
- Auth: OAuth2
Severity
I'd argue release-blocker class for two reasons:
- Functional: every dashboard/chart CSV export against Trino or Presto in 6.1.0 is broken, with no in-UI signal of failure (HTTP 200 + sentinel text inside the file).
- Security: missing impersonation may silently bypass per-user authorization on deployments that key Trino authz off
X-Trino-User. Any deployment using Ranger / OPA / file-based ACLs / RLS views with Superset + Trino should validate before upgrading.
Screenshots/recordings
No response
Superset version
master / latest-dev
Python version
3.10
Node version
I don't know
Browser
Chrome
Additional context
No response
Checklist
Bug description
Bug description
In Superset 6.1.0, the new streaming CSV export pipeline introduced by #35478 ("feat(streaming): Streaming CSV uploads for over 100k records for constant memory usage") bypasses Superset's standard query-preparation pipeline. This produces two distinct regressions, both reproducible against Trino.
Bug 1 — CSV exports crash on Trino with
__STREAM_ERROR__The streaming path in
superset/commands/streaming_export/base.py::_execute_query_and_streamsends raw chart SQL directly toengine.execute(text(sql))without running it throughdatabase.mutate_sql_based_on_config()first. The SQL Superset generates for a chart ends with aLIMIT N;line — and Trino's HTTP statement endpoint rejects trailing semicolons asmismatched input ';'. Expecting: <EOF>.Because the streaming response has already flushed headers by the time the exception fires, Flask cannot change the status code. The generator instead writes the sentinel string
__STREAM_ERROR__: Export failed. Please try again in some time.(63 bytes) into the response body and closes the stream. The user receives an HTTP 200 with that text inside what should have been their CSV file. The frontend has no way to distinguish this from a successful download.Bug 2 — User impersonation is bypassed
On databases configured with
impersonate_user: true(Trino, Presto, etc.), every other Superset execution site acquires the engine viadatabase.get_sqla_engine_with_context(user_name=…)so the end user's identity is forwarded as theX-Trino-Userheader. The streaming export path acquires its engine without this context and runs every query as the service principal.Consequences:
X-Trino-User(Ranger, OPA, file-based ACLs, row/column-level security via session-aware views) will see the service account on the streaming path. A Superset user may be able to export data via "Download CSV" that they are not permitted to read via SQL Lab.Bug 1 is the visible crash. Bug 2 is independently reproducible — even with bug 1 patched, every CSV in the Trino query log is misattributed.
The non-streaming export paths (Excel export, SQL Lab,
/api/v1/chart/dataJSON renders) are unaffected because they go through the proper pipeline.How to reproduce the bug
impersonate_user: true.…→Download→Export to CSV.Expected
User: <logged-in user>, the user's normal resource group, and the database's default schema.Actual
Error Type: USER_ERRORError Code: SYNTAX_ERROR (1)Message: line N:13: mismatched input ';'. Expecting: <EOF>User: <service principal>(not the end user)Resource Group: n/aSchema: <empty>Performing the same action with
Export to Excelinstead ofExport to CSVworks correctly and shows the end user, the right resource group, the default schema, and a sqlglot-reformatted SQL body.Side-by-side evidence
Same chart, same user, two consecutive export attempts seconds apart.
Failing CSV export — streaming path
Succeeding Excel export — non-streaming path
Both SQL strings are derived from the same chart definition. The differences (trailing
;, missing sqlglot reformat, missing schema context, missing user impersonation) are all consequences of the streaming path skippingmutate_sql_based_on_config()andget_sqla_engine_with_context(user_name=…).Minimal SQL illustrating the difference
What the streaming CSV path sends to Trino (fails):
What the non-streaming Excel path sends to Trino (works):
Stack trace
Trino-side parser stack (from the corresponding query in the Trino UI):
Environment
trino-python-clientvia SQLAlchemy)impersonate_user: trueSeverity
I'd argue release-blocker class for two reasons:
X-Trino-User. Any deployment using Ranger / OPA / file-based ACLs / RLS views with Superset + Trino should validate before upgrading.Screenshots/recordings
No response
Superset version
master / latest-dev
Python version
3.10
Node version
I don't know
Browser
Chrome
Additional context
No response
Checklist