Skip to content

fix(sql): omit connection URL from read_sql error messages and __repr__#6933

Merged
madvart merged 5 commits into
mainfrom
varun/redact-creds-special-chars
May 13, 2026
Merged

fix(sql): omit connection URL from read_sql error messages and __repr__#6933
madvart merged 5 commits into
mainfrom
varun/redact-creds-special-chars

Conversation

@madvart
Copy link
Copy Markdown
Contributor

@madvart madvart commented May 13, 2026

Why

A user reported credentials still leaking in daft.read_sql errors on v0.7.11 even after the original redactor in #6902. The URL shape from their stack trace:

trino://<USER>@<HOST>:<PORT>?auth=jwt&access_token=<SECRET>&http_scheme=https

There's no user:password@host here — the secret lives in access_token= as a query parameter. The previous fix only redacted userinfo passwords, so the JWT was emitted in plaintext in the Failed to execute sql: ... from connection: ... error.

Separately, urllib.parse.urlparse silently mis-parses passwords containing # / / / ? (it pushes them into the fragment field), so parsed.password came back None and the old _redact_url returned the URL unchanged for inputs like trino://alice:p#ss@host/db.

I initially tried to plug both holes inside _redact_url (sqlalchemy make_url as the parser, plus query-parameter redaction by sensitive-name substring). Reviewers (rchowell) pushed back: "secrets appear in many places around the connection string and are not limited to query params — we probably shouldn't log it". Agreed. Trying to enumerate every shape a credential can take in a connection URL is a losing game.

What this PR actually does

Stop echoing the connection URL in error messages and in SQLConnection.__repr__. That's it.

  • RuntimeError(f\"Failed to execute sql: {sql} from connection: {self.conn}, error: {e}\")RuntimeError(f\"Failed to execute sql: {sql}, error: {e}\"). The caller knows which connection they passed in; the URL is redundant in the message.
  • SQLConnection.__repr__ now returns SQLConnection(dialect='trino', driver='') instead of the URL.

Removed alongside: _redact_url, _SENSITIVE_PARAM_KEYWORDS, _is_sensitive_param, the opportunistic SQLAlchemy make_url import, and the urllib.parse fallback with its @-in-authority heuristic — none of it is needed once the URL itself is not echoed.

After the fix

Failed to execute sql: SELECT * FROM (SELECT * FROM iceberg.namespace.\"table\") AS subquery LIMIT 10,
  error: (trino.exceptions.TrinoConnectionError) failed to execute: HTTPSConnectionPool(...): ...
[SQL: SELECT * FROM (SELECT * FROM iceberg.namespace.\"table\") AS subquery LIMIT 10]

No URL, no userinfo, no query parameters.

Verified end-to-end

Tested locally against the customer's exact URL shape plus two regressions:

Shape Secret value Present in error?
Trino JWT (?access_token=<JWT>&auth=jwt) the JWT ✅ No
Userinfo password (alice:hunter2@...) hunter2 ✅ No
Password with # (alice:p#ss@...) p#ss ✅ No

Tests

tests/io/test_sql.py:

  • test_execute_sql_error_does_not_leak_credentials — parametrized across the three leak shapes; asserts the secret is not in the raised RuntimeError.
  • test_repr_does_not_leak_url — asserts repr(conn) contains neither the secret, the host, nor the username.

Related

Follow-up to #6902 / issue #6903. Customer-reported continued leak after v0.7.11.

🤖 Generated with Claude Code

…sswords

A customer hit two cases the original redaction missed:

1. Unescaped URL-special characters in a password (e.g. `#`, `/`, `?`).
   `urllib.parse.urlparse("trino://alice:p#ss@host:443/db")` puts the rest
   of the URL into `fragment`, so `parsed.password` is None and the
   redactor returned the URL unchanged, leaking `p#ss`.

2. Credentials passed as query parameters rather than userinfo (Trino JWT
   `?auth=jwt&access_token=...`, Snowflake `?private_key_passphrase=...`,
   Databricks PAT, generic API keys). The old code only redacted
   `user:password@host` and left query-param secrets in plaintext.

Switch the primary parser to SQLAlchemy's `make_url`, which handles
case 1 robustly (it accepts unescaped URL-special chars in passwords),
and additionally redact known sensitive query parameter values
(name contains `password`, `pwd`, `secret`, `token`, `passphrase`,
`apikey`/`api_key`, or `credential`, case-insensitive) before rendering.
Keep the urllib.parse path as a fallback for environments without
SQLAlchemy (connectorx-only); the fallback also rewrites sensitive query
params and over-redacts to `<redacted>` when an `@` appears in the
authority but urlparse can't extract a password.

Regression tests cover all reported leak paths: passwords with `#`/`/`/
`?`, all common sensitive query-param names, the customer's exact JWT
URL shape, and case-insensitive matching. Verified end-to-end with a
real `daft.read_sql(...)` call: the customer's `access_token` value no
longer appears in the raised `RuntimeError`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added the fix label May 13, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Rust Dependency Diff

Head: 41ff844dac8b65abc472e3611ba17750b454e3ab vs Base: f7f8c12dcd4428cf598fa64e8db31e7c3efe816c.

OK: Within budget.

  • New Crates: 0
  • Removed Crates: 0

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 13, 2026

Greptile Summary

This PR closes credential-leak vectors in daft.read_sql errors that the original fix (#6902) missed. Rather than attempting smart field-by-field redaction, the implementation takes the more robust approach of dropping the connection URL entirely from error messages and __repr__, since secrets can appear in userinfo, query parameters, or driver-specific extras.

  • _redact_url is removed; __repr__ now shows only dialect and driver; both _execute_sql_query_with_connectorx and _execute_sql_query_with_sqlalchemy emit Failed to execute sql: {sql}, error: {e} with no URL.
  • Tests are restructured around the new contract: test_execute_sql_error_does_not_leak_credentials (parametrized over password, JWT token, and special-char password cases) and test_repr_does_not_leak_url replace the old redaction unit tests.

Confidence Score: 5/5

Safe to merge; the fix correctly eliminates all explicit URL echoing from error messages and repr, and no existing behavior is regressed.

The core change is removing the connection URL from error messages and repr entirely, which is the simplest and most robust mitigation. Both execution paths are covered, and no sensitive data is now added explicitly to any error string.

tests/io/test_sql.py — the Trino JWT parametrize case shares a psycopg2 skip guard it does not actually require

Important Files Changed

Filename Overview
daft/sql/sql_connection.py Removes _redact_url entirely and drops the connection URL from all error messages and repr; simpler and more robust than field-specific redaction
tests/io/test_sql.py Replaces old redaction unit tests with connection-error leak checks; the Trino JWT parametrize case shares a psycopg2 skip guard it doesn't need

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[execute_sql_query called] --> B{_should_use_connectorx?}
    B -- "Yes" --> C[_execute_sql_query_with_connectorx]
    B -- "No" --> D[_execute_sql_query_with_sqlalchemy]
    C -- success --> E[Return pa.Table]
    C -- exception --> F["RuntimeError: Failed to execute sql, error — URL omitted"]
    D -- success --> E
    D -- exception --> F
    style F fill:#fdd,stroke:#900
Loading

Reviews (4): Last reviewed commit: "fix(sql): also drop URL from SQLConnecti..." | Re-trigger Greptile

Comment thread daft/sql/sql_connection.py Outdated
Comment thread daft/sql/sql_connection.py Outdated
@madvart
Copy link
Copy Markdown
Contributor Author

madvart commented May 13, 2026

@greptile-apps re-review

…e load

Addresses Greptile review on #6933:

1. Fallback over-redaction was firing on legitimate no-password URLs like
   `mysql://user@host/db` when SQLAlchemy was unavailable, diverging from
   the primary SA path. Gate the heuristic on `parsed.username is None` —
   the actual signal that the URL's userinfo wasn't parseable (e.g.
   unescaped special chars push the `@` into the fragment so urlparse
   can't extract the username). Test
   `test_redact_url_fallback_when_sqlalchemy_unavailable` pins both
   branches.

2. Move `from sqlalchemy.engine import make_url` to a module-level
   try/except. `_redact_url` runs on every error message and uses SA
   opportunistically; module-level avoids per-call import lookup. The
   other inline SA imports in this file stay inline — they sit in
   methods that conditionally take the SA code path, so they pay nothing
   in connectorx-only environments.

3. Strip `%2A%2A%2A` percent-encoding from the fallback path's
   query-rebuild output, matching the primary path so the placeholder
   stays as readable `***` in error messages.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented May 13, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.05%. Comparing base (f7f8c12) to head (c28ca58).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #6933      +/-   ##
==========================================
+ Coverage   75.70%   76.05%   +0.34%     
==========================================
  Files        1138     1138              
  Lines      161543   161511      -32     
==========================================
+ Hits       122304   122835     +531     
+ Misses      39239    38676     -563     
Files with missing lines Coverage Δ
daft/sql/sql_connection.py 80.00% <100.00%> (-2.46%) ⬇️

... and 36 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment thread daft/sql/sql_connection.py Outdated
…ckstop

Secrets can appear anywhere in a SQL connection URL — userinfo
password, query parameters (Trino JWT, Snowflake key auth, Databricks
PAT), JDBC `;property=value` strings, driver extras. Per-form
redaction is fragile because every new shape is a new leak waiting
to happen. Drop the URL from `Failed to execute sql: ...` error
messages entirely; the caller knows which connection they passed in,
so the URL is redundant.

Also add a Spark-style text-level redaction backstop, applied to the
full constructed error message before raising. Catches credentials
that might appear in the wrapped underlying-exception text (e.g. a
driver echoing back a connection string with `password=...` in its
own error). Regex matches `name<sep>value` where the name contains
any of: password, pwd, secret, token, passphrase, apikey/api_key,
credential, signature, access_key. Mirrors Apache Spark's
`spark.redaction.regex` design.

`_redact_url` is kept (it still gates `SQLConnection.__repr__`); the
URL-shape-specific redaction tests stay valid for that. The
error-path tests now assert that *no* form of the URL appears in the
raised RuntimeError.

End-to-end verified against the customer's exact URL shape
(`trino://alice@host?auth=jwt&access_token=<JWT>&...`): the
`access_token` value no longer appears in the raised RuntimeError.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@madvart
Copy link
Copy Markdown
Contributor Author

madvart commented May 13, 2026

@greptile-apps re-review

Varun Madan and others added 2 commits May 13, 2026 16:11
Following review on #6933: the primary leak vector was our own error
message echoing the connection URL. Dropping the URL from the message
closes that case completely; the additional `_redact_text` Spark-style
regex was defense-in-depth for hypothetical driver-side leaks that
we haven't actually observed, and adds ~50 lines plus its own
false-positive surface. Simpler is better here.

Also drop the now-unused `# type: ignore[assignment]` on the optional
SQLAlchemy import — mypy in CI flagged it as unused.

`_redact_url` is unchanged (still gates `SQLConnection.__repr__`)
and its tests (special chars in passwords, sensitive query params,
SQLAlchemy-unavailable fallback) remain.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Following the same logic as the error-message change: if echoing the
connection URL is unsafe because secrets can appear anywhere in it,
the same applies to `__repr__` — any caller that does `f"{conn}"` or
logs `SQLConnection` would leak. Show only dialect/driver instead.

This removes the only remaining consumer of `_redact_url`, so the
helper, the `_SENSITIVE_PARAM_KEYWORDS` list, `_is_sensitive_param`,
the SQLAlchemy `make_url` import, the urllib.parse fallback, and
their tests all go away. Net diff is strongly negative.

Tests now assert the actual contract: the raised RuntimeError and
`repr(conn)` must not contain any secret value — userinfo password,
URL-special-char password, or `access_token` query parameter
(customer-reported case).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@madvart
Copy link
Copy Markdown
Contributor Author

madvart commented May 13, 2026

@greptile-apps re-review

Comment on lines +152 to +153
# mitigation. The caller knows which connection they passed in,
# so the URL is redundant here.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this way my thinking, "The caller knows which connection they passed in" -- if they want to debug further then the responsible handling is on them. We aren't going to let the cat out of the bag on their behalf. Nice catch on the repr too

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 13, 2026

Want your agent to iterate on Greptile's feedback? Try greploops.

@madvart madvart changed the title fix(sql): redact sensitive query params and special-char passwords fix(sql): omit connection URL from read_sql error messages and __repr__ May 13, 2026
@madvart madvart merged commit c09136a into main May 13, 2026
40 checks passed
@madvart madvart deleted the varun/redact-creds-special-chars branch May 13, 2026 23:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants