fix(sql): omit connection URL from read_sql error messages and __repr__#6933
Conversation
…sswords
A customer hit two cases the original redaction missed:
1. Unescaped URL-special characters in a password (e.g. `#`, `/`, `?`).
`urllib.parse.urlparse("trino://alice:p#ss@host:443/db")` puts the rest
of the URL into `fragment`, so `parsed.password` is None and the
redactor returned the URL unchanged, leaking `p#ss`.
2. Credentials passed as query parameters rather than userinfo (Trino JWT
`?auth=jwt&access_token=...`, Snowflake `?private_key_passphrase=...`,
Databricks PAT, generic API keys). The old code only redacted
`user:password@host` and left query-param secrets in plaintext.
Switch the primary parser to SQLAlchemy's `make_url`, which handles
case 1 robustly (it accepts unescaped URL-special chars in passwords),
and additionally redact known sensitive query parameter values
(name contains `password`, `pwd`, `secret`, `token`, `passphrase`,
`apikey`/`api_key`, or `credential`, case-insensitive) before rendering.
Keep the urllib.parse path as a fallback for environments without
SQLAlchemy (connectorx-only); the fallback also rewrites sensitive query
params and over-redacts to `<redacted>` when an `@` appears in the
authority but urlparse can't extract a password.
Regression tests cover all reported leak paths: passwords with `#`/`/`/
`?`, all common sensitive query-param names, the customer's exact JWT
URL shape, and case-insensitive matching. Verified end-to-end with a
real `daft.read_sql(...)` call: the customer's `access_token` value no
longer appears in the raised `RuntimeError`.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Rust Dependency DiffHead: ✅ OK: Within budget.
|
Greptile SummaryThis PR closes credential-leak vectors in
Confidence Score: 5/5Safe to merge; the fix correctly eliminates all explicit URL echoing from error messages and repr, and no existing behavior is regressed. The core change is removing the connection URL from error messages and repr entirely, which is the simplest and most robust mitigation. Both execution paths are covered, and no sensitive data is now added explicitly to any error string. tests/io/test_sql.py — the Trino JWT parametrize case shares a psycopg2 skip guard it does not actually require Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[execute_sql_query called] --> B{_should_use_connectorx?}
B -- "Yes" --> C[_execute_sql_query_with_connectorx]
B -- "No" --> D[_execute_sql_query_with_sqlalchemy]
C -- success --> E[Return pa.Table]
C -- exception --> F["RuntimeError: Failed to execute sql, error — URL omitted"]
D -- success --> E
D -- exception --> F
style F fill:#fdd,stroke:#900
Reviews (4): Last reviewed commit: "fix(sql): also drop URL from SQLConnecti..." | Re-trigger Greptile |
|
@greptile-apps re-review |
…e load Addresses Greptile review on #6933: 1. Fallback over-redaction was firing on legitimate no-password URLs like `mysql://user@host/db` when SQLAlchemy was unavailable, diverging from the primary SA path. Gate the heuristic on `parsed.username is None` — the actual signal that the URL's userinfo wasn't parseable (e.g. unescaped special chars push the `@` into the fragment so urlparse can't extract the username). Test `test_redact_url_fallback_when_sqlalchemy_unavailable` pins both branches. 2. Move `from sqlalchemy.engine import make_url` to a module-level try/except. `_redact_url` runs on every error message and uses SA opportunistically; module-level avoids per-call import lookup. The other inline SA imports in this file stay inline — they sit in methods that conditionally take the SA code path, so they pay nothing in connectorx-only environments. 3. Strip `%2A%2A%2A` percent-encoding from the fallback path's query-rebuild output, matching the primary path so the placeholder stays as readable `***` in error messages. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #6933 +/- ##
==========================================
+ Coverage 75.70% 76.05% +0.34%
==========================================
Files 1138 1138
Lines 161543 161511 -32
==========================================
+ Hits 122304 122835 +531
+ Misses 39239 38676 -563
🚀 New features to boost your workflow:
|
…ckstop Secrets can appear anywhere in a SQL connection URL — userinfo password, query parameters (Trino JWT, Snowflake key auth, Databricks PAT), JDBC `;property=value` strings, driver extras. Per-form redaction is fragile because every new shape is a new leak waiting to happen. Drop the URL from `Failed to execute sql: ...` error messages entirely; the caller knows which connection they passed in, so the URL is redundant. Also add a Spark-style text-level redaction backstop, applied to the full constructed error message before raising. Catches credentials that might appear in the wrapped underlying-exception text (e.g. a driver echoing back a connection string with `password=...` in its own error). Regex matches `name<sep>value` where the name contains any of: password, pwd, secret, token, passphrase, apikey/api_key, credential, signature, access_key. Mirrors Apache Spark's `spark.redaction.regex` design. `_redact_url` is kept (it still gates `SQLConnection.__repr__`); the URL-shape-specific redaction tests stay valid for that. The error-path tests now assert that *no* form of the URL appears in the raised RuntimeError. End-to-end verified against the customer's exact URL shape (`trino://alice@host?auth=jwt&access_token=<JWT>&...`): the `access_token` value no longer appears in the raised RuntimeError. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@greptile-apps re-review |
Following review on #6933: the primary leak vector was our own error message echoing the connection URL. Dropping the URL from the message closes that case completely; the additional `_redact_text` Spark-style regex was defense-in-depth for hypothetical driver-side leaks that we haven't actually observed, and adds ~50 lines plus its own false-positive surface. Simpler is better here. Also drop the now-unused `# type: ignore[assignment]` on the optional SQLAlchemy import — mypy in CI flagged it as unused. `_redact_url` is unchanged (still gates `SQLConnection.__repr__`) and its tests (special chars in passwords, sensitive query params, SQLAlchemy-unavailable fallback) remain. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Following the same logic as the error-message change: if echoing the
connection URL is unsafe because secrets can appear anywhere in it,
the same applies to `__repr__` — any caller that does `f"{conn}"` or
logs `SQLConnection` would leak. Show only dialect/driver instead.
This removes the only remaining consumer of `_redact_url`, so the
helper, the `_SENSITIVE_PARAM_KEYWORDS` list, `_is_sensitive_param`,
the SQLAlchemy `make_url` import, the urllib.parse fallback, and
their tests all go away. Net diff is strongly negative.
Tests now assert the actual contract: the raised RuntimeError and
`repr(conn)` must not contain any secret value — userinfo password,
URL-special-char password, or `access_token` query parameter
(customer-reported case).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@greptile-apps re-review |
| # mitigation. The caller knows which connection they passed in, | ||
| # so the URL is redundant here. |
There was a problem hiding this comment.
Yep this way my thinking, "The caller knows which connection they passed in" -- if they want to debug further then the responsible handling is on them. We aren't going to let the cat out of the bag on their behalf. Nice catch on the repr too
|
Want your agent to iterate on Greptile's feedback? Try greploops. |
Why
A user reported credentials still leaking in
daft.read_sqlerrors on v0.7.11 even after the original redactor in #6902. The URL shape from their stack trace:There's no
user:password@hosthere — the secret lives inaccess_token=as a query parameter. The previous fix only redacted userinfo passwords, so the JWT was emitted in plaintext in theFailed to execute sql: ... from connection: ...error.Separately,
urllib.parse.urlparsesilently mis-parses passwords containing#///?(it pushes them into the fragment field), soparsed.passwordcame backNoneand the old_redact_urlreturned the URL unchanged for inputs liketrino://alice:p#ss@host/db.I initially tried to plug both holes inside
_redact_url(sqlalchemymake_urlas the parser, plus query-parameter redaction by sensitive-name substring). Reviewers (rchowell) pushed back: "secrets appear in many places around the connection string and are not limited to query params — we probably shouldn't log it". Agreed. Trying to enumerate every shape a credential can take in a connection URL is a losing game.What this PR actually does
Stop echoing the connection URL in error messages and in
SQLConnection.__repr__. That's it.RuntimeError(f\"Failed to execute sql: {sql} from connection: {self.conn}, error: {e}\")→RuntimeError(f\"Failed to execute sql: {sql}, error: {e}\"). The caller knows which connection they passed in; the URL is redundant in the message.SQLConnection.__repr__now returnsSQLConnection(dialect='trino', driver='')instead of the URL.Removed alongside:
_redact_url,_SENSITIVE_PARAM_KEYWORDS,_is_sensitive_param, the opportunistic SQLAlchemymake_urlimport, and the urllib.parse fallback with its@-in-authority heuristic — none of it is needed once the URL itself is not echoed.After the fix
No URL, no userinfo, no query parameters.
Verified end-to-end
Tested locally against the customer's exact URL shape plus two regressions:
?access_token=<JWT>&auth=jwt)alice:hunter2@...)hunter2#(alice:p#ss@...)p#ssTests
tests/io/test_sql.py:test_execute_sql_error_does_not_leak_credentials— parametrized across the three leak shapes; asserts the secret is not in the raisedRuntimeError.test_repr_does_not_leak_url— assertsrepr(conn)contains neither the secret, the host, nor the username.Related
Follow-up to #6902 / issue #6903. Customer-reported continued leak after v0.7.11.
🤖 Generated with Claude Code