fix(data-warehouse): redact API key credentials from request logs#60599
Merged
Conversation
This was referenced May 29, 2026
Member
Author
This stack of pull requests is managed by Graphite. Learn more about stacking. |
2 tasks
ebbe3a6 to
a2799e9
Compare
Contributor
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
posthog/temporal/data_imports/sources/common/http/transport.py:86
The `getattr` fallback here is superfluous — `_redact_values` is always assigned in `__init__`, so the default `()` can never be reached. Direct attribute access is clearer and avoids the implication that the attribute might be absent.
```suggestion
redact_values=self._redact_values,
```
### Issue 2 of 2
posthog/temporal/data_imports/sources/common/rest_source/tests/test_rest_client.py:114-128
These two tests cover the same code path with different inputs and assertions. The project prefers parametrized tests — collapsing them removes the repeated `@patch` boilerplate and makes it easy to add further auth types in future without writing another method.
```suggestion
@pytest.mark.parametrize(
"auth,expected_redact_values",
[
(APIKeyAuth(api_key="sk_live_x", name="key", location="query"), ("sk_live_x",)),
(None, ()),
],
)
@patch("posthog.temporal.data_imports.sources.common.rest_source.rest_client.make_tracked_session")
def test_builds_session_with_credentials_for_redaction(self, MockSession, auth, expected_redact_values) -> None:
MockSession.return_value.headers = {}
RESTClient(base_url="https://api.example.com", auth=auth)
assert MockSession.call_args.kwargs["redact_values"] == expected_redact_values
```
Reviews (1): Last reviewed commit: "fix(data-warehouse): redact API key cred..." | Re-trigger Greptile |
a2799e9 to
0c1f767
Compare
Gilbert09
approved these changes
May 29, 2026
733ce31 to
d09aaeb
Compare
0c1f767 to
8ffcc29
Compare
8ffcc29 to
9657f73
Compare
d09aaeb to
25e5440
Compare
9657f73 to
fc6fb7e
Compare
25e5440 to
e10173f
Compare
fc6fb7e to
e3dcf03
Compare
MarconLP
added a commit
that referenced
this pull request
Jun 1, 2026
#60609) ## Problem The Custom REST source stores auth credentials as write-only secrets (redacted from API responses, preserved across edits) while the `manifest_json` is fully user-authored and editable. Because a manifest can target any host — via `client.base_url` or an absolute resource `endpoint.path` — an editor who **cannot read** the stored credential could repoint the source at a server they control and the platform would send the preserved secret there on the next probe/sync. That's a confused-deputy credential-exfiltration path. This mirrors **VERIA-311** (the SSH-tunnel retarget guard already present for DB sources) and was raised in review of the custom source stack. ## Changes Applies the existing DB-source pattern (`ExternalDataSourceSerializers.update()` → "changing the connection host requires re-entering your credentials") to the custom source, whose connection target lives inside the manifest rather than a top-level `host` field. - `manifest_request_hosts(manifest_json)` — returns the set of hostnames a manifest will contact (`base_url` + any absolute resource-path hosts). - On update, if the incoming manifest **introduces a host the source wasn't already contacting** (`new_hosts - existing_hosts` non-empty), the update is folded into the existing `missing_credentials` re-entry gate: the editor must re-supply the secret. Removing a host or editing same-host paths does **not** require re-entry. Cross-host resource URLs remain supported (consistent with Airbyte's low-code requester, which allows absolute/cross-host paths); the retarget guard + the existing SSRF host-safety check are what keep that safe, rather than forbidding it. ## How did you test this code? I'm an agent; no manual testing. Added and ran automated tests (all passing, against a local Postgres + ClickHouse): - `manifest_request_hosts` unit tests — base host, absolute resource hosts added, lowercasing, unparseable/`None`/array → empty set. - Serializer integration tests on `update()`: - manifest host change **without** re-supplied credentials → `400`, "re-entering your credentials", manifest unchanged. - manifest host change **with** credentials re-supplied → `200`. - same-host manifest edit (path change) → `200`, credential preserved, no re-entry forced. ## 🤖 Agent context Authored by Claude Code (claude-opus-4-8) while hardening the custom source from review feedback. Tools: Read/Edit/Bash/Grep, plus Explore subagents comparing PostHog's DB-source `update()` guard and Airbyte's low-code requester behavior. Decision trail: the reviewer flagged that POST + stored credentials lets an editor drive state-changing/retargeted requests with a secret they can't read. Airbyte allows GET/POST and cross-host absolute URLs with no host restriction, so we kept POST and cross-host paths (matching the de-facto standard) and instead closed the *exfiltration* half — the more serious one — by reusing PostHog's own DB-source "re-enter credentials when the connection host changes" pattern (VERIA-311). The same-host POST-mutation residual is accepted for the pilot. Considered pulling `base_url` into a separate top-level field to reuse the existing `host`-keyed check verbatim, but kept it in the manifest to preserve the "paste a RESTAPIConfig" model and to cover absolute resource-path hosts too. Stacked on #60599.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.

Problem
The Custom REST source lets the manifest configure API-key auth injected into a query parameter (or a custom header/cookie) under an arbitrary, user-chosen name. The tracked HTTP transport only redacts credentials from logs and captured request samples by matching against a fixed denylist of parameter/header names (
url_utils._REDACT_PARAM_NAMES,sampling._AUTH_HEADER_NAMES). A manifest naming its key something off the denylist — e.g.subscription-key— would write the upstream API key in plaintext intodata_imports.http.requestlog lines and into sampled requests in object storage.Flagged in review of #59612.
Changes
Adds value-based redaction on top of the existing name-based scrubbers, mirroring how Airbyte masks
airbyte_secretconfig values: we know the exact credential string, so we mask it wherever it appears regardless of injection location (query, header, cookie, body) or parameter name.url_utils.redact_literal_values()— replaces a known secret string and its URL-encoded forms withREDACTED. Skips empty/very short values so unrelated log text isn't mangled.auth.auth_secret_values()— extracts the credential strings (token/api_key/password) carried by an auth object.make_tracked_session→TrackedHTTPAdapter→record_request, where they're applied to the logged URL and forwarded into the captured sample.RESTClientderives the values from itsauthobject — so this protects every REST-based source, not just the custom one.validate_credentialsprobe registers the same values.This is purely additive —
redact_valuesdefaults to empty, so behaviour is unchanged for callers that don't pass credentials.How did you test this code?
I'm an agent; I did not perform manual testing. Added and ran automated tests (all passing):
redact_literal_values— raw value, percent-encoded form, header/payload string, multiple secrets, short/empty skip, no-op when empty.auth_secret_values— extraction across bearer / api_key / http_basic, and empty when no credential set.redact_valuestorecord_request(and defaults to empty).RESTClient— builds its tracked session with credentials derived fromauth.subscription-key) is masked in the logged URL while unrelated params survive.🤖 Agent context
Authored by Claude Code (claude-opus-4-8) while working through review feedback on #59612. Tools: Read/Edit/Bash/Grep, plus two parallel Explore subagents to study how Airbyte handles credential masking and descending-order incremental resume.
Key decision: the reviewer offered "constrain the param name to the denylist" vs "thread the configured name into the scrubber." The Airbyte exploration showed the robust industry approach is value-based masking (mask the known secret value everywhere) rather than name-based, so we chose that — it covers any injection location and any name without restricting what users can name their auth param. Deriving the values from the
authobject insideRESTClient(rather than threading them per-source) means the protection is automatic for all sources.Stacked on #59612.