feat(compact): file compression tracking + secret masker (ROADMAP 7.6)#128
feat(compact): file compression tracking + secret masker (ROADMAP 7.6)#128emal-avala merged 4 commits intomainfrom
Conversation
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Self-review summaryRan a deep review per the request. Found and fixed one critical bug before merge. 🔴 Critical: credential regex corrupted session JSON (fixed in 9b48356)The credential pattern used Reproducer (added as regression tests in
Fix: split the credential pattern into two variants — quoted (requires both open + close delimiter) and unquoted (value char class excludes Added two regression tests that round-trip masked session JSON through ✅ Other checks that came back clean
Test coverage
Verdict: safe to merge after 9b48356 lands. |
Adds the core primitives for advanced history compression: - `services::secret_masker` — shared regex set for AWS/GitHub/OpenAI keys, PEM private keys, and generic credential assignments. Applied at every persistence boundary: compaction LLM prompt, session.rs disk writes, output_store.rs large tool results. - `FileCompressionRecord`, `CompressionLevel`, `FileCompressionState` in services::compact. Tracks per-file fidelity (Full / Partial / Summary / Excluded), 12-byte SHA256 content hash for change detection, protected turn window so recently-read files are locked at Full, and save/load to ~/.cache/agent-code/sessions/<id>.compression.json. - hash_content() SHA256 helper; sha2 = "0.10" added to crates/lib. Not yet wired: protected-window check into the auto-compact path, query-loop integration that populates FileCompressionState on every tool read, and the /compression slash command. These require touching the query loop and are left as follow-ups per ROADMAP 7.6 checklist. Tests: 16 new unit tests (10 secret_masker, 6 compact tracking). All services tests pass.
…ries Adds 8 integration tests that verify the masker is actually invoked at each persistence site (not just that the masker module works in isolation): - compact.rs: build_compact_summary_prompt masks AWS keys and GitHub PATs from user AND assistant messages before the summary prompt is built. - session.rs: serialize_masked (new pub(crate) helper extracted from save_session_full) redacts secrets from the serialized JSON, preserves innocuous code, and keeps non-secret metadata intact. - output_store.rs: persist_if_large_in (new pub(crate) variant that takes a store directory) writes masked content to disk while keeping the in-memory preview unmasked for agent use. Covers both the specific AWS-key regex and the generic credential rule. Also formats secret_masker.rs with cargo fmt (rustfmt 1.8.0 prefers a different wrapping for the inline replace_all call).
The credential regex used a single `["']?` at both ends to optionally
match quoted values. This was greedy and could consume a stray
trailing quote that belonged to the surrounding string literal —
catastrophically breaking session JSON when a message contained an
unquoted inner secret like `api_key=hunter2hunter2`.
Reproducer:
Input (JSON fragment): "text": "api_key=hunter2hunter2"
Buggy output: "text": "api_key=[REDACTED:credential]
^ closing quote eaten
Result: serde_json::from_str fails with "control character
(\u0000-\u001F) found while parsing a string" — every `/resume`
of a session saved with such a message would fail.
Fix: split the credential pattern into two — quoted (requires both
open and close delimiter) and unquoted (value char class excludes
`"` and `'`, so matching naturally stops at any surrounding string
delimiter). Neither variant can eat a stray closing quote.
Tests:
- Two new regression tests in session.rs that assert the masked
session JSON round-trips through serde_json::from_str for a
variety of secret shapes (unquoted assignments, mixed quoting,
URL-embedded credentials).
9b48356 to
62bd27a
Compare
…-escape support
Deep review surfaced two more gaps; both are fixed and covered by
new regression tests.
### Gaps found
1. **Quoted credential pattern missed JSON-escaped forms.**
Tool output often contains config-file fragments like
`api_key = "xxx"`. When a message carrying that text is serialized
into the session JSON, the inner `"` become `\"`. The quoted
pattern required literal `"` on both sides, so these escaped pairs
were not masked. Fix: allow an optional leading `\` before each
quote (`\\?"..."` / `\\?'...'`). Still never consumes a stray
surrounding JSON delimiter — the bound anchors are paired.
2. **URL-embedded credentials leaked entirely.**
`postgres://user:hunter2hunter2@host/db`, `redis://:pw@host`, etc.
never matched any pattern. Added a dedicated `url_credential`
rule covering postgres/postgresql/mysql/mariadb/redis/rediss/
mongodb(+srv)/amqp/amqps/mqtt/mqtts/smtp/smtps/sftp/ssh/ldap/
ldaps/http/https. Password char class excludes `@`, whitespace,
`"`, `'`, and `\` so it stops at the URL boundary and string
delimiters. Scheme and username are preserved for debugging.
### New tests (27 added, all green)
- secret_masker (+11):
- `masks_single_quoted_credential`
- `masks_mixed_quoted_and_unquoted_in_one_input`
- `does_not_consume_surrounding_json_quote` (direct regression)
- `does_not_mask_json_key_form` (structural invariant)
- `empty_input_does_not_panic`
- `strengthened_idempotency_across_split_pattern`
- `masks_uppercase_env_var_style`
- `masks_url_embedded_password_postgres`
- `masks_url_embedded_password_redis_without_user`
- `masks_url_embedded_password_inside_json_escape`
- `does_not_mask_url_without_password`
- session (+3):
- `serialize_masked_redacts_secret_in_tool_result_block` — covers
ContentBlock::ToolResult which isn't reached by the compact
summary prompt path.
- `serialize_masked_handles_many_messages_with_mixed_secrets` —
4-message stress with AWS key, URL password, unquoted token,
quoted api_key. Verifies all are redacted and the JSON still
round-trips.
- `serialize_masked_is_idempotent_save_load_save` — re-saving a
loaded session produces byte-identical JSON.
- output_store (+2):
- `persist_if_large_at_exact_threshold_passes_through` — guards
the `<=` boundary against a future refactor to `<`.
- `persist_if_large_at_threshold_plus_one_writes_to_disk`
- compact (+3):
- `compression_state_empty_roundtrip`
- `compression_state_handles_unicode_paths`
- `compression_state_demote_after_protection_window_expires`
Totals: 598 agent-code-lib unit tests pass, clippy clean, fmt clean.
(The single pre-existing `sandbox::tests::auto_detect_off_macos_is_noop`
failure is unrelated and dependent on local `bwrap` presence.)
Second deep review pass — 2 more issues found and fixed (5f02ba5)Took another careful pass specifically looking for edge cases the first review missed. Found two additional security gaps, fixed both, added regression tests. 🟠 Gap 1: JSON-escaped quotes defeated the quoted credential patternScenario: tool output contains a config fragment like Caught by: new Fix: extended the quoted pattern to allow an optional leading 🟠 Gap 2: URL-embedded credentials leaked entirelyScenario: Caught by: same stress test above. Fix: added a dedicated New tests this pass — 19 total
Final verification
Verdict: safe to merge after CI reruns on 5f02ba5. |
Summary
Lands the core primitives for ROADMAP 7.6 (Advanced History Compression). Self-contained, tested, no query-loop changes.
services::secret_masker— shared regex set for AWS keys, GitHub PATs, OpenAI-stylesk-keys, PEM private keys, and genericapi_key=/password:/auth_token=assignments. Idempotentmask().compact::build_compact_summary_prompt— masks message text before the summarizer LLM sees itsession::save_session_full— masks serialized JSON before disk writeoutput_store::persist_if_large— masks large tool results before disk write (in-memory preview stays unmasked for agent use)FileCompressionRecord,CompressionLevel,FileCompressionStateincompact.rs:Full/Partial/Summary/ExcludedFull, unchanged re-reads preserve existing levelPROTECTED_TURN_WINDOW = 2— recently-read files are locked atFullanddemote()refuses to compress themsave()/load()to~/.cache/agent-code/sessions/<id>.compression.jsonsha2 = "0.10"Deferred (ROADMAP 7.6 follow-ups)
Left for a follow-up PR that touches the query loop:
FileCompressionStateon every tool read (needs query-loop hook)/compressionslash command (UX)Test plan
secret_maskerunit tests (each regex, idempotency, non-secret passthrough)compactunit tests (hash change detection, protected window, read-resets-on-change, read-preserves-on-unchanged, demote-refuses-protected, state roundtrip)cargo test -p agent-code-lib --lib services::secret_masker— 10/10 passcargo test -p agent-code-lib --lib services::compact— 28/28 passcargo check -p agent-code-libclean