feat(audit): unified envelope + log_event adapter (Phase 1 Step 1)#3
Merged
feat(audit): unified envelope + log_event adapter (Phase 1 Step 1)#3
Conversation
…audit.py
Phase 1 Step 1 (a) per docs/PHASE1-STEP1-DESIGN.md.
Changes to codec_audit.py:
- audit() now writes the unified envelope (schema, event, source on every
entry) per §1.1. `event=` is REQUIRED (kwarg-only, no default) per §7-Q4 —
schema regressions can no longer be silent. Existing positional `tool` arg
preserved.
- New log_event(event_type, source, message, extra, *, level, outcome, tool,
transport, duration_ms, error_type, error, client_id, correlation_id) per
§2.1. Adapter over audit(); never raises.
- _transport_for(source) lookup table per §2.2 (codec-heartbeat→heartbeat,
codec-mcp-http→http, etc., default 'local').
- _PREVIEW_MAX = 200 + _truncate(s, max_len) helper per §7-Q2. Used for any
*_preview field; no call site can bypass.
- _cmd_hash(code) sha1@8 helper per §7-Q1. Pairing key for command_flagged /
approved / denied triplet — not a security primitive.
- correlation_id passed through extra.correlation_id per §1.4.
- Reserved-fields guard prevents callers from masquerading top-level fields
via extra={}.
Existing tool/extra contract preserved enough that codec_dashboard's
read_events / get_stats imports remain functionally broken in the same way
they were pre-merge (those are stale references unrelated to this design).
Behaviour change for downstream callers: extras now nest under
record["extra"]["..."] instead of being flat-merged onto the top level.
Steps (b)-(g) update each direct audit() caller to pass `event=` explicitly.
Until step (c) lands, codec_mcp.py / codec_autopilot.py audit() invocations
will TypeError at runtime — by design, surface-now-not-later.
pytest tests/ --ignore=tests/test_smoke.py: 20 failed, 522 passed, 73 skipped
(identical to pre-change baseline; all 20 failures pre-existing on main —
test_smoke.py also has a pre-existing collection-time AttributeError).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…g (b)
Phase 1 Step 1 (b) per docs/PHASE1-STEP1-DESIGN.md §2.5 + §1.4.
- Removed the duplicate _AUDIT_LOG_PATH writer (formerly at line 64). Crew /
agent events now flow through codec_audit.audit() — one writer, one lock,
one rotation. The unified envelope (schema, event, source, outcome) is
added on every entry.
- _audit() rewritten as a thin shim that translates the historic crew
kwargs (`elapsed`, free-form keys) into the audit() signature. `tool`,
`agent`, `outcome`, `error_type`, `error`, `duration_ms` lift to the
proper top-level fields; everything else lands under extra.
- Correlation-id threading via contextvars.ContextVar. Crew.run sets the
ID at entry and resets in finally; Agent.run only sets if no ID is in
context (so it inherits when called from inside a crew, generates its
own when called solo via run_custom_agent). The ContextVar is propagated
automatically through asyncio.create_task and run_in_executor — no need
to thread it manually through every nested helper.
- New audit emissions Crew.run now produces:
crew_start, agent_start, agent_finish (per-agent),
crew_complete (with duration_ms), crew_error (caught & re-raised).
- _shell_execute's existing `shell_blocked` emit picks up the cid
automatically via the contextvar — no signature change needed.
pytest tests/ --ignore=tests/test_smoke.py: 20 failed, 522 passed,
73 skipped (matches main baseline; same 20 pre-existing failures).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cked stub (c)
Phase 1 Step 1 (c) per docs/PHASE1-STEP1-DESIGN.md §2.3 + §1.4 + §1.2.
codec_mcp.py:
- Every _audit() call site now passes `event=` explicitly per Q4 (no default
for the kwarg; missing it would TypeError at runtime). 5 call sites in
tool_fn:
* _validate_mcp_input failure → event="validation"
* concurrent.futures.TimeoutError → event="timeout"
* load_failed / errmsg branches → event="tool_result", outcome="error"
* happy path → event="tool_result", outcome="ok"
4 call sites in memory tools (search_memory, get_recent_memory) follow the
same pattern.
- correlation_id (secrets.token_hex(6)) is generated once at tool_fn entry
and threaded into every _audit() emit for that invocation. Pairs the
validation/timeout/result entries that come from the same MCP call.
- New `_register_blocked_stub()` registers a stub tool for each
HTTP-blocked skill (when CODEC_MCP_TRANSPORT == "http"). The stub
emits `event="mcp_http_blocked"` with outcome="denied" on every call
attempt, then returns a denial message to the client. Without this stub,
blocked-tool calls returned "method not found" with no audit trail.
codec_mcp_http.py: untouched — _HTTP_BLOCKED enforcement happens via
codec_config.py + codec_mcp.py at registration time, and the new stub
registration in codec_mcp.py covers call-time auditing.
codec_autopilot.py: 3 audit() call sites updated with explicit event=
("autopilot_fire" per §1.2), source="codec-autopilot", transport="scheduler",
correlation_id, and trigger_name in extra.
pytest tests/ --ignore=tests/test_smoke.py: 20 failed, 522 passed,
73 skipped (matches main baseline; no new regressions).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 Step 1 (d) per docs/PHASE1-STEP1-DESIGN.md §1.4 + §1.2. codec_voice.py: - New module-level _voice_correlation_id_var (contextvars.ContextVar) so any future audit emit fired from inside a voice session — including nested tool_call / tool_result events from skill dispatch — picks up the same cid automatically. - VoicePipeline.run() now generates secrets.token_hex(6) at entry, sets the contextvar, and emits voice_session_start with the session_id and resume_id. The cid is also stored on self._cid for explicit access. - VoicePipeline.run() finally block emits voice_session_end with duration_ms, turns (count of user messages), disconnect_reason, and outcome="error" + error_type + error if the gather raised. Token is reset on exit. - Both emits are guarded by try/except so audit failures cannot break the pipeline. Internal voice_interrupt emits at the existing barge-in points are deferred to a follow-up — they're outside the scope of (d) per the design contract (only correlation_id wiring is required here, and the two paired session-lifecycle emits give us the structural grouping). pytest tests/ --ignore=tests/test_smoke.py: 20 failed, 522 passed, 73 skipped (matches main baseline; no regressions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 Step 1 (e) per docs/PHASE1-STEP1-DESIGN.md §1.2. codec_oauth_provider.py: - token_issued emit at the end of exchange_authorization_code, with client_id, access_token_id (last 8 chars only — never the full token), refresh_token_id (last 8), expires_in_sec, scope, and a fresh correlation_id for the issue→refresh chain. - token_refreshed emit at the end of exchange_refresh_token, with access_token_id (new), previous_id (last 8 of the just-revoked access token, looked up from _refresh_to_access_map BEFORE _revoke_internal clears it), expires_in_sec, scope. - New emit_token_expired(access_token_id, client_id, age_seconds) helper for the TTL-check path. Wires `outcome="denied"`, `level="warning"`. Callers in the validate path will invoke this when a token expires — no inline TTL check exists yet, so this is the documented seam. - New emit_state_invalidated(reason, tokens_cleared) for admin clear / corruption / manual_delete paths. `outcome="warning"`, `level="warning"`. Callers in ops scripts and the recovery path call this when ~/.codec/oauth_state.json is wiped. - _token_id(value) helper returns the last 8 chars — never the full opaque token. Tokens are never written in clear to the audit log. - _oauth_log_event has a defensive ImportError fallback so a broken audit module doesn't take down OAuth. pytest tests/ --ignore=tests/test_smoke.py: 20 failed, 522 passed, 73 skipped (matches main baseline; no regressions). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 Step 1 (f) per docs/PHASE1-STEP1-DESIGN.md §1.2.
codec_self_improve.py:
- One correlation_id (secrets.token_hex(6)) per nightly run, generated
at the top of run_once. Every skill_proposal_staged emit in that run
shares it — analyzer can bucket "all proposals from the 2026-04-30 run".
- skill_proposal_staged audit fires per written proposal with:
extra.proposal_path (absolute path of the .md sidecar)
extra.skill_name (canonical name as written to disk)
extra.signal_type (per design: unknown_tool / failing_tool /
timeout_tool, mapped from gap['kind'] via
_GAP_KIND_TO_SIGNAL).
extra.validation_passed (matches the .md's "Status:" line)
extra.validation_reason (truncated to 200 chars)
extra.target_date (YYYY-MM-DD of the audit window analysed)
- outcome flips to "warning" + level="warning" when the proposal failed
validation, so the analyzer's error_rate count stays meaningful.
- _si_log_event has the same defensive ImportError fallback pattern used
in codec_oauth_provider.
Note: design §1.2 mentions extra.danger_score; not emitted here because
codec_config.is_dangerous_skill_code returns (bool, reason) without a
numeric score. validation_reason captures the substantive content. A
future enhancement could compute a score from the matched-pattern count.
pytest tests/ --ignore=tests/test_smoke.py: 20 failed, 522 passed,
73 skipped (matches main baseline; no regressions).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…es (g)
Phase 1 Step 1 (g) per docs/PHASE1-STEP1-DESIGN.md §2.4 + §6.
Removed `def log_event(*a, **kw): pass` no-op fallbacks (the import had
been silently failing since log_event didn't exist in codec_audit.py — every
log_event call across these modules was a no-op for the entire history of
this codebase). Now: real import, real adapter, real audit envelope written
to ~/.codec/audit.log on every emit.
Modules updated:
codec_session.py — command_flagged / command_approved / command_denied
triplet shares a sha1@8 cmd_hash (per §7-Q1) for
analyzer pairing. Dropped the redundant `action`
field that mirrored the event type. Outcome="denied"
on flagged/denied; level="warning".
codec_scheduler.py — schedule_fire + schedule_done share a correlation_id
for the firing. duration_ms on done. outcome flips
to "error" if the crew run failed.
codec_dispatch.py — wake_dispatch + wake_skill_error share a cid across
the fall-through retry across matched skills.
error_type/error captured properly on exception.
codec_heartbeat.py — service_down (outcome=error, level=error,
extra.service+url) and heartbeat_tick (extra.tasks_run).
codec_dashboard.py — chat_command (task_preview only — never the full body),
chat_skill (tool=skill_name, extra.result_len),
chat_llm (extra.model+answer_len),
chat_llm_error (outcome=error, error_type+error,
defensive NameError guard around model var),
chat_vision (extra.prompt_preview),
service_restart (extra.service).
codec.py — wake_dispatch / tts_speak / wake_word_detected
(canonical event names + extra fields where
available). Privacy: task_preview only.
routes/auth.py — auth_success / auth_reject (per §1.2). PIN lockout
carries lockout_level + lockout_sec + client_ip in
extra; outcome="denied", level="warning".
pytest tests/ --ignore=tests/test_smoke.py: 20 failed, 522 passed,
73 skipped (matches main baseline; no regressions).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 1 Step 1 — test layer + a small fix-up to the (b) commit's
contextvar claim.
New test files (~510 LOC total):
tests/test_audit_envelope.py — schema validation, required-
field guarantees, event-required-kwarg, log_event adapter, transport
lookup, truncation, privacy (chat_command), correlation_id contract,
reserved-field guard, _cmd_hash. 18 tests.
tests/test_log_event_callsites.py — per-module assertions for the
7 modules that emit log_event (heartbeat, scheduler, dispatch,
session, dashboard, auth). Verifies event names, transport mapping,
cmd_hash pairing, privacy (no full-task field), error_type capture.
13 tests.
tests/test_audit_analyzer_compat.py — codec_audit_analyzer.analyze
accepts unified, legacy, and mixed records and produces the expected
totals/error counts/unique_clients regardless. 4 tests.
tests/test_audit_perf.py — single-thread (<0.5ms/call)
+ 10×1000 concurrent stress (<2.5ms/call, no JSON corruption, no
dropped writes, all correlation_ids accounted for). CI multiplier
keeps the test honest without flaking on slow runners. 3 tests.
tests/test_correlation_id_propagation.py — crew run, voice session,
MCP tool call, schedule run all preserve their cid end-to-end.
Includes orphan-cid detection and uniqueness checks. 8 tests.
Fix in codec_agents.py:
Agent.run now wraps `loop.run_in_executor(None, tool.run, tool_input)`
with `contextvars.copy_context().run()`. asyncio's run_in_executor does
NOT auto-propagate contextvars (verified empirically), so without this
wrap any _audit emit fired from inside a Tool (e.g. _shell_execute →
shell_blocked) was losing the agent/crew correlation_id. The test in
tests/test_correlation_id_propagation.py uses the same wrap pattern
to lock the contract in.
pytest tests/ --ignore=tests/test_smoke.py: 20 failed, 568 passed,
73 skipped (568 = 522 baseline + 46 new tests; same 20 pre-existing
failures, no regressions).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Marks the audit envelope as UNIFIED (schema:1) on the phase1-step1-audit-unification branch. - Documents the full envelope (top-level + extra namespace, all reserved field names) in jsonc. - Notes the event= required-kwarg contract (Q4) and the correlation_id required contract (§1.4 — 12-char hex from secrets.token_hex(6), required for any op with ≥2 audit emits). - Lists the 7 modules whose log_event is now wired through (was previously 5; codec.py + routes/auth.py also had the same dead-import pattern). - Records implementation HEAD: 05f9b80. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Captured 2026-04-29 23:30 GMT+2 from ~/.codec/audit.log (live entries, all-time available window since this machine doesn't yet have audit.db for the 30-min trailing slice from design §5.4). Numbers (live audit.log, n=292, with_duration=203): avg = 987.96 ms p95 = 1907.78 ms Hard-revert thresholds derived (per §5.4): p95 > 3815.56 ms (2× baseline) → revert avg > 1975.92 ms (2× baseline) → revert p95 between 2484.11 - 3815.56 ms (1.3-2×) → investigate, don't revert test_audit_concurrent_no_corruption fails on live load → revert Sampling cadence: T+0/+4h/+8h/+12h/+16h/+20h post-merge, each sample appended to ~/.codec/perf-postmerge.txt. Sign-off after 24h if all six samples within 1.3× baseline. Recompute command + audit.db SQL form documented inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ng fails
Per PR review request: catalog every pre-existing test failure, classify
each as untouched / touched-unchanged / touched-needs-investigation, and
prove that nothing reaches category 3.
Methodology: cloned main to /tmp/main-baseline, ran pytest both sides,
diffed failure lists (identical 20-for-20). For each fail in a touched
file, walked the diff to show the test target was not edited.
Counts:
untouched 9 (skills/codec.py, codec_skill_registry.py,
README.md, codec_agent.py — none in this PR)
touched-unchanged 11 (codec_dashboard / codec_voice / codec_session /
codec_scheduler / codec.py / routes-auth: diffs
walked, no overlap with test targets)
touched-needs-investigation 0
total 20
73 skipped tests are all environmental (Dashboard not running on :8090,
CODEC_TEST_TOKEN unset, native deps missing, auth_pin_hash absent in
worktree config). Verified `git diff main...HEAD -- tests/ | grep
'pytest.skip|skipif'` empty — this PR adds zero new skip markers.
Sign-off: branch clean for merge by contract.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Owner
Author
|
Pre-merge test audit added — 20-for-20 classification of pre-existing failures:
73 skipped: all environmental (dashboard PM2 not running, no Branch clean for merge by the design contract. |
AVADSA25
pushed a commit
that referenced
this pull request
Apr 30, 2026
…sues Phase 1 Step 1 — post-merge cadence kickoff after PR #3 (commit 45d4aa7). docs/PHASE1-STEP1-POSTMERGE-SAMPLES.md T+0 captured 2026-04-30T09:23 GMT+2 (T+0h06m post-merge): total=31, schema1=28 (90%), legacy=3, errors=7, with_duration=0. avg_ms / p95_ms: N/A (no MCP traffic in trailing 30-min window). Status: ok (with caveats documented inline). The 7 errors are all `service_down` emits from codec-heartbeat flagging Whisper/Kokoro/Vision intermittents. These are previously-hidden lifecycle events now visible per the design's stated intent (§0). Not a regression — the errors did not start appearing, visibility into them did. Pending blocks reserved for T+4h, T+8h, T+12h, T+16h, T+20h. scripts/capture_audit_sample.py Re-runs the same trailing-30m analysis on demand, replaces the matching pending block in the samples doc with captured numbers, and exits 0 (ok) / 1 (investigate) / 2 (revert per §5.4). Apple Reminders set for the user at 13:17 / 17:17 / 21:17 / 01:17 / 05:17 with the exact invocation command. docs/known-issues.md New file. Documents the pre-existing _safe_task naming-convention bug in codec.py (line 255 post-merge) that fails test_security.py::test_osascript_inputs_sanitized — escape pattern is correct, only the variable name disagrees with the test's `safe_` prefix expectation. Deferred-not-fixed; revisit on next legitimate edit to _dispatch_inner. Reserved space for the §5.4 sign-off entry once T+20h sample lands. Live-load verification: pytest tests/test_audit_perf.py::test_audit_concurrent_no_corruption → PASSED (10×1000 writes, no JSON corruption, no dropped cids, under contention on the merged main). PM2 services confirmed healthy after restart sequence (30s gaps, checked logs for boot errors): codec-dashboard /api/health → 200, codec-mcp-http /health → 200 with all 4 dependency checks green. Boot tracebacks in logs are SIGTERM/KeyboardInterrupt from the previous instances during graceful shutdown — not from new code. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 1 Step 1 — implements the unified audit envelope (schema:1) + real
log_eventadapter, perdocs/PHASE1-STEP1-DESIGN.md(v2 at commit805ead1, approved 2026-04-30).tool+ crew-styleevent) with one unified envelope includingschema,event(REQUIRED kwarg, no default — Q4),source, plus all top-level fields and anextranamespace.log_eventfrom a silent no-op fallback into a real adapter overaudit()across 7 modules (the design listed 5; codec.py and routes/auth.py also had the same dead-import pattern).correlation_id(12-char hex fromsecrets.token_hex(6)) across multi-emit operations: MCP tool calls, crew runs, voice sessions, schedule runs, OAuth handshakes, skill-proposal nightly runs.loop.run_in_executordoes NOT auto-propagate contextvars in asyncio (verified empirically). Fixed incodec_agents.Agent.runwithcontextvars.copy_context().run().Commits (10)
3fcca25audit(),log_event(),_transport_for,_truncate,_cmd_hash,_PREVIEW_MAX, correlation_id pass-through4d791e6_auditshim, kill duplicate_AUDIT_LOG_PATH, contextvar threading through Crew.run / Agent.run, newagent_start/agent_finish/crew_erroremitsba83742event=per emit, cid at tool_fn entry,_register_blocked_stubfor HTTP-blocked skills8c999efvoice_session_start/voice_session_endwith cid contextvar675c746token_issued/token_refreshed+emit_token_expired/emit_state_invalidatedhelpers537e6b5skill_proposal_stagedper proposal sharing one nightly cid7f837bbextra.actiondropped,cmd_hashtriplet pairing05f9b801248d77027beafTest plan
main; identical list pre- and post-change. None caused by this branch. (test_smoke.py also has a pre-existing collection-time AttributeError; ignored throughout per the same reasoning.)audit()< 0.5 ms/call, 10×1000 concurrent stress < 2.5 ms/call with no JSON corruption and no dropped writes.Pre-merge baseline + revert thresholds
See
docs/PHASE1-STEP1-BASELINE.md:tests/test_audit_perf.py::test_audit_concurrent_no_corruptionfails on live load.Rollback runbook (from design §5.4)
~/.codec/audit.logitself is left intact — pre-merge entries still parse viacodec_audit_analyzer(already tolerant of legacy shape).Estimated rollback time: ~30 seconds.
What was NOT done (intentionally)
~/.codec/audit.logormemory.db._HTTP_BLOCKED(codec_config.py) — the new_register_blocked_stubonly adds an HTTP-side audit emit.Design + implementation links
docs/PHASE1-STEP1-DESIGN.md(v2, all 7 reviewer questions resolved)1248d77027beaf🤖 Generated with Claude Code