Skip to content

fix: prevent duplicate sessions, audits, and stuck audit logs#40

Merged
George-iam merged 2 commits intomainfrom
feat/audit-dedup-20260407
Apr 7, 2026
Merged

fix: prevent duplicate sessions, audits, and stuck audit logs#40
George-iam merged 2 commits intomainfrom
feat/audit-dedup-20260407

Conversation

@George-iam
Copy link
Copy Markdown
Contributor

Summary

Three fixes for related audit reliability issues.

Fix 1: Session creation lock (sessions.ts)

  • O_EXCL filesystem lock prevents parallel hooks from creating multiple AXME sessions per Claude session
  • Spin-wait 500ms on contention, stale lock cleanup (>5s), graceful degradation

Fix 2: Audit spawn dedup (server.ts + session-cleanup.ts)

  • cleanupAndExit groups AXME sessions by Claude session ID, spawns one audit per Claude session
  • Cross-session concurrent-audit check: if another AXME session with same Claude ID is pending, skip

Fix 3: Stuck audit cleanup (session-cleanup.ts + cli.ts)

  • finally block ensures audit log gets phase=failed if neither success nor catch ran
  • SIGTERM/SIGINT handlers in audit-session CLI set auditStatus=failed before exit
  • SIGKILL covered by existing 15-min stale timeout

Evidence (from this session)

  • 156 sessions created, many duplicates for same Claude session
  • Paired audits reading identical transcript offsets (wasted LLM cost)
  • Two stuck at phase=started forever (b211b61d, 7971ee25)

Test plan

  • 103 existing tests green
  • Build + type check clean
  • E2E: reload VS Code, verify session count +1 (not +5)
  • E2E: verify single audit worker spawned per reload
  • E2E: verify no new stuck audits (phase=started)

Fix 1: Filesystem lock (O_EXCL) in ensureAxmeSessionForClaude prevents
parallel hooks from creating multiple AXME sessions per Claude session.
Lock winner creates session, others re-read the mapping.

Fix 2: cleanupAndExit deduplicates by Claude session ID before spawning
audit workers (one per Claude session, not per AXME session). Plus
cross-session concurrent-audit check in runSessionCleanup as defense.

Fix 3: finally block in LLM audit section ensures audit log is finalized
even on unexpected termination. SIGTERM/SIGINT handlers in audit-session
CLI set auditStatus=failed before exit. SIGKILL handled by existing
15-minute stale timeout.
- 11 new tests: lock mechanism (7), sequential calls (3), parallel E2E (1)
- E2E test: 5 parallel processes -> exactly 1 session (verified)
- Fix: re-check inside lock trusts mapping unconditionally (winner's session
  may have dead pid if worker exited, but it's fresh by definition)
- Export lock functions for testing
@George-iam George-iam merged commit 363e43a into main Apr 7, 2026
@George-iam George-iam deleted the feat/audit-dedup-20260407 branch April 7, 2026 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant