You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SessionStats.flush() in packages/claude-code-plugin/hooks/lib/stats.py performs a non-atomic read-modify-write against the on-disk stats file. The file lock is released between the read and the write, so two PostToolUse hook processes that fire concurrently can each read the same baseline, each apply their delta, and the second writer overwrites the first writer's update. The session loses one or more tool calls.
```python
def flush(self) -> None:
"""Flush accumulated in-memory stats to disk."""
if self._pending_count == 0:
return
data = self._locked_read() # <-- LOCK_SH acquired and released
data["tool_count"] = data.get("tool_count", 0) + self._mem_tool_count
data["error_count"] = data.get("error_count", 0) + self._mem_error_count
tool_names = data.get("tool_names", {})
for name, count in self._mem_tool_names.items():
tool_names[name] = tool_names.get(name, 0) + count
data["tool_names"] = tool_names
# Merge hook timings
hook_timings = data.get("hook_timings", {})
for name, times in self._mem_hook_timings.items():
if name not in hook_timings:
hook_timings[name] = []
hook_timings[name].extend(times)
data["hook_timings"] = hook_timings
self._locked_write(data) # <-- new LOCK_EX acquired here
# Reset in-memory accumulators
...
```
The helpers _locked_read() and _locked_write() each open the file inside their own with block and release the lock when the block exits. Between the two calls there is a window in which another process can grab LOCK_EX.
Race scenario
Two PostToolUse hook processes A and B fire for two parallel tool calls:
Process A: _locked_read() returns {tool_count: 5}, releases lock
Process B: _locked_read() returns {tool_count: 5}, releases lock
Process A: applies delta, _locked_write({tool_count: 6})
Process B: applies delta, _locked_write({tool_count: 6}) — should be 7
Claude Code can dispatch tools in parallel (e.g. multiple Agent invocations, or several Read/Grep in one assistant turn). Each spawns its own short-lived PostToolUse process, so the race window is realistic, not hypothetical.
Users will see slightly under-reported [CB] Xm | N tools | ... numbers in the Stop hook summary on busy turns.
Reproduction
```bash
cd packages/claude-code-plugin
python3 - <<'PY'
import multiprocessing as mp
import os, sys, tempfile, json
sys.path.insert(0, 'hooks/lib')
from stats import SessionStats
def worker(data_dir, session_id, n):
s = SessionStats(session_id=session_id, data_dir=data_dir, flush_interval=10)
for _ in range(n):
s.record_tool_call("Bash")
s.flush()
with tempfile.TemporaryDirectory() as tmp:
procs = [mp.Process(target=worker, args=(tmp, "race", 100)) for _ in range(8)]
for p in procs: p.start()
for p in procs: p.join()
s = SessionStats(session_id="race", data_dir=tmp)
on_disk = s._locked_read()
print("expected", 8 * 100, "got", on_disk["tool_count"])
PY
```
Expected output: `expected 800 got 800`. Today you will typically see a number well under 800 (varies per run).
Fix direction
Replace _locked_read() + _locked_write() inside flush() with a single critical section:
Open the stats file in `r+` mode
Acquire `fcntl.flock(LOCK_EX)`
`json.load()` from the file handle
Apply in-memory deltas to the loaded dict
`f.seek(0)` + `f.truncate()` + `json.dump()`
Release the lock by closing the file
A clean way to do this is to add a private `_locked_modify(self, mutator)` helper on `SessionStats` that takes a callable `(data: dict) -> dict` and runs it inside one `LOCK_EX` window. `flush()` then becomes a thin caller of `_locked_modify`.
Note: `_locked_read()` and `_locked_write()` may still be useful for callers that only read or only write, so leave them in place.
Acceptance criteria
`SessionStats.flush()` performs read-modify-write inside a single `fcntl.flock(LOCK_EX)` window.
New regression test in `packages/claude-code-plugin/tests/test_stats.py` (suggested class `TestConcurrentFlush`):
Spawns 8 `multiprocessing.Process` workers, each calling `record_tool_call() + flush()` 100 times against the same session/data_dir.
Asserts the final on-disk `tool_count` equals `8 * 100` exactly.
Asserts the final `tool_names["Bash"]` equals `8 * 100`.
Existing `test_concurrent_writes_dont_corrupt` (single-process) still passes.
All other tests in `test_stats.py` still pass.
Verify behavior on macOS (fcntl available) and document fallback when `HAS_FCNTL` is False (currently the code silently skips locking — that fallback also needs a comment about the lost-update risk).
Out of scope
`record_hook_timing` is never called by any hook — tracked separately.
`HookTimer` (`hooks/lib/hook_timer.py`) is also dead code — tracked in the same separate issue.
Problem
SessionStats.flush()inpackages/claude-code-plugin/hooks/lib/stats.pyperforms a non-atomic read-modify-write against the on-disk stats file. The file lock is released between the read and the write, so twoPostToolUsehook processes that fire concurrently can each read the same baseline, each apply their delta, and the second writer overwrites the first writer's update. The session loses one or more tool calls.Affected code
File:
packages/claude-code-plugin/hooks/lib/stats.pyMethod:
SessionStats.flush()```python
def flush(self) -> None:
"""Flush accumulated in-memory stats to disk."""
if self._pending_count == 0:
return
data = self._locked_read() # <-- LOCK_SH acquired and released
data["tool_count"] = data.get("tool_count", 0) + self._mem_tool_count
data["error_count"] = data.get("error_count", 0) + self._mem_error_count
tool_names = data.get("tool_names", {})
for name, count in self._mem_tool_names.items():
tool_names[name] = tool_names.get(name, 0) + count
data["tool_names"] = tool_names
# Merge hook timings
hook_timings = data.get("hook_timings", {})
for name, times in self._mem_hook_timings.items():
if name not in hook_timings:
hook_timings[name] = []
hook_timings[name].extend(times)
data["hook_timings"] = hook_timings
self._locked_write(data) # <-- new LOCK_EX acquired here
# Reset in-memory accumulators
...
```
The helpers
_locked_read()and_locked_write()each open the file inside their ownwithblock and release the lock when the block exits. Between the two calls there is a window in which another process can grabLOCK_EX.Race scenario
Two
PostToolUsehook processes A and B fire for two parallel tool calls:_locked_read()returns{tool_count: 5}, releases lock_locked_read()returns{tool_count: 5}, releases lock_locked_write({tool_count: 6})_locked_write({tool_count: 6})— should be 7One tool call has been silently lost.
Why this matters
Agentinvocations, or severalRead/Grepin one assistant turn). Each spawns its own short-livedPostToolUseprocess, so the race window is realistic, not hypothetical.[CB] Xm | N tools | ...numbers in the Stop hook summary on busy turns.Reproduction
```bash
cd packages/claude-code-plugin
python3 - <<'PY'
import multiprocessing as mp
import os, sys, tempfile, json
sys.path.insert(0, 'hooks/lib')
from stats import SessionStats
def worker(data_dir, session_id, n):
s = SessionStats(session_id=session_id, data_dir=data_dir, flush_interval=10)
for _ in range(n):
s.record_tool_call("Bash")
s.flush()
with tempfile.TemporaryDirectory() as tmp:
procs = [mp.Process(target=worker, args=(tmp, "race", 100)) for _ in range(8)]
for p in procs: p.start()
for p in procs: p.join()
s = SessionStats(session_id="race", data_dir=tmp)
on_disk = s._locked_read()
print("expected", 8 * 100, "got", on_disk["tool_count"])
PY
```
Expected output: `expected 800 got 800`. Today you will typically see a number well under 800 (varies per run).
Fix direction
Replace
_locked_read()+_locked_write()insideflush()with a single critical section:A clean way to do this is to add a private `_locked_modify(self, mutator)` helper on `SessionStats` that takes a callable `(data: dict) -> dict` and runs it inside one `LOCK_EX` window. `flush()` then becomes a thin caller of `_locked_modify`.
Note: `_locked_read()` and `_locked_write()` may still be useful for callers that only read or only write, so leave them in place.
Acceptance criteria
Out of scope
References