fix: reject heartbeats on lock timeout instead of proceeding unsafely#159
fix: reject heartbeats on lock timeout instead of proceeding unsafely#159
Conversation
…fely When the heartbeat lock couldn't be acquired within 1s, the code would proceed without the lock (causing concurrent SQLite access and "database is locked" errors) then try to release a lock it didn't own. Now returns 503 so the client can retry cleanly. Also increased timeout to 10s to reduce spurious rejections.
Greptile SummaryThis PR fixes two related bugs in Confidence Score: 5/5Safe to merge — fixes a real concurrency bug with no new risk introduced. Both bugs (per-instance lock and fall-through on timeout) are correctly addressed. The class-level lock provides the intended mutual exclusion, the 503 early return prevents the unsafe code path, and the increased timeout reduces spurious rejections. No P0/P1 issues remain. No files require special attention. Important Files Changed
Reviews (2): Last reviewed commit: "fix: make heartbeat lock a class variabl..." | Re-trigger Greptile |
The lock was created in __init__ as self.lock = Lock(), but Flask-RESTX instantiates Resource per-request, so each request got its own lock providing zero mutual exclusion. Moving to a class variable ensures all heartbeat requests are properly serialized.
|
@greptileai review |
Summary
Problem
The heartbeat lock handling had a bug: when
self.lock.acquire(timeout=1)timed out, the code would:heartbeat()without holding the lock — causing concurrent SQLite accessself.lock.release()infinally— releasing a lock this thread doesn't ownThis led to frequent
database is lockederrors and 500 responses, which caused watchers to retry, amplifying the problem.Fix
Evidence from production logs
~4000 such errors in a 21-day session, with 9000 queued retry databases in aw-client.
Test plan
See also: ActivityWatch/aw-core PR for enabling WAL mode (the complementary fix)