Skip to content

fix: reject heartbeats on lock timeout instead of proceeding unsafely#159

Merged
ErikBjare merged 2 commits intomasterfrom
fix/heartbeat-lock-handling
Apr 13, 2026
Merged

fix: reject heartbeats on lock timeout instead of proceeding unsafely#159
ErikBjare merged 2 commits intomasterfrom
fix/heartbeat-lock-handling

Conversation

@ErikBjare
Copy link
Copy Markdown
Member

Summary

  • Return 503 when heartbeat lock can't be acquired, instead of proceeding without thread safety
  • Increase lock timeout from 1s to 10s to reduce spurious rejections

Problem

The heartbeat lock handling had a bug: when self.lock.acquire(timeout=1) timed out, the code would:

  1. Proceed to execute heartbeat() without holding the lock — causing concurrent SQLite access
  2. Call self.lock.release() in finally — releasing a lock this thread doesn't own

This led to frequent database is locked errors and 500 responses, which caused watchers to retry, amplifying the problem.

Fix

  • If the lock can't be acquired within 10s, return HTTP 503 ("Server busy") so the client retries cleanly
  • The longer timeout reduces spurious rejections under normal load

Evidence from production logs

peewee.OperationalError: database is locked
500 (127.0.0.1): POST /api/0/buckets/aw-watcher-window_.../heartbeat

~4000 such errors in a 21-day session, with 9000 queued retry databases in aw-client.

Test plan

  • Existing tests pass
  • Under concurrent heartbeat load, no "database is locked" errors
  • Lock timeout returns 503 instead of crashing

See also: ActivityWatch/aw-core PR for enabling WAL mode (the complementary fix)

…fely

When the heartbeat lock couldn't be acquired within 1s, the code would
proceed without the lock (causing concurrent SQLite access and "database
is locked" errors) then try to release a lock it didn't own.

Now returns 503 so the client can retry cleanly. Also increased timeout
to 10s to reduce spurious rejections.
@greptile-apps
Copy link
Copy Markdown

greptile-apps bot commented Apr 13, 2026

Greptile Summary

This PR fixes two related bugs in HeartbeatResource: the lock was created per-request instance (providing no mutual exclusion), and a timeout failure fell through into heartbeat() without holding the lock while the finally block released it erroneously. Both are fixed by moving the lock to class level and returning 503 on timeout.

Confidence Score: 5/5

Safe to merge — fixes a real concurrency bug with no new risk introduced.

Both bugs (per-instance lock and fall-through on timeout) are correctly addressed. The class-level lock provides the intended mutual exclusion, the 503 early return prevents the unsafe code path, and the increased timeout reduces spurious rejections. No P0/P1 issues remain.

No files require special attention.

Important Files Changed

Filename Overview
aw_server/rest.py Class-level lock fixes mutual exclusion; 503 return on timeout prevents unsafe lock-free execution and erroneous release; timeout bumped to 10s.

Reviews (2): Last reviewed commit: "fix: make heartbeat lock a class variabl..." | Re-trigger Greptile

The lock was created in __init__ as self.lock = Lock(), but Flask-RESTX
instantiates Resource per-request, so each request got its own lock
providing zero mutual exclusion. Moving to a class variable ensures all
heartbeat requests are properly serialized.
@ErikBjare
Copy link
Copy Markdown
Member Author

@greptileai review

@ErikBjare ErikBjare merged commit aa848ce into master Apr 13, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant