Skip to content

Commit aa848ce

Browse files
authored
fix: reject heartbeats on lock timeout instead of proceeding unsafely (#159)
* fix: reject heartbeats when lock times out instead of proceeding unsafely When the heartbeat lock couldn't be acquired within 1s, the code would proceed without the lock (causing concurrent SQLite access and "database is locked" errors) then try to release a lock it didn't own. Now returns 503 so the client can retry cleanly. Also increased timeout to 10s to reduce spurious rejections. * fix: make heartbeat lock a class variable for actual thread safety The lock was created in __init__ as self.lock = Lock(), but Flask-RESTX instantiates Resource per-request, so each request got its own lock providing zero mutual exclusion. Moving to a class variable ensures all heartbeat requests are properly serialized.
1 parent 153e7fd commit aa848ce

1 file changed

Lines changed: 9 additions & 4 deletions

File tree

aw_server/rest.py

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -271,8 +271,12 @@ def delete(self, bucket_id: str, event_id: int):
271271

272272
@api.route("/0/buckets/<string:bucket_id>/heartbeat")
273273
class HeartbeatResource(Resource):
274+
# Class-level lock shared across all instances.
275+
# Flask-RESTX creates a new Resource instance per request, so an
276+
# instance-level lock would provide no mutual exclusion.
277+
lock = Lock()
278+
274279
def __init__(self, *args, **kwargs):
275-
self.lock = Lock()
276280
super().__init__(*args, **kwargs)
277281

278282
@api.expect(event, validate=True)
@@ -291,11 +295,12 @@ def post(self, bucket_id):
291295
# This lock is meant to ensure that only one heartbeat is processed at a time,
292296
# as the heartbeat function is not thread-safe.
293297
# This should maybe be moved into the api.py file instead (but would be very messy).
294-
aquired = self.lock.acquire(timeout=1)
295-
if not aquired:
298+
acquired = self.lock.acquire(timeout=10)
299+
if not acquired:
296300
logger.warning(
297-
"Heartbeat lock could not be aquired within a reasonable time, this likely indicates a bug."
301+
"Heartbeat lock could not be acquired within timeout, rejecting request."
298302
)
303+
return {"message": "Server busy, try again later"}, 503
299304
try:
300305
event = current_app.api.heartbeat(bucket_id, heartbeat, pulsetime)
301306
finally:

0 commit comments

Comments
 (0)