-
-
Notifications
You must be signed in to change notification settings - Fork 73
Description
Summary
There's a critical bug causing duplicate events and failed heartbeat merges. The root cause is a mismatch between two SQL queries that should target the same event but use different selection criteria.
Root Cause
get_events (datastore.rs:736):
ORDER BY starttime DESC LIMIT 1replace_last_event (datastore.rs:563):
WHERE endtime = (SELECT max(endtime) FROM events WHERE bucketrow = ?1)When events have the same starttime but different durations (common during AFK status transitions), these queries target different events.
Impact
In my database:
- 284,995 AFK events reduced to 10,816 after deduplication + merge (96% were duplicates/unmerged)
- 92% of events had zero duration (heartbeats never merged)
- 14,013 events on a single day (should be ~50 status transitions)
Reproduction
-
AFK watcher sends two rapid heartbeats on status change (e.g., becoming idle):
ping(not-afk, T, duration=0)ping(afk, T, duration=X)
-
Both events have same starttime T but different data/duration
-
Next heartbeat arrives:
get_events(LIMIT 1)withORDER BY starttime DESCreturns one event (order undefined for same starttime)- If data doesn't match, merge fails → INSERT new event
- If merge succeeds,
replace_last_eventtargets event withmax(endtime)- potentially a different event
-
Cache and database diverge, duplicates accumulate
Additional Issue
replace_last_event doesn't check rows affected (datastore.rs:584-590):
match stmt.execute([...]) {
Ok(_) => self.update_endtime(&mut bucket, event), // Ignores rows_affected!
Err(err) => return Err(...)
}If UPDATE matches 0 rows, it returns Ok(()) anyway, corrupting the cache.
Proposed Fixes
Fix 1: Consistent query selection (Required)
Either:
- Pass event ID to
replace_last_eventand useWHERE id = ? - Or change
get_eventsto useORDER BY endtime DESC
Fix 2: Verify row count (Required)
match stmt.execute([...]) {
Ok(0) => return Err(DatastoreError::InternalError(
"replace_last_event matched 0 rows".into()
)),
Ok(_) => self.update_endtime(&mut bucket, event),
Err(err) => return Err(...)
}Fix 3: Add tiebreaker to ORDER BY (Recommended)
ORDER BY starttime DESC, id DESC LIMIT 1This ensures deterministic ordering when starttime is identical.
Environment
- aw-server-rust: latest (cloned today)
- awatcher client
- ~8 months of data
Workaround
Data can be recovered by:
- Deduplicating (keep longest duration per starttime+data)
- Merging consecutive same-status events
- Reconstructing durations from next event's starttime
Happy to provide more details or test patches.