Observed
On rhencke/confusio PR #192, reviewer comment 3102997241 got two near-duplicate fido replies 23s apart (plus a third fix-announcement reply), while comment 3103000287 got one clean reply:
| id |
created_at |
in_reply_to |
body preview |
| 3102997241 |
20:10:37Z |
(root, rhencke) |
"Yes and no. Redbean is fork()-driven..." |
| 3103000287 |
20:11:12Z |
(root, rhencke) |
"We really need integration tests beyond Gitea..." |
| 3103003525 |
20:11:49Z |
3103000287 |
fido: "Totally agree — Gitea-only coverage..." ✅ |
| 3103017940 |
20:14:14Z |
3102997241 |
fido: "Good catch! You're right — 'single-threaded'..." |
| 3103019970 |
20:14:37Z |
3102997241 |
fido: "Good catch! You're right — 'single-threaded'..." ← duplicate |
| 3103020955 |
20:14:49Z |
3102997241 |
fido: "Fixed in e319c94..." |
What actually happened
Two reviewer comments arrived ~35s apart and both triggered legitimate webhook handlers simultaneously at 20:11:31 (tids 954503360 and 937717952). Nothing wrong with that — one handler per comment.
The duplicate is on a single comment: 3102997241 received both 3103017940 and 3103019970 with essentially the same generated body. One came from the webhook's reply path, the other from the worker running a thread task for the same comment. Same class as #566 → #662 ("duplicate replies to review-thread comments"), but the _replied_comments / RepliedComments claim filter missed it here.
Possible holes in the claim system:
- Race window: webhook handler for comment A posts its reply after the worker has already picked the thread task for A, so the worker's "filter threads by webhook-claimed ids" check ran before the claim existed.
- Cross-handler races: one of the two simultaneous handlers (for different comments) could have been serializing in a way that let the worker interleave before the claim landed.
- Claim-key mismatch: the webhook-side claim key doesn't match what the worker's
handle_threads filter queries (e.g. review-comment vs. review-thread id, in_reply_to_id vs id). Worth auditing the exact ids stored/compared.
Proposed fix
Treat the claim as a pre-reply barrier, not a post-reply annotation:
- Webhook:
RepliedComments.claim(comment_id) acquires before the Opus reply generation begins and is visible to the worker before the reply is posted, not after.
- Worker:
handle_threads already filters by the claim set; add a tighter check right before the reply HTTP POST that re-reads the claim and aborts if a claim was added while Opus was generating.
- Log when the worker-side dedupe fires (
skipping thread N — claimed by webhook) so we can measure in production.
Also add an integration-shape test that runs webhook-then-worker on the same comment id with Opus mocked, asserting exactly one reply posted.
Related
Sibling bug #671 — same class on the issue-comment reply path (different endpoint, different claim?). Fix should cover both paths, or we'll keep playing whack-a-mole.
Sub-issue of
Bugs work order: #41 (at 100-cap; stands alone under v1 if link fails).
Observed
On rhencke/confusio PR #192, reviewer comment 3102997241 got two near-duplicate fido replies 23s apart (plus a third fix-announcement reply), while comment 3103000287 got one clean reply:
What actually happened
Two reviewer comments arrived ~35s apart and both triggered legitimate webhook handlers simultaneously at 20:11:31 (tids
954503360and937717952). Nothing wrong with that — one handler per comment.The duplicate is on a single comment:
3102997241received both3103017940and3103019970with essentially the same generated body. One came from the webhook's reply path, the other from the worker running a thread task for the same comment. Same class as #566 → #662 ("duplicate replies to review-thread comments"), but the_replied_comments/RepliedCommentsclaim filter missed it here.Possible holes in the claim system:
handle_threadsfilter queries (e.g. review-comment vs. review-thread id,in_reply_to_idvsid). Worth auditing the exact ids stored/compared.Proposed fix
Treat the claim as a pre-reply barrier, not a post-reply annotation:
RepliedComments.claim(comment_id)acquires before the Opus reply generation begins and is visible to the worker before the reply is posted, not after.handle_threadsalready filters by the claim set; add a tighter check right before the reply HTTP POST that re-reads the claim and aborts if a claim was added while Opus was generating.skipping thread N — claimed by webhook) so we can measure in production.Also add an integration-shape test that runs webhook-then-worker on the same comment id with Opus mocked, asserting exactly one reply posted.
Related
Sibling bug #671 — same class on the issue-comment reply path (different endpoint, different claim?). Fix should cover both paths, or we'll keep playing whack-a-mole.
Sub-issue of
Bugs work order: #41 (at 100-cap; stands alone under v1 if link fails).