Skip to content

Duplicate review-thread reply to same comment: webhook + worker both post (claim system missed) #672

@FidoCanCode

Description

@FidoCanCode

Observed

On rhencke/confusio PR #192, reviewer comment 3102997241 got two near-duplicate fido replies 23s apart (plus a third fix-announcement reply), while comment 3103000287 got one clean reply:

id created_at in_reply_to body preview
3102997241 20:10:37Z (root, rhencke) "Yes and no. Redbean is fork()-driven..."
3103000287 20:11:12Z (root, rhencke) "We really need integration tests beyond Gitea..."
3103003525 20:11:49Z 3103000287 fido: "Totally agree — Gitea-only coverage..." ✅
3103017940 20:14:14Z 3102997241 fido: "Good catch! You're right — 'single-threaded'..."
3103019970 20:14:37Z 3102997241 fido: "Good catch! You're right — 'single-threaded'..." ← duplicate
3103020955 20:14:49Z 3102997241 fido: "Fixed in e319c94..."

What actually happened

Two reviewer comments arrived ~35s apart and both triggered legitimate webhook handlers simultaneously at 20:11:31 (tids 954503360 and 937717952). Nothing wrong with that — one handler per comment.

The duplicate is on a single comment: 3102997241 received both 3103017940 and 3103019970 with essentially the same generated body. One came from the webhook's reply path, the other from the worker running a thread task for the same comment. Same class as #566#662 ("duplicate replies to review-thread comments"), but the _replied_comments / RepliedComments claim filter missed it here.

Possible holes in the claim system:

  1. Race window: webhook handler for comment A posts its reply after the worker has already picked the thread task for A, so the worker's "filter threads by webhook-claimed ids" check ran before the claim existed.
  2. Cross-handler races: one of the two simultaneous handlers (for different comments) could have been serializing in a way that let the worker interleave before the claim landed.
  3. Claim-key mismatch: the webhook-side claim key doesn't match what the worker's handle_threads filter queries (e.g. review-comment vs. review-thread id, in_reply_to_id vs id). Worth auditing the exact ids stored/compared.

Proposed fix

Treat the claim as a pre-reply barrier, not a post-reply annotation:

  • Webhook: RepliedComments.claim(comment_id) acquires before the Opus reply generation begins and is visible to the worker before the reply is posted, not after.
  • Worker: handle_threads already filters by the claim set; add a tighter check right before the reply HTTP POST that re-reads the claim and aborts if a claim was added while Opus was generating.
  • Log when the worker-side dedupe fires (skipping thread N — claimed by webhook) so we can measure in production.

Also add an integration-shape test that runs webhook-then-worker on the same comment id with Opus mocked, asserting exactly one reply posted.

Related

Sibling bug #671 — same class on the issue-comment reply path (different endpoint, different claim?). Fix should cover both paths, or we'll keep playing whack-a-mole.

Sub-issue of

Bugs work order: #41 (at 100-cap; stands alone under v1 if link fails).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions