Skip to content

Thread auto-resolve races with late-arriving webhook task creation (fido resolves before follow-up task is queued) #520

@FidoCanCode

Description

@FidoCanCode

Symptom

Fido resolves a review thread as soon as its pending tasks are all complete — but late-arriving webhook triage can create a new thread task moments later from a reviewer comment that was already in the thread. Result: resolved thread + pending task. Human sees fido saying "Task complete" and closing the loop, then fido starts working on the same thread again as if it were a new item.

Concrete repro — rhencke/orly PR #3

Thread on discussion_r3083222125:

```
00:12:28 rhencke: "Missing caching"
00:13:09 fido: (clarification question)
00:13:26 fido: (clarification question — double-reply, tracked in #518)
00:13:53 rhencke: "Ocaml and Coq both. Look at setup-cache and the OCaml installing action."
00:13:27 [orly] task: "What should have caching in this PR?" ← task from first comment
00:14:14 [orly] claude> Great! The reviewer clarified: cache both OCaml and Coq…
00:14:18 [orly] claude> Now I'll add caching to the CI workflow.
00:14:41 [orly] claude> Let me post a reply to the thread
00:14:58 [orly] task completed (id=…0374): "What should have caching in this PR?"
00:15:02 [orly] resolved thread PRRT_kwDOSC2Wec56-nhd ← PREMATURE
00:15:13 [orly] (new thread task created: "Add caching for CI workflow dependencies")
00:15:22 [orly] task: Add caching for CI workflow dependencies
```

Fido:

  • Resolved the thread at 00:15:02, satisfied that the task he was working on was done.
  • 11 seconds later, the rescope/triage for rhencke's clarification comment finally queued a new task — for the same thread.
  • Worker picks up the new task at 00:15:22 and starts redoing the caching work on a now-resolved thread.

Root cause

Thread auto-resolve runs on the worker side as soon as its pending-task count for the thread hits zero. That check is purely against `tasks.json`. It doesn't know whether:

  • any webhook handler is currently in the middle of creating a new task for this thread, or
  • there's an unacknowledged reviewer comment after fido's last reply in the thread.

Either signal would prevent premature resolve.

Fix direction

Two complementary rules before resolving a thread:

  1. Freshness check against GitHub: fetch the thread's comments (we already do this for edit-last-reply — see Thread task titles pull from reply body instead of root review comment #470). If the latest comment author is the reviewer and its ID is later than fido's most recent reply, don't resolve — there's something unacknowledged.

  2. Debounce after recent webhook activity: if kennel received a `pull_request_review_comment` for this thread in the last N seconds (say 30s), defer the resolve. Catches the in-flight triage case.

#1 is cheaper and more principled — the thread state on GitHub is the source of truth for "is there an unacknowledged reviewer comment?" #2 is a belt-and-suspenders for cases where the comment came but triage hasn't even started yet.

Start with #1.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions