Summary
Tasks with needs_review_on_completion=True and judge=False transition correctly to pending_completion_review on executor completion, but are then invisible to fleet-wide judge agents because /v2/reviews/pending/ (PendingReviewsView) filters on judge=True (src/reviews/views.py:99). If no project-scoped reviewer polls, the task is stuck.
This is not strictly a bug — judge=True is the explicit opt-in for fleet-wide visibility — but the two fields' relationship is undocumented and the failure mode is silent.
Reproducer (observed 2026-05-23 on vafi-dev)
A Pass-2 evaluation-loop task (PGieeQr_s9XkcdLsbIwYn) was created with the canonical Pass-2 recipe (carried in the vafi workspace cheat-sheet for weeks):
Task.objects.create(
...,
required_tags=['claude'],
needs_review_on_completion=True,
isolation='sequential',
status='draft',
)
# judge defaults to False
Executor delivered cleanly, task entered pending_completion_review at ~06:03 UTC. The judge agent (tags=['judge']) polled /v2/reviews/pending/ every ~30s and got 200 OK + items=[] consistently. The judge agent logs were healthy; the task was in the right state; no review was ever picked up.
After ~22 min stuck, manual diagnosis: t.judge = True; t.save() → judge picked up the task within one poll cycle → verdict landed.
Why this matters now
The substrate fix in vafi#36 (kb gotcha XsPemtnm — controller note-400 resilience) unblocked the Pass-2 Phase-1 first-real-run, which is what surfaced this. Without judge=True, the substrate fix works but the loop is still blocked at the next stage.
Field semantics today (from reading the model + endpoint)
needs_review_on_completion: BooleanField(default=True) — controls whether doing → pending_completion_review or doing → done on executor completion (src/tasks/state_machine.py:248).
judge: BooleanField(default=False) — controls whether the task is visible to fleet-wide judge agents via /v2/reviews/pending/ (src/reviews/views.py:99).
These are orthogonal in the schema, but coupled in practice: a task that needs a review and is intended to be reviewed by a fleet-wide judge needs BOTH set. A task that needs a review by a project member can have judge=False (membership-scoped pickup via /v1/tasks/?status=pending_completion_review would work — except that scoping is exactly what vtaskforge#6 fixed for judge-role agents, leaving membership-scoped review as the fallback for non-judge reviewers).
Proposed direction (three options, prefer 1)
-
Document the relationship + add a soft guard. Add a clear docstring on both fields (mention the other in each); optionally add a clean() method on Task that emits a warning (or GuardViolation on todo entry) when needs_review_on_completion=True with no judge=True AND no project-scoped reviewer member exists. Cheapest, most conservative; keeps the schema orthogonal.
-
Make the endpoint UNION-aware. PendingReviewsView.get_queryset() returns Q(judge=True) | Q(needs_review_on_completion=True, ...). Risk: judges then see tasks not intended for them (an executor-team task pending project-member review). Could be filtered by another label, but adds policy where there was none.
-
Promote needs_review_on_completion semantics. When needs_review_on_completion=True AND the project has registered fleet-judge agents, auto-set judge=True. Magic, fragile, breaks the principle that flags don't mutate themselves.
Workaround in use today
Vafi workspace cheat-sheet (handoff 2026-05-23) corrected to set judge=True explicitly in the task-creation recipe. Three Pass-2 evaluation-loop tasks have since been created with this recipe and all transitioned cleanly. kb gotcha uhUSfjkp recorded.
References
src/reviews/views.py:74-99 (PendingReviewsView)
src/tasks/models.py:91 (judge = BooleanField(default=False))
src/tasks/models.py:57 (needs_review_on_completion = BooleanField(default=True))
- vafi#36 (substrate fix that unblocked this scenario)
- vtaskforge#6 (the prior silent-fail incident that motivated
/v2/reviews/pending/)
- vafi kb gotcha
uhUSfjkp (2026-05-23)
Summary
Tasks with
needs_review_on_completion=Trueandjudge=Falsetransition correctly topending_completion_reviewon executor completion, but are then invisible to fleet-wide judge agents because/v2/reviews/pending/(PendingReviewsView) filters onjudge=True(src/reviews/views.py:99). If no project-scoped reviewer polls, the task is stuck.This is not strictly a bug —
judge=Trueis the explicit opt-in for fleet-wide visibility — but the two fields' relationship is undocumented and the failure mode is silent.Reproducer (observed 2026-05-23 on vafi-dev)
A Pass-2 evaluation-loop task (
PGieeQr_s9XkcdLsbIwYn) was created with the canonical Pass-2 recipe (carried in the vafi workspace cheat-sheet for weeks):Executor delivered cleanly, task entered
pending_completion_reviewat ~06:03 UTC. The judge agent (tags=['judge']) polled/v2/reviews/pending/every ~30s and got200 OK + items=[]consistently. The judge agent logs were healthy; the task was in the right state; no review was ever picked up.After ~22 min stuck, manual diagnosis:
t.judge = True; t.save()→ judge picked up the task within one poll cycle → verdict landed.Why this matters now
The substrate fix in vafi#36 (kb gotcha
XsPemtnm— controller note-400 resilience) unblocked the Pass-2 Phase-1 first-real-run, which is what surfaced this. Withoutjudge=True, the substrate fix works but the loop is still blocked at the next stage.Field semantics today (from reading the model + endpoint)
needs_review_on_completion: BooleanField(default=True)— controls whetherdoing → pending_completion_reviewordoing → doneon executor completion (src/tasks/state_machine.py:248).judge: BooleanField(default=False)— controls whether the task is visible to fleet-wide judge agents via/v2/reviews/pending/(src/reviews/views.py:99).These are orthogonal in the schema, but coupled in practice: a task that needs a review and is intended to be reviewed by a fleet-wide judge needs BOTH set. A task that needs a review by a project member can have
judge=False(membership-scoped pickup via/v1/tasks/?status=pending_completion_reviewwould work — except that scoping is exactly what vtaskforge#6 fixed for judge-role agents, leaving membership-scoped review as the fallback for non-judge reviewers).Proposed direction (three options, prefer 1)
Document the relationship + add a soft guard. Add a clear docstring on both fields (mention the other in each); optionally add a
clean()method onTaskthat emits a warning (orGuardViolationontodoentry) whenneeds_review_on_completion=Truewith nojudge=TrueAND no project-scoped reviewer member exists. Cheapest, most conservative; keeps the schema orthogonal.Make the endpoint UNION-aware.
PendingReviewsView.get_queryset()returnsQ(judge=True) | Q(needs_review_on_completion=True, ...). Risk: judges then see tasks not intended for them (an executor-team task pending project-member review). Could be filtered by another label, but adds policy where there was none.Promote
needs_review_on_completionsemantics. Whenneeds_review_on_completion=TrueAND the project has registered fleet-judge agents, auto-setjudge=True. Magic, fragile, breaks the principle that flags don't mutate themselves.Workaround in use today
Vafi workspace cheat-sheet (handoff 2026-05-23) corrected to set
judge=Trueexplicitly in the task-creation recipe. Three Pass-2 evaluation-loop tasks have since been created with this recipe and all transitioned cleanly. kb gotchauhUSfjkprecorded.References
src/reviews/views.py:74-99(PendingReviewsView)src/tasks/models.py:91(judge = BooleanField(default=False))src/tasks/models.py:57(needs_review_on_completion = BooleanField(default=True))/v2/reviews/pending/)uhUSfjkp(2026-05-23)