Skip to content

Judge routing: needs_review_on_completion + judge field relationship undocumented; silent footgun #16

@vilosource

Description

@vilosource

Summary

Tasks with needs_review_on_completion=True and judge=False transition correctly to pending_completion_review on executor completion, but are then invisible to fleet-wide judge agents because /v2/reviews/pending/ (PendingReviewsView) filters on judge=True (src/reviews/views.py:99). If no project-scoped reviewer polls, the task is stuck.

This is not strictly a bug — judge=True is the explicit opt-in for fleet-wide visibility — but the two fields' relationship is undocumented and the failure mode is silent.

Reproducer (observed 2026-05-23 on vafi-dev)

A Pass-2 evaluation-loop task (PGieeQr_s9XkcdLsbIwYn) was created with the canonical Pass-2 recipe (carried in the vafi workspace cheat-sheet for weeks):

Task.objects.create(
    ...,
    required_tags=['claude'],
    needs_review_on_completion=True,
    isolation='sequential',
    status='draft',
)
# judge defaults to False

Executor delivered cleanly, task entered pending_completion_review at ~06:03 UTC. The judge agent (tags=['judge']) polled /v2/reviews/pending/ every ~30s and got 200 OK + items=[] consistently. The judge agent logs were healthy; the task was in the right state; no review was ever picked up.

After ~22 min stuck, manual diagnosis: t.judge = True; t.save() → judge picked up the task within one poll cycle → verdict landed.

Why this matters now

The substrate fix in vafi#36 (kb gotcha XsPemtnm — controller note-400 resilience) unblocked the Pass-2 Phase-1 first-real-run, which is what surfaced this. Without judge=True, the substrate fix works but the loop is still blocked at the next stage.

Field semantics today (from reading the model + endpoint)

  • needs_review_on_completion: BooleanField(default=True) — controls whether doing → pending_completion_review or doing → done on executor completion (src/tasks/state_machine.py:248).
  • judge: BooleanField(default=False) — controls whether the task is visible to fleet-wide judge agents via /v2/reviews/pending/ (src/reviews/views.py:99).

These are orthogonal in the schema, but coupled in practice: a task that needs a review and is intended to be reviewed by a fleet-wide judge needs BOTH set. A task that needs a review by a project member can have judge=False (membership-scoped pickup via /v1/tasks/?status=pending_completion_review would work — except that scoping is exactly what vtaskforge#6 fixed for judge-role agents, leaving membership-scoped review as the fallback for non-judge reviewers).

Proposed direction (three options, prefer 1)

  1. Document the relationship + add a soft guard. Add a clear docstring on both fields (mention the other in each); optionally add a clean() method on Task that emits a warning (or GuardViolation on todo entry) when needs_review_on_completion=True with no judge=True AND no project-scoped reviewer member exists. Cheapest, most conservative; keeps the schema orthogonal.

  2. Make the endpoint UNION-aware. PendingReviewsView.get_queryset() returns Q(judge=True) | Q(needs_review_on_completion=True, ...). Risk: judges then see tasks not intended for them (an executor-team task pending project-member review). Could be filtered by another label, but adds policy where there was none.

  3. Promote needs_review_on_completion semantics. When needs_review_on_completion=True AND the project has registered fleet-judge agents, auto-set judge=True. Magic, fragile, breaks the principle that flags don't mutate themselves.

Workaround in use today

Vafi workspace cheat-sheet (handoff 2026-05-23) corrected to set judge=True explicitly in the task-creation recipe. Three Pass-2 evaluation-loop tasks have since been created with this recipe and all transitioned cleanly. kb gotcha uhUSfjkp recorded.

References

  • src/reviews/views.py:74-99 (PendingReviewsView)
  • src/tasks/models.py:91 (judge = BooleanField(default=False))
  • src/tasks/models.py:57 (needs_review_on_completion = BooleanField(default=True))
  • vafi#36 (substrate fix that unblocked this scenario)
  • vtaskforge#6 (the prior silent-fail incident that motivated /v2/reviews/pending/)
  • vafi kb gotcha uhUSfjkp (2026-05-23)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions