Skip to content

fix(annealing): repair promotion source assignment + autonomous approval#393

Merged
aaronsb merged 1 commit into
mainfrom
fix/annealing-proposal-golden-path
May 20, 2026
Merged

fix(annealing): repair promotion source assignment + autonomous approval#393
aaronsb merged 1 commit into
mainfrom
fix/annealing-proposal-golden-path

Conversation

@aaronsb
Copy link
Copy Markdown
Owner

@aaronsb aaronsb commented May 20, 2026

Summary

Goal: get annealing proposals to behave as advertised, in their most basic golden-path mode. An investigation of the annealing agent surfaced three issues — two are real defects (fixed here), one is correct behaviour as-is.

Issue 1 — promotions created empty ontologies 🔴 fixed

get_first_order_source_ids() ran a Cypher query that combined a bound variable with an aggregate in a single WITH:

WITH neighbor_sources + collect(DISTINCT s2.source_id) as all_sources

Legal in Neo4j; rejected by Apache AGE"neighbor_sources" must be either part of an explicitly listed key or used inside an aggregate function. The try/except swallowed it, the function returned [], and every execute_promotion() reassigned zero sources — creating Ontology nodes with no concepts.

Runtime logs confirmed it fired for all four recent promotions:

ERROR get_first_order_source_ids ... Failed ... "neighbor_sources" must be ...

And the graph proved the effect: all 7 Source nodes still SCOPED_BY primordial, the 4 promoted ontologies empty.

Fix: split into two fixed-length queries (the anchor's own sources + its neighbors' sources), unioned in Python. Verified against the live graph — a known anchor now returns 4 primordial-scoped sources where the old query threw.

Issue 2 — autonomous mode still had a raceable "pending" window 🟠 fixed

Auto-approval ran as a post-cycle batch ~30s after proposals were written (each proposal needs an LLM evaluation). During that window proposals sat pending in the UI and a human could approve one first — observed in the logs: one proposal executed triggered_by=admin, the rest triggered_by=annealing_worker, plus a Proposal N not pending — skipping auto-approve warning.

Fix: in autonomous mode the manager now stores proposals already approved (reviewed_by=annealing_worker) — there is no pending window to race. The worker's post-cycle step is dispatch-only (_dispatch_approved_proposals), guarded on status='approved' so a re-run can't double-dispatch. The proposal still passes through the same approvedproposal_execution job lifecycle the manual review endpoint uses. hitl mode is unchanged — proposals are born pending for human review.

Issue 3 — failed demotions ("Ontology no longer exists") 🟡 no change, by design

When a proposal targets an ontology that was already removed (by a sibling proposal or manual edits), execute_demotion() fails gracefully with a clear error and status=failed. That is correct — a real reviewer would not approve a proposal against a missing target, and with autonomous approval working there is no drift to reconcile. No state-reconciliation layer added.

Changes

  • api/app/lib/age_client/ontology_edges.py — AGE-safe two-query rewrite of get_first_order_source_ids
  • api/app/services/annealing_manager.pyautomation_level param; _store_proposal born-approved in autonomous mode
  • api/app/workers/annealing_worker.py — pass automation_level to the manager; _auto_approve_and_dispatch → dispatch-only _dispatch_approved_proposals
  • tests/unit/services/test_annealing_manager.py — born-approved / born-pending tests

Verification

  • 465 unit tests pass (incl. 2 new).
  • Issue 1's fixed query run directly against the live AGE graph — returns sources where the old form errored.
  • Backend-only — the web admin tab reflects worker/DB state, no UI change needed.

End-to-end (trigger a real annealing cycle and watch a promotion populate an ontology) is worth doing post-merge — it mutates the graph, so left for an explicit run.

Two defects kept annealing proposals from behaving as advertised in
their basic golden path. Investigation traced both to confirmed root
causes with runtime evidence.

1. Promotions created empty ontologies.
   get_first_order_source_ids() ran a Cypher query that combined a
   bound variable with an aggregate in one WITH
   (`neighbor_sources + collect(...)`) — legal in Neo4j, rejected by
   Apache AGE ("must be either part of an explicitly listed key or
   used inside an aggregate function"). The exception was swallowed,
   the function returned [], and every execute_promotion() reassigned
   zero sources — minting Ontology nodes with no concepts. Split into
   two fixed-length queries (the anchor's own sources + its neighbors'
   sources) unioned in Python. Verified against the live graph: a
   known anchor now returns 4 primordial-scoped sources where the old
   query threw.

2. Autonomous mode still had a human-raceable 'pending' window.
   Auto-approval ran as a post-cycle batch ~30s after proposals were
   written, so an operator could click Approve first (observed:
   one proposal executed triggered_by=admin, the rest by the worker).
   In autonomous mode the manager now stores proposals already
   'approved' (reviewed_by=annealing_worker) — no pending window
   exists. The worker's post-cycle step is dispatch-only
   (_dispatch_approved_proposals), guarded on status='approved' so a
   re-run cannot double-dispatch. hitl mode is unchanged: proposals
   are born 'pending' for human review.

Not changed (by design): a proposal approved against a now-missing
ontology still fails gracefully with a clear error. That is correct
behaviour — a real reviewer would not approve a missing target — so
no state-reconciliation layer is warranted.

Files:
- api/app/lib/age_client/ontology_edges.py — AGE-safe source query
- api/app/services/annealing_manager.py — automation_level, born-approved
- api/app/workers/annealing_worker.py — dispatch-only post-cycle step
- tests/unit/services/test_annealing_manager.py — born-approved/pending tests

Tests: 465 unit passed (incl. 2 new); promotion query verified live.
@aaronsb aaronsb added ontology Ontology lifecycle, annealing, promotion/demotion, librarian bug Something isn't working labels May 20, 2026
@aaronsb aaronsb merged commit d4390d2 into main May 20, 2026
3 checks passed
@aaronsb aaronsb deleted the fix/annealing-proposal-golden-path branch May 20, 2026 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working ontology Ontology lifecycle, annealing, promotion/demotion, librarian

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant