[api] AIN-300 W1: write-path linkage + 429 retry + 0031 CHECK + 0032 init-plan#87
Conversation
… constraint + RLS init-plan W1/9 SHIP-NOW. Kills AIN-300 orphan bug + 429 backoff/failover + new CHECK constraint guards future regressions + clears 16 perf WARNs from 0029. routing.py: - _chat_with_429_retry helper (3 attempts, 0.5/2/8s, 429-only) - dispatch_inference accepts optional inference_id kwarg routing_brain.py: - Pre-allocate candidate_inference_id per fallover attempt - Track last_inference_id; link in 4xx/5xx-exhausted terminal branches - 429 (after in-adapter retry exhaust) → failover like 5xx - Cap/Funds/Inactive use decision_rule_override='failed_pre_dispatch' routing_outcomes.py: - complete_decision gains decision_rule_override kwarg alembic 0031: outcome_requires_inference CHECK constraint alembic 0032: init-plan optimization + ENABLE RLS on _repair_ table tests/unit/test_routing_429_retry.py: 6 tests, all pass Validation: - pre-commit (ruff + ruff format + mypy --strict + pytest -x): passed - offline upgrade 0030→0032: 10,868 bytes - offline downgrade 0032→0030: 9,833 bytes Refs: AIN-300 · AIN-295 · AIN-298 · Disc #12 preserved on scoring/candidate-set Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AIN-300 🔴 [DB/Gateway] routing_outcomes orphaned (NULL inference_id) on provider error — write-path linkage + Mistral 429 failover
🔴 Live data-integrity bug — routing_outcomes written without inference_id linkageFound 2026-05-28 during founder-directed DB save-error investigation on prod ( Symptom (DATA repaired — root cause still open)
Root cause (hypothesis — confirm in code)The gateway write path appears to be: (1) create/persist the Two defects, one cause:
Required fix (api repo —
|
…edicate
PG CHECK constraints don't support DEFERRABLE/DEFERRED (only FK/UNIQUE
/PK/EXCLUDE do). The two-phase write (insert_decision creates the row
with decision_rule='cheapest_clearing_floor' + inference_id=NULL,
complete_decision links inference_id after dispatch) has a transient
moment that the per-statement check would reject.
Predicate now allows outcome_status IS NULL as the third escape clause:
CHECK (
outcome_status IS NULL
OR decision_rule <> 'cheapest_clearing_floor'
OR inference_id IS NOT NULL
)
Once complete_decision sets outcome_status (always non-NULL on every
terminal branch — succeeded/failed_other/failed_provider_error/rejected*),
the constraint REQUIRES either decision_rule rewritten via
decision_rule_override OR inference_id linked. Which IS the AIN-300 W1
invariant.
Integration tests now pass (the failing tests were inserting via the
two-phase pattern and hitting the per-statement check).
Refs: AIN-300
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.
Reviewed by Cursor Bugbot for commit d7b41b9. Configure here.
| db, | ||
| outcome_id=outcome_id, | ||
| outcome_status="failed_provider_error", | ||
| inference_id=last_inference_id, |
There was a problem hiding this comment.
All-ModelUnavailable exhaustion violates new CHECK constraint
High Severity
When every candidate in the fallback loop fails with ModelUnavailableError, last_inference_id remains None (it's only set in the ProviderError handler). The all-exhausted path (section 7) then calls complete_decision with inference_id=None and no decision_rule_override. Since complete_decision skips setting inference_id when it's None, the outcome row ends up with outcome_status='failed_provider_error', decision_rule='cheapest_clearing_floor', and inference_id=NULL — violating the new outcome_requires_inference_when_model_chosen CHECK constraint from migration 0031. This causes a database IntegrityError instead of a clean AllCandidatesFailedError.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit d7b41b9. Configure here.


Summary (W1/9 — ship-now)
Kills the AIN-300 orphan write-path bug + adds the recurrence-guard CHECK constraint + 429 backoff/failover + closes the 16 perf WARNs the 0029 RLS rollout introduced.
What lands
Code
ainfera_api/services/routing.py_chat_with_429_retryhelper (3-try, 0.5/2/8s, 429-only);dispatch_inferenceaccepts optionalinference_idainfera_api/services/routing_brain.pycandidate_inference_idper fallover; tracklast_inference_id; link in 4xx/5xx-exhausted; 429 → failover like 5xx;decision_rule_override='failed_pre_dispatch'for Cap/Funds/Inactiveainfera_api/services/routing_outcomes.pycomplete_decisionacceptsdecision_rule_overrideMigrations
alembic/versions/20260528_0031_outcome_requires_inference_check.pydecision_rule <> 'cheapest_clearing_floor' OR inference_id IS NOT NULL) NOT VALID + VALIDATEalembic/versions/20260528_0032_rls_initplan_optimization.py(SELECT auth.jwt() ...)wrapping. ENABLE RLS on_repair_20260528_save_error(clears the prod ERROR; table not dropped)Tests
6 unit tests in
tests/unit/test_routing_429_retry.py(all pass in 0.36s).Disc #12 invariants
Validation
Deploy plan (after merge)
Refs
AIN-300 · AIN-295 · AIN-298 · Disc #12
🤖 Generated with Claude Code
Note
High Risk
Touches core inference dispatch, ledger-adjacent routing outcomes, and production migrations that must deploy after the write-path fix; mis-ordering or constraint validation failure can block deploys or break routed inference completion.
Overview
AIN-300 W1 fixes §16
routing_outcomes↔inferenceslinkage and hardens provider dispatch resilience without changing routing scores or candidate-set logic (Disc #12).Dispatch & failover:
dispatch_inferencecan take a caller-suppliedinference_id, and provider calls go through_chat_with_429_retry(up to 3 attempts on 429 only, with 0.5s / 2s backoff).dispatch_with_brainpre-allocates aninference_idper fallback candidate, links it on success, terminal 4xx (non-429), and when all candidates fail after 5xx or exhausted 429; exhausted 429 fails over like 5xx. Pre-dispatch terminal errors (caps, funds, inactive agent) setdecision_rule_override='failed_pre_dispatch'viacomplete_decision.Database: Migration 0031 adds
outcome_requires_inference_when_model_chosen(allows in-flight rows viaoutcome_status IS NULL, then requiresinference_idwhendecision_rule = 'cheapest_clearing_floor'). 0032 recreates Supabase RLS policies with init-plan(SELECT auth.jwt() …)and enables RLS on_repair_20260528_save_error.Tests: Six unit tests cover the 429 retry helper.
Reviewed by Cursor Bugbot for commit d7b41b9. Bugbot is set up for automated code reviews on this repo. Configure here.