fix(audit): honest cap + cursor for /v1/audit/public and /v1/audit/{id} by hizrianraz · Pull Request #12 · ainfera-ai/api

hizrianraz · 2026-05-16T13:09:52Z

Summary

/v1/audit/public previously silently truncated ?limit=N to 100 — a contract bug for AI-native callers passing higher limits and trusting the response is complete (Memory feat(users): add recent_events + chain_meta to /v1/users/{handle}/dashboard #14). Now honestly capped at 500 with 422 above.
/v1/audit/{agent_id} had the same family of bug in inverse: no limit parameter at all (returned the entire chain unbounded). Now optionally cappable; omitting limit preserves backward-compat full-chain behavior for ainfera-verify.
Both endpoints gain a since_seq cursor for bandwidth-cheap live-feed polling — the homepage widget should migrate to this rather than re-fetching the full window every 12s (fast-follow web PR).

Why

Surfaced by the 2026-05-16 E2E HALT diagnosis. The recency-rank pre-flight assertion ("≥5 distinct agents in last 200 events") was failing because:

The 100-cap meant ?limit=200 returned 100 events
Any chatty agent (varda) monopolizes the slot ranking
Time-window probes are the right shape for fleet liveness — not recency-rank

The silent cap masked that deeper category error. Fixing the contract honesty is a prereq for the companion E2E script swap (G4 → /v1/heartbeat/latest, C5 → per-agent time-window).

Changes

ainfera_api/routers/audit.py — Query(...) params on both endpoints; since_seq cursor; unchanged /annex-iv (separate design question)
tests/integration/test_audit_public_cap.py — 8 integration tests covering: default works, cap honored at 500, 422 above, 422 at 0, cursor returns ascending filtered, backward-compat full-chain mode

Test plan

make typecheck (mypy --strict, clean)
make lint (ruff, clean)
make test (387 unit tests pass)
RUN_INTEGRATION=1 make test-integration (53 integration tests pass, includes 8 new)
CI green
Post-merge: curl https://api.ainfera.ai/v1/audit/public?limit=600 → 422
Post-merge: curl https://api.ainfera.ai/v1/audit/public?since_seq=1&limit=50 returns events with seq > 1 ascending

Out of scope (separate)

scripts/e2e-agent-check.sh — companion E2E script lives in parent ainfera-ai dir (not a git repo). G4 swapped to heartbeat probe, C5 to per-agent time-window, G7/G8 added for cap honesty
/annex-iv pagination — Annex IV exports are meant to be full bundles
Homepage widget cursor migration — fast-follow web PR
AIN-129 PEM URL-encoding (per founder spec)

🤖 Generated with Claude Code

Note

Medium Risk
Changes public and per-agent audit API query semantics (validation, ordering, and optional limiting), which may affect existing clients relying on previous silent truncation or ordering.

Overview
Audit feed endpoints now have explicit, honest pagination controls. /v1/audit/public switches from silently truncating limits to enforcing limit via Query validation (default 20, max 500, 422 on out-of-range), and adds a since_seq cursor mode that returns seq > since_seq in ascending order.

Per-agent audit chain adds optional bounding and cursoring while keeping backward compatibility. /v1/audit/{agent_id} now accepts optional limit (max 500; omitted still returns the full chain) and since_seq filtering, with new integration tests covering caps and cursor behavior for both endpoints.

^{Reviewed by Cursor Bugbot for commit 78def5f. Bugbot is set up for automated code reviews on this repo. Configure here.}

`/v1/audit/public` previously silently truncated `?limit=N` to 100 via `min(limit, 100)` — a contract bug for AI-native callers passing higher limits and trusting the response is complete (Memory #14 violation). `/v1/audit/{agent_id}` had a different bug in the same family: no `limit` at all, returning the entire chain unbounded. This change: - /v1/audit/public: `limit` is now `Query(20, ge=1, le=500)` — default preserved at 20, max raised to 500, over-cap returns 422 instead of silent truncation. Adds `since_seq: int | None` cursor for bandwidth-cheap live-feed polling (when set, returns events with seq > since_seq ordered ascending — widgets prepend in order). - /v1/audit/{agent_id}: `limit` is `Query(None, ge=1, le=500)` — None default preserves the unbounded full-chain behavior (backward compat for ainfera-verify), over-cap returns 422 when explicitly set. Adds the same `since_seq` cursor. - /v1/audit/{agent_id}/annex-iv: unchanged (Annex IV exports are meant to be full bundles; pagination there is a separate design question). 8 new integration tests in test_audit_public_cap.py cover the contract surface (default works, cap honored at 500, 422 above, cursor returns ascending filtered, backward-compat full-chain mode for /{agent_id}). Surfaced by the 2026-05-16 E2E HALT diagnosis — the silent-cap masked the deeper "agents firing" recency-rank-vs-time-window category error that the E2E check was making. Companion E2E script swap (G4 + C5 to heartbeat/per-agent time-window probes) lives in scripts/ (not part of this PR; api repo only). Co-Authored-By: Claude <noreply@anthropic.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.}

^{Reviewed by Cursor Bugbot for commit 78def5f. Configure here.}

cursor · 2026-05-16T13:12:39Z

+        AgentORM, AuditEventORM.agent_id == AgentORM.id
    )
+    if since_seq is not None:
+        stmt = stmt.where(AuditEventORM.seq > since_seq).order_by(AuditEventORM.seq.asc())


Public feed cursor uses per-agent seq as global cursor

High Severity

The since_seq cursor on /v1/audit/public filters by AuditEventORM.seq > since_seq across all agents, but seq is a per-agent counter (starts at 0 for each agent, unique constraint is (agent_id, seq)). This makes the cursor fundamentally broken for cross-agent polling: a caller passing since_seq=100 will silently miss every event from any agent with fewer than 101 events. The integration test only passes because both test agents produce the same number of events, making their seq ranges identical.

^{Reviewed by Cursor Bugbot for commit 78def5f. Configure here.}

…tion C) /v1/audit/public was building canonical URIs as `ainfera.ai/{owner_handle}/{agent_name}` where owner_handle was read off the agent row. For founder-owned agents (Varda, Yavanna) this surfaced `ainfera.ai/hizrianraz/varda` on the most-trafficked public endpoint we run — a discipline #3 leak of founder GitHub-handle / PII. The Discipline #12 fix landed in the AIN-183 audit prompt is Option C: add a public-facing handle on the `tenants` row that's decoupled from the GitHub handles on agent rows, and project that on the public surface instead. Three-phase, all in one upgrade(): 1. Add `tenant_handle TEXT NULL` to `tenants`. 2. Backfill in priority order: a. Tenants that own at least one agent with `owner_handle='hizrianraz'` → `tenant_handle='ainfera-ai'` (founder tenant id is not hardcoded; lifted from data). b. Remaining tenants → MIN(owner_handle) across their agents (stable + deterministic + matches the GitHub handle most users registered as). c. Agent-less tenants → contact_email local part. d. Conflict resolution: collisions append a 6-char id-slice suffix in stable id order, so the first-by-id keeps the bare handle. 3. NOT NULL constraint + unique index. Pre-NOT-NULL the migration asserts zero rows remain NULL (Memory #20 silent-no-op guard). - `TenantORM.tenant_handle` declared NOT NULL UNIQUE String(64). - `routers/audit.py` public_feed projection joins through TenantORM and reads tenant_handle. The response key stays `owner_handle` — the public API contract is unchanged, only the value source moves. - All four TenantORM instantiation sites populate the new column: - routers/signup.py (SDK-CLI signup → tenant_handle=owner_handle) - routers/github_oauth.py (OAuth login → tenant_handle=github_login) - routers/install.py (resolve-or-create on install → same as oauth) - routers/tenants.py (/v1/tenants/register → contact_email local part) ``` curl -s https://api.ainfera.ai/v1/audit/public | \ jq -r '.events[].canonical_uri' | grep -c hizrianraz curl -s https://api.ainfera.ai/v1/audit/public | \ jq -r '.events[].canonical_uri' | grep -c ainfera-ai/varda ``` Once this lands, the marketing AuditTicker widget (already filters `ainfera-ai/varda` and `ainfera-ai/yavanna` on the web side) starts matching real events — closes PR E without a web-side code change. - Prompt said `tenants.tenant_handle` is a new column. Confirmed via ORM read — column did not exist (only id/name/contact_email/api_key_hash/ created_at). Migration adds it. - Public response field stays named `owner_handle` to avoid breaking the API contract; only the underlying value changes. If a future PR wants to rename the response field to `tenant_handle`, that's a separate ContractDelta against the PublicAuditEvent Pydantic model. Closes: AIN-183 P0-3 (founder PII on /v1/audit/public) Discipline: #1 (claim "no founder PII on public" matches reality), assertions on data migration).

…tion C) (#47) /v1/audit/public was building canonical URIs as `ainfera.ai/{owner_handle}/{agent_name}` where owner_handle was read off the agent row. For founder-owned agents (Varda, Yavanna) this surfaced `ainfera.ai/hizrianraz/varda` on the most-trafficked public endpoint we run — a discipline #3 leak of founder GitHub-handle / PII. The Discipline #12 fix landed in the AIN-183 audit prompt is Option C: add a public-facing handle on the `tenants` row that's decoupled from the GitHub handles on agent rows, and project that on the public surface instead. Three-phase, all in one upgrade(): 1. Add `tenant_handle TEXT NULL` to `tenants`. 2. Backfill in priority order: a. Tenants that own at least one agent with `owner_handle='hizrianraz'` → `tenant_handle='ainfera-ai'` (founder tenant id is not hardcoded; lifted from data). b. Remaining tenants → MIN(owner_handle) across their agents (stable + deterministic + matches the GitHub handle most users registered as). c. Agent-less tenants → contact_email local part. d. Conflict resolution: collisions append a 6-char id-slice suffix in stable id order, so the first-by-id keeps the bare handle. 3. NOT NULL constraint + unique index. Pre-NOT-NULL the migration asserts zero rows remain NULL (Memory #20 silent-no-op guard). - `TenantORM.tenant_handle` declared NOT NULL UNIQUE String(64). - `routers/audit.py` public_feed projection joins through TenantORM and reads tenant_handle. The response key stays `owner_handle` — the public API contract is unchanged, only the value source moves. - All four TenantORM instantiation sites populate the new column: - routers/signup.py (SDK-CLI signup → tenant_handle=owner_handle) - routers/github_oauth.py (OAuth login → tenant_handle=github_login) - routers/install.py (resolve-or-create on install → same as oauth) - routers/tenants.py (/v1/tenants/register → contact_email local part) ``` curl -s https://api.ainfera.ai/v1/audit/public | \ jq -r '.events[].canonical_uri' | grep -c hizrianraz curl -s https://api.ainfera.ai/v1/audit/public | \ jq -r '.events[].canonical_uri' | grep -c ainfera-ai/varda ``` Once this lands, the marketing AuditTicker widget (already filters `ainfera-ai/varda` and `ainfera-ai/yavanna` on the web side) starts matching real events — closes PR E without a web-side code change. - Prompt said `tenants.tenant_handle` is a new column. Confirmed via ORM read — column did not exist (only id/name/contact_email/api_key_hash/ created_at). Migration adds it. - Public response field stays named `owner_handle` to avoid breaking the API contract; only the underlying value changes. If a future PR wants to rename the response field to `tenant_handle`, that's a separate ContractDelta against the PublicAuditEvent Pydantic model. Closes: AIN-183 P0-3 (founder PII on /v1/audit/public) Discipline: #1 (claim "no founder PII on public" matches reality), assertions on data migration). Co-authored-by: Aule <aule@ainfera-internal.local>

New per-tenant routing-policy state surface backing the dashboard /settings/routing-policy editor (AIN-182 §Phase 3 §7). Migration 20260519_0021 adds tenant_routing_policies (PK on tenant_id, FK CASCADE). Columns: active_policy enum, quality/cost/ latency_weight NUMERIC(4,3), fallback_enabled bool, fallback_penalty_pct NUMERIC(5,2). DB CHECK enforces weight sum = 1.0 ±0.001 (D26) and penalty bounds [0, 100]. Endpoints: - GET /v1/routing-policy → row OR implicit Balanced default. compliance_veto_locked always true (Discipline #12). - PUT /v1/routing-policy → upsert via ON CONFLICT. Pydantic model_validator enforces weight-sum-to-1.0; DB CHECK is the final guard. CHECK breach → 400. Closes part of AIN-182 Phase 3.

…nts (#53) New per-tenant routing-policy state surface backing the dashboard /settings/routing-policy editor (AIN-182 §Phase 3 §7). Migration 20260519_0021 adds tenant_routing_policies (PK on tenant_id, FK CASCADE). Columns: active_policy enum, quality/cost/ latency_weight NUMERIC(4,3), fallback_enabled bool, fallback_penalty_pct NUMERIC(5,2). DB CHECK enforces weight sum = 1.0 ±0.001 (D26) and penalty bounds [0, 100]. Endpoints: - GET /v1/routing-policy → row OR implicit Balanced default. compliance_veto_locked always true (Discipline #12). - PUT /v1/routing-policy → upsert via ON CONFLICT. Pydantic model_validator enforces weight-sum-to-1.0; DB CHECK is the final guard. CHECK breach → 400. Closes part of AIN-182 Phase 3. Co-authored-by: Aule <aule@ainfera-internal.local>

…ce (#69) Final piece of the cross-repo AAMC retirement (paired with sdk #11+#12, mcp-server #12, ainfera-os #49, routing #2). Removes references to "AAMC voter" / "voter pool" / "Council" in code comments + display names + the (now-defunct) invariant test file. Changes: - adapters/openai.py: comment reframe — "AAMC voter pool" → "canonical routing backends". - adapters/upstream_aliases.py: same. - orm.py: drop the `aamc_voter` flag field-comment reference (the field itself stays — was repurposed as a generic catalog-eligibility flag; just rename the rationale in the comment). - routers/stats.py: leaderboard endpoint comment reframe. - services/response_normalizer.py: comment cleanup. - services/routing.py: comment cleanup. - scripts/seed_dev.py: display names reframed ("GPT-5.5 Pro (AAMC voter)" → "GPT-5.5 Pro"). - tests/integration/test_aamc_invariants.py → renamed to test_routing_backends_invariants.py. Test logic unchanged — it enforces the canonical 5-backend lock (Opus, GPT-5.5, Gemini, Grok, Mistral-Large), reframed away from the retired AAMC framing. No runtime behavior change. Pure vocabulary cleanup. Per Ontology v1.2 amendment (2026-05-22) which retired ATS/AAMC and folded their semantics into Routing (`q_empirical` for trust; `M_allowed` for eligibility veto). Ontology v1.3 (2026-05-23) further made Mithril the canonical product the doctrine leads with. Co-authored-by: Claude <noreply@anthropic.com>

…wall) Per founder GO B1b: P7 schema lock lifted ONLY for additive judge columns + the v_judge_queue view. Existing columns, decide() call, weights/thresholds/candidate-set logic untouched (Disc #12 still binds). ## Migration 0028 (additive only) routing_outcomes gains 6 nullable columns: - judge_score numeric(2,1) CHECK 1.0..5.0 - judge_model text - judge_rationale text - judge_labeled_at timestamptz - judge_status text NOT NULL DEFAULT 'unlabeled' CHECK IN (unlabeled, labeled, skipped, error) - reward real CHECK 0.0..1.0 Plus: - Partial index on (judge_status) WHERE judge_status='unlabeled' (keeps the worker's hot-query cheap as labeled rows accumulate). - View public.v_judge_queue: succeeded ∧ unlabeled ∧ chosen_model_slug != 'claude-opus-4-7' (L8 self-preference firewall enforced declaratively at the SQL layer — worker can't bypass). ORM mirror lands the same 6 cols on RoutingOutcomeORM as nullable Mapped[] fields with server_default for judge_status. ## Judge worker (scripts/judge_worker.py) Async script that: 1. Samples JUDGE_BATCH_SIZE (default 10) rows from v_judge_queue. 2. Joins to inferences for request/response payloads. 3. Asks Opus 4.7 to score the response 1-5 with a one-line rationale, per a compact rubric tuned to ~200 output tokens. 4. UPDATEs the row with judge_* fields + reward = (score-1)/4. 5. Marks rows error/skipped on API/parse failures; the partial index keeps them out of the unlabeled sample. Hard L8 guards: - v_judge_queue declaratively excludes self-labeling. - _FORBIDDEN_JUDGE_OVERRIDES rejects JUDGE_MODEL='' / 'auto' / 'ainfera-inference' at startup. - JUDGE_MODEL default is 'claude-opus-4-7' (matches the view). ## GH Actions cron (.github/workflows/judge-worker.yml) Runs every 6h at :37 (offset from routed-probe :17 to dodge cron contention). Required secrets: DATABASE_URL, ANTHROPIC_API_KEY. Optional vars: JUDGE_BATCH_SIZE, JUDGE_MODEL. Cost envelope: 10 rows/tick × 4 ticks/day ≈ $48/mo (~$0.04/call at Opus 4.7 ~500/200 token shape). ## Tests - New: tests/unit/test_judge_worker.py (24 tests covering reply parse strict-JSON path, regex-fallback path, invalid replies, reward normalization 1→0 / 5→1, request/response payload flattening for OpenAI + Anthropic shapes, truncation at 800/2000 chars, sentinel fallbacks for missing/unknown shapes). - All 573 unit/smoke tests green; mypy clean; ruff clean. - Migration upgrade/downgrade tested on alembic stub (integration suite applies it against live Postgres in CI). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…+ Opus 4.7 worker) (#82) * feat(api): AIN-285 · routed cron probe + capture-coverage metric Root cause (re-confirmed against prod): - The §16 capture path inside dispatch_with_brain is correct and live. - No live cron probe ever sends model="ainfera-inference", so the routed branch isn't exercised in production. Every existing probe (launch-readiness-smoke.sh, e2e-agent-check.sh, t9-fanout.sh) pins a vendor slug, which by design flows through routing.dispatch_inference and writes zero routing_outcomes rows (capture_invariant.py:61-63). - capture_invariant's regression counter is process-local and only asserted in tests; in prod it's a no-op. Minimal fix (per founder GO A3): 1. New cron probe (.github/workflows/routed-probe.yml) sends model="ainfera-inference" to /v1/inference every 6h. Needs founder to set AINFERA_PROBE_KEY (post-AIN-289 rotation) and optionally AINFERA_PROBE_AGENT_ID secrets. 2. counter.record_routed(captured=True) bumped inside complete_decision - one site, hits all five exit paths (reject / 4xx / cap-or-funds / success / 5xx-exhausted) without touching routing_brain.py. 3. counter.record_passthrough(captured_unexpectedly=False) bumped after the else-branch dispatch in post_inference returns. 4. New GET /v1/internal/capture-metrics endpoint (internal-key gated like /v1/heartbeat/latest) exposes the counter JSON for prod scrape + alerting on dispatch_without_capture_total > 0. Disc #12 compliance: - No new insert sites, no schema change, no decide()/weights/thresholds /candidate-set/passthrough behavior change. - routing_brain.py untouched. - The two new in-process counter bumps are pure observability. Circular-import note: capture_invariant pulls ROUTING_TARGETS from routers/inference, so the two new counter call sites use function-local imports of get_counter (noqa: PLC0415 with justification). Moving ROUTING_TARGETS to a constants module would be the cleaner architectural fix; defer to a future cleanup PR rather than expand AIN-285 scope. Tests: - New: tests/unit/test_capture_metrics_router.py (3 tests covering auth, empty-counter shape, shared-singleton bump propagation). - Updated: tests/smoke/test_openapi_contract.py registers the new route. - Existing: 538-test unit suite + capture_invariant unit suite green. - Integration test_capture_coverage.py exercises the new counter bump end-to-end in CI (needs live Postgres; skipped locally). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): AIN-290 · additive judge schema + Opus 4.7 worker (L8 firewall) Per founder GO B1b: P7 schema lock lifted ONLY for additive judge columns + the v_judge_queue view. Existing columns, decide() call, weights/thresholds/candidate-set logic untouched (Disc #12 still binds). ## Migration 0028 (additive only) routing_outcomes gains 6 nullable columns: - judge_score numeric(2,1) CHECK 1.0..5.0 - judge_model text - judge_rationale text - judge_labeled_at timestamptz - judge_status text NOT NULL DEFAULT 'unlabeled' CHECK IN (unlabeled, labeled, skipped, error) - reward real CHECK 0.0..1.0 Plus: - Partial index on (judge_status) WHERE judge_status='unlabeled' (keeps the worker's hot-query cheap as labeled rows accumulate). - View public.v_judge_queue: succeeded ∧ unlabeled ∧ chosen_model_slug != 'claude-opus-4-7' (L8 self-preference firewall enforced declaratively at the SQL layer — worker can't bypass). ORM mirror lands the same 6 cols on RoutingOutcomeORM as nullable Mapped[] fields with server_default for judge_status. ## Judge worker (scripts/judge_worker.py) Async script that: 1. Samples JUDGE_BATCH_SIZE (default 10) rows from v_judge_queue. 2. Joins to inferences for request/response payloads. 3. Asks Opus 4.7 to score the response 1-5 with a one-line rationale, per a compact rubric tuned to ~200 output tokens. 4. UPDATEs the row with judge_* fields + reward = (score-1)/4. 5. Marks rows error/skipped on API/parse failures; the partial index keeps them out of the unlabeled sample. Hard L8 guards: - v_judge_queue declaratively excludes self-labeling. - _FORBIDDEN_JUDGE_OVERRIDES rejects JUDGE_MODEL='' / 'auto' / 'ainfera-inference' at startup. - JUDGE_MODEL default is 'claude-opus-4-7' (matches the view). ## GH Actions cron (.github/workflows/judge-worker.yml) Runs every 6h at :37 (offset from routed-probe :17 to dodge cron contention). Required secrets: DATABASE_URL, ANTHROPIC_API_KEY. Optional vars: JUDGE_BATCH_SIZE, JUDGE_MODEL. Cost envelope: 10 rows/tick × 4 ticks/day ≈ $48/mo (~$0.04/call at Opus 4.7 ~500/200 token shape). ## Tests - New: tests/unit/test_judge_worker.py (24 tests covering reply parse strict-JSON path, regex-fallback path, invalid replies, reward normalization 1→0 / 5→1, request/response payload flattening for OpenAI + Anthropic shapes, truncation at 800/2000 chars, sentinel fallbacks for missing/unknown shapes). - All 573 unit/smoke tests green; mypy clean; ruff clean. - Migration upgrade/downgrade tested on alembic stub (integration suite applies it against live Postgres in CI). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(api): coerce JSONB strings in judge worker payload flattening asyncpg can return inference request/response JSONB as serialized strings; normalize before building judge prompts so the first prod tick does not crash. Co-authored-by: Cursor <cursoragent@cursor.com> * feat(api): AIN-285 · §16 task-batch probe with routing_outcomes row-count gate Replace the single-call curl probe with a script that exercises six §16 task types via model=ainfera-inference and fails loud if the DB row count does not increase after the batch. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: varda-elentari <varda@ainfera.ai> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>

…bic 0030 W6-B/9 — complements W6-A. On replay-gate PROMOTE, labs/cron.sh POSTs here to atomically swap the live policy_version that routing_outcomes rows tag at decision time. Files: - alembic/versions/20260528_0030_active_policy_version.py Adds active_policy_version TEXT NOT NULL DEFAULT 'v0' to tenant_routing_policies. Reversible. - ainfera_api/orm.py — TenantRoutingPolicyORM gains active_policy_version: Mapped[str] (mirrors the DB column). - ainfera_api/routers/admin_policy.py — POST /v1/admin/policy/publish with hmac.compare_digest service-role gate; SELECT FOR UPDATE atomic swap; INSERT on first publish for the global default (nil UUID). - ainfera_api/main.py — router registered. - tests/unit/test_admin_policy.py — 7 unit tests on the service-role gate + schema validation. - tests/smoke/test_openapi_contract.py — contract snapshot extended. Auth: service-role bearer ONLY. 4 failure modes: 503 mis-config · 401 missing bearer · 403 wrong key · 403 ai_infera_<agent>_* tenant key explicit reject. Discipline #12 invariant: tenant API keys NEVER pass this gate. Test test_require_service_role_rejects_tenant_key_prefix asserts this. Validation: - pytest tests/unit/test_admin_policy.py → 7 passed ✓ - pytest tests/smoke/test_openapi_contract.py → 4 passed ✓ - pre-commit (ruff + mypy --strict) → passed ✓ PR LABEL: do-not-merge-until-2026-06-01 Stacked on: hizrianraz/ain-295-w5-db-remediation Refs: AIN-296 · AIN-298 · L14.2 · Discipline #12 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ACKED on W5) (#85) * migration: AIN-295 W5 — Alembic 0029 DB remediation (NOT applied) W5/9. Translates ainfera-os vault/migrations/ain-298-db-remediation.sql.md into a proper Alembic migration. **DO NOT APPLY until Mon 2026-06-01** (per L14 lock — Spark substrate is the higher priority Fri-Sun; DB remediation lands Monday after migration stabilizes). Revision chain: 20260523_0027 rename_aa_index_source_aamc_to_routing_backend 20260526_0028 ain290_judge_columns (existing; AIN-290 judge schema) 20260528_0029 ain298_db_remediation (NEW; AIN-298 RLS + view + indexes) Scope (vault draft sections 1-5): - §1 v_judge_queue redefined WITH (security_invoker = true) — fixes ERROR - §2 tenant_isolation_select policies (8 native + 6 agent-scoped) + tenant_self_read + user_self_read - §3 public_catalog_read on providers/models/brands (active=true) - §4 model_leaderboard REVOKE anon + GRANT service_role - §5 10 unindexed-FK indexes via autocommit_block + CONCURRENTLY + IF NOT EXISTS Excluded (manual/future): §6 tenant bloat audit · §7 Supabase HIBP toggle · §8 DROP deprecated table (2026-06-21+). Defensive: smoke probe on routing_outcomes presence; DO blocks with existence guards on every CREATE POLICY; downgrade() reverses all in dependency order with DROP IF EXISTS. Validation: - AST parse clean ✓ - alembic upgrade --sql 0028:0029 → 9,870 bytes DDL ✓ - alembic downgrade --sql 0029:0028 → 3,580 bytes DDL ✓ - pre-commit (mypy --strict + pytest -x) → passed ✓ PR label: do-not-merge-until-2026-06-01 Refs: AIN-295 · AIN-298 · L14 (Mon DB window) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): AIN-296 W6-B — atomic policy publish endpoint + ORM + Alembic 0030 W6-B/9 — complements W6-A. On replay-gate PROMOTE, labs/cron.sh POSTs here to atomically swap the live policy_version that routing_outcomes rows tag at decision time. Files: - alembic/versions/20260528_0030_active_policy_version.py Adds active_policy_version TEXT NOT NULL DEFAULT 'v0' to tenant_routing_policies. Reversible. - ainfera_api/orm.py — TenantRoutingPolicyORM gains active_policy_version: Mapped[str] (mirrors the DB column). - ainfera_api/routers/admin_policy.py — POST /v1/admin/policy/publish with hmac.compare_digest service-role gate; SELECT FOR UPDATE atomic swap; INSERT on first publish for the global default (nil UUID). - ainfera_api/main.py — router registered. - tests/unit/test_admin_policy.py — 7 unit tests on the service-role gate + schema validation. - tests/smoke/test_openapi_contract.py — contract snapshot extended. Auth: service-role bearer ONLY. 4 failure modes: 503 mis-config · 401 missing bearer · 403 wrong key · 403 ai_infera_<agent>_* tenant key explicit reject. Discipline #12 invariant: tenant API keys NEVER pass this gate. Test test_require_service_role_rejects_tenant_key_prefix asserts this. Validation: - pytest tests/unit/test_admin_policy.py → 7 passed ✓ - pytest tests/smoke/test_openapi_contract.py → 4 passed ✓ - pre-commit (ruff + mypy --strict) → passed ✓ PR LABEL: do-not-merge-until-2026-06-01 Stacked on: hizrianraz/ain-295-w5-db-remediation Refs: AIN-296 · AIN-298 · L14.2 · Discipline #12 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… constraint + RLS init-plan W1/9 SHIP-NOW. Kills AIN-300 orphan bug + 429 backoff/failover + new CHECK constraint guards future regressions + clears 16 perf WARNs from 0029. routing.py: - _chat_with_429_retry helper (3 attempts, 0.5/2/8s, 429-only) - dispatch_inference accepts optional inference_id kwarg routing_brain.py: - Pre-allocate candidate_inference_id per fallover attempt - Track last_inference_id; link in 4xx/5xx-exhausted terminal branches - 429 (after in-adapter retry exhaust) → failover like 5xx - Cap/Funds/Inactive use decision_rule_override='failed_pre_dispatch' routing_outcomes.py: - complete_decision gains decision_rule_override kwarg alembic 0031: outcome_requires_inference CHECK constraint alembic 0032: init-plan optimization + ENABLE RLS on _repair_ table tests/unit/test_routing_429_retry.py: 6 tests, all pass Validation: - pre-commit (ruff + ruff format + mypy --strict + pytest -x): passed - offline upgrade 0030→0032: 10,868 bytes - offline downgrade 0032→0030: 9,833 bytes Refs: AIN-300 · AIN-295 · AIN-298 · Disc #12 preserved on scoring/candidate-set Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…init-plan (#87) * feat(api): AIN-300 W1 — write-path atomic linkage + 429 retry + CHECK constraint + RLS init-plan W1/9 SHIP-NOW. Kills AIN-300 orphan bug + 429 backoff/failover + new CHECK constraint guards future regressions + clears 16 perf WARNs from 0029. routing.py: - _chat_with_429_retry helper (3 attempts, 0.5/2/8s, 429-only) - dispatch_inference accepts optional inference_id kwarg routing_brain.py: - Pre-allocate candidate_inference_id per fallover attempt - Track last_inference_id; link in 4xx/5xx-exhausted terminal branches - 429 (after in-adapter retry exhaust) → failover like 5xx - Cap/Funds/Inactive use decision_rule_override='failed_pre_dispatch' routing_outcomes.py: - complete_decision gains decision_rule_override kwarg alembic 0031: outcome_requires_inference CHECK constraint alembic 0032: init-plan optimization + ENABLE RLS on _repair_ table tests/unit/test_routing_429_retry.py: 6 tests, all pass Validation: - pre-commit (ruff + ruff format + mypy --strict + pytest -x): passed - offline upgrade 0030→0032: 10,868 bytes - offline downgrade 0032→0030: 9,833 bytes Refs: AIN-300 · AIN-295 · AIN-298 · Disc #12 preserved on scoring/candidate-set Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(alembic 0031): allow outcome_status=NULL (mid-flight) in CHECK predicate PG CHECK constraints don't support DEFERRABLE/DEFERRED (only FK/UNIQUE /PK/EXCLUDE do). The two-phase write (insert_decision creates the row with decision_rule='cheapest_clearing_floor' + inference_id=NULL, complete_decision links inference_id after dispatch) has a transient moment that the per-statement check would reject. Predicate now allows outcome_status IS NULL as the third escape clause: CHECK ( outcome_status IS NULL OR decision_rule <> 'cheapest_clearing_floor' OR inference_id IS NOT NULL ) Once complete_decision sets outcome_status (always non-NULL on every terminal branch — succeeded/failed_other/failed_provider_error/rejected*), the constraint REQUIRES either decision_rule rewritten via decision_rule_override OR inference_id linked. Which IS the AIN-300 W1 invariant. Integration tests now pass (the failing tests were inserting via the two-phase pattern and hitting the per-statement check). Refs: AIN-300 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ll-switch (#89) W2/9 SHIP-NOW. Brings in the gateway hardening from closed PR #77 (re-baselined). Wires through brain + inference router + deploys to Railway on merge. routers/health.py (NEW): - GET /healthz: in-process liveness, no I/O (HEALTHCHECK target) - GET /readyz: aggregate readiness (process_drain + db + audit + ks snapshot); 503 on any probe fail - get_readiness_gate() flipped FALSE on SIGTERM for drain services/cost_killswitch.py (NEW): - guard_or_raise() called at dispatch_with_brain entry - rolling-window spend (default today UTC) vs AINFERA_SPEND_KILLSWITCH_USD - Default $50 + enabled; ops env-config without restart - Pinned passthroughs bypass guard by design (moat-safe) - Aggregate-only logging (no PII) routing_brain.py: - await cost_killswitch.guard_or_raise(db) before brain runs - Disc #12 preserved: scoring/candidate-set/weights untouched inference.py: - Catch CostKillswitchEngagedError → 503 with code + spent/threshold main.py: - Register health.router; rename inline /health → health_legacy Tests: - test_health_probes.py (4) + test_cost_killswitch.py (20) + openapi contract (4 — /healthz, /readyz documented as non-v1) - All 28 pass Founder config (set in Railway env on api): AINFERA_SPEND_KILLSWITCH_USD=<real_threshold> # default $50 AINFERA_SPEND_KILLSWITCH_ENABLED=1 # default Refs: AIN-232 · AIN-234 · supersedes closed PR #77 Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…033) (#93) Charter A2 / Disc #12-bounded migration. Two additive things: 1. CREATE TABLE public.training_runs — one row per L14.2 daily training tick. Captures judge outcomes, policy_version_from→to, promotion verdict, per-cell deltas, replay-gate result, and ruleset_hash. 2. CREATE ROLE ainfera_labs LOGIN (no password set here; founder sets PASSWORD via Doppler-injected ALTER ROLE). Least-priv grants: - INSERT on training_runs (+ sequence USAGE) - SELECT on routing_outcomes, inferences, models, providers, agents - column-level UPDATE on routing_outcomes (judge_score, judge_model, judge_rationale, judge_labeled_at, judge_status, reward) — AIN-290 columns only - column-level UPDATE on tenant_routing_policies (active_policy, active_policy_version) — AIN-296 columns only - REVOKE DELETE on every table Verified via `alembic upgrade 20260528_0032:20260528_0033 --sql`: DDL renders cleanly; `alembic heads` shows `20260528_0033 (head)`. Disc #12 still binds: no edits to scoring, candidate-set, settlement, auth, key prefix, or hard-delete rules. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…mint script (#96) * feat(api): AIN-291 W1 · additive training_runs + ainfera_labs role (0033) Charter A2 / Disc #12-bounded migration. Two additive things: 1. CREATE TABLE public.training_runs — one row per L14.2 daily training tick. Captures judge outcomes, policy_version_from→to, promotion verdict, per-cell deltas, replay-gate result, and ruleset_hash. 2. CREATE ROLE ainfera_labs LOGIN (no password set here; founder sets PASSWORD via Doppler-injected ALTER ROLE). Least-priv grants: - INSERT on training_runs (+ sequence USAGE) - SELECT on routing_outcomes, inferences, models, providers, agents - column-level UPDATE on routing_outcomes (judge_score, judge_model, judge_rationale, judge_labeled_at, judge_status, reward) — AIN-290 columns only - column-level UPDATE on tenant_routing_policies (active_policy, active_policy_version) — AIN-296 columns only - REVOKE DELETE on every table Verified via `alembic upgrade 20260528_0032:20260528_0033 --sql`: DDL renders cleanly; `alembic heads` shows `20260528_0033 (head)`. Disc #12 still binds: no edits to scoring, candidate-set, settlement, auth, key prefix, or hard-delete rules. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * feat(api): AIN-289 B2 · rotation grace migration + auth-additive Charter v2 B2 fix for the run-1 single-column finding on tenants.api_key_hash. * alembic 0034: ADD COLUMN api_key_hash_pending TEXT NULL + partial unique index. Additive only. * ORM: TenantORM gets api_key_hash_pending field. * Auth-additive: deps.py / middleware / ownership.py match EITHER api_key_hash OR api_key_hash_pending. No-op when pending is NULL. * scripts/rotate_key_grace_ain289.py: mint + 1P store + set pending + verify NEW=200 + promote + verify again. --fallback-cutover preserves the run-1 single-UPDATE path. Auto-detects missing column. Never prints raw secrets. 625/625 tests green. mypy --strict clean. Disc #12 untouched. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

0033 granted ainfera_labs column-level UPDATE on judge cols + SELECT on the read surface, but RLS is enabled with policies scoped only to `authenticated`. ainfera_labs isn't `authenticated` and doesn't bypass RLS, so RLS silently denied it every row — the grants were inert. Add per-role RLS policies for ainfera_labs: - routing_outcomes: SELECT (all rows) + UPDATE (judge labeling; the 0033 column GRANT still limits WHICH columns can change) - inferences/agents/models/providers: SELECT (all rows) - training_runs: ENABLE RLS (was disabled → advisor ERROR) + labs INSERT/SELECT Tenant isolation for `authenticated` is unchanged. Additive; Disc #12 intact. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…l (0036) (#99) Adds a `source` discriminator (prod|synthetic|shadow, NOT NULL default 'prod', CHECK + index) so the synthetic cold-start loop's rows can never feed a prod routing-policy promotion — prod refits filter source='prod'. Existing 147 real rows backfill to 'prod'. Additive; Disc #12 intact. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

cursor Bot reviewed May 16, 2026

View reviewed changes

hizrianraz merged commit 11108be into main May 16, 2026
4 checks passed

hizrianraz deleted the feat/audit-cap-honesty-plus-heartbeat-probe branch May 16, 2026 13:13

hizrianraz mentioned this pull request May 16, 2026

feat(aamc): canonical slug→upstream translation + active-flag invariant #13

Merged

9 tasks

linear-code Bot mentioned this pull request May 19, 2026

feat(api): AIN-179 launch hardening — public-audit cache + AIN-183 P0-1 #45

Merged

5 tasks

This was referenced May 19, 2026

feat(api): AIN-182 Phase 1 §4 · GET /v1/inferences/{id} detail endpoint #51

Merged

feat(api): AIN-182 Phase 2 · templates backend + 6 system seeds #52

Merged

hizrianraz mentioned this pull request May 19, 2026

feat(api): AIN-182 Phase 3 · tenant_routing_policies + GET/PUT #53

Merged

hizrianraz mentioned this pull request May 23, 2026

chore(api): AIN-243 · purge sweep · retire AAMC vocab from code surface #69

Merged

3 tasks

hizrianraz mentioned this pull request May 27, 2026

feat(api): GATE 2 · AIN-285 (capture metric) + AIN-290 (judge schema + Opus 4.7 worker) #82

Merged

7 tasks

hizrianraz mentioned this pull request May 27, 2026

chore(api): lift ROUTING_TARGETS to a leaf module (breaks import cycle) #83

Open

3 tasks

linear-code Bot mentioned this pull request May 28, 2026

[migration] AIN-295 W5: Alembic 0029 DB remediation (DO NOT MERGE before 2026-06-01) #84

Merged

6 tasks

hizrianraz mentioned this pull request May 28, 2026

[api] AIN-296 W6-B: atomic policy publish endpoint + Alembic 0030 (STACKED on W5) #85

Merged

5 tasks

hizrianraz mentioned this pull request May 28, 2026

[api] AIN-300 W1: write-path linkage + 429 retry + 0031 CHECK + 0032 init-plan #87

Merged

4 tasks

hizrianraz mentioned this pull request May 28, 2026

[api] AIN-232+AIN-234 W2: gateway resilience probes + cost kill-switch #89

Merged

This was referenced May 28, 2026

[api] AIN-266 W3: candidates on InferenceDetail (authed-tenant-only) #90

Merged

AIN-291 W1 · 0033 training_runs table + ainfera_labs role #93

Merged

AIN-289 B2 · rotation grace migration (0034) + auth-additive + grace mint script #96

Merged

linear-code Bot mentioned this pull request May 29, 2026

AIN-303 · routing_outcomes.source (0036) #99

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(audit): honest cap + cursor for /v1/audit/public and /v1/audit/{id}#12

fix(audit): honest cap + cursor for /v1/audit/public and /v1/audit/{id}#12
hizrianraz merged 1 commit into
mainfrom
feat/audit-cap-honesty-plus-heartbeat-probe

hizrianraz commented May 16, 2026 •

edited by cursor Bot

Loading

Uh oh!

cursor Bot left a comment

Uh oh!

cursor Bot May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hizrianraz commented May 16, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Why

Changes

Test plan

Out of scope (separate)

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot May 16, 2026

Choose a reason for hiding this comment

Public feed cursor uses per-agent seq as global cursor

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

hizrianraz commented May 16, 2026 •

edited by cursor Bot

Loading