Skip to content

fix(audit): honest cap + cursor for /v1/audit/public and /v1/audit/{id}#12

Merged
hizrianraz merged 1 commit into
mainfrom
feat/audit-cap-honesty-plus-heartbeat-probe
May 16, 2026
Merged

fix(audit): honest cap + cursor for /v1/audit/public and /v1/audit/{id}#12
hizrianraz merged 1 commit into
mainfrom
feat/audit-cap-honesty-plus-heartbeat-probe

Conversation

@hizrianraz
Copy link
Copy Markdown
Contributor

@hizrianraz hizrianraz commented May 16, 2026

Summary

  • /v1/audit/public previously silently truncated ?limit=N to 100 — a contract bug for AI-native callers passing higher limits and trusting the response is complete (Memory feat(users): add recent_events + chain_meta to /v1/users/{handle}/dashboard #14). Now honestly capped at 500 with 422 above.
  • /v1/audit/{agent_id} had the same family of bug in inverse: no limit parameter at all (returned the entire chain unbounded). Now optionally cappable; omitting limit preserves backward-compat full-chain behavior for ainfera-verify.
  • Both endpoints gain a since_seq cursor for bandwidth-cheap live-feed polling — the homepage widget should migrate to this rather than re-fetching the full window every 12s (fast-follow web PR).

Why

Surfaced by the 2026-05-16 E2E HALT diagnosis. The recency-rank pre-flight assertion ("≥5 distinct agents in last 200 events") was failing because:

  1. The 100-cap meant ?limit=200 returned 100 events
  2. Any chatty agent (varda) monopolizes the slot ranking
  3. Time-window probes are the right shape for fleet liveness — not recency-rank

The silent cap masked that deeper category error. Fixing the contract honesty is a prereq for the companion E2E script swap (G4 → /v1/heartbeat/latest, C5 → per-agent time-window).

Changes

Test plan

  • make typecheck (mypy --strict, clean)
  • make lint (ruff, clean)
  • make test (387 unit tests pass)
  • RUN_INTEGRATION=1 make test-integration (53 integration tests pass, includes 8 new)
  • CI green
  • Post-merge: curl https://api.ainfera.ai/v1/audit/public?limit=600 → 422
  • Post-merge: curl https://api.ainfera.ai/v1/audit/public?since_seq=1&limit=50 returns events with seq > 1 ascending

Out of scope (separate)

  • scripts/e2e-agent-check.sh — companion E2E script lives in parent ainfera-ai dir (not a git repo). G4 swapped to heartbeat probe, C5 to per-agent time-window, G7/G8 added for cap honesty
  • /annex-iv pagination — Annex IV exports are meant to be full bundles
  • Homepage widget cursor migration — fast-follow web PR
  • AIN-129 PEM URL-encoding (per founder spec)

🤖 Generated with Claude Code


Note

Medium Risk
Changes public and per-agent audit API query semantics (validation, ordering, and optional limiting), which may affect existing clients relying on previous silent truncation or ordering.

Overview
Audit feed endpoints now have explicit, honest pagination controls. /v1/audit/public switches from silently truncating limits to enforcing limit via Query validation (default 20, max 500, 422 on out-of-range), and adds a since_seq cursor mode that returns seq > since_seq in ascending order.

Per-agent audit chain adds optional bounding and cursoring while keeping backward compatibility. /v1/audit/{agent_id} now accepts optional limit (max 500; omitted still returns the full chain) and since_seq filtering, with new integration tests covering caps and cursor behavior for both endpoints.

Reviewed by Cursor Bugbot for commit 78def5f. Bugbot is set up for automated code reviews on this repo. Configure here.

`/v1/audit/public` previously silently truncated `?limit=N` to 100 via
`min(limit, 100)` — a contract bug for AI-native callers passing higher
limits and trusting the response is complete (Memory #14 violation).

`/v1/audit/{agent_id}` had a different bug in the same family: no `limit`
at all, returning the entire chain unbounded.

This change:
- /v1/audit/public: `limit` is now `Query(20, ge=1, le=500)` — default
  preserved at 20, max raised to 500, over-cap returns 422 instead of
  silent truncation. Adds `since_seq: int | None` cursor for
  bandwidth-cheap live-feed polling (when set, returns events with
  seq > since_seq ordered ascending — widgets prepend in order).
- /v1/audit/{agent_id}: `limit` is `Query(None, ge=1, le=500)` — None
  default preserves the unbounded full-chain behavior (backward compat
  for ainfera-verify), over-cap returns 422 when explicitly set. Adds
  the same `since_seq` cursor.
- /v1/audit/{agent_id}/annex-iv: unchanged (Annex IV exports are meant
  to be full bundles; pagination there is a separate design question).

8 new integration tests in test_audit_public_cap.py cover the contract
surface (default works, cap honored at 500, 422 above, cursor returns
ascending filtered, backward-compat full-chain mode for /{agent_id}).

Surfaced by the 2026-05-16 E2E HALT diagnosis — the silent-cap masked
the deeper "agents firing" recency-rank-vs-time-window category error
that the E2E check was making. Companion E2E script swap (G4 + C5 to
heartbeat/per-agent time-window probes) lives in scripts/ (not part of
this PR; api repo only).

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Bugbot Autofix is ON. A cloud agent has been kicked off to fix the reported issue.

Reviewed by Cursor Bugbot for commit 78def5f. Configure here.

AgentORM, AuditEventORM.agent_id == AgentORM.id
)
if since_seq is not None:
stmt = stmt.where(AuditEventORM.seq > since_seq).order_by(AuditEventORM.seq.asc())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Public feed cursor uses per-agent seq as global cursor

High Severity

The since_seq cursor on /v1/audit/public filters by AuditEventORM.seq > since_seq across all agents, but seq is a per-agent counter (starts at 0 for each agent, unique constraint is (agent_id, seq)). This makes the cursor fundamentally broken for cross-agent polling: a caller passing since_seq=100 will silently miss every event from any agent with fewer than 101 events. The integration test only passes because both test agents produce the same number of events, making their seq ranges identical.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 78def5f. Configure here.

@hizrianraz hizrianraz merged commit 11108be into main May 16, 2026
4 checks passed
@hizrianraz hizrianraz deleted the feat/audit-cap-honesty-plus-heartbeat-probe branch May 16, 2026 13:13
hizrianraz pushed a commit that referenced this pull request May 19, 2026
…tion C)

/v1/audit/public was building canonical URIs as
`ainfera.ai/{owner_handle}/{agent_name}` where owner_handle was read off
the agent row. For founder-owned agents (Varda, Yavanna) this surfaced
`ainfera.ai/hizrianraz/varda` on the most-trafficked public endpoint we
run — a discipline #3 leak of founder GitHub-handle / PII.

The Discipline #12 fix landed in the AIN-183 audit prompt is Option C: add
a public-facing handle on the `tenants` row that's decoupled from the
GitHub handles on agent rows, and project that on the public surface
instead.

Three-phase, all in one upgrade():

1. Add `tenant_handle TEXT NULL` to `tenants`.
2. Backfill in priority order:
   a. Tenants that own at least one agent with `owner_handle='hizrianraz'`
      → `tenant_handle='ainfera-ai'` (founder tenant id is not hardcoded;
      lifted from data).
   b. Remaining tenants → MIN(owner_handle) across their agents (stable +
      deterministic + matches the GitHub handle most users registered as).
   c. Agent-less tenants → contact_email local part.
   d. Conflict resolution: collisions append a 6-char id-slice suffix in
      stable id order, so the first-by-id keeps the bare handle.
3. NOT NULL constraint + unique index. Pre-NOT-NULL the migration asserts
   zero rows remain NULL (Memory #20 silent-no-op guard).

- `TenantORM.tenant_handle` declared NOT NULL UNIQUE String(64).
- `routers/audit.py` public_feed projection joins through TenantORM and
  reads tenant_handle. The response key stays `owner_handle` — the public
  API contract is unchanged, only the value source moves.
- All four TenantORM instantiation sites populate the new column:
    - routers/signup.py (SDK-CLI signup → tenant_handle=owner_handle)
    - routers/github_oauth.py (OAuth login → tenant_handle=github_login)
    - routers/install.py (resolve-or-create on install → same as oauth)
    - routers/tenants.py (/v1/tenants/register → contact_email local part)

```
curl -s https://api.ainfera.ai/v1/audit/public | \
  jq -r '.events[].canonical_uri' | grep -c hizrianraz

curl -s https://api.ainfera.ai/v1/audit/public | \
  jq -r '.events[].canonical_uri' | grep -c ainfera-ai/varda
```

Once this lands, the marketing AuditTicker widget (already filters
`ainfera-ai/varda` and `ainfera-ai/yavanna` on the web side) starts
matching real events — closes PR E without a web-side code change.

- Prompt said `tenants.tenant_handle` is a new column. Confirmed via ORM
  read — column did not exist (only id/name/contact_email/api_key_hash/
  created_at). Migration adds it.
- Public response field stays named `owner_handle` to avoid breaking the
  API contract; only the underlying value changes. If a future PR wants
  to rename the response field to `tenant_handle`, that's a separate
  ContractDelta against the PublicAuditEvent Pydantic model.

Closes: AIN-183 P0-3 (founder PII on /v1/audit/public)
Discipline: #1 (claim "no founder PII on public" matches reality),
assertions on data migration).
hizrianraz added a commit that referenced this pull request May 19, 2026
…tion C) (#47)

/v1/audit/public was building canonical URIs as
`ainfera.ai/{owner_handle}/{agent_name}` where owner_handle was read off
the agent row. For founder-owned agents (Varda, Yavanna) this surfaced
`ainfera.ai/hizrianraz/varda` on the most-trafficked public endpoint we
run — a discipline #3 leak of founder GitHub-handle / PII.

The Discipline #12 fix landed in the AIN-183 audit prompt is Option C: add
a public-facing handle on the `tenants` row that's decoupled from the
GitHub handles on agent rows, and project that on the public surface
instead.

Three-phase, all in one upgrade():

1. Add `tenant_handle TEXT NULL` to `tenants`.
2. Backfill in priority order:
   a. Tenants that own at least one agent with `owner_handle='hizrianraz'`
      → `tenant_handle='ainfera-ai'` (founder tenant id is not hardcoded;
      lifted from data).
   b. Remaining tenants → MIN(owner_handle) across their agents (stable +
      deterministic + matches the GitHub handle most users registered as).
   c. Agent-less tenants → contact_email local part.
   d. Conflict resolution: collisions append a 6-char id-slice suffix in
      stable id order, so the first-by-id keeps the bare handle.
3. NOT NULL constraint + unique index. Pre-NOT-NULL the migration asserts
   zero rows remain NULL (Memory #20 silent-no-op guard).

- `TenantORM.tenant_handle` declared NOT NULL UNIQUE String(64).
- `routers/audit.py` public_feed projection joins through TenantORM and
  reads tenant_handle. The response key stays `owner_handle` — the public
  API contract is unchanged, only the value source moves.
- All four TenantORM instantiation sites populate the new column:
    - routers/signup.py (SDK-CLI signup → tenant_handle=owner_handle)
    - routers/github_oauth.py (OAuth login → tenant_handle=github_login)
    - routers/install.py (resolve-or-create on install → same as oauth)
    - routers/tenants.py (/v1/tenants/register → contact_email local part)

```
curl -s https://api.ainfera.ai/v1/audit/public | \
  jq -r '.events[].canonical_uri' | grep -c hizrianraz

curl -s https://api.ainfera.ai/v1/audit/public | \
  jq -r '.events[].canonical_uri' | grep -c ainfera-ai/varda
```

Once this lands, the marketing AuditTicker widget (already filters
`ainfera-ai/varda` and `ainfera-ai/yavanna` on the web side) starts
matching real events — closes PR E without a web-side code change.

- Prompt said `tenants.tenant_handle` is a new column. Confirmed via ORM
  read — column did not exist (only id/name/contact_email/api_key_hash/
  created_at). Migration adds it.
- Public response field stays named `owner_handle` to avoid breaking the
  API contract; only the underlying value changes. If a future PR wants
  to rename the response field to `tenant_handle`, that's a separate
  ContractDelta against the PublicAuditEvent Pydantic model.

Closes: AIN-183 P0-3 (founder PII on /v1/audit/public)
Discipline: #1 (claim "no founder PII on public" matches reality),
assertions on data migration).

Co-authored-by: Aule <aule@ainfera-internal.local>
hizrianraz pushed a commit that referenced this pull request May 19, 2026
New per-tenant routing-policy state surface backing the dashboard
/settings/routing-policy editor (AIN-182 §Phase 3 §7).

Migration 20260519_0021 adds tenant_routing_policies (PK on
tenant_id, FK CASCADE). Columns: active_policy enum, quality/cost/
latency_weight NUMERIC(4,3), fallback_enabled bool,
fallback_penalty_pct NUMERIC(5,2). DB CHECK enforces weight sum
= 1.0 ±0.001 (D26) and penalty bounds [0, 100].

Endpoints:
- GET /v1/routing-policy → row OR implicit Balanced default.
  compliance_veto_locked always true (Discipline #12).
- PUT /v1/routing-policy → upsert via ON CONFLICT. Pydantic
  model_validator enforces weight-sum-to-1.0; DB CHECK is the
  final guard. CHECK breach → 400.

Closes part of AIN-182 Phase 3.
hizrianraz added a commit that referenced this pull request May 19, 2026
…nts (#53)

New per-tenant routing-policy state surface backing the dashboard
/settings/routing-policy editor (AIN-182 §Phase 3 §7).

Migration 20260519_0021 adds tenant_routing_policies (PK on
tenant_id, FK CASCADE). Columns: active_policy enum, quality/cost/
latency_weight NUMERIC(4,3), fallback_enabled bool,
fallback_penalty_pct NUMERIC(5,2). DB CHECK enforces weight sum
= 1.0 ±0.001 (D26) and penalty bounds [0, 100].

Endpoints:
- GET /v1/routing-policy → row OR implicit Balanced default.
  compliance_veto_locked always true (Discipline #12).
- PUT /v1/routing-policy → upsert via ON CONFLICT. Pydantic
  model_validator enforces weight-sum-to-1.0; DB CHECK is the
  final guard. CHECK breach → 400.

Closes part of AIN-182 Phase 3.

Co-authored-by: Aule <aule@ainfera-internal.local>
hizrianraz added a commit that referenced this pull request May 23, 2026
…ce (#69)

Final piece of the cross-repo AAMC retirement (paired with sdk #11+#12,
mcp-server #12, ainfera-os #49, routing #2). Removes references to
"AAMC voter" / "voter pool" / "Council" in code comments + display
names + the (now-defunct) invariant test file.

Changes:

- adapters/openai.py: comment reframe — "AAMC voter pool" →
  "canonical routing backends".
- adapters/upstream_aliases.py: same.
- orm.py: drop the `aamc_voter` flag field-comment reference (the
  field itself stays — was repurposed as a generic catalog-eligibility
  flag; just rename the rationale in the comment).
- routers/stats.py: leaderboard endpoint comment reframe.
- services/response_normalizer.py: comment cleanup.
- services/routing.py: comment cleanup.
- scripts/seed_dev.py: display names reframed
  ("GPT-5.5 Pro (AAMC voter)" → "GPT-5.5 Pro").
- tests/integration/test_aamc_invariants.py → renamed to
  test_routing_backends_invariants.py. Test logic unchanged — it
  enforces the canonical 5-backend lock (Opus, GPT-5.5, Gemini, Grok,
  Mistral-Large), reframed away from the retired AAMC framing.

No runtime behavior change. Pure vocabulary cleanup.

Per Ontology v1.2 amendment (2026-05-22) which retired ATS/AAMC and
folded their semantics into Routing (`q_empirical` for trust;
`M_allowed` for eligibility veto). Ontology v1.3 (2026-05-23) further
made Mithril the canonical product the doctrine leads with.

Co-authored-by: Claude <noreply@anthropic.com>
hizrianraz pushed a commit that referenced this pull request May 27, 2026
…wall)

Per founder GO B1b: P7 schema lock lifted ONLY for additive judge
columns + the v_judge_queue view. Existing columns, decide() call,
weights/thresholds/candidate-set logic untouched (Disc #12 still binds).

## Migration 0028 (additive only)

routing_outcomes gains 6 nullable columns:
- judge_score numeric(2,1)          CHECK 1.0..5.0
- judge_model text
- judge_rationale text
- judge_labeled_at timestamptz
- judge_status text NOT NULL DEFAULT 'unlabeled'
  CHECK IN (unlabeled, labeled, skipped, error)
- reward real                       CHECK 0.0..1.0

Plus:
- Partial index on (judge_status) WHERE judge_status='unlabeled'
  (keeps the worker's hot-query cheap as labeled rows accumulate).
- View public.v_judge_queue: succeeded ∧ unlabeled ∧
  chosen_model_slug != 'claude-opus-4-7' (L8 self-preference firewall
  enforced declaratively at the SQL layer — worker can't bypass).

ORM mirror lands the same 6 cols on RoutingOutcomeORM as nullable
Mapped[] fields with server_default for judge_status.

## Judge worker (scripts/judge_worker.py)

Async script that:
1. Samples JUDGE_BATCH_SIZE (default 10) rows from v_judge_queue.
2. Joins to inferences for request/response payloads.
3. Asks Opus 4.7 to score the response 1-5 with a one-line rationale,
   per a compact rubric tuned to ~200 output tokens.
4. UPDATEs the row with judge_* fields + reward = (score-1)/4.
5. Marks rows error/skipped on API/parse failures; the partial index
   keeps them out of the unlabeled sample.

Hard L8 guards:
- v_judge_queue declaratively excludes self-labeling.
- _FORBIDDEN_JUDGE_OVERRIDES rejects JUDGE_MODEL='' / 'auto' /
  'ainfera-inference' at startup.
- JUDGE_MODEL default is 'claude-opus-4-7' (matches the view).

## GH Actions cron (.github/workflows/judge-worker.yml)

Runs every 6h at :37 (offset from routed-probe :17 to dodge cron
contention). Required secrets: DATABASE_URL, ANTHROPIC_API_KEY.
Optional vars: JUDGE_BATCH_SIZE, JUDGE_MODEL.

Cost envelope: 10 rows/tick × 4 ticks/day ≈ $48/mo (~$0.04/call at
Opus 4.7 ~500/200 token shape).

## Tests

- New: tests/unit/test_judge_worker.py (24 tests covering reply parse
  strict-JSON path, regex-fallback path, invalid replies, reward
  normalization 1→0 / 5→1, request/response payload flattening for
  OpenAI + Anthropic shapes, truncation at 800/2000 chars, sentinel
  fallbacks for missing/unknown shapes).
- All 573 unit/smoke tests green; mypy clean; ruff clean.
- Migration upgrade/downgrade tested on alembic stub (integration suite
  applies it against live Postgres in CI).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hizrianraz added a commit that referenced this pull request May 27, 2026
…+ Opus 4.7 worker) (#82)

* feat(api): AIN-285 · routed cron probe + capture-coverage metric

Root cause (re-confirmed against prod):
- The §16 capture path inside dispatch_with_brain is correct and live.
- No live cron probe ever sends model="ainfera-inference", so the
  routed branch isn't exercised in production. Every existing probe
  (launch-readiness-smoke.sh, e2e-agent-check.sh, t9-fanout.sh) pins a
  vendor slug, which by design flows through routing.dispatch_inference
  and writes zero routing_outcomes rows (capture_invariant.py:61-63).
- capture_invariant's regression counter is process-local and only
  asserted in tests; in prod it's a no-op.

Minimal fix (per founder GO A3):
1. New cron probe (.github/workflows/routed-probe.yml) sends
   model="ainfera-inference" to /v1/inference every 6h. Needs founder
   to set AINFERA_PROBE_KEY (post-AIN-289 rotation) and optionally
   AINFERA_PROBE_AGENT_ID secrets.
2. counter.record_routed(captured=True) bumped inside complete_decision
   - one site, hits all five exit paths (reject / 4xx / cap-or-funds /
   success / 5xx-exhausted) without touching routing_brain.py.
3. counter.record_passthrough(captured_unexpectedly=False) bumped after
   the else-branch dispatch in post_inference returns.
4. New GET /v1/internal/capture-metrics endpoint (internal-key gated
   like /v1/heartbeat/latest) exposes the counter JSON for prod scrape
   + alerting on dispatch_without_capture_total > 0.

Disc #12 compliance:
- No new insert sites, no schema change, no decide()/weights/thresholds
  /candidate-set/passthrough behavior change.
- routing_brain.py untouched.
- The two new in-process counter bumps are pure observability.

Circular-import note: capture_invariant pulls ROUTING_TARGETS from
routers/inference, so the two new counter call sites use function-local
imports of get_counter (noqa: PLC0415 with justification). Moving
ROUTING_TARGETS to a constants module would be the cleaner architectural
fix; defer to a future cleanup PR rather than expand AIN-285 scope.

Tests:
- New: tests/unit/test_capture_metrics_router.py (3 tests covering auth,
  empty-counter shape, shared-singleton bump propagation).
- Updated: tests/smoke/test_openapi_contract.py registers the new route.
- Existing: 538-test unit suite + capture_invariant unit suite green.
- Integration test_capture_coverage.py exercises the new counter bump
  end-to-end in CI (needs live Postgres; skipped locally).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(api): AIN-290 · additive judge schema + Opus 4.7 worker (L8 firewall)

Per founder GO B1b: P7 schema lock lifted ONLY for additive judge
columns + the v_judge_queue view. Existing columns, decide() call,
weights/thresholds/candidate-set logic untouched (Disc #12 still binds).

## Migration 0028 (additive only)

routing_outcomes gains 6 nullable columns:
- judge_score numeric(2,1)          CHECK 1.0..5.0
- judge_model text
- judge_rationale text
- judge_labeled_at timestamptz
- judge_status text NOT NULL DEFAULT 'unlabeled'
  CHECK IN (unlabeled, labeled, skipped, error)
- reward real                       CHECK 0.0..1.0

Plus:
- Partial index on (judge_status) WHERE judge_status='unlabeled'
  (keeps the worker's hot-query cheap as labeled rows accumulate).
- View public.v_judge_queue: succeeded ∧ unlabeled ∧
  chosen_model_slug != 'claude-opus-4-7' (L8 self-preference firewall
  enforced declaratively at the SQL layer — worker can't bypass).

ORM mirror lands the same 6 cols on RoutingOutcomeORM as nullable
Mapped[] fields with server_default for judge_status.

## Judge worker (scripts/judge_worker.py)

Async script that:
1. Samples JUDGE_BATCH_SIZE (default 10) rows from v_judge_queue.
2. Joins to inferences for request/response payloads.
3. Asks Opus 4.7 to score the response 1-5 with a one-line rationale,
   per a compact rubric tuned to ~200 output tokens.
4. UPDATEs the row with judge_* fields + reward = (score-1)/4.
5. Marks rows error/skipped on API/parse failures; the partial index
   keeps them out of the unlabeled sample.

Hard L8 guards:
- v_judge_queue declaratively excludes self-labeling.
- _FORBIDDEN_JUDGE_OVERRIDES rejects JUDGE_MODEL='' / 'auto' /
  'ainfera-inference' at startup.
- JUDGE_MODEL default is 'claude-opus-4-7' (matches the view).

## GH Actions cron (.github/workflows/judge-worker.yml)

Runs every 6h at :37 (offset from routed-probe :17 to dodge cron
contention). Required secrets: DATABASE_URL, ANTHROPIC_API_KEY.
Optional vars: JUDGE_BATCH_SIZE, JUDGE_MODEL.

Cost envelope: 10 rows/tick × 4 ticks/day ≈ $48/mo (~$0.04/call at
Opus 4.7 ~500/200 token shape).

## Tests

- New: tests/unit/test_judge_worker.py (24 tests covering reply parse
  strict-JSON path, regex-fallback path, invalid replies, reward
  normalization 1→0 / 5→1, request/response payload flattening for
  OpenAI + Anthropic shapes, truncation at 800/2000 chars, sentinel
  fallbacks for missing/unknown shapes).
- All 573 unit/smoke tests green; mypy clean; ruff clean.
- Migration upgrade/downgrade tested on alembic stub (integration suite
  applies it against live Postgres in CI).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(api): coerce JSONB strings in judge worker payload flattening

asyncpg can return inference request/response JSONB as serialized strings;
normalize before building judge prompts so the first prod tick does not crash.

Co-authored-by: Cursor <cursoragent@cursor.com>

* feat(api): AIN-285 · §16 task-batch probe with routing_outcomes row-count gate

Replace the single-call curl probe with a script that exercises six §16
task types via model=ainfera-inference and fails loud if the DB row count
does not increase after the batch.

Co-authored-by: Cursor <cursoragent@cursor.com>

---------

Co-authored-by: varda-elentari <varda@ainfera.ai>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
hizrianraz added a commit that referenced this pull request May 28, 2026
…bic 0030

W6-B/9 — complements W6-A. On replay-gate PROMOTE, labs/cron.sh POSTs
here to atomically swap the live policy_version that routing_outcomes
rows tag at decision time.

Files:
- alembic/versions/20260528_0030_active_policy_version.py
  Adds active_policy_version TEXT NOT NULL DEFAULT 'v0' to
  tenant_routing_policies. Reversible.
- ainfera_api/orm.py — TenantRoutingPolicyORM gains
  active_policy_version: Mapped[str] (mirrors the DB column).
- ainfera_api/routers/admin_policy.py — POST /v1/admin/policy/publish
  with hmac.compare_digest service-role gate; SELECT FOR UPDATE atomic
  swap; INSERT on first publish for the global default (nil UUID).
- ainfera_api/main.py — router registered.
- tests/unit/test_admin_policy.py — 7 unit tests on the service-role
  gate + schema validation.
- tests/smoke/test_openapi_contract.py — contract snapshot extended.

Auth: service-role bearer ONLY. 4 failure modes:
  503 mis-config · 401 missing bearer · 403 wrong key ·
  403 ai_infera_<agent>_* tenant key explicit reject.

Discipline #12 invariant: tenant API keys NEVER pass this gate. Test
test_require_service_role_rejects_tenant_key_prefix asserts this.

Validation:
- pytest tests/unit/test_admin_policy.py → 7 passed ✓
- pytest tests/smoke/test_openapi_contract.py → 4 passed ✓
- pre-commit (ruff + mypy --strict) → passed ✓

PR LABEL: do-not-merge-until-2026-06-01

Stacked on: hizrianraz/ain-295-w5-db-remediation

Refs: AIN-296 · AIN-298 · L14.2 · Discipline #12

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hizrianraz added a commit that referenced this pull request May 28, 2026
…ACKED on W5) (#85)

* migration: AIN-295 W5 — Alembic 0029 DB remediation (NOT applied)

W5/9. Translates ainfera-os vault/migrations/ain-298-db-remediation.sql.md
into a proper Alembic migration. **DO NOT APPLY until Mon 2026-06-01**
(per L14 lock — Spark substrate is the higher priority Fri-Sun;
DB remediation lands Monday after migration stabilizes).

Revision chain:
  20260523_0027  rename_aa_index_source_aamc_to_routing_backend
  20260526_0028  ain290_judge_columns  (existing; AIN-290 judge schema)
  20260528_0029  ain298_db_remediation  (NEW; AIN-298 RLS + view + indexes)

Scope (vault draft sections 1-5):
- §1 v_judge_queue redefined WITH (security_invoker = true) — fixes ERROR
- §2 tenant_isolation_select policies (8 native + 6 agent-scoped) +
     tenant_self_read + user_self_read
- §3 public_catalog_read on providers/models/brands (active=true)
- §4 model_leaderboard REVOKE anon + GRANT service_role
- §5 10 unindexed-FK indexes via autocommit_block + CONCURRENTLY + IF NOT EXISTS

Excluded (manual/future): §6 tenant bloat audit · §7 Supabase HIBP toggle ·
§8 DROP deprecated table (2026-06-21+).

Defensive: smoke probe on routing_outcomes presence; DO blocks with
existence guards on every CREATE POLICY; downgrade() reverses all in
dependency order with DROP IF EXISTS.

Validation:
- AST parse clean ✓
- alembic upgrade --sql 0028:0029 → 9,870 bytes DDL ✓
- alembic downgrade --sql 0029:0028 → 3,580 bytes DDL ✓
- pre-commit (mypy --strict + pytest -x) → passed ✓

PR label: do-not-merge-until-2026-06-01

Refs: AIN-295 · AIN-298 · L14 (Mon DB window)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(api): AIN-296 W6-B — atomic policy publish endpoint + ORM + Alembic 0030

W6-B/9 — complements W6-A. On replay-gate PROMOTE, labs/cron.sh POSTs
here to atomically swap the live policy_version that routing_outcomes
rows tag at decision time.

Files:
- alembic/versions/20260528_0030_active_policy_version.py
  Adds active_policy_version TEXT NOT NULL DEFAULT 'v0' to
  tenant_routing_policies. Reversible.
- ainfera_api/orm.py — TenantRoutingPolicyORM gains
  active_policy_version: Mapped[str] (mirrors the DB column).
- ainfera_api/routers/admin_policy.py — POST /v1/admin/policy/publish
  with hmac.compare_digest service-role gate; SELECT FOR UPDATE atomic
  swap; INSERT on first publish for the global default (nil UUID).
- ainfera_api/main.py — router registered.
- tests/unit/test_admin_policy.py — 7 unit tests on the service-role
  gate + schema validation.
- tests/smoke/test_openapi_contract.py — contract snapshot extended.

Auth: service-role bearer ONLY. 4 failure modes:
  503 mis-config · 401 missing bearer · 403 wrong key ·
  403 ai_infera_<agent>_* tenant key explicit reject.

Discipline #12 invariant: tenant API keys NEVER pass this gate. Test
test_require_service_role_rejects_tenant_key_prefix asserts this.

Validation:
- pytest tests/unit/test_admin_policy.py → 7 passed ✓
- pytest tests/smoke/test_openapi_contract.py → 4 passed ✓
- pre-commit (ruff + mypy --strict) → passed ✓

PR LABEL: do-not-merge-until-2026-06-01

Stacked on: hizrianraz/ain-295-w5-db-remediation

Refs: AIN-296 · AIN-298 · L14.2 · Discipline #12

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hizrianraz added a commit that referenced this pull request May 28, 2026
… constraint + RLS init-plan

W1/9 SHIP-NOW. Kills AIN-300 orphan bug + 429 backoff/failover + new
CHECK constraint guards future regressions + clears 16 perf WARNs from
0029.

routing.py:
- _chat_with_429_retry helper (3 attempts, 0.5/2/8s, 429-only)
- dispatch_inference accepts optional inference_id kwarg

routing_brain.py:
- Pre-allocate candidate_inference_id per fallover attempt
- Track last_inference_id; link in 4xx/5xx-exhausted terminal branches
- 429 (after in-adapter retry exhaust) → failover like 5xx
- Cap/Funds/Inactive use decision_rule_override='failed_pre_dispatch'

routing_outcomes.py:
- complete_decision gains decision_rule_override kwarg

alembic 0031: outcome_requires_inference CHECK constraint
alembic 0032: init-plan optimization + ENABLE RLS on _repair_ table

tests/unit/test_routing_429_retry.py: 6 tests, all pass

Validation:
- pre-commit (ruff + ruff format + mypy --strict + pytest -x): passed
- offline upgrade 0030→0032: 10,868 bytes
- offline downgrade 0032→0030: 9,833 bytes

Refs: AIN-300 · AIN-295 · AIN-298 · Disc #12 preserved on scoring/candidate-set

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hizrianraz added a commit that referenced this pull request May 28, 2026
…init-plan (#87)

* feat(api): AIN-300 W1 — write-path atomic linkage + 429 retry + CHECK constraint + RLS init-plan

W1/9 SHIP-NOW. Kills AIN-300 orphan bug + 429 backoff/failover + new
CHECK constraint guards future regressions + clears 16 perf WARNs from
0029.

routing.py:
- _chat_with_429_retry helper (3 attempts, 0.5/2/8s, 429-only)
- dispatch_inference accepts optional inference_id kwarg

routing_brain.py:
- Pre-allocate candidate_inference_id per fallover attempt
- Track last_inference_id; link in 4xx/5xx-exhausted terminal branches
- 429 (after in-adapter retry exhaust) → failover like 5xx
- Cap/Funds/Inactive use decision_rule_override='failed_pre_dispatch'

routing_outcomes.py:
- complete_decision gains decision_rule_override kwarg

alembic 0031: outcome_requires_inference CHECK constraint
alembic 0032: init-plan optimization + ENABLE RLS on _repair_ table

tests/unit/test_routing_429_retry.py: 6 tests, all pass

Validation:
- pre-commit (ruff + ruff format + mypy --strict + pytest -x): passed
- offline upgrade 0030→0032: 10,868 bytes
- offline downgrade 0032→0030: 9,833 bytes

Refs: AIN-300 · AIN-295 · AIN-298 · Disc #12 preserved on scoring/candidate-set

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(alembic 0031): allow outcome_status=NULL (mid-flight) in CHECK predicate

PG CHECK constraints don't support DEFERRABLE/DEFERRED (only FK/UNIQUE
/PK/EXCLUDE do). The two-phase write (insert_decision creates the row
with decision_rule='cheapest_clearing_floor' + inference_id=NULL,
complete_decision links inference_id after dispatch) has a transient
moment that the per-statement check would reject.

Predicate now allows outcome_status IS NULL as the third escape clause:

  CHECK (
    outcome_status IS NULL
    OR decision_rule <> 'cheapest_clearing_floor'
    OR inference_id IS NOT NULL
  )

Once complete_decision sets outcome_status (always non-NULL on every
terminal branch — succeeded/failed_other/failed_provider_error/rejected*),
the constraint REQUIRES either decision_rule rewritten via
decision_rule_override OR inference_id linked. Which IS the AIN-300 W1
invariant.

Integration tests now pass (the failing tests were inserting via the
two-phase pattern and hitting the per-statement check).

Refs: AIN-300

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hizrianraz added a commit that referenced this pull request May 28, 2026
…ll-switch (#89)

W2/9 SHIP-NOW. Brings in the gateway hardening from closed PR #77
(re-baselined). Wires through brain + inference router + deploys to
Railway on merge.

routers/health.py (NEW):
- GET /healthz: in-process liveness, no I/O (HEALTHCHECK target)
- GET /readyz: aggregate readiness (process_drain + db + audit + ks
  snapshot); 503 on any probe fail
- get_readiness_gate() flipped FALSE on SIGTERM for drain

services/cost_killswitch.py (NEW):
- guard_or_raise() called at dispatch_with_brain entry
- rolling-window spend (default today UTC) vs AINFERA_SPEND_KILLSWITCH_USD
- Default $50 + enabled; ops env-config without restart
- Pinned passthroughs bypass guard by design (moat-safe)
- Aggregate-only logging (no PII)

routing_brain.py:
- await cost_killswitch.guard_or_raise(db) before brain runs
- Disc #12 preserved: scoring/candidate-set/weights untouched

inference.py:
- Catch CostKillswitchEngagedError → 503 with code + spent/threshold

main.py:
- Register health.router; rename inline /health → health_legacy

Tests:
- test_health_probes.py (4) + test_cost_killswitch.py (20) + openapi
  contract (4 — /healthz, /readyz documented as non-v1)
- All 28 pass

Founder config (set in Railway env on api):
  AINFERA_SPEND_KILLSWITCH_USD=<real_threshold>  # default $50
  AINFERA_SPEND_KILLSWITCH_ENABLED=1             # default

Refs: AIN-232 · AIN-234 · supersedes closed PR #77

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hizrianraz added a commit that referenced this pull request May 28, 2026
…033) (#93)

Charter A2 / Disc #12-bounded migration. Two additive things:

1. CREATE TABLE public.training_runs — one row per L14.2 daily training
   tick. Captures judge outcomes, policy_version_from→to, promotion
   verdict, per-cell deltas, replay-gate result, and ruleset_hash.

2. CREATE ROLE ainfera_labs LOGIN (no password set here; founder sets
   PASSWORD via Doppler-injected ALTER ROLE). Least-priv grants:
   - INSERT on training_runs (+ sequence USAGE)
   - SELECT on routing_outcomes, inferences, models, providers, agents
   - column-level UPDATE on routing_outcomes (judge_score, judge_model,
     judge_rationale, judge_labeled_at, judge_status, reward) — AIN-290
     columns only
   - column-level UPDATE on tenant_routing_policies (active_policy,
     active_policy_version) — AIN-296 columns only
   - REVOKE DELETE on every table

Verified via `alembic upgrade 20260528_0032:20260528_0033 --sql`:
DDL renders cleanly; `alembic heads` shows `20260528_0033 (head)`.

Disc #12 still binds: no edits to scoring, candidate-set, settlement,
auth, key prefix, or hard-delete rules.

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hizrianraz added a commit that referenced this pull request May 28, 2026
…mint script (#96)

* feat(api): AIN-291 W1 · additive training_runs + ainfera_labs role (0033)

Charter A2 / Disc #12-bounded migration. Two additive things:

1. CREATE TABLE public.training_runs — one row per L14.2 daily training
   tick. Captures judge outcomes, policy_version_from→to, promotion
   verdict, per-cell deltas, replay-gate result, and ruleset_hash.

2. CREATE ROLE ainfera_labs LOGIN (no password set here; founder sets
   PASSWORD via Doppler-injected ALTER ROLE). Least-priv grants:
   - INSERT on training_runs (+ sequence USAGE)
   - SELECT on routing_outcomes, inferences, models, providers, agents
   - column-level UPDATE on routing_outcomes (judge_score, judge_model,
     judge_rationale, judge_labeled_at, judge_status, reward) — AIN-290
     columns only
   - column-level UPDATE on tenant_routing_policies (active_policy,
     active_policy_version) — AIN-296 columns only
   - REVOKE DELETE on every table

Verified via `alembic upgrade 20260528_0032:20260528_0033 --sql`:
DDL renders cleanly; `alembic heads` shows `20260528_0033 (head)`.

Disc #12 still binds: no edits to scoring, candidate-set, settlement,
auth, key prefix, or hard-delete rules.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* feat(api): AIN-289 B2 · rotation grace migration + auth-additive

Charter v2 B2 fix for the run-1 single-column finding on tenants.api_key_hash.

* alembic 0034: ADD COLUMN api_key_hash_pending TEXT NULL + partial
  unique index. Additive only.
* ORM: TenantORM gets api_key_hash_pending field.
* Auth-additive: deps.py / middleware / ownership.py match EITHER
  api_key_hash OR api_key_hash_pending. No-op when pending is NULL.
* scripts/rotate_key_grace_ain289.py: mint + 1P store + set pending +
  verify NEW=200 + promote + verify again. --fallback-cutover preserves
  the run-1 single-UPDATE path. Auto-detects missing column. Never
  prints raw secrets.

625/625 tests green. mypy --strict clean. Disc #12 untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
hizrianraz added a commit that referenced this pull request May 29, 2026
0033 granted ainfera_labs column-level UPDATE on judge cols + SELECT on the
read surface, but RLS is enabled with policies scoped only to `authenticated`.
ainfera_labs isn't `authenticated` and doesn't bypass RLS, so RLS silently
denied it every row — the grants were inert.

Add per-role RLS policies for ainfera_labs:
- routing_outcomes: SELECT (all rows) + UPDATE (judge labeling; the 0033
  column GRANT still limits WHICH columns can change)
- inferences/agents/models/providers: SELECT (all rows)
- training_runs: ENABLE RLS (was disabled → advisor ERROR) + labs INSERT/SELECT

Tenant isolation for `authenticated` is unchanged. Additive; Disc #12 intact.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
hizrianraz added a commit that referenced this pull request May 29, 2026
…l (0036) (#99)

Adds a `source` discriminator (prod|synthetic|shadow, NOT NULL default 'prod',
CHECK + index) so the synthetic cold-start loop's rows can never feed a prod
routing-policy promotion — prod refits filter source='prod'. Existing 147 real
rows backfill to 'prod'. Additive; Disc #12 intact.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant