fix(api): AIN-173 · soften code-capability inference + empty-route diagnostics by hizrianraz · Pull Request #32 · ainfera-ai/api

hizrianraz · 2026-05-18T13:51:09Z

Summary

Role-weighted code detection: code capability only inferred from user/assistant content, not system role. Addresses Manwe's 2026-05-18 over-rejection (2KB SOUL → 400 no_models_match_constraints even though user just said "hi").
Empty-route diagnostic logging: structured WARNING with per-filter elimination counts so incident triage can read off "capability=8, quality=2, cost=2" without per-call debug logging.

Hypothesis C fix only (Hypothesis A/B deferred)

Per ticket §"Recommended: Combined fix (defense in depth)" — this PR ships #3 (soften capability inference) only.

Why not A (catalog audit): Per Memory #5 verify-the-plan, verifying code capability flags on top 5 frontier models against provider docs needs founder lock. SQL UPDATE without that lock is too risky. Filing as founder-action recommended for next session.

Why not B (cost recalibration): Needs 30-day rolling actual-cost data which lives in AIN-154 Phase E (ATS scoring) pipeline. Deferred to that phase.

Why not #4 (downgrade header): Changing 400 → 200 + x-ainfera-capability-downgraded is a behavioral contract change deserving wider review.

Tests added (4 regressions)

test_code_not_detected_when_only_in_system_prompt — exact Manwe reproducer pattern
test_code_detected_when_user_turn_has_code_even_with_neutral_system — legitimate code request guard
test_code_detected_when_assistant_turn_has_code — multi-turn (only system is descriptive-only)
test_long_context_still_role_agnostic — scope guard (fix is code only)

Test plan

4 new assertions pass
Pre-commit hooks (ruff / mypy --strict / pytest -x) green
CI green
Manwe replay: same 2KB SOUL system + "hi" user → 200 (post-merge)
Tulkas probe (AIN-178): 10 capability-trigger prompts → 0 unexpected 400s

Refs

AIN-173 (parent · BUG 1 from Manwe production dogfood)
AIN-154 (router hardening epic)
AIN-178 (Tulkas activation — verification path)

🤖 Generated with Claude Code

…e diagnostics Per Manwe v0.14.0 production dogfood 2026-05-18 BUG 1. Hypothesis C fix only — addresses over-aggressive inference where descriptive system prompts triggered the `code` capability requirement that filtered out all candidate models. ## Changes ### 1. Role-weighted code detection `detect_capabilities` now extracts text into (system_text, other_text) buckets and only triggers `code` when the syntax markers appear in user/assistant content. System-only mentions of code keywords are treated as descriptive context, not capability requests. Rationale: SOUL system prompts (Manwe's 2KB pattern) often describe "this agent can review code / write scripts / etc." with example snippets. The user's actual turn might be "hi" — no need to filter to code-capable models for a greeting. `_CODE_SYNTAX_MARKERS` unchanged; only matching substrate moves from "full_text" to "other_text". ### 2. Empty-route diagnostic logging `auto_route` now emits a structured WARNING with per-filter elimination counts when returning empty: auto_route_empty agent=<id> required_caps=[code, text] quality_floor=good per_call_cap=2.00 rows_inspected=12 dropped_by={capability=8, quality=2, cost=2} Manwe's bug surfaced as a generic 400 with no per-filter breakdown — this fixes the visibility gap. ## Deferred with explicit notes - Hypothesis A catalog audit (model capability flag verification against provider docs) — founder lock required per Memory #5 - Hypothesis B cost-estimate recalibration — needs 30d rolling data via AIN-154 Phase E - Recommendation #4 downgrade header (200 + x-ainfera-capability- downgraded) — behavioral contract change deserves wider review ## Tests added 4 regression cases in tests/unit/test_t9_auto_routing.py: - code NOT inferred from system-only mention (Manwe pattern) - code IS inferred when user turn has code (regression guard) - code IS inferred when assistant turn has code (multi-turn) - long_context still role-agnostic (scope guard) Pre-commit hooks green. ## Refs - AIN-173 (parent · BUG 1) - AIN-154 (router hardening epic) - AIN-178 (Tulkas activation — verify 10 capability-trigger prompts) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor · 2026-05-18T13:51:13Z

You have used all Bugbot PR reviews included in your free trial for your GitHub account on this workspace.

To continue using Bugbot reviews, enable Bugbot for your team in the Cursor dashboard.

linear-code · 2026-05-18T13:51:13Z

AIN-173 🔴 BUG 1: /v1/inference no_models_match_constraints over-rejects code-capable requests (hard-fails turn)

Severity: URGENT 🔴

Filed from Manwe (hermes-agent v0.14.0) production dogfood 2026-05-18. Customer #1 evidence: hard-fails agent turns, framework retries 3× then aborts with empty response. Opaque error from consumer perspective.

Symptom

POST /v1/inference returns HTTP 400 with:

{
  "detail": {
    "code": "no_models_match_constraints",
    "message": "no active model satisfies the requested capabilities, quality_floor, and per_call_cap_usd",
    "context": {
      "capabilities_required": ["code", "text"],
      "quality_floor": "good",
      "per_call_cap_usd": "2"
    }
  }
}

Reproducer

# Yesterday's request worked
curl -X POST https://api.ainfera.ai/v1/inference \
  -H "Authorization: Bearer $MANWE_KEY" \
  -d '{
    "model": "ainfera-auto",
    "messages": [
      {"role": "system", "content": "<512-byte SOUL without code keywords>"},
      {"role": "user", "content": "hi"}
    ]
  }'
# → 200 OK with claude-opus-4-7

# Today's request fails
curl -X POST https://api.ainfera.ai/v1/inference \
  -H "Authorization: Bearer $MANWE_KEY" \
  -d '{
    "model": "ainfera-auto",
    "messages": [
      {"role": "system", "content": "<2KB SOUL mentioning coding/system admin/code execution>"},
      {"role": "user", "content": "hi"}
    ]
  }'
# → 400 no_models_match_constraints

Root cause hypothesis

Auto-router infers capabilities_required=["code", "text"] from prompt keywords. Catalog state changed when AIN-141 migration ran (2026-05-18 14:55-14:58 UTC). One of:

A. claude-opus-4-7 registration in models table is missing the code capability flag (catalog drift from AIN-165 case study — same pattern: SKILL.md vs daemon config drift, this time model_catalog vs router_capability_inference)

B. claude-opus-4-7 per-call cost estimate exceeds the $2 per_call_cap_usd. Measured per-call cost ~$0.035, but estimate may overshoot. Catalog cost_per_call_estimate_usd column may be miscalibrated.

C. Routing layer's capability inference is too aggressive — promotes "coding" → code capability when the prompt is descriptive ("my agent is good at coding") rather than directive ("write me code"). Should distinguish discussion-of-code from request-for-code.

Cross-framework impact

Agent	Hit probability	Reason
Manwe (hermes)	🔴 HITS NOW	Confirmed via this bug report
Varda (Claude SDK + OpenClaw)	🔴 HITS	System prompts mention code review, PR triage, audit — all keyword triggers
Aule (Claude SDK + Opus)	🔴 HITS	`git_pr_create`, `bash_execute_sandboxed` tools — entire purpose is code
Yavanna (LangGraph)	🟡 LOW-MED	Tweet drafts mention competitors who do "code" (LangChain, etc.) → may trigger
Namo (Letta + Gemini)	🟡 LOW	Research synthesis may discuss code-related papers → keyword trigger
Tulkas (Garak + Mistral)	🟡 LOW	Single-prompt probes; system prompts mention "drain proof" not code

Net: 3 of 6 fleet agents at production risk. Manwe confirmed broken. Varda + Aule pending validation.

Fix recommendation (pick one OR combine)

Recommended: Combined fix (defense in depth)

Audit models catalog — verify every model has correct capability flags:
```
SELECT provider, slug, capabilities, cost_per_call_estimate_usd, active 
FROM models 
WHERE active = true 
ORDER BY provider, slug;
```
Cross-reference with provider API docs. Fix any code-capable models marked otherwise. PRIORITY: claude-opus-4-7, claude-sonnet-4-6, gpt-5-5, gpt-5-5-mini, gemini-3-1-pro, grok-4.
Audit cost_per_call_estimate_usd — measured values vs catalog estimates. If overshoot >2×, recalibrate from last-30-days p95 actual cost.
Soften capability inference — distinguish:
- "code" capability inferred from EXPLICIT directives ("write code", "implement", "fix this bug", code blocks in user message) → strict
- "code" capability inferred from DESCRIPTIVE prompts ("my agent does code review", "I work on a codebase") → soft (informational, doesn't gate routing)
Header/extension fallback hint — when constraint over-rejects but text-only routing would succeed, return 200 with x-ainfera-capability-downgraded: true header instead of 400. Consumer can opt out via X-Strict-Capabilities: true request header.

Acceptance gates

Migration: catalog audit + fix for top 5 frontier models (claude-opus-4-7 first)
Test: replay Manwe's failing request → 200 OK
Test: code-explicit user message → still routes to code-capable model
Test: descriptive system prompt with "code" keyword + text-only user message → 200 OK (downgrade header set)
Tulkas probe (AIN-178): replay 10 capability-triggering prompts post-fix → 0 unexpected 400s
Audit event added: inference.capability_downgraded when fallback fires
Per When-stuck fix(api): AIN-141 strip retired AAMC model literals (code-side only; DB migration deferred) #19: STAGING canary before prod
PR opens off fix/ain-173-code-capability-rejection branch (Aule author override)

Connection to existing tickets

AIN-154 (router hardening) — this is a router-layer bug, parents here
AIN-165 (SKILL.md vs daemon config drift) — same Discipline fix(orm): use postgresql.ARRAY so capabilities.contains() emits @> #11 corollary pattern: catalog vs router_inference drift
AIN-141 (catalog migration that triggered state change) — root cause analysis lesson
AIN-171 (auto-route asymmetric exception coverage) — same auto-route path; verify Choice B exception coverage handles no_models_match_constraints correctly

Workaround (Manwe already using)

Pin model via consumer env var: MANWE_AINFERA_DEFAULT_MODEL=claude-opus-4-7 bypasses auto-router entirely. This is the AIN-172 SPOF mitigation pattern — proves the env-override pattern is load-bearing for incident response.

Founder authorization

Per "Fix this error for and check from all frameworks. Tulkas need to start working now." (2026-05-18 PM)

Review in Linear

hizrianraz merged commit 6809731 into main May 18, 2026
3 checks passed

hizrianraz deleted the fix/ain-173-soft-capability-inference branch May 18, 2026 14:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(api): AIN-173 · soften code-capability inference + empty-route diagnostics#32

fix(api): AIN-173 · soften code-capability inference + empty-route diagnostics#32
hizrianraz merged 1 commit into
mainfrom
fix/ain-173-soft-capability-inference

hizrianraz commented May 18, 2026

Uh oh!

cursor Bot commented May 18, 2026

Uh oh!

linear-code Bot commented May 18, 2026 •

edited

Loading

Severity: URGENT 🔴

Symptom

Reproducer

Root cause hypothesis

Cross-framework impact

Fix recommendation (pick one OR combine)

Recommended: Combined fix (defense in depth)

Acceptance gates

Connection to existing tickets

Workaround (Manwe already using)

Founder authorization

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

hizrianraz commented May 18, 2026

Summary

Hypothesis C fix only (Hypothesis A/B deferred)

Tests added (4 regressions)

Test plan

Refs

Uh oh!

cursor Bot commented May 18, 2026

Uh oh!

linear-code Bot commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Severity: URGENT 🔴

Symptom

Reproducer

Root cause hypothesis

Cross-framework impact

Fix recommendation (pick one OR combine)

Recommended: Combined fix (defense in depth)

Acceptance gates

Connection to existing tickets

Workaround (Manwe already using)

Founder authorization

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

linear-code Bot commented May 18, 2026 •

edited

Loading