fix(api): AIN-174 stop-gap · stream=true returns 501 (not silent JSON billing burn)#43
Conversation
Eliminates the silent-failure billing burn pattern from INC-2026-05-18-004 Bug 5. Stop-gap: - Adds `stream: bool = False` to InferenceRequest (was silently dropped) - Returns 501 with code=streaming_not_supported when stream=true - Eliminates the silent retry pattern: clients fail fast Per v6.2 Discipline #15 + When-stuck #21. Full SSE = AIN-174 Phase B. Co-Authored-By: Claude <noreply@anthropic.com>
AIN-174 🔴 BUG 5: /v1/inference no SSE streaming — silent retries burn $0.32+/day per agent (hermes/Claude SDK/LangGraph/Letta all default stream=true)
Severity: URGENT 🔴 (billing burn + silent failures)Filed from Manwe (hermes-agent v0.14.0) production dogfood 2026-05-18. Consumer sees zero output despite paying for 3 silent retries. SymptomHermes-agent (and most modern OpenAI-compat clients) send
Cost impact during the dogfood loop today: ~$0.054 burned per user-message (3 × $0.018 silent retries) before the proxy added SSE wrapping. Reproducercurl -X POST https://api.ainfera.ai/v1/inference \
-H "Authorization: Bearer $KEY" \
-d '{
"model": "claude-opus-4-7",
"messages": [{"role": "user", "content": "say hello"}],
"stream": true
}'
# Current behavior:
# HTTP/2 200
# content-type: application/json
#
# {"id":"...","content":"hello","usage":{...}}
#
# Expected (OpenAI-compat clients):
# HTTP/2 200
# content-type: text/event-stream
#
# data: {"id":"...","choices":[{"delta":{"content":"hello"}}]}
#
# data: [DONE]The
Cross-framework impact
Net: 5 of 6 fleet agents at risk. ALL of Aratar except Tulkas. Confirmed burn on Manwe; pending validation on others (Aule and Varda may have been hitting this all along but absorbing it as "framework noise"). Fix recommendationRecommended: Native SSE on
|
|
You have used all Bugbot PR reviews included in your free trial for your GitHub account on this workspace. To continue using Bugbot reviews, enable Bugbot for your team in the Cursor dashboard. |
Summary
Eliminates the silent-failure billing burn pattern from INC-2026-05-18-004 Bug 5.
Pattern: SSE-expecting client sends
stream=true→ Ainfera silently dropped the field (Pydantic v2 defaultextra='ignore') → returned single-shot JSON → client treated JSON as empty → retried 3x → each retry billed → ~$0.32+/day silent burn per agent.Stop-gap fix:
stream: bool = FalsetoInferenceRequest(was silently dropped before)501 Not Implementedwithcode=streaming_not_supportedwhenstream=true, including remediation hint to setstream=falsePer v6.2 Discipline #15 (surface normalization Ainfera-side, never customer-side) + When-stuck #21 (silent-failure billing path requires explicit handling).
Full SSE streaming = AIN-174 Phase B (multi-day, separate sprint). This PR is the stop-gap that closes the billing-burn vector without waiting for full SSE.
Test plan
test_stream_true_returns_501_not_silent_jsonverifies 501 + remediation hintcurl -d '{"stream":true,...}' /v1/inferencereturns 501 with code=streaming_not_supported🤖 Generated with Claude Code