Fail loudly on missing throughput budgets#1063
Merged
Merged
Conversation
This was referenced May 8, 2026
joelteply
pushed a commit
that referenced
this pull request
May 11, 2026
Codifies the fairness bar Mac+Windows smoke surfaced post #1057-1060: storm IS fixed (CPU stays flat) BUT first-claim-wins coordination is too sticky (only 1 of N personas replies). This test makes that failure mode explicit so the eventual fix has an executable green-vs-red signal. Five typed loud-fail buckets per #1063 / #1067 pattern: probe_not_persisted — chat/send returned ok but DB drop no_personas_replied — total silence (storm-fix overcorrection) first_response_budget_exceeded — first reply > 10s budget per #1062 all_response_budget_exceeded — full reply set > 30s budget per #1062 fairness_violated — only K of N replied where K < min Standing-rule alignment (#1070 / #1072): - Single attempt, no retry on failure - Loud-fail with typed bucket — operator greps result, doesn't dig logs - No silent fallback — reports what user-facing surface actually shows Uses ./jtag CLI via execFile to stay decoupled from in-process JTAGClient TS surface drift; matches the chat-probe pattern operators already use.
joelteply
added a commit
that referenced
this pull request
May 11, 2026
* test(sensory): add Position 2 alpha-contract WebRTC sensory smoke Per #1072 sensory persona alpha contract: codifies the live sensory loop a STANDARD PERSONA must satisfy. Resolves multimodal model via cognition/resolve-model (Position 1 dependency), spawns LiveKitAgent, publishes test audio question + known image as video frame, asserts persona's TTS response + transcription mentions image content. Six typed loud-fail buckets per #1063 / #1067 pattern: no_qualified_model, persona_failed_to_join, no_audio_published, no_transcription, vision_blind, budget_exceeded Failing-loud test today; passes when Position 1 (resolver + RequirementProfile::StandardPersona IPC) and Position 3 (Qwen multimodal GPU kernels) land. Bar is the test, not the impl. No silent CPU fallback, no degraded text-only pass, no retry on failure (per #1070 / #1072 standing rules). * test(persona): multi-persona response timing regression smoke Codifies the fairness bar Mac+Windows smoke surfaced post #1057-1060: storm IS fixed (CPU stays flat) BUT first-claim-wins coordination is too sticky (only 1 of N personas replies). This test makes that failure mode explicit so the eventual fix has an executable green-vs-red signal. Five typed loud-fail buckets per #1063 / #1067 pattern: probe_not_persisted — chat/send returned ok but DB drop no_personas_replied — total silence (storm-fix overcorrection) first_response_budget_exceeded — first reply > 10s budget per #1062 all_response_budget_exceeded — full reply set > 30s budget per #1062 fairness_violated — only K of N replied where K < min Standing-rule alignment (#1070 / #1072): - Single attempt, no retry on failure - Loud-fail with typed bucket — operator greps result, doesn't dig logs - No silent fallback — reports what user-facing surface actually shows Uses ./jtag CLI via execFile to stay decoupled from in-process JTAGClient TS surface drift; matches the chat-probe pattern operators already use. --------- Co-authored-by: Test <test@test.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Validation
Notes
Native-arch Docker push was skipped by the existing post-hook dirty-state guard; CI/Windows RTX image validation should cover the arch slice before merge.