Add live API drift detection + fix missing refusal field#33
Merged
Conversation
commit: |
19a1825 to
169156f
Compare
OpenAI now returns a `refusal` field (null for non-refusal responses) on all Chat Completions messages. Both the SDK types and real API include it, but llmock was omitting it — causing shape mismatches for consumers that validate response structure.
169156f to
7a626fc
Compare
Three-layer triangulation between SDK types, real API responses, and llmock output to detect response shape drift across OpenAI (Chat + Responses), Anthropic Claude, and Google Gemini. - schema.ts: shape extraction, three-way comparison, severity classification - sdk-shapes.ts: expected shapes from SDK types - providers.ts: raw fetch clients, SSE parsing, model listing - helpers.ts: shared test fixtures and server lifecycle - 4 provider drift test files (16 tests) + model deprecation checks (3 tests) - vitest.config.drift.ts: separate config with 30s timeout - Weekly CI workflow (.github/workflows/test-drift.yml) - DRIFT.md: full documentation
7a626fc to
7a961f8
Compare
jpr5
added a commit
that referenced
this pull request
Apr 3, 2026
## Summary - **Bug fix**: OpenAI Chat Completions responses now include `refusal: null` — a field both the SDK and real API return that llmock was omitting. Conformance and unit tests updated to assert the field. - **New feature**: Three-layer drift detection test suite that triangulates between SDK types, real API responses, and llmock output to catch response shape drift across all 4 providers (OpenAI Chat, OpenAI Responses, Anthropic Claude, Google Gemini) - **CI**: Weekly GitHub Actions workflow for automated drift checks + manual trigger ## Details 19 drift tests across 5 files: - 16 shape comparison tests (4 per provider × 4 scenarios: non-streaming text/tool, streaming text/tool) - 3 model deprecation checks (one per provider) Key robustness features: - All provider functions fail fast on non-2xx responses with status code + body in the error message - All streaming tests assert events were actually received (no silent pass on zero events) - SSE parsers handle `\r\n` line endings and continuation lines (Gemini sends wrapped JSON) - Retry with exponential backoff on 429/500/502/503 - `ping` and other transport-level SSE events classified as `info`, not `critical` - Known intentional differences (usage fields, system_fingerprint, etc.) allowlisted The refusal bug was discovered by running the drift tests against real APIs — exactly the value prop. See [DRIFT.md](DRIFT.md) for full documentation. ## Test plan - [x] `pnpm test` — 540/540 existing tests pass (including new refusal assertions) - [x] `pnpm test:drift` with all 3 API keys — 19/19 pass - [x] `pnpm test:drift` without keys — 19 tests skip gracefully - [x] Prettier + ESLint clean - [x] 4 rounds of code review (code-reviewer, silent-failure-hunter, code-simplifier, comment-analyzer, pr-test-analyzer, type-design-analyzer) — all clean 🤖 Generated with [Claude Code](https://claude.com/claude-code)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
refusal: null— a field both the SDK and real API return that llmock was omitting. Conformance and unit tests updated to assert the field.Details
19 drift tests across 5 files:
Key robustness features:
\r\nline endings and continuation lines (Gemini sends wrapped JSON)pingand other transport-level SSE events classified asinfo, notcriticalThe refusal bug was discovered by running the drift tests against real APIs — exactly the value prop.
See DRIFT.md for full documentation.
Test plan
pnpm test— 540/540 existing tests pass (including new refusal assertions)pnpm test:driftwith all 3 API keys — 19/19 passpnpm test:driftwithout keys — 19 tests skip gracefully🤖 Generated with Claude Code