Add live API drift detection + fix missing refusal field by jpr5 · Pull Request #33 · CopilotKit/aimock

jpr5 · 2026-03-15T04:56:02Z

Summary

Bug fix: OpenAI Chat Completions responses now include refusal: null — a field both the SDK and real API return that llmock was omitting. Conformance and unit tests updated to assert the field.
New feature: Three-layer drift detection test suite that triangulates between SDK types, real API responses, and llmock output to catch response shape drift across all 4 providers (OpenAI Chat, OpenAI Responses, Anthropic Claude, Google Gemini)
CI: Weekly GitHub Actions workflow for automated drift checks + manual trigger
Docs: Added concrete Gemini base URL setup instructions to README (was previously just a comment with no actionable env var)

Details

19 drift tests across 5 files:

16 shape comparison tests (4 per provider × 4 scenarios: non-streaming text/tool, streaming text/tool)
3 model deprecation checks (one per provider)

Key robustness features:

All provider functions fail fast on non-2xx responses with status code + body in the error message
All streaming tests assert events were actually received (no silent pass on zero events)
SSE parsers handle \r\n line endings and continuation lines (Gemini sends wrapped JSON)
Retry with exponential backoff on 429/500/502/503
ping and other transport-level SSE events classified as info, not critical
Known intentional differences (usage fields, system_fingerprint, etc.) allowlisted

The refusal bug was discovered by running the drift tests against real APIs — exactly the value prop.

See DRIFT.md for full documentation.

Test plan

pnpm test — 540/540 existing tests pass (including new refusal assertions)
pnpm test:drift with all 3 API keys — 19/19 pass
pnpm test:drift without keys — 19 tests skip gracefully
Prettier + ESLint clean
4 rounds of code review (code-reviewer, silent-failure-hunter, code-simplifier, comment-analyzer, pr-test-analyzer, type-design-analyzer) — all clean

🤖 Generated with Claude Code

pkg-pr-new · 2026-03-15T04:56:29Z

Open in StackBlitz

npm i https://pkg.pr.new/CopilotKit/llmock/@copilotkit/llmock@33

commit: 7a961f8

OpenAI now returns a `refusal` field (null for non-refusal responses) on all Chat Completions messages. Both the SDK types and real API include it, but llmock was omitting it — causing shape mismatches for consumers that validate response structure.

Three-layer triangulation between SDK types, real API responses, and llmock output to detect response shape drift across OpenAI (Chat + Responses), Anthropic Claude, and Google Gemini. - schema.ts: shape extraction, three-way comparison, severity classification - sdk-shapes.ts: expected shapes from SDK types - providers.ts: raw fetch clients, SSE parsing, model listing - helpers.ts: shared test fixtures and server lifecycle - 4 provider drift test files (16 tests) + model deprecation checks (3 tests) - vitest.config.drift.ts: separate config with 30s timeout - Weekly CI workflow (.github/workflows/test-drift.yml) - DRIFT.md: full documentation

## Summary - **Bug fix**: OpenAI Chat Completions responses now include `refusal: null` — a field both the SDK and real API return that llmock was omitting. Conformance and unit tests updated to assert the field. - **New feature**: Three-layer drift detection test suite that triangulates between SDK types, real API responses, and llmock output to catch response shape drift across all 4 providers (OpenAI Chat, OpenAI Responses, Anthropic Claude, Google Gemini) - **CI**: Weekly GitHub Actions workflow for automated drift checks + manual trigger ## Details 19 drift tests across 5 files: - 16 shape comparison tests (4 per provider × 4 scenarios: non-streaming text/tool, streaming text/tool) - 3 model deprecation checks (one per provider) Key robustness features: - All provider functions fail fast on non-2xx responses with status code + body in the error message - All streaming tests assert events were actually received (no silent pass on zero events) - SSE parsers handle `\r\n` line endings and continuation lines (Gemini sends wrapped JSON) - Retry with exponential backoff on 429/500/502/503 - `ping` and other transport-level SSE events classified as `info`, not `critical` - Known intentional differences (usage fields, system_fingerprint, etc.) allowlisted The refusal bug was discovered by running the drift tests against real APIs — exactly the value prop. See [DRIFT.md](DRIFT.md) for full documentation. ## Test plan - [x] `pnpm test` — 540/540 existing tests pass (including new refusal assertions) - [x] `pnpm test:drift` with all 3 API keys — 19/19 pass - [x] `pnpm test:drift` without keys — 19 tests skip gracefully - [x] Prettier + ESLint clean - [x] 4 rounds of code review (code-reviewer, silent-failure-hunter, code-simplifier, comment-analyzer, pr-test-analyzer, type-design-analyzer) — all clean 🤖 Generated with [Claude Code](https://claude.com/claude-code)

jpr5 force-pushed the feat/drift-detection branch 4 times, most recently from 19a1825 to 169156f Compare March 15, 2026 05:32

jpr5 force-pushed the feat/drift-detection branch from 169156f to 7a626fc Compare March 15, 2026 05:33

jpr5 added 2 commits March 14, 2026 22:35

chore: bump version to 1.3.2

7a961f8

jpr5 force-pushed the feat/drift-detection branch from 7a626fc to 7a961f8 Compare March 15, 2026 05:35

jpr5 merged commit e75918a into main Mar 15, 2026
9 checks passed

jpr5 deleted the feat/drift-detection branch March 15, 2026 05:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add live API drift detection + fix missing refusal field#33

Add live API drift detection + fix missing refusal field#33
jpr5 merged 3 commits intomainfrom
feat/drift-detection

jpr5 commented Mar 15, 2026 •

edited

Loading

Uh oh!

pkg-pr-new bot commented Mar 15, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jpr5 commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Details

Test plan

Uh oh!

pkg-pr-new bot commented Mar 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jpr5 commented Mar 15, 2026 •

edited

Loading

pkg-pr-new bot commented Mar 15, 2026 •

edited

Loading