diff --git a/.claude-plugin/marketplace.json b/.claude-plugin/marketplace.json index 44ade6f1..6616b66d 100644 --- a/.claude-plugin/marketplace.json +++ b/.claude-plugin/marketplace.json @@ -9,9 +9,9 @@ "source": { "source": "npm", "package": "@copilotkit/llmock", - "version": "^1.3.1" + "version": "^1.5.0" }, - "description": "Fixture authoring skill for @copilotkit/llmock — match fields, response types, agent loop patterns, gotchas, and debugging" + "description": "Fixture authoring skill for @copilotkit/llmock — match fields, response types, embeddings, structured output, sequential responses, streaming physics, agent loop patterns, gotchas, and debugging" } ] } diff --git a/.claude-plugin/plugin.json b/.claude-plugin/plugin.json index 7edff576..cd8e5ae4 100644 --- a/.claude-plugin/plugin.json +++ b/.claude-plugin/plugin.json @@ -1,6 +1,6 @@ { "name": "llmock", - "version": "1.3.1", + "version": "1.5.0", "description": "Fixture authoring guidance for @copilotkit/llmock", "author": { "name": "CopilotKit" diff --git a/.claude/commands/write-fixtures.md b/.claude/commands/write-fixtures.md deleted file mode 100644 index 1d1ee8c8..00000000 --- a/.claude/commands/write-fixtures.md +++ /dev/null @@ -1,238 +0,0 @@ ---- -name: write-fixtures -description: Use when writing test fixtures for @copilotkit/llmock — mock LLM responses, tool call sequences, error injection, multi-turn agent loops, or debugging fixture mismatches ---- - -# Writing llmock Test Fixtures - -## What llmock Is - -Zero-dependency mock LLM server. Fixture-driven. Multi-provider (OpenAI, Anthropic, Gemini). Runs a real HTTP server on a real port — works across processes, unlike MSW-style interceptors. WebSocket support for OpenAI Responses/Realtime and Gemini Live APIs. - -## Core Mental Model - -- **Fixtures** = match criteria + response -- **First-match-wins** — order matters -- All providers share one fixture pool (provider adapters normalize to `ChatCompletionRequest`) -- Fixtures are stateless — no built-in multi-turn sequencing -- Fixtures are live — mutations after `start()` take effect immediately - -## Match Field Reference - -| Field | Type | Matches Against | -| ------------- | ----------------------------------------- | ------------------------------------------------------------------------- | -| `userMessage` | `string` | Substring of last `role: "user"` message text | -| `userMessage` | `RegExp` | Pattern test on last `role: "user"` message text | -| `toolName` | `string` | Exact match on any tool in request's `tools[]` array (by `function.name`) | -| `toolCallId` | `string` | Exact match on `tool_call_id` of last `role: "tool"` message | -| `model` | `string` | Exact match on `req.model` | -| `model` | `RegExp` | Pattern test on `req.model` | -| `predicate` | `(req: ChatCompletionRequest) => boolean` | Custom function — full access to request | - -**AND logic**: all specified fields must match. Empty match `{}` = catch-all. - -Multi-part content (e.g., `[{type: "text", text: "hello"}]`) is automatically extracted — `userMessage` matching works regardless of content format. - -## Response Types - -### Text - -```typescript -{ - content: "Hello!"; -} -``` - -### Tool Calls - -```typescript -{ - toolCalls: [{ name: "get_weather", arguments: '{"city":"SF"}' }]; -} -``` - -**`arguments` MUST be a JSON string**, not an object. This is the #1 mistake. - -### Error - -```typescript -{ error: { message: "Rate limited", type: "rate_limit_error" }, status: 429 } -``` - -## Common Patterns - -### Basic text fixture - -```typescript -mock.onMessage("hello", { content: "Hi there!" }); -``` - -### Tool call → tool result → final response (3-step agent loop) - -The most common pattern. Fixture 1 triggers the tool call, fixture 2 handles the tool result. - -```typescript -// Step 1: User asks about weather → LLM calls tool -mock.onMessage("weather", { - toolCalls: [{ name: "get_weather", arguments: '{"city":"SF"}' }], -}); - -// Step 2: Tool result comes back → LLM responds with text -mock.addFixture({ - match: { predicate: (req) => req.messages.at(-1)?.role === "tool" }, - response: { content: "It's 72°F in San Francisco." }, -}); -``` - -**Why predicate, not userMessage?** After a tool call, the client replays the same conversation with the tool result appended. The user message hasn't changed — `userMessage: "weather"` would match the SAME fixture again, creating an infinite loop. - -### Predicate-based routing (same user message, different context) - -Common in supervisor/orchestrator patterns where the system prompt changes: - -```typescript -mock.addFixture({ - match: { - predicate: (req) => { - const sys = req.messages.find((m) => m.role === "system")?.content ?? ""; - return typeof sys === "string" && sys.includes("Flights found: false"); - }, - }, - response: { toolCalls: [{ name: "search_flights", arguments: "{}" }] }, -}); -``` - -### Catch-all (always add one) - -Prevents unmatched requests from returning 404 and crashing the test: - -```typescript -mock.addFixture({ - match: { predicate: () => true }, - response: { content: "I understand. How can I help?" }, -}); -``` - -### Tool result catch-all with prependFixture - -Must go at the front so it matches before substring-based fixtures: - -```typescript -mock.prependFixture({ - match: { predicate: (req) => req.messages.at(-1)?.role === "tool" }, - response: { content: "Done!" }, -}); -``` - -### Stream interruption simulation (v1.3.0+) - -```typescript -mock.onMessage( - "long response", - { content: "This will be cut short..." }, - { - truncateAfterChunks: 3, // Stop after 3 SSE chunks - disconnectAfterMs: 500, // Or disconnect after 500ms - }, -); -``` - -### Error injection (one-shot) - -```typescript -mock.nextRequestError(429, { message: "Rate limited", type: "rate_limit_error" }); -// Next request gets 429, then fixture auto-removes itself -``` - -### JSON fixture files - -```json -{ - "fixtures": [ - { - "match": { "userMessage": "hello" }, - "response": { "content": "Hi!" } - } - ] -} -``` - -JSON files cannot use `RegExp` or `predicate` — those are code-only features. - -Load with `mock.loadFixtureFile("./fixtures/greetings.json")` or `mock.loadFixtureDir("./fixtures/")`. - -## Critical Gotchas - -1. **Order matters** — first match wins. Specific fixtures before general ones. Use `prependFixture()` to force priority. - -2. **`arguments` must be a JSON string** — `"arguments": "{\"key\":\"value\"}"` not `"arguments": {"key":"value"}`. The type system enforces this but JSON fixtures can get it wrong silently. - -3. **Latency is per-chunk, not total** — `latency: 100` means 100ms between each SSE chunk, not 100ms total response time. Similarly, `truncateAfterChunks` and `disconnectAfterMs` are for simulating stream interruptions (added in v1.3.0). - -4. **Tool result messages don't change the user message** — after a tool call, the client sends the same conversation + tool result. Matching on `userMessage` will hit the SAME fixture again → infinite loop. Always use `predicate` checking `role === "tool"` for tool results. - -5. **`clearFixtures()` preserves the array reference** — uses `.length = 0`, not reassignment. The running server reads the same array object. - -6. **Journal records everything** — including 404 "no match" responses. Use `mock.getLastRequest()` to debug mismatches. - -7. **All providers share fixtures** — a fixture matching "hello" works whether the request comes via `/v1/chat/completions` (OpenAI), `/v1/messages` (Anthropic), or Gemini endpoints. - -8. **WebSocket uses the same fixture pool** — no special setup needed for WebSocket-based APIs (OpenAI Responses WS, Realtime, Gemini Live). - -## Debugging Fixture Mismatches - -When a fixture doesn't match: - -1. **Inspect what the server received**: `mock.getLastRequest()` → check `body.messages` array -2. **Check fixture order**: `mock.getFixtures()` returns fixtures in registration order -3. **For `userMessage`**: match is against the LAST `role: "user"` message only, substring match (not exact) -4. **Check the journal**: `mock.getRequests()` shows all requests including which fixture matched (or `null` for 404) - -## E2E Test Setup Pattern - -```typescript -import { LLMock } from "@copilotkit/llmock"; - -// Setup — port: 0 picks a random available port -const mock = new LLMock({ port: 0 }); -mock.loadFixtureDir("./fixtures"); -await mock.start(); -process.env.OPENAI_BASE_URL = `${mock.url}/v1`; - -// Per-test cleanup -afterEach(() => mock.reset()); // clears fixtures AND journal - -// Teardown -afterAll(async () => await mock.stop()); -``` - -### Static factory shorthand - -```typescript -const mock = await LLMock.create({ port: 0 }); // creates + starts in one call -``` - -## API Quick Reference - -| Method | Purpose | -| ------------------------------------- | ---------------------------------- | -| `addFixture(f)` | Append fixture (last priority) | -| `addFixtures(f[])` | Append multiple | -| `prependFixture(f)` | Insert at front (highest priority) | -| `clearFixtures()` | Remove all fixtures | -| `getFixtures()` | Read current fixture list | -| `on(match, response, opts?)` | Shorthand for `addFixture` | -| `onMessage(pattern, response, opts?)` | Match by user message | -| `onToolCall(name, response, opts?)` | Match by tool name in `tools[]` | -| `onToolResult(id, response, opts?)` | Match by `tool_call_id` | -| `nextRequestError(status, body?)` | One-shot error, auto-removes | -| `loadFixtureFile(path)` | Load JSON fixture file | -| `loadFixtureDir(path)` | Load all JSON files in directory | -| `start()` | Start server, returns URL | -| `stop()` | Stop server | -| `reset()` | Clear fixtures + journal | -| `getRequests()` | All journal entries | -| `getLastRequest()` | Most recent journal entry | -| `clearRequests()` | Clear journal only | -| `url` / `baseUrl` | Server URL (throws if not started) | -| `port` | Server port number | diff --git a/.claude/commands/write-fixtures.md b/.claude/commands/write-fixtures.md new file mode 120000 index 00000000..3d887c6a --- /dev/null +++ b/.claude/commands/write-fixtures.md @@ -0,0 +1 @@ +../../skills/write-fixtures/SKILL.md \ No newline at end of file diff --git a/CHANGELOG.md b/CHANGELOG.md index c3b5a8db..69708a1c 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,5 +1,38 @@ # @copilotkit/llmock +## 1.5.0 + +### Minor Changes + +- Embeddings API: `POST /v1/embeddings` endpoint, `onEmbedding()` convenience method, `inputText` match field, `EmbeddingResponse` type, deterministic fallback embeddings from input hash, Azure embedding routing +- Structured output / JSON mode: `responseFormat` match field, `onJsonOutput()` convenience method +- Sequential responses: `sequenceIndex` match field for stateful multi-turn fixtures, per-fixture-group match counting, `resetMatchCounts()` method +- Streaming physics: `StreamingProfile` type with `ttft`, `tps`, `jitter` fields for realistic timing simulation +- AWS Bedrock: `POST /model/{modelId}/invoke` endpoint, Anthropic Messages format translation +- Azure OpenAI: provider routing for `/openai/deployments/{id}/chat/completions` and `/openai/deployments/{id}/embeddings` +- Health & models endpoints: `GET /health`, `GET /ready`, `GET /v1/models` (auto-populated from fixtures) +- Docker & Helm: Dockerfile, Helm chart for Kubernetes deployment +- Documentation website: full docs site at llmock.com with feature pages and competitive comparison matrix +- Automated drift remediation: `scripts/drift-report-collector.ts` and `scripts/fix-drift.ts` for CI-driven drift fixes +- CI automation: competitive matrix update workflow, drift fix workflow +- `FixtureOpts` and `EmbeddingFixtureOpts` type aliases exported for external consumers + +### Patch Changes + +- Fix Gemini Live handler crash on malformed `clientContent.turns` and `toolResponse.functionResponses` +- Add `isClosed` guard before WebSocket finalization events (prevents writes to closed connections) +- Default to non-streaming for Claude Messages API and Responses API (matching real API defaults) +- Fix `streamingProfile` missing from convenience method opts types (`on`, `onMessage`, etc.) +- Fix skills/ symlink direction so npm pack includes the write-fixtures skill +- Fix `.claude` removed from package.json files (was dead weight — symlink doesn't ship) +- Add `.worktrees/` to eslint ignores +- Remove dead `@keyframes sseLine` CSS from docs site +- Fix watcher cleanup on error (clear debounce timer, null guard) +- Fix empty-reload guard (keep previous fixtures when reload produces 0) +- README rewritten as concise overview with links to docs site +- Write-fixtures skill updated for all v1.5.0 features +- Docs site: Get Started links to docs, comparison above reliability, npm version badge + ## 1.4.0 ### Minor Changes diff --git a/README.md b/README.md index ebad0ca5..562cde38 100644 --- a/README.md +++ b/README.md @@ -1,15 +1,28 @@ -# @copilotkit/llmock [![Unit Tests](https://github.com/CopilotKit/llmock/actions/workflows/test-unit.yml/badge.svg)](https://github.com/CopilotKit/llmock/actions/workflows/test-unit.yml) [![Drift Tests](https://github.com/CopilotKit/llmock/actions/workflows/test-drift.yml/badge.svg)](https://github.com/CopilotKit/llmock/actions/workflows/test-drift.yml) +# @copilotkit/llmock [![Unit Tests](https://github.com/CopilotKit/llmock/actions/workflows/test-unit.yml/badge.svg)](https://github.com/CopilotKit/llmock/actions/workflows/test-unit.yml) [![Drift Tests](https://github.com/CopilotKit/llmock/actions/workflows/test-drift.yml/badge.svg)](https://github.com/CopilotKit/llmock/actions/workflows/test-drift.yml) [![npm version](https://img.shields.io/npm/v/@copilotkit/llmock)](https://www.npmjs.com/package/@copilotkit/llmock) -Deterministic multi-provider mock LLM server for testing. Streams SSE responses in real OpenAI, Claude, and Gemini API formats, driven entirely by fixtures. Zero runtime dependencies — built on Node.js builtins only. +Deterministic mock LLM server for testing. A real HTTP server on a real port — not an in-process interceptor — so every process in your stack (Playwright, Next.js, agent workers, microservices) can point at it via `OPENAI_BASE_URL` / `ANTHROPIC_BASE_URL` and get reproducible, instant responses. Streams SSE in real OpenAI, Claude, Gemini, Bedrock, and Azure API formats, driven entirely by fixtures. Zero runtime dependencies. -Supports both streaming (SSE) and non-streaming JSON responses across OpenAI (Chat Completions + Responses), Anthropic Claude (Messages), and Google Gemini (GenerateContent) APIs. Text completions, tool calls, and error injection. Point any process at it via `OPENAI_BASE_URL`, `ANTHROPIC_BASE_URL`, or Gemini base URL and get reproducible, instant responses. - -## Install +## Quick Start ```bash npm install @copilotkit/llmock ``` +```typescript +import { LLMock } from "@copilotkit/llmock"; + +const mock = new LLMock({ port: 5555 }); + +mock.onMessage("hello", { content: "Hi there!" }); + +const url = await mock.start(); +// Point your OpenAI client at `url` instead of https://api.openai.com + +// ... run your tests ... + +await mock.stop(); +``` + ## When to Use This vs MSW [MSW (Mock Service Worker)](https://mswjs.io/) is a popular API mocking library, but it solves a different problem. @@ -57,534 +70,23 @@ MSW can't intercept any of those calls. llmock can — it's a real server on a r | CLI for standalone use | **Yes** | **No** | | Zero dependencies | **Yes** | **No** (~300KB) | -## Quick Start - -```typescript -import { LLMock } from "@copilotkit/llmock"; - -const mock = new LLMock({ port: 5555 }); - -mock.onMessage("hello", { content: "Hi there!" }); - -const url = await mock.start(); -// Point your OpenAI client at `url` instead of https://api.openai.com - -// ... run your tests ... - -await mock.stop(); -``` - -## E2E Test Patterns - -Real-world patterns from using llmock in Playwright E2E tests with CopilotKit, Mastra, LangGraph, and Agno agent frameworks. - -### Global Setup/Teardown - -Start the mock server once for the entire test suite. All child processes (Next.js, agent workers) inherit the URL via environment variable. - -```typescript -// e2e/llmock-setup.ts -import { LLMock } from "@copilotkit/llmock"; -import * as path from "node:path"; - -let mockServer: LLMock | null = null; - -export async function setupLLMock(): Promise { - mockServer = new LLMock({ port: 5555 }); - - // Load JSON fixtures from a directory - mockServer.loadFixtureDir(path.join(__dirname, "fixtures", "openai")); - - const url = await mockServer.start(); - - // Child processes use this to find the mock - process.env.LLMOCK_URL = `${url}/v1`; -} - -export async function teardownLLMock(): Promise { - if (mockServer) { - await mockServer.stop(); - mockServer = null; - } -} -``` - -The Next.js app (or any other service) just needs: - -```env -OPENAI_BASE_URL=http://localhost:5555/v1 -OPENAI_API_KEY=mock-key - -# Or for Anthropic Claude: -ANTHROPIC_BASE_URL=http://localhost:5555/v1 -ANTHROPIC_API_KEY=mock-key - -# Or for Google Gemini (set baseUrl in code — see below): -GOOGLE_API_KEY=mock-key -``` - -For Google Gemini, the SDK doesn't support a base URL env var — pass it in code: - -```typescript -// @google/genai (v1.x) -import { GoogleGenAI } from "@google/genai"; -const ai = new GoogleGenAI({ - apiKey: process.env.GOOGLE_API_KEY, - httpOptions: { baseUrl: "http://localhost:5555" }, -}); - -// @google/generative-ai (v0.x) -import { GoogleGenerativeAI } from "@google/generative-ai"; -const genAI = new GoogleGenerativeAI(process.env.GOOGLE_API_KEY!); -const model = genAI.getGenerativeModel( - { model: "gemini-2.0-flash" }, - { baseUrl: "http://localhost:5555" }, -); -``` - -### JSON Fixture Files - -Define fixtures as JSON — one file per feature, loaded with `loadFixtureFile` or `loadFixtureDir`. - -**Text responses** — match on a substring of the last user message: - -```json -{ - "fixtures": [ - { - "match": { "userMessage": "stock price of AAPL" }, - "response": { "content": "The current stock price of Apple Inc. (AAPL) is $150.25." } - }, - { - "match": { "userMessage": "capital of France" }, - "response": { "content": "The capital of France is Paris." } - } - ] -} -``` - -**Tool call responses** — the agent framework receives these as tool calls and executes them: - -```json -{ - "fixtures": [ - { - "match": { "userMessage": "one step with eggs" }, - "response": { - "toolCalls": [ - { - "name": "generate_task_steps", - "arguments": "{\"steps\":[{\"description\":\"Crack eggs into bowl\",\"status\":\"enabled\"},{\"description\":\"Preheat oven to 350F\",\"status\":\"enabled\"}]}" - } - ] - } - }, - { - "match": { "userMessage": "background color to blue" }, - "response": { - "toolCalls": [ - { - "name": "change_background", - "arguments": "{\"background\":\"blue\"}" - } - ] - } - } - ] -} -``` - -### Fixture Load Order Matters - -Fixtures are evaluated first-match-wins. When two fixtures could match the same message, load the more specific one first: - -```typescript -// Load HITL fixtures first — "one step with eggs" is more specific than -// "plan to make brownies" which also appears in the HITL user message -mockServer.loadFixtureFile(path.join(FIXTURES_DIR, "human-in-the-loop.json")); - -// Then load everything else — earlier matches take priority -mockServer.loadFixtureDir(FIXTURES_DIR); -``` - -### Predicate-Based Routing - -When substring matching isn't enough — for example, when the last user message is the same across multiple requests but the system prompt differs — use predicates: - -```typescript -// Supervisor agent: same user message every time, but system prompt -// contains state flags like "Flights found: false" -mockServer.addFixture({ - match: { - predicate: (req) => { - const sysMsg = req.messages.find((m) => m.role === "system"); - return sysMsg?.content?.includes("Flights found: false") ?? false; - }, - }, - response: { - toolCalls: [ - { - name: "supervisor_response", - arguments: '{"answer":"Let me find flights for you!","next_agent":"flights_agent"}', - }, - ], - }, -}); - -mockServer.addFixture({ - match: { - predicate: (req) => { - const sys = req.messages.find((m) => m.role === "system")?.content ?? ""; - return sys.includes("Flights found: true") && sys.includes("Hotels found: false"); - }, - }, - response: { - toolCalls: [ - { - name: "supervisor_response", - arguments: '{"answer":"Now let me find hotels.","next_agent":"hotels_agent"}', - }, - ], - }, -}); -``` - -### Tool Result Catch-All - -After a tool executes, the next request contains a `role: "tool"` message with the result. Add a catch-all for these so the conversation can continue: - -```typescript -const toolResultFixture = { - match: { - predicate: (req) => { - const last = req.messages[req.messages.length - 1]; - return last?.role === "tool"; - }, - }, - response: { content: "Done! I've completed that for you." }, -}; -mockServer.addFixture(toolResultFixture); - -// Move it to the front so it matches before substring-based fixtures -// (the last user message hasn't changed, so substring fixtures would -// match the same fixture again otherwise) -const fixtures = (mockServer as any).fixtures; -const idx = fixtures.indexOf(toolResultFixture); -if (idx > 0) { - fixtures.splice(idx, 1); - fixtures.unshift(toolResultFixture); -} -``` - -### Universal Catch-All - -Append a catch-all last to handle any request that doesn't match a specific fixture, preventing 404s from crashing the test: - -```typescript -mockServer.addFixture({ - match: { predicate: () => true }, - response: { content: "I understand. How can I help you with that?" }, -}); -``` - -## Programmatic API - -### `new LLMock(options?)` - -Create a new mock server instance. - -| Option | Type | Default | Description | -| ----------- | -------- | ------------- | ----------------------------------- | -| `port` | `number` | `0` (random) | Port to listen on | -| `host` | `string` | `"127.0.0.1"` | Host to bind to | -| `latency` | `number` | `0` | Default ms delay between SSE chunks | -| `chunkSize` | `number` | `20` | Default characters per SSE chunk | - -### `LLMock.create(options?)` - -Static factory — creates an instance and starts it in one call. Returns `Promise`. - -### Server Lifecycle - -| Method | Returns | Description | -| --------- | ----------------- | -------------------------------------- | -| `start()` | `Promise` | Start the server, returns the base URL | -| `stop()` | `Promise` | Stop the server | -| `url` | `string` | Base URL (throws if not started) | -| `baseUrl` | `string` | Alias for `url` | -| `port` | `number` | Listening port (throws if not started) | - -### Fixture Registration - -All registration methods return `this` for chaining. - -#### `on(match, response, opts?)` - -Register a fixture with full control over match criteria. - -```typescript -mock.on({ userMessage: /weather/i, model: "gpt-4" }, { content: "It's sunny!" }, { latency: 50 }); -``` - -#### `onMessage(pattern, response, opts?)` - -Shorthand — matches on the last user message. - -```typescript -mock.onMessage("hello", { content: "Hi!" }); -mock.onMessage(/greet/i, { content: "Hey there!" }); -``` - -#### `onToolCall(name, response, opts?)` - -Shorthand — matches when the request contains a tool with the given name. - -```typescript -mock.onToolCall("get_weather", { - toolCalls: [{ name: "get_weather", arguments: '{"location":"SF"}' }], -}); -``` - -#### `onToolResult(id, response, opts?)` - -Shorthand — matches when a tool result message has the given `tool_call_id`. - -```typescript -mock.onToolResult("call_abc123", { content: "Temperature is 72F" }); -``` - -#### `addFixture(fixture)` / `addFixtures(fixtures)` - -Add raw `Fixture` objects directly (appended to the end of the list). - -#### `prependFixture(fixture)` - -Insert a fixture at the **front** of the list so it matches before all existing fixtures. -Useful for catch-all predicates that must fire before substring-based fixtures. - -```typescript -mock.prependFixture({ - match: { predicate: (req) => req.messages.at(-1)?.role === "tool" }, - response: { content: "Done!" }, -}); -``` - -#### `getFixtures()` - -Returns a `readonly Fixture[]` view of all registered fixtures. Useful for -debugging and logging fixture statistics without accessing private internals. - -```typescript -const fixtures = mock.getFixtures(); -console.log(`${fixtures.length} fixtures loaded`); -``` - -#### `loadFixtureFile(path)` / `loadFixtureDir(path)` - -Load fixtures from JSON files on disk. See [Fixture Files](#json-fixture-files) above. - -#### `clearFixtures()` - -Remove all registered fixtures. - -### Error Injection - -#### `nextRequestError(status, errorBody?)` - -Queue a one-shot error for the very next request. The error fires once, then auto-removes itself. - -```typescript -mock.nextRequestError(429, { - message: "Rate limited", - type: "rate_limit_error", -}); - -// Next request → 429 error -// Subsequent requests → normal fixture matching -``` - -### Request Journal - -Every request to all API endpoints (`/v1/chat/completions`, `/v1/responses`, `/v1/messages`, and Gemini endpoints) is recorded in a journal. - -#### Programmatic Access - -| Method | Returns | Description | -| ------------------ | ---------------------- | ------------------------------------- | -| `getRequests()` | `JournalEntry[]` | All recorded requests | -| `getLastRequest()` | `JournalEntry \| null` | Most recent request | -| `clearRequests()` | `void` | Clear the journal | -| `journal` | `Journal` | Direct access to the journal instance | - -```typescript -await fetch(mock.url + "/v1/chat/completions", { ... }); - -const last = mock.getLastRequest(); -expect(last?.body.messages).toContainEqual({ - role: "user", - content: "hello", -}); -``` - -#### HTTP Endpoints - -The server also exposes journal data over HTTP (useful in CLI mode): - -- `GET /v1/_requests` — returns all journal entries as JSON. Supports `?limit=N`. -- `DELETE /v1/_requests` — clears the journal. Returns 204. - -### Reset - -#### `reset()` - -Clear all fixtures **and** the journal in one call. Works before or after the server is started. - -```typescript -afterEach(() => { - mock.reset(); -}); -``` - -## Fixture Matching - -Fixtures are evaluated in registration order (first match wins). A fixture matches when **all** specified fields match the incoming request (AND logic). - -| Field | Type | Matches on | -| ------------- | ------------------ | --------------------------------------------- | -| `userMessage` | `string \| RegExp` | Content of the last `role: "user"` message | -| `toolName` | `string` | Name of a tool in the request's `tools` array | -| `toolCallId` | `string` | `tool_call_id` on a `role: "tool"` message | -| `model` | `string \| RegExp` | The `model` field in the request | -| `predicate` | `(req) => boolean` | Arbitrary matching function | - -## Fixture Responses - -### Text - -```typescript -{ - content: "Hello world"; -} -``` - -Streams as SSE chunks, splitting `content` by `chunkSize`. With `stream: false`, returns a standard `chat.completion` JSON object. - -### Tool Calls - -```typescript -{ - toolCalls: [{ name: "get_weather", arguments: '{"location":"SF"}' }]; -} -``` - -### Errors - -```typescript -{ - error: { message: "Rate limited", type: "rate_limit_error" }, - status: 429 -} -``` - -## API Endpoints - -The server handles: - -- **POST `/v1/chat/completions`** — OpenAI Chat Completions API (streaming and non-streaming) -- **POST `/v1/responses`** — OpenAI Responses API (streaming and non-streaming) -- **POST `/v1/messages`** — Anthropic Claude Messages API (streaming and non-streaming) -- **POST `/v1beta/models/{model}:generateContent`** — Google Gemini (non-streaming) -- **POST `/v1beta/models/{model}:streamGenerateContent`** — Google Gemini (streaming) - -WebSocket endpoints: - -- **WS `/v1/responses`** — OpenAI Responses API over WebSocket -- **WS `/v1/realtime`** — OpenAI Realtime API (text + tool calls) -- **WS `/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent`** — Gemini Live ([unverified](#gemini-live-bidigeneratecontent)) - -All endpoints share the same fixture pool — the same fixtures work across all providers. Requests are translated to a common format internally for fixture matching. +## Features -## WebSocket APIs +- **[Multi-provider support](https://llmock.com/compatible-providers.html)** — [OpenAI Chat Completions](https://llmock.com/chat-completions.html), [OpenAI Responses](https://llmock.com/responses-api.html), [Anthropic Claude](https://llmock.com/claude-messages.html), [Google Gemini](https://llmock.com/gemini.html), [AWS Bedrock](https://llmock.com/aws-bedrock.html), [Azure OpenAI](https://llmock.com/azure-openai.html) +- **[Embeddings API](https://llmock.com/embeddings.html)** — OpenAI-compatible embedding responses with configurable dimensions +- **[Structured output / JSON mode](https://llmock.com/structured-output.html)** — `response_format`, `json_schema`, and function calling +- **[Sequential responses](https://llmock.com/sequential-responses.html)** — Stateful multi-turn fixtures that return different responses on each call +- **[Streaming physics](https://llmock.com/streaming-physics.html)** — Configurable `ttft`, `tps`, and `jitter` for realistic timing +- **[WebSocket APIs](https://llmock.com/websocket.html)** — OpenAI Responses WS, Realtime API, and Gemini Live +- **[Error injection](https://llmock.com/error-injection.html)** — One-shot errors, rate limiting, and provider-specific error formats +- **[Request journal](https://llmock.com/docs.html)** — Record, inspect, and assert on every request +- **[Fixture validation](https://llmock.com/fixtures.html)** — Schema validation at load time with `--validate-on-load` +- **CLI with hot-reload** — Standalone server with `--watch` for live fixture editing +- **[Docker + Helm](https://llmock.com/docker.html)** — Container image and Helm chart for CI/CD pipelines +- **[Drift detection](https://llmock.com/drift-detection.html)** — Daily CI runs against real APIs to catch response format changes +- **Claude Code integration** — `/write-fixtures` skill teaches your AI assistant how to write fixtures correctly -The same fixtures that drive HTTP responses also work over WebSocket transport. llmock implements RFC 6455 WebSocket framing with zero external dependencies — connect, send events, and receive streaming responses in real provider formats. - -Only text and tool call paths are supported over WebSocket. Audio, video, and binary frames are not implemented. - -### OpenAI Responses API (WebSocket) - -Connect to `ws://localhost:5555/v1/responses` and send a `response.create` event. The server streams back the same events as OpenAI's real WebSocket Responses API: - -```jsonc -// → Client sends: -{ - "type": "response.create", - "model": "gpt-4o", - "instructions": "You are a helpful assistant.", - "input": [ - { "type": "message", "role": "user", "content": [{ "type": "input_text", "text": "Hello" }] }, - ], -} - -// ← Server streams: -// {"type": "response.created", ...} -// {"type": "response.output_item.added", ...} -// {"type": "response.content_part.added", ...} -// {"type": "response.output_item.done", ...} -// {"type": "response.done", ...} -``` - -### OpenAI Realtime API - -Connect to `ws://localhost:5555/v1/realtime`. The Realtime API uses a session-based protocol — configure the session, add conversation items, then request a response: - -```jsonc -// → Configure session: -{ "type": "session.update", "session": { "modalities": ["text"], "model": "gpt-4o-realtime" } } - -// → Add a user message: -{ - "type": "conversation.item.create", - "item": { - "type": "message", - "role": "user", - "content": [{ "type": "input_text", "text": "What is the capital of France?" }] - } -} - -// → Request a response: -{ "type": "response.create" } - -// ← Server streams: -// {"type": "response.created", ...} -// {"type": "response.text.delta", "delta": "The"} -// {"type": "response.text.delta", "delta": " capital"} -// ... -// {"type": "response.text.done", ...} -// {"type": "response.done", ...} -``` - -### Gemini Live (BidiGenerateContent) - -Connect to `ws://localhost:5555/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent`. Gemini Live uses a setup/content/response flow. - -> **⚠️ Unverified**: As of March 2026, Google's only `bidiGenerateContent`-capable models are audio-only — no text-capable model exists for this endpoint. llmock implements the text-based protocol as documented in Google's [Live API reference](https://ai.google.dev/api/live), but the response shapes have not been verified against real API output. Code you write against this mock may need adjustment when Google ships a text-capable Live model. See [DRIFT.md](DRIFT.md#gemini-live-unverified) for details and the automated canary that tracks model availability. - -```jsonc -// → Setup message (must be first): -{ "setup": { "model": "models/gemini-2.5-flash", "generationConfig": { "responseModalities": ["TEXT"] } } } - -// → Send user content: -{ "clientContent": { "turns": [{ "role": "user", "parts": [{ "text": "Hello" }] }], "turnComplete": true } } - -// ← Server streams: -// {"setupComplete": {}} -// {"serverContent": {"modelTurn": {"parts": [{"text": "Hello"}]}, "turnComplete": false}} -// {"serverContent": {"modelTurn": {"parts": [{"text": "!"}]}, "turnComplete": true}} -``` - -## CLI - -The package includes a standalone server binary: +## CLI Quick Reference ```bash llmock [options] @@ -613,98 +115,15 @@ llmock -p 8080 -f ./my-fixtures llmock --latency 100 --chunk-size 5 ``` -## Advanced Usage - -### Low-level Server - -If you need the raw HTTP server without the `LLMock` wrapper: - -```typescript -import { createServer } from "@copilotkit/llmock"; - -const fixtures = [{ match: { userMessage: "hi" }, response: { content: "Hello!" } }]; - -const { server, journal, url } = await createServer(fixtures, { port: 0 }); -// ... use it ... -server.close(); -``` - -### Per-Fixture Timing - -```typescript -mock.on({ userMessage: "slow" }, { content: "Finally..." }, { latency: 200, chunkSize: 5 }); -``` - -## Claude Code Integration +## Documentation -llmock ships with a [Claude Code](https://docs.anthropic.com/en/docs/claude-code) skill that teaches your AI assistant how to write fixtures correctly — match fields, response types, agent loop patterns, gotchas, and debugging techniques. Available as the `/write-fixtures` slash command. +Full API reference, fixture format, E2E patterns, and provider-specific guides: -### Option 1: Plugin install (recommended) - -```bash -# Add the marketplace (one time) -/plugin marketplace add CopilotKit/llmock - -# Install the plugin -/plugin install llmock@copilotkit-tools -``` - -The skill appears as `/llmock:write-fixtures`. - -### Option 2: Local plugin from node_modules - -```bash -claude --plugin-dir ./node_modules/@copilotkit/llmock -``` - -Same result, no marketplace needed. Good for trying it out. - -### Option 3: Add directory - -```bash -claude --add-dir ./node_modules/@copilotkit/llmock -``` - -The skill appears as `/write-fixtures` for the session. - -### Option 4: Copy to your project - -```bash -mkdir -p .claude/commands -cp node_modules/@copilotkit/llmock/.claude/commands/write-fixtures.md .claude/commands/ -``` - -Permanently available as `/write-fixtures` in your project. Commit to share with your team. - -## Future Direction - -Areas where llmock could grow, and explicit non-goals for the current scope. - -### WebSocket APIs - -- **Audio and multimodal**: OpenAI Realtime API audio buffers, voice activity detection, and audio transcription are not implemented. Gemini Live audio/video input and output are similarly out of scope. Only text and tool call paths are supported over WebSocket. -- **Binary WebSocket frames**: Only text frames are processed; binary frames are silently ignored. -- **WebSocket compression**: `permessage-deflate` is not supported. -- **Session persistence**: Realtime and Gemini Live sessions exist only for the lifetime of a single WebSocket connection. There is no cross-connection session resumption. - -### Fixtures - -- **Request metadata in predicates**: Predicate functions receive only the `ChatCompletionRequest`, not HTTP headers, method, or URL. -- **Multi-turn conversation state**: Fixtures are stateless — there is no built-in way to sequence responses across multiple requests in a conversation. -- **Validation on load**: Schema validation is available via `--validate-on-load` (CLI) and `validateFixtures()` (programmatic API), but it is opt-in and not enabled by default. -- **Inheritance and aliasing**: No `$ref` or `extends` mechanism for fixture reuse across files. - -### Testing - -- **Live API drift detection**: The `drift` test suite runs against real OpenAI, Anthropic, and Gemini APIs to catch response format drift. See [DRIFT.md](DRIFT.md) for details on the three-layer triangulation approach, how to run tests, and how to fix detected drift. Runs daily in CI; requires API keys. -- **Token counts**: Usage fields are always zero across all providers. -- **Vision/image content**: Image content parts are not handled by any provider. +**[llmock.com/docs.html](https://llmock.com/docs.html)** ## Real-World Usage -[CopilotKit](https://github.com/CopilotKit/CopilotKit) uses llmock across its test suite to verify AI agent behavior across multiple LLM providers without hitting real APIs. The tests cover streaming text, tool calls, and multi-turn conversations across both v1 and v2 runtimes. - -See the [CopilotKit test suite](https://github.com/CopilotKit/CopilotKit/search?q=llmock&type=code) for real-world examples of llmock in action. +[CopilotKit](https://github.com/CopilotKit/CopilotKit) uses llmock across its test suite to verify AI agent behavior across multiple LLM providers without hitting real APIs. ## License diff --git a/docs/index.html b/docs/index.html index c8c3dbd0..ca177036 100644 --- a/docs/index.html +++ b/docs/index.html @@ -945,16 +945,6 @@ transform: translateY(0); } } - @keyframes sseLine { - from { - opacity: 0; - transform: translateX(-8px); - } - to { - opacity: 1; - transform: translateX(0); - } - } @keyframes blink { 50% { opacity: 0; @@ -1046,8 +1036,8 @@