refactor(core): migrate LLM provider layer from Vercel AI SDK to pi-ai#1206
refactor(core): migrate LLM provider layer from Vercel AI SDK to pi-ai#1206
Conversation
Capture the actual call graph before any provider port: graders consume provider.asLanguageModel() (Vercel LanguageModel) directly, not provider.invoke(), so the migration needs either a Vercel LanguageModelV2 shim over pi-ai (Path A) or a richer Provider API that drops asLanguageModel (Path B). Document the trade-offs so the spike implementation path is decided before code lands. Refs #1205 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pre-push lint was failing on a Biome organizeImports rule for targets-validator.ts (introduced in #1203). Reorder the imports so the lint passes — unblocks pushing from this branch. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Deploying agentv with
|
| Latest commit: |
9f28c3f
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://b43d7512.agentv.pages.dev |
| Branch Preview URL: | https://refactor-1205-pi-ai-spike.agentv.pages.dev |
Drop asLanguageModel() from the Provider interface; enrich Provider.invoke() with optional `tools` + `maxSteps` and `steps` in the response so it covers the hardest consumer (llm-grader built-in agent mode). Tools use JSON Schema on the wire (provider-library-neutral). Document consumer migration order (simplest first), provider port order, and open questions (Anthropic thinking budget mapping, retry placement, token-usage shape). Refs #1205 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…p 1) First consumer + first provider on Path B of #1205: - OpenAIProvider.invoke() now calls @mariozechner/pi-ai's complete() instead of Vercel AI SDK's generateText. asLanguageModel() still returns the Vercel model so llm-grader, composite, and agentv-provider keep working until later steps migrate them. - rubric-generator.ts switches from provider.asLanguageModel() + generateText() to provider.invoke(). It is the simplest consumer (single-shot, no tools) and validates the new shape end-to-end. - pi-ai loaded via dynamic import + `any` casts, mirroring the pattern in pi-coding-agent.ts:250 — pi-ai's published d.ts files do not statically resolve named exports under NodeNext or Bundler module resolution. - @mariozechner/pi-ai added as a regular dependency (was transitive via pi-coding-agent peer dep). - chatPromptToPiContext only handles system + user roles; assistant / tool / function paths throw with a pointer to #1205. YAGNI for step 1 — later consumers (llm-grader multi-turn, tools) will add what they need. - targets.test.ts: openai test now mocks pi-ai's complete/getModel and asserts those are called instead of ai-sdk's generateText. Refs #1205 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…adapter Address three review findings on the pi-ai adapter (#1205 step 1): 1. chatPromptToPiContext now passes assistant messages through and folds tool/function roles into prefixed assistant text, mirroring the Vercel path's toModelMessages. Previously turn 2+ of any multi-turn eval against an openai target threw on the prior turn's assistant message. 2. resolvePiModel falls back to https://api.openai.com/v1 for the openai provider when getModel misses and no baseUrl is configured, and throws a clear error otherwise. Empty baseUrl was forwarded into pi-ai's OpenAI client and failed opaquely. 3. mapPiResponse omits costUsd when pi-ai reports 0 (typically the fallback model descriptor with no pricing) instead of surfacing 0 as "free". Matches the Vercel path, which never sets costUsd. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Step 1 UAT — greenRan a live end-to-end against OpenRouter (OpenAI-compatible endpoint) at commit const provider = resolveAndCreateProvider({
name: 'openai-via-openrouter',
provider: 'openai', // routes through new pi-ai-backed OpenAIProvider
endpoint: 'https://openrouter.ai/api/v1',
api_key: '${{ OPENROUTER_API_KEY }}',
model: 'openai/gpt-4o-mini',
}, env);
const rubrics = await generateRubrics({
criteria: 'A function that returns the sum of two numbers, handling integer overflow correctly.',
provider,
});Output: Confirms:
"Red" evidence is the test the unit-test mock change captures — the previous assertion that |
Make pi-ai a first-class static dependency, like ai-sdk: - Add @sinclair/typebox as a direct dep so pi-ai's transitive types resolve. - Add packages/core/src/evaluation/providers/pi-ai-shim.d.ts that augments '@mariozechner/pi-ai' with the subset we use. Pi-ai's published d.ts has cross-module re-exports that don't surface at the package root under NodeNext (and Bundler) — only direct primary declarations leak through. Re-declaring just what we call gives us static imports + real types. - ai-sdk.ts: replace `let piAiSdk: any | null` + lazy `loadPiAi()` + `as any` casts with plain top-level imports of `complete`, `getModel`, `registerBuiltInApiProviders`, and the Model/Message/AssistantMessage types. registerBuiltInApiProviders() runs once at module load. The previous dynamic-import + any-cast pattern was inherited from pi-coding-agent.ts where pi-ai is an optional peer dep. Now that pi-ai is a real dep, that workaround was earning nothing and costing readability — this PR drops it across the new code path. (pi-coding-agent.ts itself keeps the lazy-load because the pi-coding-agent peer dep can be uninstalled.) Refs #1205 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Lean into pi-ai's design rather than papering over it. Pi-ai treats Model as plain data and apiKey as a per-call StreamOptions field — model and credentials are orthogonal. Reflect that in the adapter: - Add `private readonly piModel: PiModel` field; resolved once in the constructor via resolvePiModel(). - invoke() passes the prebuilt model + apiKey to invokePiAi(); no per-call registry lookup or field merge. - InvokePiAiOptions shrinks from 7 fields to 5 — model is data, the call needs the model + auth + the request. The previous shape rebuilt the model on every invoke from raw config strings, conflating "what model" with "construction details" at the call site. The new shape is both more efficient (resolve once) and more faithful to pi-ai's API: a Model object you carry around, an apiKey you pass when you actually call. Refs #1205 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The CLI bundles @agentv/core (noExternal), and core now imports pi-ai directly. tsup keeps pi-ai external in the bundle (correct — it has dynamic requires), so the published CLI needs pi-ai resolvable at runtime. apps/cli/package.json wasn't listing it, which surfaced as "Cannot find module '@mariozechner/pi-ai'" in CI's Validate Evals job. Reproduces locally with `bun apps/cli/dist/cli.js validate ...`; passes after adding the dep. Refs #1205 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
End-to-end UAT — orchestrator path through OpenAI providerRan the real OPENAI_ENDPOINT=https://openrouter.ai/api/v1 \
OPENAI_API_KEY=$OPENROUTER_API_KEY \
OPENAI_MODEL=openai/gpt-4o-mini \
LLM_TARGET=openai AGENT_TARGET=openai GRADER_TARGET=openai \
bun apps/cli/src/cli.ts eval examples/features/rubric/evals/dataset.eval.yaml \
--output /tmp/spike-uatEval ran clean — 5/5 test cases, no orchestrator errors, mean score 78%, one expected Grader-score baseline (regression gate per AGENTS.md)All three rubric-grader scores from the new pi-ai-backed OpenAI provider land in the ranges established before the migration. No drift. JSONL output shape — no regressionSampled all 5 result rows:
No new fields appearing where there weren't any. No fields disappearing that should be present. Evidence checklist (AGENTS.md "Completing Work — E2E Checklist")
Now genuinely ready for review. |
Extend the Provider interface so invoke() can replace asLanguageModel() across
every grader call site. The new fields are additive — single-shot consumers
keep their current shape.
types.ts:
- Add ProviderTool: { name, description, parameters: JsonObject (JSON Schema),
execute(input): unknown }
- ProviderRequest: optional tools, maxSteps
- ProviderResponse: optional steps: { count, toolCallCount }
ai-sdk.ts (invokePiAi):
- Run the agent loop when tools are provided: model turn → execute tool calls
→ next model turn, until the model stops requesting tools or maxSteps hits.
- Aggregate token usage and cost across all turns; surface step + tool counts
on the response.
- Tool parameters flow as JSON Schema — pi-ai's openai-completions converter
passes them through to the wire format unchanged.
pi-ai-shim.d.ts:
- Declare Tool, Context.tools so the loop typechecks.
- Declare ToolCall.thoughtSignature (set by some providers, optional).
No consumer changes yet; next commit migrates llm-grader / composite /
agentv-provider / rubric-generator off asLanguageModel onto invoke().
Refs #1205
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Every grader call site now goes through Provider.invoke(). The Vercel
LanguageModel branches are gone; provider.invoke() is the single API.
composite.ts:
- Drop the asLanguageModel + generateText branch; rely on provider.invoke()
(which used to be the fallback path).
llm-grader.ts:
- LLM-judge mode (generateStructuredResponse): single invoke() call. Image
inputs flow as ProviderRequest.images instead of ai-sdk image parts.
- Built-in agent mode (evaluateBuiltIn): replace generateText({tools, stopWhen})
with provider.invoke({tools, maxSteps}); read step + tool counts off
ProviderResponse.steps.
- Filesystem tools (createFilesystemTools) now return ProviderTool[] with
JSON Schema parameters — no zod, no ai-sdk tool() helper.
- Drop ai-sdk imports (generateText, stepCountIs, tool); drop toAiSdkImageParts.
agentv-provider.ts:
- Was: throws on invoke(), exposes Vercel asLanguageModel().
- Now: parses provider:model into pi-ai (providerName, apiId), resolves the
PiModel in the constructor, and routes invoke() through invokePiAi(). API
keys come from pi-ai's env-var fallback (OPENAI_API_KEY, ANTHROPIC_API_KEY,
GOOGLE_GENERATIVE_AI_API_KEY, ...).
ai-sdk.ts:
- Export resolvePiModel, invokePiAi, ProviderDefaults so other providers can
be ported without copying the adapter.
- InvokePiAiOptions.apiKey is now optional (agentv provider relies on env
fallback).
- invokePiAi handles the agent loop: tool calls → execute → next model turn,
bounded by maxSteps. Aggregates token usage and cost across turns.
types.ts:
- ProviderRequest.images: optional ContentImage[] for multimodal grader inputs.
Tests:
- agentv-provider.test.ts: rewritten — mocks pi-ai, asserts the new
provider:model → (providerName, modelId) routing and that invoke() calls
pi-ai's complete().
- llm-grader-multimodal.test.ts: rewritten — verifies images flow through
ProviderRequest.images instead of ai-sdk message parts.
Refs #1205
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Complete the #1205 migration. ai-sdk.ts no longer imports from @ai-sdk/* or 'ai'; all five direct-API providers (OpenAI, Azure, OpenRouter, Anthropic, Gemini) route through the same invokePiAi() adapter. Provider classes (ai-sdk.ts): - All five resolve a pi-ai PiModel in their constructor and delegate invoke() to invokePiAi. - Vercel `this.model` field, createOpenAI()/createAzure()/etc., and asLanguageModel() are gone. - AnthropicProvider passes thinkingBudget through pi-ai's Anthropic-specific options as { thinkingEnabled, thinkingBudgetTokens } — no lossy bucket mapping for older models. Newer models (Opus/Sonnet 4.6) ignore it in favour of adaptive thinking, same as before. - AzureProvider routes through pi-ai's azure-openai-responses for both apiFormat values. Behavior change: the legacy Vercel path used /chat/completions for apiFormat='chat' (default); pi-ai uses /responses for everything. Functionally equivalent for grader use cases. Users who hit a deployment that only exposes /chat/completions can route through `provider: openai` with a deployment-scoped baseURL instead. Provider interface (types.ts): - Drop asLanguageModel?(); the Vercel LanguageModel reference is gone. invokePiAi: - Now accepts providerOptions: Record<string, unknown> for provider-specific knobs (Anthropic thinking, Azure URL config). Pi-ai's ProviderStreamOptions = StreamOptions & Record<string, unknown> forwards these to the underlying provider impl. Tests: - targets.test.ts: dropped @ai-sdk/* / ai / @openrouter/ai-sdk-provider module mocks. createProvider tests now assert pi-ai routing (providerName + apiId + baseUrl + provider-specific options). Dependencies removed: - packages/core: @ai-sdk/anthropic, @ai-sdk/azure, @ai-sdk/google, @ai-sdk/openai, ai - apps/cli: @ai-sdk/openai - root: @openrouter/ai-sdk-provider Verification: - Build / typecheck / lint / 1741 unit tests all green. - Live eval: examples/features/rubric/evals/dataset.eval.yaml run with target=openai routed via OpenRouter. All 3 grader-score baselines pass: ✓ code-quality-multi-eval / rubrics: 0.5 ∈ [0.3, 1] ✓ code-explanation-simple / rubrics: 1.0 ∈ [0.8, 1] ✓ technical-writing-detailed / rubrics: 1.0 ∈ [0.8, 1] Refs #1205 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… check Two cleanups closing out the #1205 migration: 1. Rename providers/ai-sdk.ts → providers/llm-providers.ts. The file is no longer the Vercel AI SDK adapter; it owns the five direct-API LLM provider classes (OpenAI, OpenRouter, Anthropic, Gemini, Azure) and delegates to pi-ai. Keeping the old name was misleading. `llm-providers.ts` also distinguishes from the agent providers (claude.ts, codex.ts, etc.) in the same directory. Updated callers in agentv-provider.ts and providers/index.ts. 2. Add scripts/check-pi-ai-shim.ts + a pre-push prek hook + bun script alias. The shim re-declares pi-ai's public surface so our static imports resolve under NodeNext (pi-ai's cross-module re-exports don't bubble up through `export * from`). If pi-ai ships a breaking change — renamed field, removed function — TypeScript stays happy against the shim while the runtime drifts. The check parses both d.ts files (regex + brace counting), confirms every interface name + field name in our shim exists upstream, and likewise for exported function names. Field types are not compared — too much surface for too little value; type-level breakage would surface in llm-providers.ts compilation, and runtime presence is exercised by the unit-test suite. Wired into .pre-commit-config.yaml as `check-pi-ai-shim` (pre-push) and exposed as `bun run check:pi-ai-shim` for manual runs. Verified the failure path by injecting a fake field into the shim — the script exits non-zero with a clear "interface X declares field Y not in upstream" message. Refs #1205 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The pi-ai-shim.d.ts wasn't working around a pi-ai bug — it was working around a stale `declare module '@mariozechner/pi-ai'` block in our own src/types/pi-sdk.d.ts that declared just `getModel(...): unknown`. That stub was added when pi-ai was an optional peer-dep accessed via dynamic import in pi-coding-agent.ts. When pi-ai became a direct dep with its own published types, the stub started colliding: TypeScript merged our `declare module` block with the real one and shadowed/dropped most of pi-ai's exports (complete, Model, AssistantMessage, ...) — but only when the full src/ tree was compiled, which is why it didn't reproduce in a minimal project. Confirmed the diagnosis by removing the stub block and watching pi-ai's imports resolve cleanly with no other changes. The pi-ai-shim.d.ts and the @sinclair/typebox direct dep we added were both unnecessary workarounds for this self-inflicted issue. Changes: - src/types/pi-sdk.d.ts: drop the `declare module '@mariozechner/pi-ai'` block entirely. Keep the pi-coding-agent block (still a real optional peer-dep stub). Header comment now warns against re-adding a pi-ai block. - src/evaluation/providers/pi-ai-shim.d.ts: deleted. - src/evaluation/providers/llm-providers.ts: import pi-ai's real types. Add boundary casts where pi-ai's typed registry meets our runtime strings (PiKnownProvider for getModel's provider arg, `as never` for modelId, `as unknown as PiTool[]` for our JSON-Schema tools fed into pi-ai's TypeBox-typed parameters slot — pi-ai's openai-completions converter passes parameters through as JSON Schema unchanged). - packages/core/package.json: drop @sinclair/typebox direct dep. - scripts/check-pi-ai-shim.ts: deleted (no shim to validate). - .pre-commit-config.yaml: drop the check-pi-ai-shim hook. - package.json: drop the check:pi-ai-shim script. Verified: typecheck / lint / 1741 unit tests / live UAT through OpenRouter all green with no shim and pi-ai's real types in use. Refs #1205 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… docs
Three small follow-ups on the pi-ai migration:
1. llm-grader.ts: comments at line 208/474/478 still referenced "AI SDK
generateText" / "Vercel AI SDK generateText()". Updated to describe the
actual code path: provider.invoke() with filesystem tools, agent loop
driven by pi-ai through the agentv provider.
2. llm-providers.ts: `resolvePiModel`'s synthesized fallback Model used a
single hardcoded `contextWindow: 128000 / maxTokens: 16384` for every
unknown (provider, modelId). These fields are metadata only — pi-ai
uses them for cost telemetry, not to cap the API call (the real
request size comes from StreamOptions.maxTokens, which we omit unless
the caller set request.maxOutputTokens). Replaced with per-provider
defaults via `defaultModelMetadata()`:
- openai / azure-openai-responses: 400K / 128K (gpt-5 family)
- anthropic: 200K / 32K (claude 4.x)
- google: 1M / 64K (gemini 2.5)
- openrouter: 200K / 32K
- default: 128K / 16K
Bump these if a custom gateway routes to bigger windows.
3. llm-providers.ts: tightened the two boundary casts with one-line "why
safe" explanations citing the upstream proof:
- `as unknown as PiTool[]` — pi-ai/dist/providers/openai-completions.js
convertTools() forwards `parameters` unchanged ("TypeBox already
generates JSON Schema").
- `piGetModel(... as PiKnownProvider, ... as never)` —
pi-ai/dist/models.js getModel() is a plain Map lookup that accepts
any string and returns undefined on miss; the casts satisfy the
generic constraint without changing runtime behavior. Also fixed
the comment's "throws otherwise" → returns undefined, and made the
cast `PiModel | undefined` to match.
Verified: typecheck / lint / 1741 unit tests / live UAT through OpenRouter
all green.
Refs #1205
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The per-provider defaultModelMetadata table was over-engineered. On the
complete()/streamOpenAICompletions code path we use, pi-ai only sets
max_tokens when the caller passes StreamOptions.maxTokens — model.maxTokens
is not consulted. Pi-ai's *simple* options builder
(simple-options.js:buildBaseOptions) does fall back to
Math.min(model.maxTokens, 32000) for the completeSimple/streamSimple path,
but we don't currently call that path.
Replace the switch statement with a universal { contextWindow: 128000,
maxTokens: 16384 } matching pi-coding-agent's ModelRegistry choice for
custom models — same numbers across both shims keeps behavior consistent
when callers eventually mix the two SDKs.
Comment now honestly describes pi-ai's actual maxTokens consumption: not
"metadata only", but "metadata on our path; would be a fallback ceiling
on the *Simple path we don't use".
Refs #1205
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…r custom providers
Pinned in both packages/core/package.json and apps/cli/package.json (the two places that consume pi-ai's runtime). 10 minor versions of upstream fixes and additions; no breaking changes for our adapter — index.d.ts shape is unchanged on the named exports we use (complete, getModel, registerBuiltInApiProviders) and the Model / Tool / Message / AssistantMessage types still match our cast assumptions in llm-providers.ts. Verified: - typecheck / lint / 1741 unit tests all green - live UAT: generateRubrics through OpenAIProvider routed at OpenRouter returns 6 valid rubrics Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…1214) * fix(core): drop api_format from Azure targets; surface pi-ai errors Two regressions from the Vercel AI SDK → pi-ai migration (#1206) broke Azure evals end-to-end whenever a target had api_format=chat (or unset on a config still defaulting to it): 1. The migration kept the api_format field but pi-ai's azure-openai- responses provider always hits /openai/v1/responses. With chat-style defaults still in place, AgentV sent ?api-version=2024-12-01-preview to the Responses path, which Azure rejects with 400 "API version not supported." Every eval call failed. 2. invokePiAi never inspected pi-ai's stopReason. When pi-ai surfaced the 400 as { stopReason: 'error', errorMessage: '...', content: [] }, the adapter happily returned an empty assistant message, which downstream graders then reported as "Unexpected EOF" JSON parse failures — completely hiding the underlying HTTP error. Fix: - Remove api_format from Azure targets entirely. Pi-ai's adapter only exposes the Responses path, so a chat/responses switch on this provider has no effect. Default Azure api version to v1 (matching the /openai/v1/responses path). Reject api_format on Azure targets at both validation and resolution time with a migration message pointing at `provider: openai` for chat-completions-only deployments. api_format remains supported on `provider: openai`. - Throw from invokePiAi when pi-ai returns stopReason 'error' so failures reach the surface and withRetry can apply its status-based retry policy. The thrown message includes the parsed HTTP status so isRetryableError can decide correctly. Also drops api_format from the repo's .agentv/targets.yaml and updates the Azure provider docs to document the migration path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore: biome formatter pass on self-update files Pure whitespace fix from `biome format --write` on apps/cli/src/self-update.ts and apps/cli/test/self-update.test.ts. Pre-existing issue from #1213; the pre-push hook fails on these files independently of any other change, so fixing them here unblocks the bug-fix branch's push. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Refs #1205. Migrates AgentV's LLM provider layer from Vercel AI SDK to @mariozechner/pi-ai.
Summary
Replaces all five Vercel AI SDK provider classes (OpenAI, Azure, OpenRouter, Anthropic, Google) with pi-ai-backed implementations. Removes the
asLanguageModel()method from the Provider interface — all providers now go through a singleinvoke()path.Key changes
ai-sdk.ts(old provider file), all@ai-sdk/*andaidependenciesllm-providers.ts— five provider classes routing throughinvokePiAi()asLanguageModel(), addedtools,maxSteps,imagestoProviderRequestinvokePiAi()— model turn → tool execution → model turn, up tomaxStepsresolvePiModel()falls back to 128K/16K (same as pi-coding-agent) for unknown modelsAgentvProvider: parsesprovider:modelstrings, delegates to sharedinvokePiAi()max_output_tokensfor direct providersType safety
Model<TApi>used correctly asPiModelBase<PiApi>: any, no@ts-ignoreTest plan
Notes
docs/plans/1205-pi-ai-spike.mdto be cleaned up before merge (move relevant content to refactor(core): migrate LLM providers from Vercel AI SDK to @mariozechner/pi-ai #1205)🤖 Generated with Claude Code