You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Replace the Vercel AI SDK (@ai-sdk/*, ai, @openrouter/ai-sdk-provider) with @mariozechner/pi-ai as the LLM provider layer for grader/rubric/agentv-provider call sites in packages/core. AgentV already depends on @mariozechner/pi-coding-agent (which sits on top of pi-ai), so this consolidates onto a single LLM stack and removes ~6 SDK packages from the dependency graph.
Background
packages/core/src/evaluation/providers/ai-sdk.ts (559 lines) wraps Vercel AI SDK to expose 5 providers: OpenAIProvider, AzureProvider, OpenRouterProvider, AnthropicProvider, GeminiProvider. All converge on a single generateText() call per invoke() (stateless RPC shape).
pi-ai covers the same provider surface natively:
OpenAI, Azure OpenAI (Responses), Anthropic, Google, OpenRouter, plus Vertex/Bedrock/Mistral/Groq/xAI/Cerebras/etc.
Dedicated provider files in pi-ai/dist/providers/ including azure-openai-responses.js.
Unified complete(model, context) and stream(model, context) APIs.
Keep the existing Provider.invoke(request) -> response contract. Implement provider classes as thin adapters over pi-ai's complete(). Don't refactor call sites to a session-based shape — that's a much larger change and pi-coding-agent's session model is heavier than graders need.
Tool definitions move from Zod (ai-sdk's tool()) to TypeBox (pi-ai's Type.Object()). Mechanical port for the small set of filesystem tools in llm-grader.
Anthropic thinking budget: today the config takes a numeric budgetTokens; pi-ai exposes a 5-bucket reasoning enum. Map numeric budgets to the closest bucket and document the change.
Retry/backoff: ai-sdk.ts lines 520–559 have a custom exponential-backoff loop. Either preserve as a wrapper around complete() or accept pi-ai's defaults.
Spike scope (first PR)
Single-provider PoC to de-risk the migration before scoping the full port:
Port OpenAIProvider only to a pi-ai adapter behind the existing Provider interface.
Leave the other 4 providers and all consumers unchanged.
Run the existing grader-score baselines (scripts/check-grader-scores.ts) against an OpenAI-targeted eval and confirm scores stay within range.
Capture findings on: token-usage shape mapping, retry-loop placement, tool-definition port complexity, any Azure/Anthropic-specific gotchas observed while reading pi-ai source.
The spike PR is not intended to remove @ai-sdk/openai from package.json — both libraries co-exist for the duration of the spike.
Acceptance signals (full migration)
All 5 provider classes in ai-sdk.ts reimplemented over pi-ai.
@ai-sdk/anthropic, @ai-sdk/azure, @ai-sdk/google, @ai-sdk/openai, ai, @openrouter/ai-sdk-provider removed from all package.json files.
All grader-score baselines under examples/**/*.grader-scores.yaml pass.
At least one live eval per provider (OpenAI, Azure, Anthropic, Google, OpenRouter) produces correct scores[].type, scores in expected range, and non-zero token usage.
Anthropic thinking budget config: numeric → bucket mapping documented in skill files and code header.
No regressions in bun run test or bun run validate:examples.
Risks / unknowns
Token-usage object shape. pi-ai returns {input, output, cost}; ai-sdk surfaces {inputTokens, outputTokens, cachedInputTokens, reasoningTokens}. JSONL output and Studio aggregation may need adjustment if any consumer relies on cached/reasoning fields.
Azure Responses API parity.useDeploymentBasedUrls + apiFormat: 'responses' switching needs verification with real deployment.
Anthropic thinking. Going from numeric budget to 5-bucket enum is a lossy API change for anyone setting fine-grained budgets — call out as a behavior change in the PR.
Retry semantics.ai-sdk.ts has bespoke backoff; pi-ai's behavior differs. Decide: wrap or replace.
Non-goals
No streaming. Current call sites are non-streaming; don't add streaming as part of this migration.
No move to pi-coding-agent's session model — keep grader calls stateless.
Not changing the public Provider interface or ProviderRequest/ProviderResponse shapes consumed elsewhere in core.
Not adding new providers exposed by pi-ai (Bedrock, Vertex, Mistral, etc.) in this issue — separate work.
Objective
Replace the Vercel AI SDK (
@ai-sdk/*,ai,@openrouter/ai-sdk-provider) with@mariozechner/pi-aias the LLM provider layer for grader/rubric/agentv-provider call sites inpackages/core. AgentV already depends on@mariozechner/pi-coding-agent(which sits on top ofpi-ai), so this consolidates onto a single LLM stack and removes ~6 SDK packages from the dependency graph.Background
packages/core/src/evaluation/providers/ai-sdk.ts(559 lines) wraps Vercel AI SDK to expose 5 providers:OpenAIProvider,AzureProvider,OpenRouterProvider,AnthropicProvider,GeminiProvider. All converge on a singlegenerateText()call perinvoke()(stateless RPC shape).pi-aicovers the same provider surface natively:pi-ai/dist/providers/includingazure-openai-responses.js.complete(model, context)andstream(model, context)APIs.reasoning: 'minimal'|'low'|'medium'|'high'|'xhigh'thinking control.OpenRouter is a first-class supported provider with its own routing config (
openRouterRouting).Call sites in scope
packages/core/src/evaluation/providers/ai-sdk.ts— 5 provider classespackages/core/src/evaluation/providers/agentv-provider.ts— built-in grader providerpackages/core/src/evaluation/graders/llm-grader.ts—generateText()+ filesystemtool()definitionspackages/core/src/evaluation/graders/composite.ts—generateText()packages/core/src/evaluation/generators/rubric-generator.ts—generateText()packages/core/src/evaluation/providers/index.ts— registry wiringDesign latitude
Provider.invoke(request) -> responsecontract. Implement provider classes as thin adapters overpi-ai'scomplete(). Don't refactor call sites to a session-based shape — that's a much larger change andpi-coding-agent's session model is heavier than graders need.tool()) to TypeBox (pi-ai'sType.Object()). Mechanical port for the small set of filesystem tools inllm-grader.budgetTokens; pi-ai exposes a 5-bucketreasoningenum. Map numeric budgets to the closest bucket and document the change.ai-sdk.tslines 520–559 have a custom exponential-backoff loop. Either preserve as a wrapper aroundcomplete()or accept pi-ai's defaults.Spike scope (first PR)
Single-provider PoC to de-risk the migration before scoping the full port:
pi-aiadapter behind the existingProviderinterface.scripts/check-grader-scores.ts) against an OpenAI-targeted eval and confirm scores stay within range.The spike PR is not intended to remove
@ai-sdk/openaifrompackage.json— both libraries co-exist for the duration of the spike.Acceptance signals (full migration)
ai-sdk.tsreimplemented overpi-ai.llm-grader,composite,rubric-generator,agentv-providerupdated.@ai-sdk/anthropic,@ai-sdk/azure,@ai-sdk/google,@ai-sdk/openai,ai,@openrouter/ai-sdk-providerremoved from allpackage.jsonfiles.examples/**/*.grader-scores.yamlpass.scores[].type, scores in expected range, and non-zero token usage.bun run testorbun run validate:examples.Risks / unknowns
{input, output, cost}; ai-sdk surfaces{inputTokens, outputTokens, cachedInputTokens, reasoningTokens}. JSONL output and Studio aggregation may need adjustment if any consumer relies on cached/reasoning fields.useDeploymentBasedUrls+apiFormat: 'responses'switching needs verification with real deployment.ai-sdk.tshas bespoke backoff; pi-ai's behavior differs. Decide: wrap or replace.Non-goals
pi-coding-agent's session model — keep grader calls stateless.Providerinterface orProviderRequest/ProviderResponseshapes consumed elsewhere in core.