feat(sdk): support local OpenAI-compatible providers via providerOptions (closes #678)#693
Open
vraj00222 wants to merge 8 commits into
Open
Conversation
…e endpoints Adds two optional fields to AgentDefinition.providerOptions (and the backend OpenRouterProviderRoutingOptions + Zod schema) so an agent can direct its LLM calls at a custom OpenAI-compatible base URL (Ollama, LM Studio, self-hosted). The dispatch logic that consumes these fields lands in a follow-up commit. Also adds CODEBUFF_BASE_URL / CODEBUFF_PROVIDER_API_KEY env var constants and SDK getters for them. Part of issue CodebuffAI#678. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rRequest When ModelRequestParams.customProvider.baseUrl is set, return an OpenAICompatibleChatLanguageModel pointed at that endpoint and flag the result with isCustomProvider: true. Bypasses both the Codebuff backend and the ChatGPT OAuth direct path. No metadataExtractor — direct calls don't flow through Codebuff cost accounting. Mirrors the existing ChatGPT-OAuth-direct branch pattern. Trailing slashes on baseUrl are trimmed. apiKey defaults to "codebuff" when absent (most local runtimes ignore it). Adds 5 unit tests covering the new branch, regression-tested against existing model-provider-free-mode tests. Part of issue CodebuffAI#678. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…option
In promptAiSdkStream, resolve baseUrl/apiKey across three layers (agent
providerOptions > clientCustomProvider > env vars) and forward to
getModelForRequest. When the custom-provider path is active:
• maxRetries: 1 (one retry handles brief model-load stalls; no further
fallback — would violate user intent re: privacy / cost)
• Skip codebuff_metadata and OpenRouter routing keys in the request
body (same as the existing ChatGPT-OAuth-direct branch)
• Wrap connection failures and 404s in friendly messages pointing at
the configured URL and model
Plumbs CodebuffClient.{providerBaseUrl, providerApiKey} through
runOnce → getAgentRuntimeImpl, which wraps promptAiSdkStream with a
closure that injects clientCustomProvider on every call.
Adds an integration test documenting the precedence contract.
Resolves issue CodebuffAI#678 (implementation; smoke verification follows).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bun's fetch surfaces ECONNREFUSED as code='ConnectionRefused' with message "Unable to connect. Is the computer able to access the url?". Neither matched the original error-wrap regex. Now check both the raw message and the error.code property across Bun/Node patterns. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
UX layer on top of the providerOptions.baseUrl plumbing — lets a user toggle between local and cloud inference without restarting codebuff or editing agent files. Subcommands: /local — show current status /local on — enable with default Ollama URL (localhost:11434/v1) /local on <url> — enable with a specific URL /local set <url> — alias for `/local on <url>` /local off — disable, return to Codebuff backend /local status — same as `/local` Implementation mutates process.env.CODEBUFF_BASE_URL at runtime. The SDK reads this env var lazily on every promptAiSdkStream call, so changes take effect immediately for the next request without needing to rebuild the CodebuffClient. Agent-level providerOptions.baseUrl still wins — /local only affects agents that don't set their own baseUrl. Communicated in the enable message so users aren't surprised. 29 unit tests covering: parse/apply separation, all subcommands and aliases, URL validation, idempotent disable, end-to-end toggle cycle, and verification that mutations are visible to the SDK env getter. All 2354 existing CLI tests still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…l with "model not found" Without this, `/local on` redirected the URL to Ollama but the agent still sent its declared cloud model id (e.g. anthropic/claude-opus-4-7) which Ollama doesn't have — every prompt failed with "model not found". Adds a CODEBUFF_LOCAL_MODEL env var that overrides the agent's model when a custom provider is active. The SDK reads it lazily in promptAiSdkStream, mirroring how the URL is resolved. Override only applies to agents WITHOUT their own providerOptions.baseUrl — explicit per-agent config is left alone. /local subcommands grew: /local on <model> — enable with default URL + model /local on <url> <model> — enable with both /local model <name> — set model after enable /local model clear — drop the override /local list — probe Ollama's /api/tags for models /local <model-with-colon> — shortcut: same as `on <model>` The status message also nudges users toward setting a model: without an override, it warns that cloud model ids will go to the local provider and fail. 40 unit tests cover: parser shapes (URL only, model only, both, aliases, bare URL, bare model with-colon shortcut), URL/model validation, set-model rejection when local is off, idempotent disable, end-to-end env mutation via the SDK getter. All 3245 existing tests across CLI + SDK + agent-runtime still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When CODEBUFF_LOCAL_MODEL is set and overrides params.model, log it at INFO so users can grep their cli.jsonl and confirm the substitution is happening on outbound requests. Verified end-to-end against live Ollama: params.model='anthropic/claude-opus-4.7' + env CODEBUFF_LOCAL_MODEL='llama3.1:8b' → request reached Ollama, llama3.1:8b responded. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
354fdb1 to
a74e57b
Compare
• Remove unused isUrl() helper in /local parser (replaced by
looksLikeUrl during earlier fix; never deleted).
• Rename env var CODEBUFF_LOCAL_MODEL → CODEBUFF_PROVIDER_MODEL
so all three custom-provider env vars share the CODEBUFF_(PROVIDER_)?
prefix consistently. Clarify in JSDoc that the override is skipped
when an agent declares its own providerOptions.baseUrl.
• Default apiKey placeholder "codebuff" → "unused" in
createCustomProviderModel. The literal string "codebuff" invited the
wrong mental model (could read as "send my Codebuff key"); "unused"
plus a comment makes the intent obvious. Local runtimes ignore the
Authorization header entirely; we never send the user's real key on
the direct path.
• Extract maxRetries: 1 into CUSTOM_PROVIDER_MAX_RETRIES with a
JSDoc explaining the choice (one retry for cold-start; more wouldn't
help with deterministic local failures).
• Simplify the precedence ladder in promptAiSdkStream — replace the
nested ternary that paired apiKey-with-winning-baseUrl with a small
sources array + .find(). Same behavior, easier to read at a glance.
Tests updated for the env var rename. All 3245 tests across CLI, SDK,
and agent-runtime still pass.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds the ability to route an agent's LLM calls directly to a local or self-hosted OpenAI-compatible endpoint (Ollama, LM Studio, etc.) instead of the Codebuff backend. Resolves #678.
Followed @jahooma's review guidance from the issue thread:
providerOptionsfield (no newlocalProvider)CODEBUFF_BASE_URL(dropped "LOCAL" —baseUrl/apiKeyare also useful for non-local cloud endpoints)CodebuffClientconstructor acceptsproviderBaseUrl/providerApiKeyZero new runtime dependencies. Fully backward-compatible.
How a user uses it
Simplest — env var (every agent goes local)
export CODEBUFF_BASE_URL=http://localhost:11434/v1 codebuffInside the CLI — new
/localslash commandPer-agent (the cost-split pattern — see "Recommended usage" below)
SDK consumers
Precedence
Per LLM call: agent's
providerOptions.baseUrl>CodebuffClient.providerBaseUrl>CODEBUFF_BASE_URLenv var > Codebuff backend (unchanged default).apiKeyresolves on the same ladder, paired with whichever URL wins.Recommended usage (today)
Cost-split pattern — exactly what #678 called out:
Keep the default orchestrator agents on Codebuff cloud (best at planning, tool routing, complex reasoning). Set
providerOptions.baseUrlper-agent for cheap sub-agents likefile-pickerandcode-searcher. The orchestrator stays smart; the cheap tasks become free. The PR enables this exactly.What's NOT recommended yet: replacing the default orchestrator with a small local model end-to-end. Codebuff's default agents have ~5000-token system prompts (with extensive tool-calling instructions, sub-agent definitions, mode hints, etc.) designed for cloud-class models. Small local models (≤8B) struggle to follow that scaffolding cleanly and often produce confusing output ("Please provide the task you would like me to perform") or empty turns. This is not a bug in the plumbing — the request reaches the local model correctly; the model just can't drive the orchestrator role.
A follow-up PR will add a
compactPromptsflag onAgentDefinitionplus/mode <custom-agent>so small-model end-to-end workflows work cleanly.Demo
When it fails (Ollama down, wrong URL, model not pulled)
Users get an actionable error, not raw
fetch failed:Error patterns cover both Node-style (
ECONNREFUSED,ENOTFOUND,ETIMEDOUT) and Bun-style (ConnectionRefused, "Unable to connect...") variants — caught during local testing against Bun's fetch implementation.No silent fallback to the Codebuff backend if the local endpoint is unreachable. Falling back would leak prompts the user wanted to keep local and incur billing they expected to avoid — violates user intent.
Implementation notes
Mirrors the existing ChatGPT-OAuth-direct pattern in sdk/src/impl/model-provider.ts — adds a third branch to
getModelForRequest()parallel to ChatGPT-OAuth and Codebuff-backend defaults. SameOpenAICompatibleChatLanguageModelprimitive, zero new dependencies.Key code paths:
AgentDefinitionfiles (.agents/,agents/,common/src/templates/initial-agents-dir/) get optionalbaseUrl/apiKeyonproviderOptions. BackendOpenRouterProviderRoutingOptionsextended likewise. Zod schema validatesbaseUrlas a URL.getModelForRequest()returns the new branch whencustomProvider.baseUrlis set.promptAiSdkStream()resolves precedence and skips Codebuff metadata / OpenRouter routing keys on direct-route requests (same as ChatGPT-OAuth path).CODEBUFF_LOCAL_MODELenv var (set by/local on <model>) substitutes the agent's declared cloud model id with a local one, so the request body sent to Ollama contains the right model name. Override is logged at INFO for transparency.maxRetries: 1on direct routes — one retry handles brief model-load stalls without long waits when the provider is genuinely down.getCustomProviderBaseUrlFromEnv()readsprocess.envon every request, so/local on//local offmid-session takes effect immediately without rebuilding the cachedCodebuffClient.System notes discovered during integration
A few things worth documenting for future contributors:
The agent runtime auto-injects scaffolding around every agent's prompts (run-agent-step.ts:232-253):
STEP_PROMPTuser messages every iteration,INSTRUCTIONS_PROMPTuser message once, additional system prompts (file tree, mode hints, codebuff meta-info). Even minimal custom agents inherit this scaffolding — which is fine for cloud models but is the root cause of small-model confusion. Addressed in the follow-up PR.Orchestrator runs many LLM calls per turn. A single user prompt in DEFAULT mode triggers the orchestrator + sub-agents (file pickers, thinkers, editors, code-reviewers). Each makes its own LLM call. On cloud this is fast; on local 8B models it's 30-90 seconds per turn.
Even with local inference, the agent runtime still hits the Codebuff backend for run setup (
/api/v1/agent-runs) and token counting (/api/v1/token-count). Only inference goes local. True air-gapped operation is a larger scope and is out of this PR (documented as known limitation).OpenAICompatibleChatLanguageModelalready exists in the codebase — it's the same primitive used by the ChatGPT-OAuth-direct path. No new SDK was needed.Verification
Automated
sdk/src/impl/__tests__/model-provider-custom.test.tscovering the custom-provider branch, trailing-slash tolerance, precedence over ChatGPT-OAuth eligibility, missing-apiKey defaulting.cli/src/commands/__tests__/local-provider.test.tscovering/localparser shapes, URL/model validation, idempotent disable, full toggle cycle.Live (against Ollama 0.20.5)
Verified end-to-end during development:
customProviderroutes to local;llama3.1:8bstreams a real response.baseUrltolerated.CODEBUFF_BASE_URLenv-var fallback works.CODEBUFF_LOCAL_MODELoverride substitutes the cloud model id in the outbound request body (verified by sending'anthropic/claude-opus-4-7'in params + env override → Ollama responded with'llama3.1:8b'content).Backward compatibility
Fully backward-compatible. All new fields and the env vars are optional. Existing agents with
providerOptions: { order: [...] }(OpenRouter routing config) continue to work unchanged. The Codebuff-backend default path is untouched when no custom provider is configured. No breaking changes to the SDK API.Known limitations (intentional — addressed in follow-ups)
These were considered and explicitly left out to keep this PR focused on the plumbing the maintainer signed off on:
compactPromptsflag onAgentDefinition— for opting an agent out of Codebuff's auto-injected scaffolding (STEP_PROMPT, INSTRUCTIONS_PROMPT, additional system prompts). Makes small-model agents actually usable. (Follow-up PR)/mode <custom-agent>— letting users set a custom agent as the active orchestrator. (Follow-up PR)fallbackToCodebuffopt-in — for users who explicitly want fallback on local failure.Files changed
17 files, +1221 / −8.