feat(sdk): support local OpenAI-compatible providers via providerOptions (closes #678) by vraj00222 · Pull Request #693 · CodebuffAI/codebuff

vraj00222 · 2026-05-16T06:55:12Z

Adds the ability to route an agent's LLM calls directly to a local or self-hosted OpenAI-compatible endpoint (Ollama, LM Studio, etc.) instead of the Codebuff backend. Resolves #678.

Followed @jahooma's review guidance from the issue thread:

✅ Reuses the existing providerOptions field (no new localProvider)
✅ Env var renamed: CODEBUFF_BASE_URL (dropped "LOCAL" — baseUrl / apiKey are also useful for non-local cloud endpoints)
✅ CodebuffClient constructor accepts providerBaseUrl / providerApiKey

Zero new runtime dependencies. Fully backward-compatible.

How a user uses it

Simplest — env var (every agent goes local)

export CODEBUFF_BASE_URL=http://localhost:11434/v1
codebuff

Inside the CLI — new `/local` slash command

/local on llama3.1:8b              # enable; sets URL default + model override
/local on http://x:1234/v1 llama-7b   # custom URL + model
/local list                        # query Ollama for available models
/local model qwen2.5-coder:7b      # swap models mid-session
/local status                      # show current config
/local off                         # disable, back to Codebuff cloud

Per-agent (the cost-split pattern — see "Recommended usage" below)

// .agents/my-local-file-picker.ts
const definition: AgentDefinition = {
  id: 'my-local-file-picker',
  model: 'llama3.1:8b',
  providerOptions: {
    baseUrl: 'http://localhost:11434/v1',
    apiKey: 'ollama',
  },
  // ...
}

SDK consumers

new CodebuffClient({
  apiKey: 'cb-...',
  providerBaseUrl: 'http://localhost:11434/v1',
})

Precedence

Per LLM call: agent's providerOptions.baseUrl > CodebuffClient.providerBaseUrl > CODEBUFF_BASE_URL env var > Codebuff backend (unchanged default). apiKey resolves on the same ladder, paired with whichever URL wins.

Recommended usage (today)

Cost-split pattern — exactly what #678 called out:

"cheap tasks (file picking) locally, complex reasoning on cloud"

Keep the default orchestrator agents on Codebuff cloud (best at planning, tool routing, complex reasoning). Set providerOptions.baseUrl per-agent for cheap sub-agents like file-picker and code-searcher. The orchestrator stays smart; the cheap tasks become free. The PR enables this exactly.

What's NOT recommended yet: replacing the default orchestrator with a small local model end-to-end. Codebuff's default agents have ~5000-token system prompts (with extensive tool-calling instructions, sub-agent definitions, mode hints, etc.) designed for cloud-class models. Small local models (≤8B) struggle to follow that scaffolding cleanly and often produce confusing output ("Please provide the task you would like me to perform") or empty turns. This is not a bug in the plumbing — the request reaches the local model correctly; the model just can't drive the orchestrator role.

A follow-up PR will add a compactPrompts flag on AgentDefinition plus /mode <custom-agent> so small-model end-to-end workflows work cleanly.

Demo

When it fails (Ollama down, wrong URL, model not pulled)

Users get an actionable error, not raw fetch failed:

Cannot reach LLM provider at http://localhost:11434/v1.

Check:
  • Is the provider running? (e.g. `ollama serve` or LM Studio's Local Server)
  • Is the URL correct? Currently configured: http://localhost:11434/v1
  • Is the model 'llama3.1:8b' loaded? (e.g. `ollama list`)

Error patterns cover both Node-style (ECONNREFUSED, ENOTFOUND, ETIMEDOUT) and Bun-style (ConnectionRefused, "Unable to connect...") variants — caught during local testing against Bun's fetch implementation.

No silent fallback to the Codebuff backend if the local endpoint is unreachable. Falling back would leak prompts the user wanted to keep local and incur billing they expected to avoid — violates user intent.

Implementation notes

Mirrors the existing ChatGPT-OAuth-direct pattern in sdk/src/impl/model-provider.ts — adds a third branch to getModelForRequest() parallel to ChatGPT-OAuth and Codebuff-backend defaults. Same OpenAICompatibleChatLanguageModel primitive, zero new dependencies.

Key code paths:

Type surface — three mirrored AgentDefinition files (.agents/, agents/, common/src/templates/initial-agents-dir/) get optional baseUrl / apiKey on providerOptions. Backend OpenRouterProviderRoutingOptions extended likewise. Zod schema validates baseUrl as a URL.
Dispatch — getModelForRequest() returns the new branch when customProvider.baseUrl is set. promptAiSdkStream() resolves precedence and skips Codebuff metadata / OpenRouter routing keys on direct-route requests (same as ChatGPT-OAuth path).
Model override — CODEBUFF_LOCAL_MODEL env var (set by /local on <model>) substitutes the agent's declared cloud model id with a local one, so the request body sent to Ollama contains the right model name. Override is logged at INFO for transparency.
maxRetries: 1 on direct routes — one retry handles brief model-load stalls without long waits when the provider is genuinely down.
Lazy env reading — getCustomProviderBaseUrlFromEnv() reads process.env on every request, so /local on//local off mid-session takes effect immediately without rebuilding the cached CodebuffClient.

System notes discovered during integration

A few things worth documenting for future contributors:

The agent runtime auto-injects scaffolding around every agent's prompts (run-agent-step.ts:232-253): STEP_PROMPT user messages every iteration, INSTRUCTIONS_PROMPT user message once, additional system prompts (file tree, mode hints, codebuff meta-info). Even minimal custom agents inherit this scaffolding — which is fine for cloud models but is the root cause of small-model confusion. Addressed in the follow-up PR.
Orchestrator runs many LLM calls per turn. A single user prompt in DEFAULT mode triggers the orchestrator + sub-agents (file pickers, thinkers, editors, code-reviewers). Each makes its own LLM call. On cloud this is fast; on local 8B models it's 30-90 seconds per turn.
Even with local inference, the agent runtime still hits the Codebuff backend for run setup (/api/v1/agent-runs) and token counting (/api/v1/token-count). Only inference goes local. True air-gapped operation is a larger scope and is out of this PR (documented as known limitation).
OpenAICompatibleChatLanguageModel already exists in the codebase — it's the same primitive used by the ChatGPT-OAuth-direct path. No new SDK was needed.

Verification

Automated

6 unit tests in sdk/src/impl/__tests__/model-provider-custom.test.ts covering the custom-provider branch, trailing-slash tolerance, precedence over ChatGPT-OAuth eligibility, missing-apiKey defaulting.
40 unit tests in cli/src/commands/__tests__/local-provider.test.ts covering /local parser shapes, URL/model validation, idempotent disable, full toggle cycle.
All existing tests pass: SDK 454/454, agent-runtime 426/426, CLI 2354/2354. All workspace typechecks clean.

Live (against Ollama 0.20.5)

Verified end-to-end during development:

✅ customProvider routes to local; llama3.1:8b streams a real response.
✅ Trailing slashes on baseUrl tolerated.
✅ Unreachable endpoint produces the wrapped error with hostname + model.
✅ CODEBUFF_BASE_URL env-var fallback works.
✅ CODEBUFF_LOCAL_MODEL override substitutes the cloud model id in the outbound request body (verified by sending 'anthropic/claude-opus-4-7' in params + env override → Ollama responded with 'llama3.1:8b' content).

Backward compatibility

Fully backward-compatible. All new fields and the env vars are optional. Existing agents with providerOptions: { order: [...] } (OpenRouter routing config) continue to work unchanged. The Codebuff-backend default path is untouched when no custom provider is configured. No breaking changes to the SDK API.

Known limitations (intentional — addressed in follow-ups)

These were considered and explicitly left out to keep this PR focused on the plumbing the maintainer signed off on:

compactPrompts flag on AgentDefinition — for opting an agent out of Codebuff's auto-injected scaffolding (STEP_PROMPT, INSTRUCTIONS_PROMPT, additional system prompts). Makes small-model agents actually usable. (Follow-up PR)
/mode <custom-agent> — letting users set a custom agent as the active orchestrator. (Follow-up PR)
Auto-detection of local providers on startup (probing :11434, :1234). Adds UI surface and security considerations — separate concern.
True air-gapped mode — agent runtime still hits Codebuff backend for run setup and token counting. Bigger scope.
Model capability awareness — refuse / warn when pairing tiny models with heavy agents.
fallbackToCodebuff opt-in — for users who explicitly want fallback on local failure.

Files changed

17 files, +1221 / −8.

TYPES (mirrored agent-facing × 3)
  .agents/types/agent-definition.ts
  agents/types/agent-definition.ts
  common/src/templates/initial-agents-dir/types/agent-definition.ts

TYPES (backend + contract + schema)
  common/src/types/agent-template.ts
  common/src/types/contracts/llm.ts
  common/src/types/dynamic-agent-template.ts

CONSTANTS
  common/src/constants/custom-provider.ts             (new)

SDK
  sdk/src/env.ts
  sdk/src/run.ts
  sdk/src/impl/llm.ts
  sdk/src/impl/model-provider.ts
  sdk/src/impl/agent-runtime.ts
  sdk/src/impl/__tests__/model-provider-custom.test.ts   (new)

CLI (the /local command)
  cli/src/commands/local-provider.ts                  (new)
  cli/src/commands/command-registry.ts
  cli/src/data/slash-commands.ts
  cli/src/commands/__tests__/local-provider.test.ts   (new)

…e endpoints Adds two optional fields to AgentDefinition.providerOptions (and the backend OpenRouterProviderRoutingOptions + Zod schema) so an agent can direct its LLM calls at a custom OpenAI-compatible base URL (Ollama, LM Studio, self-hosted). The dispatch logic that consumes these fields lands in a follow-up commit. Also adds CODEBUFF_BASE_URL / CODEBUFF_PROVIDER_API_KEY env var constants and SDK getters for them. Part of issue CodebuffAI#678. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…rRequest When ModelRequestParams.customProvider.baseUrl is set, return an OpenAICompatibleChatLanguageModel pointed at that endpoint and flag the result with isCustomProvider: true. Bypasses both the Codebuff backend and the ChatGPT OAuth direct path. No metadataExtractor — direct calls don't flow through Codebuff cost accounting. Mirrors the existing ChatGPT-OAuth-direct branch pattern. Trailing slashes on baseUrl are trimmed. apiKey defaults to "codebuff" when absent (most local runtimes ignore it). Adds 5 unit tests covering the new branch, regression-tested against existing model-provider-free-mode tests. Part of issue CodebuffAI#678. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…option In promptAiSdkStream, resolve baseUrl/apiKey across three layers (agent providerOptions > clientCustomProvider > env vars) and forward to getModelForRequest. When the custom-provider path is active: • maxRetries: 1 (one retry handles brief model-load stalls; no further fallback — would violate user intent re: privacy / cost) • Skip codebuff_metadata and OpenRouter routing keys in the request body (same as the existing ChatGPT-OAuth-direct branch) • Wrap connection failures and 404s in friendly messages pointing at the configured URL and model Plumbs CodebuffClient.{providerBaseUrl, providerApiKey} through runOnce → getAgentRuntimeImpl, which wraps promptAiSdkStream with a closure that injects clientCustomProvider on every call. Adds an integration test documenting the precedence contract. Resolves issue CodebuffAI#678 (implementation; smoke verification follows). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Bun's fetch surfaces ECONNREFUSED as code='ConnectionRefused' with message "Unable to connect. Is the computer able to access the url?". Neither matched the original error-wrap regex. Now check both the raw message and the error.code property across Bun/Node patterns. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

UX layer on top of the providerOptions.baseUrl plumbing — lets a user toggle between local and cloud inference without restarting codebuff or editing agent files. Subcommands: /local — show current status /local on — enable with default Ollama URL (localhost:11434/v1) /local on <url> — enable with a specific URL /local set <url> — alias for `/local on <url>` /local off — disable, return to Codebuff backend /local status — same as `/local` Implementation mutates process.env.CODEBUFF_BASE_URL at runtime. The SDK reads this env var lazily on every promptAiSdkStream call, so changes take effect immediately for the next request without needing to rebuild the CodebuffClient. Agent-level providerOptions.baseUrl still wins — /local only affects agents that don't set their own baseUrl. Communicated in the enable message so users aren't surprised. 29 unit tests covering: parse/apply separation, all subcommands and aliases, URL validation, idempotent disable, end-to-end toggle cycle, and verification that mutations are visible to the SDK env getter. All 2354 existing CLI tests still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…l with "model not found" Without this, `/local on` redirected the URL to Ollama but the agent still sent its declared cloud model id (e.g. anthropic/claude-opus-4-7) which Ollama doesn't have — every prompt failed with "model not found". Adds a CODEBUFF_LOCAL_MODEL env var that overrides the agent's model when a custom provider is active. The SDK reads it lazily in promptAiSdkStream, mirroring how the URL is resolved. Override only applies to agents WITHOUT their own providerOptions.baseUrl — explicit per-agent config is left alone. /local subcommands grew: /local on <model> — enable with default URL + model /local on <url> <model> — enable with both /local model <name> — set model after enable /local model clear — drop the override /local list — probe Ollama's /api/tags for models /local <model-with-colon> — shortcut: same as `on <model>` The status message also nudges users toward setting a model: without an override, it warns that cloud model ids will go to the local provider and fail. 40 unit tests cover: parser shapes (URL only, model only, both, aliases, bare URL, bare model with-colon shortcut), URL/model validation, set-model rejection when local is off, idempotent disable, end-to-end env mutation via the SDK getter. All 3245 existing tests across CLI + SDK + agent-runtime still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

When CODEBUFF_LOCAL_MODEL is set and overrides params.model, log it at INFO so users can grep their cli.jsonl and confirm the substitution is happening on outbound requests. Verified end-to-end against live Ollama: params.model='anthropic/claude-opus-4.7' + env CODEBUFF_LOCAL_MODEL='llama3.1:8b' → request reached Ollama, llama3.1:8b responded. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

• Remove unused isUrl() helper in /local parser (replaced by looksLikeUrl during earlier fix; never deleted). • Rename env var CODEBUFF_LOCAL_MODEL → CODEBUFF_PROVIDER_MODEL so all three custom-provider env vars share the CODEBUFF_(PROVIDER_)? prefix consistently. Clarify in JSDoc that the override is skipped when an agent declares its own providerOptions.baseUrl. • Default apiKey placeholder "codebuff" → "unused" in createCustomProviderModel. The literal string "codebuff" invited the wrong mental model (could read as "send my Codebuff key"); "unused" plus a comment makes the intent obvious. Local runtimes ignore the Authorization header entirely; we never send the user's real key on the direct path. • Extract maxRetries: 1 into CUSTOM_PROVIDER_MAX_RETRIES with a JSDoc explaining the choice (one retry for cold-start; more wouldn't help with deterministic local failures). • Simplify the precedence ladder in promptAiSdkStream — replace the nested ternary that paired apiKey-with-winning-baseUrl with a small sources array + .find(). Same behavior, easier to read at a glance. Tests updated for the env var rename. All 3245 tests across CLI, SDK, and agent-runtime still pass. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

vraj00222 requested review from brandonkachen, charleslien and jahooma as code owners May 16, 2026 06:55

vraj00222 and others added 7 commits May 16, 2026 04:22

vraj00222 force-pushed the feat/local-openai-compatible-provider branch from 354fdb1 to a74e57b Compare May 16, 2026 11:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sdk): support local OpenAI-compatible providers via providerOptions (closes #678)#693

feat(sdk): support local OpenAI-compatible providers via providerOptions (closes #678)#693
vraj00222 wants to merge 8 commits into
CodebuffAI:mainfrom
vraj00222:feat/local-openai-compatible-provider

vraj00222 commented May 16, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vraj00222 commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How a user uses it

Simplest — env var (every agent goes local)

Inside the CLI — new /local slash command

Per-agent (the cost-split pattern — see "Recommended usage" below)

SDK consumers

Precedence

Recommended usage (today)

Demo

When it fails (Ollama down, wrong URL, model not pulled)

Implementation notes

System notes discovered during integration

Verification

Automated

Live (against Ollama 0.20.5)

Backward compatibility

Known limitations (intentional — addressed in follow-ups)

Files changed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vraj00222 commented May 16, 2026 •

edited

Loading

Inside the CLI — new `/local` slash command