Skip to content

feat(sdk): support local OpenAI-compatible providers via providerOptions (closes #678)#693

Open
vraj00222 wants to merge 8 commits into
CodebuffAI:mainfrom
vraj00222:feat/local-openai-compatible-provider
Open

feat(sdk): support local OpenAI-compatible providers via providerOptions (closes #678)#693
vraj00222 wants to merge 8 commits into
CodebuffAI:mainfrom
vraj00222:feat/local-openai-compatible-provider

Conversation

@vraj00222
Copy link
Copy Markdown

@vraj00222 vraj00222 commented May 16, 2026

Adds the ability to route an agent's LLM calls directly to a local or self-hosted OpenAI-compatible endpoint (Ollama, LM Studio, etc.) instead of the Codebuff backend. Resolves #678.

Followed @jahooma's review guidance from the issue thread:

  • ✅ Reuses the existing providerOptions field (no new localProvider)
  • ✅ Env var renamed: CODEBUFF_BASE_URL (dropped "LOCAL" — baseUrl / apiKey are also useful for non-local cloud endpoints)
  • CodebuffClient constructor accepts providerBaseUrl / providerApiKey

Zero new runtime dependencies. Fully backward-compatible.


How a user uses it

Simplest — env var (every agent goes local)

export CODEBUFF_BASE_URL=http://localhost:11434/v1
codebuff

Inside the CLI — new /local slash command

/local on llama3.1:8b              # enable; sets URL default + model override
/local on http://x:1234/v1 llama-7b   # custom URL + model
/local list                        # query Ollama for available models
/local model qwen2.5-coder:7b      # swap models mid-session
/local status                      # show current config
/local off                         # disable, back to Codebuff cloud

Per-agent (the cost-split pattern — see "Recommended usage" below)

// .agents/my-local-file-picker.ts
const definition: AgentDefinition = {
  id: 'my-local-file-picker',
  model: 'llama3.1:8b',
  providerOptions: {
    baseUrl: 'http://localhost:11434/v1',
    apiKey: 'ollama',
  },
  // ...
}

SDK consumers

new CodebuffClient({
  apiKey: 'cb-...',
  providerBaseUrl: 'http://localhost:11434/v1',
})

Precedence

Per LLM call: agent's providerOptions.baseUrl > CodebuffClient.providerBaseUrl > CODEBUFF_BASE_URL env var > Codebuff backend (unchanged default). apiKey resolves on the same ladder, paired with whichever URL wins.


Recommended usage (today)

Cost-split pattern — exactly what #678 called out:

"cheap tasks (file picking) locally, complex reasoning on cloud"

Keep the default orchestrator agents on Codebuff cloud (best at planning, tool routing, complex reasoning). Set providerOptions.baseUrl per-agent for cheap sub-agents like file-picker and code-searcher. The orchestrator stays smart; the cheap tasks become free. The PR enables this exactly.

What's NOT recommended yet: replacing the default orchestrator with a small local model end-to-end. Codebuff's default agents have ~5000-token system prompts (with extensive tool-calling instructions, sub-agent definitions, mode hints, etc.) designed for cloud-class models. Small local models (≤8B) struggle to follow that scaffolding cleanly and often produce confusing output ("Please provide the task you would like me to perform") or empty turns. This is not a bug in the plumbing — the request reaches the local model correctly; the model just can't drive the orchestrator role.

A follow-up PR will add a compactPrompts flag on AgentDefinition plus /mode <custom-agent> so small-model end-to-end workflows work cleanly.


Demo

Screenshot 2026-05-16 at 4 22 20 AM

When it fails (Ollama down, wrong URL, model not pulled)

Users get an actionable error, not raw fetch failed:

Cannot reach LLM provider at http://localhost:11434/v1.

Check:
  • Is the provider running? (e.g. `ollama serve` or LM Studio's Local Server)
  • Is the URL correct? Currently configured: http://localhost:11434/v1
  • Is the model 'llama3.1:8b' loaded? (e.g. `ollama list`)

Error patterns cover both Node-style (ECONNREFUSED, ENOTFOUND, ETIMEDOUT) and Bun-style (ConnectionRefused, "Unable to connect...") variants — caught during local testing against Bun's fetch implementation.

No silent fallback to the Codebuff backend if the local endpoint is unreachable. Falling back would leak prompts the user wanted to keep local and incur billing they expected to avoid — violates user intent.


Implementation notes

Mirrors the existing ChatGPT-OAuth-direct pattern in sdk/src/impl/model-provider.ts — adds a third branch to getModelForRequest() parallel to ChatGPT-OAuth and Codebuff-backend defaults. Same OpenAICompatibleChatLanguageModel primitive, zero new dependencies.

Key code paths:

  • Type surface — three mirrored AgentDefinition files (.agents/, agents/, common/src/templates/initial-agents-dir/) get optional baseUrl / apiKey on providerOptions. Backend OpenRouterProviderRoutingOptions extended likewise. Zod schema validates baseUrl as a URL.
  • DispatchgetModelForRequest() returns the new branch when customProvider.baseUrl is set. promptAiSdkStream() resolves precedence and skips Codebuff metadata / OpenRouter routing keys on direct-route requests (same as ChatGPT-OAuth path).
  • Model overrideCODEBUFF_LOCAL_MODEL env var (set by /local on <model>) substitutes the agent's declared cloud model id with a local one, so the request body sent to Ollama contains the right model name. Override is logged at INFO for transparency.
  • maxRetries: 1 on direct routes — one retry handles brief model-load stalls without long waits when the provider is genuinely down.
  • Lazy env readinggetCustomProviderBaseUrlFromEnv() reads process.env on every request, so /local on//local off mid-session takes effect immediately without rebuilding the cached CodebuffClient.

System notes discovered during integration

A few things worth documenting for future contributors:

  1. The agent runtime auto-injects scaffolding around every agent's prompts (run-agent-step.ts:232-253): STEP_PROMPT user messages every iteration, INSTRUCTIONS_PROMPT user message once, additional system prompts (file tree, mode hints, codebuff meta-info). Even minimal custom agents inherit this scaffolding — which is fine for cloud models but is the root cause of small-model confusion. Addressed in the follow-up PR.

  2. Orchestrator runs many LLM calls per turn. A single user prompt in DEFAULT mode triggers the orchestrator + sub-agents (file pickers, thinkers, editors, code-reviewers). Each makes its own LLM call. On cloud this is fast; on local 8B models it's 30-90 seconds per turn.

  3. Even with local inference, the agent runtime still hits the Codebuff backend for run setup (/api/v1/agent-runs) and token counting (/api/v1/token-count). Only inference goes local. True air-gapped operation is a larger scope and is out of this PR (documented as known limitation).

  4. OpenAICompatibleChatLanguageModel already exists in the codebase — it's the same primitive used by the ChatGPT-OAuth-direct path. No new SDK was needed.


Verification

Automated

  • 6 unit tests in sdk/src/impl/__tests__/model-provider-custom.test.ts covering the custom-provider branch, trailing-slash tolerance, precedence over ChatGPT-OAuth eligibility, missing-apiKey defaulting.
  • 40 unit tests in cli/src/commands/__tests__/local-provider.test.ts covering /local parser shapes, URL/model validation, idempotent disable, full toggle cycle.
  • All existing tests pass: SDK 454/454, agent-runtime 426/426, CLI 2354/2354. All workspace typechecks clean.

Live (against Ollama 0.20.5)

Verified end-to-end during development:

  • customProvider routes to local; llama3.1:8b streams a real response.
  • ✅ Trailing slashes on baseUrl tolerated.
  • ✅ Unreachable endpoint produces the wrapped error with hostname + model.
  • CODEBUFF_BASE_URL env-var fallback works.
  • CODEBUFF_LOCAL_MODEL override substitutes the cloud model id in the outbound request body (verified by sending 'anthropic/claude-opus-4-7' in params + env override → Ollama responded with 'llama3.1:8b' content).

Backward compatibility

Fully backward-compatible. All new fields and the env vars are optional. Existing agents with providerOptions: { order: [...] } (OpenRouter routing config) continue to work unchanged. The Codebuff-backend default path is untouched when no custom provider is configured. No breaking changes to the SDK API.


Known limitations (intentional — addressed in follow-ups)

These were considered and explicitly left out to keep this PR focused on the plumbing the maintainer signed off on:

  • compactPrompts flag on AgentDefinition — for opting an agent out of Codebuff's auto-injected scaffolding (STEP_PROMPT, INSTRUCTIONS_PROMPT, additional system prompts). Makes small-model agents actually usable. (Follow-up PR)
  • /mode <custom-agent> — letting users set a custom agent as the active orchestrator. (Follow-up PR)
  • Auto-detection of local providers on startup (probing :11434, :1234). Adds UI surface and security considerations — separate concern.
  • True air-gapped mode — agent runtime still hits Codebuff backend for run setup and token counting. Bigger scope.
  • Model capability awareness — refuse / warn when pairing tiny models with heavy agents.
  • fallbackToCodebuff opt-in — for users who explicitly want fallback on local failure.

Files changed

17 files, +1221 / −8.

TYPES (mirrored agent-facing × 3)
  .agents/types/agent-definition.ts
  agents/types/agent-definition.ts
  common/src/templates/initial-agents-dir/types/agent-definition.ts

TYPES (backend + contract + schema)
  common/src/types/agent-template.ts
  common/src/types/contracts/llm.ts
  common/src/types/dynamic-agent-template.ts

CONSTANTS
  common/src/constants/custom-provider.ts             (new)

SDK
  sdk/src/env.ts
  sdk/src/run.ts
  sdk/src/impl/llm.ts
  sdk/src/impl/model-provider.ts
  sdk/src/impl/agent-runtime.ts
  sdk/src/impl/__tests__/model-provider-custom.test.ts   (new)

CLI (the /local command)
  cli/src/commands/local-provider.ts                  (new)
  cli/src/commands/command-registry.ts
  cli/src/data/slash-commands.ts
  cli/src/commands/__tests__/local-provider.test.ts   (new)

vraj00222 and others added 7 commits May 16, 2026 04:22
…e endpoints

Adds two optional fields to AgentDefinition.providerOptions (and the
backend OpenRouterProviderRoutingOptions + Zod schema) so an agent can
direct its LLM calls at a custom OpenAI-compatible base URL (Ollama,
LM Studio, self-hosted). The dispatch logic that consumes these fields
lands in a follow-up commit.

Also adds CODEBUFF_BASE_URL / CODEBUFF_PROVIDER_API_KEY env var
constants and SDK getters for them.

Part of issue CodebuffAI#678.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…rRequest

When ModelRequestParams.customProvider.baseUrl is set, return an
OpenAICompatibleChatLanguageModel pointed at that endpoint and flag
the result with isCustomProvider: true. Bypasses both the Codebuff
backend and the ChatGPT OAuth direct path. No metadataExtractor —
direct calls don't flow through Codebuff cost accounting.

Mirrors the existing ChatGPT-OAuth-direct branch pattern. Trailing
slashes on baseUrl are trimmed. apiKey defaults to "codebuff" when
absent (most local runtimes ignore it).

Adds 5 unit tests covering the new branch, regression-tested against
existing model-provider-free-mode tests.

Part of issue CodebuffAI#678.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…option

In promptAiSdkStream, resolve baseUrl/apiKey across three layers (agent
providerOptions > clientCustomProvider > env vars) and forward to
getModelForRequest. When the custom-provider path is active:

  • maxRetries: 1 (one retry handles brief model-load stalls; no further
    fallback — would violate user intent re: privacy / cost)
  • Skip codebuff_metadata and OpenRouter routing keys in the request
    body (same as the existing ChatGPT-OAuth-direct branch)
  • Wrap connection failures and 404s in friendly messages pointing at
    the configured URL and model

Plumbs CodebuffClient.{providerBaseUrl, providerApiKey} through
runOnce → getAgentRuntimeImpl, which wraps promptAiSdkStream with a
closure that injects clientCustomProvider on every call.

Adds an integration test documenting the precedence contract.

Resolves issue CodebuffAI#678 (implementation; smoke verification follows).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bun's fetch surfaces ECONNREFUSED as code='ConnectionRefused' with
message "Unable to connect. Is the computer able to access the url?".
Neither matched the original error-wrap regex. Now check both the raw
message and the error.code property across Bun/Node patterns.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
UX layer on top of the providerOptions.baseUrl plumbing — lets a user
toggle between local and cloud inference without restarting codebuff
or editing agent files.

Subcommands:
  /local                   — show current status
  /local on                — enable with default Ollama URL (localhost:11434/v1)
  /local on <url>          — enable with a specific URL
  /local set <url>         — alias for `/local on <url>`
  /local off               — disable, return to Codebuff backend
  /local status            — same as `/local`

Implementation mutates process.env.CODEBUFF_BASE_URL at runtime. The
SDK reads this env var lazily on every promptAiSdkStream call, so
changes take effect immediately for the next request without needing
to rebuild the CodebuffClient.

Agent-level providerOptions.baseUrl still wins — /local only affects
agents that don't set their own baseUrl. Communicated in the enable
message so users aren't surprised.

29 unit tests covering: parse/apply separation, all subcommands and
aliases, URL validation, idempotent disable, end-to-end toggle cycle,
and verification that mutations are visible to the SDK env getter.

All 2354 existing CLI tests still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…l with "model not found"

Without this, `/local on` redirected the URL to Ollama but the agent
still sent its declared cloud model id (e.g. anthropic/claude-opus-4-7)
which Ollama doesn't have — every prompt failed with "model not found".

Adds a CODEBUFF_LOCAL_MODEL env var that overrides the agent's model
when a custom provider is active. The SDK reads it lazily in
promptAiSdkStream, mirroring how the URL is resolved. Override only
applies to agents WITHOUT their own providerOptions.baseUrl — explicit
per-agent config is left alone.

/local subcommands grew:
  /local on <model>            — enable with default URL + model
  /local on <url> <model>      — enable with both
  /local model <name>          — set model after enable
  /local model clear           — drop the override
  /local list                  — probe Ollama's /api/tags for models
  /local <model-with-colon>    — shortcut: same as `on <model>`

The status message also nudges users toward setting a model: without
an override, it warns that cloud model ids will go to the local
provider and fail.

40 unit tests cover: parser shapes (URL only, model only, both,
aliases, bare URL, bare model with-colon shortcut), URL/model
validation, set-model rejection when local is off, idempotent
disable, end-to-end env mutation via the SDK getter.

All 3245 existing tests across CLI + SDK + agent-runtime still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When CODEBUFF_LOCAL_MODEL is set and overrides params.model, log it at
INFO so users can grep their cli.jsonl and confirm the substitution
is happening on outbound requests.

Verified end-to-end against live Ollama:
  params.model='anthropic/claude-opus-4.7' + env CODEBUFF_LOCAL_MODEL='llama3.1:8b'
  → request reached Ollama, llama3.1:8b responded.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@vraj00222 vraj00222 force-pushed the feat/local-openai-compatible-provider branch from 354fdb1 to a74e57b Compare May 16, 2026 11:38
  • Remove unused isUrl() helper in /local parser (replaced by
    looksLikeUrl during earlier fix; never deleted).

  • Rename env var CODEBUFF_LOCAL_MODEL → CODEBUFF_PROVIDER_MODEL
    so all three custom-provider env vars share the CODEBUFF_(PROVIDER_)?
    prefix consistently. Clarify in JSDoc that the override is skipped
    when an agent declares its own providerOptions.baseUrl.

  • Default apiKey placeholder "codebuff" → "unused" in
    createCustomProviderModel. The literal string "codebuff" invited the
    wrong mental model (could read as "send my Codebuff key"); "unused"
    plus a comment makes the intent obvious. Local runtimes ignore the
    Authorization header entirely; we never send the user's real key on
    the direct path.

  • Extract maxRetries: 1 into CUSTOM_PROVIDER_MAX_RETRIES with a
    JSDoc explaining the choice (one retry for cold-start; more wouldn't
    help with deterministic local failures).

  • Simplify the precedence ladder in promptAiSdkStream — replace the
    nested ternary that paired apiKey-with-winning-baseUrl with a small
    sources array + .find(). Same behavior, easier to read at a glance.

Tests updated for the env var rename. All 3245 tests across CLI, SDK,
and agent-runtime still pass.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature: Support local OpenAI-compatible providers (Ollama, LM Studio) via localProvider in AgentDefinition

1 participant