feat(models): Phase 3 — openai backend + toolMode: 'return' by heskew · Pull Request #698 · HarperFast/harper

heskew · 2026-05-21T19:48:16Z

Summary

Phase 3 of #510 — second real ModelBackend, against OpenAI's HTTP API
(or any OpenAI-compatible endpoint via baseUrl override). Validates the
Phase 1 (#628) interface against a real provider with native tool-call
support and a different streaming wire format from Ollama.

Also lands toolMode: 'return' end-to-end: caller passes
tools: ToolDef[], OpenAI returns tool-call requests in the
GenerateResult or GenerateChunk.deltaToolCalls, caller resolves
externally. The harder half (toolMode: 'auto' orchestration) is
split to #612.

Closes #630
Tracking: #510

Note

Opened as Draft until CI clears, per the engineering-guidelines
step 16 (harper-engineering-guidelines/SKILL.md). Specific reviewer
assignment from the expertise matrix after Draft → Ready.

What ships

utility/expandEnvVar.ts — new reusable env-var expansion utility
(expandEnvVar, expandEnvVarsDeep, isUnresolvedEnvVarPlaceholder).
Whole-string ${VAR_NAME} placeholders → process.env[VAR_NAME].
Matches the convention from @harperfast/oauth's config loader.
Object.create(null) accumulator for prototype-pollution defense.
components/openai/index.ts — OpenAIBackend implements ModelBackend
over native fetch (no SDK dep). Embed via /embeddings, generate
via /chat/completions, stream via SSE. Tool-call support: forwards
GenerateInput.tools as OpenAI's tools parameter; index-keyed
delta accumulator yields each tool call once when complete.
resources/models/bootstrap.ts — adds openai to the factory map.
Runs expandEnvVarsDeep(entry) before factory dispatch. Enumerates
known backends in the "unknown backend, skipping" error log so
operators can spot value-name typos (backend: 'openi'). Warns when
an entry has a literal apiKey (not wrapped in ${VAR}) —
surfaces the secret-on-disk anti-pattern at boot.
validation/configValidator.ts — switches modelEntrySchema from
flat to discriminated Joi.alternatives().conditional('.backend', ...).
ollama and openai get tight schemas with .unknown(false) so
cross-backend field typos block boot. Unknown backends fall through
to a permissive schema so future backends pass validation; bootstrap
logs+skips at runtime.

Why native fetch, not the `openai` SDK

The official SDK is ~5MB with transitive dependencies that ship with
every Harper install regardless of whether anyone uses model APIs.
Native fetch is ~300 lines of clean translation code, the OpenAI
wire surface we touch has been stable for 2+ years, and we don't
need the SDK's auto-retry / per-request key rotation / telemetry
features. Sets the precedent for #633 anthropic (also bearer auth).
Bedrock will need an SDK for SigV4 signing — that's a Phase 6 call.

Secrets handling (sets precedent for #633)

models:
  generative:
    default:
      backend: openai
      model: gpt-4o-mini
      apiKey: ${OPENAI_API_KEY}     # whole-string ${VAR} placeholder

bootstrapModels runs expandEnvVarsDeep on the entry before passing
it to the factory. Backend constructors detect unresolved placeholders
(env var unset) and throw a pointed "set the matching env var" error
rather than waiting for a 401 from the first request.

A literal apiKey: 'sk-...' (not wrapped in ${VAR}) is also accepted
but logs a warn at boot: secrets in harperdb-config.yaml land on
disk, in backups, and depending on deployment in replicated config —
the ${VAR} indirection is the documented pattern.

Pre-PR verifier cadence

Per the agreed cadence for this series:

deep-review single-pass × 3 parallel agents (api / concurrency / config).
Surfaced 8 findings total: 0 blockers, all Tier 4–5.
- 5 fixed in this PR: tool-call arguments cap, log on malformed
  tool-call drop, MAX_SSE_* rename, OpenAI error.message
  extraction with cap, Object.create(null) accumulator.
- 3 also fixed: enumerate known backends in unknown-backend log,
  align whitespace check between expandEnvVar and
  isUnresolvedEnvVarPlaceholder, warn on literal-credential at boot.
External cross-model pass: deferred to PR-time GitHub workflow
(claude + claude-bot inline). Per Phase 2 lessons (agy auth /
print-mode + Gemini CLI both hang in non-interactive Claude
sessions), the deep-review skill's domain agents are the productive
substrate.

Files

Path	Change
`utility/expandEnvVar.ts`	new — reusable env-var expansion
`unitTests/utility/expandEnvVar.test.js`	new — 18 tests
`components/openai/index.ts`	new — `OpenAIBackend` + `registerOpenAIBackend`
`unitTests/components/openai/index.test.js`	new — ~50 unit tests with mocked `fetch`
`resources/models/bootstrap.ts`	+ openai factory + `expandEnvVarsDeep` + enum-on-unknown log + literal-credential warn
`unitTests/resources/models/bootstrap.test.js`	+ 5 openai-specific tests
`validation/configValidator.ts`	flat → discriminated schema (ollama / openai / unknown)
`unitTests/validation/configValidator.test.js`	+ 7 openai-schema tests
`integrationTests/server/openai-backend.test.ts`	new — gated on `OPENAI_API_KEY`

Acceptance criteria

Test plan

npm run build clean
npm run lint:required clean
npx prettier --check . clean
209 unit tests pass locally (Node 24)
Phase 1 + 2 model tests still pass (no regression)
CI green across full matrix
Manual smoke against real OpenAI with OPENAI_API_KEY set

Out of scope (per #630)

toolMode: 'auto' orchestration — split to Add agent-loop orchestration / toolMode: 'auto' to scope.models #612
Anthropic / Bedrock backends — [Models] Phase 6 — anthropic + bedrock backends #633
OpenAI batch API / Assistants / Files / Audio / Images
Prompt caching (Anthropic-style)
Client-side retry / backoff (caller decides)

🤖 Generated with Claude Code

Adds `utility/expandEnvVar.ts` with three exports: - `expandEnvVar(value)` — whole-string `${VAR_NAME}` placeholder resolution against `process.env`. Falls back to the original placeholder string when the env var is unset, so downstream "missing required field" checks catch operator misconfiguration. - `expandEnvVarsDeep(value)` — recursive walker over arrays and plain objects, expanding every string leaf. Uses `Object.create(null)` for the accumulator so `__proto__` / `constructor` keys in input cannot pollute `Object.prototype`. Defensive measure for a util documented as reusable. - `isUnresolvedEnvVarPlaceholder(value)` — predicate used by callers that want a louder error than "field is missing" when an operator's `${VAR}` placeholder didn't resolve. Matches the convention established by `@harperfast/oauth`'s `src/lib/config.ts` so operator documentation and config snippets carry over verbatim. Both `expandEnvVar` and `isUnresolvedEnvVarPlaceholder` share a single internal `looksLikePlaceholder` predicate so the two helpers agree on edge cases (notably: a placeholder with internal whitespace is NOT a placeholder — env-var names can't contain whitespace, so treating `'${MY KEY}'` as a literal value is the right call). Tracking: #630, #510 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Second real `ModelBackend` for #510, against OpenAI's HTTP API or any OpenAI-compatible endpoint via `baseUrl` override (Azure OpenAI, Together AI, OpenRouter, vLLM's OpenAI shim). Validates the Phase 1 (#628) interface against a real provider with native tool-call support and a different streaming wire format from Ollama. Native `fetch` directly — no SDK dependency. The OpenAI wire surface we touch (`POST /embeddings`, `POST /chat/completions` with SSE streaming + tool calls) has been stable for 2+ years on the fields we read; the mapping is mechanical and ~350 lines. Endpoints + capabilities: - `embed` → `POST {baseUrl}/embeddings` - `generate` → `POST {baseUrl}/chat/completions` (chat shape always) - `generateStream` → same with `stream: true`; SSE-framed responses - `capabilities()` advertises `tools: true` (first backend with native tool-call support), `adapters: false` (OpenAI doesn't surface LoRA) Tool calls (`toolMode: 'return'` end-to-end): - Caller passes `tools: ToolDef[]` via `GenerateInput`'s object variant (`{ messages, tools, system }`) — the path Phase 1 established. - Non-streaming: `result.toolCalls` mirrors OpenAI's `choices[0].message.tool_calls` after parsing each `function.arguments` string into an object per Phase 1's `ToolCall.arguments: object` contract. - Streaming: backend internally accumulates index-keyed tool-call deltas (`Map<number, { id, name, argumentsBuf }>`), yields each call ONCE when its arguments parse cleanly on the terminal `finish_reason: 'tool_calls'` event. Same contract — caller never sees partial-argument strings. Robustness: - Sanitized errors. HTTP failures include OpenAI's `error.message` from the well-defined error envelope (model/service-side text, not user- input echo), capped at 500 chars for misbehaving compat shims. Malformed-NDJSON-equivalent (invalid SSE event) error is static so upstream-derived content doesn't leak through the thrown message. - Defensive caps: SSE accumulator at 1 MiB (matches Phase 2's NDJSON cap); per-tool-call argument-buffer at 1 MiB to prevent unbounded accumulation across many sub-MiB events. - Token-count hygiene: same `Number.isFinite && >= 0 && Number.isInteger` guard as Phase 2, so NaN/Infinity/negatives from upstream don't poison `SUM(prompt_tokens)`-style aggregates over `hdb_model_calls`. - Dropped tool calls (malformed JSON arguments from upstream) log at warn so the silent skip is auditable. - Unresolved `${VAR}` apiKey detection: constructor throws with a pointed "set the matching env var" error rather than waiting for a 401 from the first request. `AbortSignal` composition (`opts.signal` + `requestTimeoutMs` via `AbortSignal.any`) mirrors Phase 2's `OllamaBackend`. Component shape matches the pattern in `components/mcp/index.ts` (PR #649) and `components/ollama/index.ts` (PR #651): core imports a register helper and calls it during boot; not a `handleApplication(scope)` self-loader. Tracking: #630, #510 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

`validation/configValidator.ts` switches `modelEntrySchema` from a flat `Joi.object` to `Joi.alternatives().conditional('.backend', { switch: [...] })`. Each known backend (`ollama`, `openai`) gets a tight schema with `.unknown(false)` so field-name typos block boot. Unknown backends fall through to a permissive `Joi.object({ backend: required }).unknown(true)` so future backends or third-party components can register their own without core schema edits — `bootstrapModels` logs+skips at runtime. `resources/models/bootstrap.ts`: - Adds `openai: registerOpenAIBackend` to the factory map alongside `ollama`. - Runs `expandEnvVarsDeep(entry)` on each entry BEFORE dispatching to the factory. `apiKey: ${OPENAI_API_KEY}` in YAML becomes the resolved `process.env` value at the backend; unresolved placeholders pass through unchanged and the backend's required-field check (with `isUnresolvedEnvVarPlaceholder` detection) throws a pointed error. - Enumerates known backends in the "unknown backend, skipping" error message so operators can spot value-name typos like `backend: 'openi'` (instead of `'openai'`) — the discriminated schema catches field typos cleanly but value typos fall through to the permissive path, where this log line is now the only signal. - Warns at boot when an entry has a literal credential field (e.g., `apiKey: 'sk-...'` not wrapped in `${VAR}`). `harperdb-config.yaml` on disk in plaintext is an anti-pattern; the `${VAR}` indirection from `@harperfast/oauth` is documented but not enforced — the warn surfaces the issue at boot where it's still cheap to fix. Tracking: #630, #510 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Unit tests (mocked `fetch`, ~480 lines): - Shape: name, capabilities, baseUrl normalization (default / explicit scheme / trailing slash), Authorization Bearer header, optional OpenAI-Organization header, apiKey-required, unresolved-${VAR} detection. - `embed`: /embeddings POST shape, Float32Array output, batch input, out-of-order index sort, model override, missing-model rejection, vector-count mismatch, non-finite vector values, non-finite token counts dropped, non-JSON response wrap, HTTP-error wrap, upstream `error.message` extraction with length cap, fallback when envelope absent. - `generate`: string input → single user message, messages-array input, system-prepend on object input, tool forwarding to OpenAI's tools shape, tool_calls parsing with arguments deserialization, malformed- arguments drop, responseFormat mapping (text / json / json_schema), temperature/maxTokens mapping, finishReason mapping for length / content_filter, non-integer / non-finite token counts dropped, missing-choices rejection, non-string content rejection. - `generateStream`: SSE per-event content delta, stream:true on body, split-across-chunks SSE, [DONE] terminator (events after [DONE] ignored), comment-line skip, invalid-JSON event sanitization (static message; no upstream content in thrown error), SSE buffer cap rename. Tool-call delta accumulation: assembles index-keyed deltas, parallel tool calls via distinct indices, drops malformed arguments, maps finish_reason='tool_calls', flushes buffer when stream ends without finish_reason, caps cumulative tool-call arguments at 1 MiB across events. - AbortSignal: passthrough when no timeout configured, composed via AbortSignal.any with per-call timeout. - `registerOpenAIBackend`: embedding/generative kinds register under logical names in Phase 2's registry. `bootstrap.test.js` extends Phase 2 coverage with openai-specific cases: single openai entry, ${ENV_VAR} resolution at bootstrap, unresolved placeholder logs+skips, missing apiKey logs+skips, ollama and openai entries side by side. `configValidator.test.js` extends with discriminated-schema coverage: openai entry with apiKey + model, ${ENV_VAR} placeholder as apiKey value (validates clean; bootstrap resolves), baseUrl + organization acceptance, missing-apiKey rejection, cross-backend field rejection (host on openai, apiKey on ollama), side-by-side mixed entries, unknown-backend pass-through to permissive schema. Integration test (`integrationTests/server/openai-backend.test.ts`): gated on OPENAI_API_KEY env; exercises the real OpenAI HTTP API to validate the mocked wire format matches reality. Covers embed (single + batch), generate (string + messages + tools), generateStream (content + tool-call assembly), AbortSignal cancellation. Dynamic import of `OpenAIBackend` inside before() to dodge the pre-existing harper_logger↔common_utils CJS require cycle (same workaround as Phase 2's ollama integration test). Tracking: #630, #510 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude · 2026-05-21T19:51:28Z

Reviewed; no blockers found.

kriszyp

Looks pretty similar to ollama. And looks good!

agy auth / print-mode + Gemini CLI both hang in non-interactive Claude sessions

I wonder why this doesn't work 🤔

heskew · 2026-05-21T20:40:43Z

agy

I had agy doing work in other sessions at the time. Might be gemini rate limit woes. 🤷 I had agy do a one-off review from my box to verify the pr after seeing that.

heskew and others added 4 commits May 21, 2026 12:45

heskew marked this pull request as ready for review May 21, 2026 20:29

heskew requested a review from a team as a code owner May 21, 2026 20:29

heskew requested a review from kriszyp May 21, 2026 20:29

kriszyp approved these changes May 21, 2026

View reviewed changes

heskew merged commit 6303d7e into main May 21, 2026
59 of 62 checks passed

heskew deleted the feat/models-openai branch May 21, 2026 20:57

heskew mentioned this pull request May 21, 2026

feat(models): Phase 6 — anthropic + bedrock backends #707

Merged

21 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(models): Phase 3 — openai backend + toolMode: 'return'#698

feat(models): Phase 3 — openai backend + toolMode: 'return'#698
heskew merged 4 commits into
mainfrom
feat/models-openai

heskew commented May 21, 2026

Uh oh!

claude Bot commented May 21, 2026

Uh oh!

kriszyp left a comment

Uh oh!

heskew commented May 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

heskew commented May 21, 2026

Summary

What ships

Why native fetch, not the openai SDK

Secrets handling (sets precedent for #633)

Pre-PR verifier cadence

Files

Acceptance criteria

Test plan

Out of scope (per #630)

Uh oh!

claude Bot commented May 21, 2026

Uh oh!

kriszyp left a comment

Choose a reason for hiding this comment

Uh oh!

heskew commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Why native fetch, not the `openai` SDK

heskew commented May 21, 2026 •

edited

Loading