Skip to content

feat(models): Phase 3 — openai backend + toolMode: 'return'#698

Merged
heskew merged 4 commits into
mainfrom
feat/models-openai
May 21, 2026
Merged

feat(models): Phase 3 — openai backend + toolMode: 'return'#698
heskew merged 4 commits into
mainfrom
feat/models-openai

Conversation

@heskew
Copy link
Copy Markdown
Member

@heskew heskew commented May 21, 2026

Summary

Phase 3 of #510 — second real ModelBackend, against OpenAI's HTTP API
(or any OpenAI-compatible endpoint via baseUrl override). Validates the
Phase 1 (#628) interface against a real provider with native tool-call
support and a different streaming wire format from Ollama.

Also lands toolMode: 'return' end-to-end: caller passes
tools: ToolDef[], OpenAI returns tool-call requests in the
GenerateResult or GenerateChunk.deltaToolCalls, caller resolves
externally. The harder half (toolMode: 'auto' orchestration) is
split to #612.

Closes #630
Tracking: #510

Note

Opened as Draft until CI clears, per the engineering-guidelines
step 16 (harper-engineering-guidelines/SKILL.md). Specific reviewer
assignment from the expertise matrix after Draft → Ready.

What ships

  • utility/expandEnvVar.ts — new reusable env-var expansion utility
    (expandEnvVar, expandEnvVarsDeep, isUnresolvedEnvVarPlaceholder).
    Whole-string ${VAR_NAME} placeholders → process.env[VAR_NAME].
    Matches the convention from @harperfast/oauth's config loader.
    Object.create(null) accumulator for prototype-pollution defense.
  • components/openai/index.tsOpenAIBackend implements ModelBackend
    over native fetch (no SDK dep). Embed via /embeddings, generate
    via /chat/completions, stream via SSE. Tool-call support: forwards
    GenerateInput.tools as OpenAI's tools parameter; index-keyed
    delta accumulator yields each tool call once when complete.
  • resources/models/bootstrap.ts — adds openai to the factory map.
    Runs expandEnvVarsDeep(entry) before factory dispatch. Enumerates
    known backends in the "unknown backend, skipping" error log so
    operators can spot value-name typos (backend: 'openi'). Warns when
    an entry has a literal apiKey (not wrapped in ${VAR}) —
    surfaces the secret-on-disk anti-pattern at boot.
  • validation/configValidator.ts — switches modelEntrySchema from
    flat to discriminated Joi.alternatives().conditional('.backend', ...).
    ollama and openai get tight schemas with .unknown(false) so
    cross-backend field typos block boot. Unknown backends fall through
    to a permissive schema so future backends pass validation; bootstrap
    logs+skips at runtime.

Why native fetch, not the openai SDK

The official SDK is ~5MB with transitive dependencies that ship with
every Harper install regardless of whether anyone uses model APIs.
Native fetch is ~300 lines of clean translation code, the OpenAI
wire surface we touch has been stable for 2+ years, and we don't
need the SDK's auto-retry / per-request key rotation / telemetry
features. Sets the precedent for #633 anthropic (also bearer auth).
Bedrock will need an SDK for SigV4 signing — that's a Phase 6 call.

Secrets handling (sets precedent for #633)

models:
  generative:
    default:
      backend: openai
      model: gpt-4o-mini
      apiKey: ${OPENAI_API_KEY}     # whole-string ${VAR} placeholder

bootstrapModels runs expandEnvVarsDeep on the entry before passing
it to the factory. Backend constructors detect unresolved placeholders
(env var unset) and throw a pointed "set the matching env var" error
rather than waiting for a 401 from the first request.

A literal apiKey: 'sk-...' (not wrapped in ${VAR}) is also accepted
but logs a warn at boot: secrets in harperdb-config.yaml land on
disk, in backups, and depending on deployment in replicated config —
the ${VAR} indirection is the documented pattern.

Pre-PR verifier cadence

Per the agreed cadence for this series:

  • deep-review single-pass × 3 parallel agents (api / concurrency / config).
    Surfaced 8 findings total: 0 blockers, all Tier 4–5.
    • 5 fixed in this PR: tool-call arguments cap, log on malformed
      tool-call drop, MAX_SSE_* rename, OpenAI error.message
      extraction with cap, Object.create(null) accumulator.
    • 3 also fixed: enumerate known backends in unknown-backend log,
      align whitespace check between expandEnvVar and
      isUnresolvedEnvVarPlaceholder, warn on literal-credential at boot.
  • External cross-model pass: deferred to PR-time GitHub workflow
    (claude + claude-bot inline). Per Phase 2 lessons (agy auth /
    print-mode + Gemini CLI both hang in non-interactive Claude
    sessions), the deep-review skill's domain agents are the productive
    substrate.

Files

Path Change
utility/expandEnvVar.ts new — reusable env-var expansion
unitTests/utility/expandEnvVar.test.js new — 18 tests
components/openai/index.ts new — OpenAIBackend + registerOpenAIBackend
unitTests/components/openai/index.test.js new — ~50 unit tests with mocked fetch
resources/models/bootstrap.ts + openai factory + expandEnvVarsDeep + enum-on-unknown log + literal-credential warn
unitTests/resources/models/bootstrap.test.js + 5 openai-specific tests
validation/configValidator.ts flat → discriminated schema (ollama / openai / unknown)
unitTests/validation/configValidator.test.js + 7 openai-schema tests
integrationTests/server/openai-backend.test.ts new — gated on OPENAI_API_KEY

Acceptance criteria

  • OpenAIBackend implements ModelBackend per Phase 1's interface
  • harper.models.embed() with backend: openai produces a vector
  • harper.models.generate() produces a completion
  • harper.models.generateStream() yields GenerateChunk per Phase 1's contract
  • toolMode: 'return' end-to-end (caller→OpenAI tools mapping; OpenAI tool-calls→GenerateResult.toolCalls / GenerateChunk.deltaToolCalls)
  • responseFormat: 'text' | 'json' | { schema } maps to OpenAI's response_format
  • Per-call accounting: backend: 'openai', model, token counts, latency
  • AbortSignal cancels in-flight requests
  • baseUrl override works (verified via mock for Azure/Together/OpenRouter/vLLM shape)
  • Unit tests cover translation logic; integration test runs against real OpenAI behind env gate
  • CI green across full matrix (Draft → wait for CI → Ready)

Test plan

  • npm run build clean
  • npm run lint:required clean
  • npx prettier --check . clean
  • 209 unit tests pass locally (Node 24)
  • Phase 1 + 2 model tests still pass (no regression)
  • CI green across full matrix
  • Manual smoke against real OpenAI with OPENAI_API_KEY set

Out of scope (per #630)


🤖 Generated with Claude Code

heskew and others added 4 commits May 21, 2026 12:45
Adds `utility/expandEnvVar.ts` with three exports:

- `expandEnvVar(value)` — whole-string `${VAR_NAME}` placeholder resolution
  against `process.env`. Falls back to the original placeholder string
  when the env var is unset, so downstream "missing required field"
  checks catch operator misconfiguration.
- `expandEnvVarsDeep(value)` — recursive walker over arrays and plain
  objects, expanding every string leaf. Uses `Object.create(null)` for
  the accumulator so `__proto__` / `constructor` keys in input cannot
  pollute `Object.prototype`. Defensive measure for a util documented
  as reusable.
- `isUnresolvedEnvVarPlaceholder(value)` — predicate used by callers
  that want a louder error than "field is missing" when an operator's
  `${VAR}` placeholder didn't resolve.

Matches the convention established by `@harperfast/oauth`'s
`src/lib/config.ts` so operator documentation and config snippets
carry over verbatim.

Both `expandEnvVar` and `isUnresolvedEnvVarPlaceholder` share a single
internal `looksLikePlaceholder` predicate so the two helpers agree on
edge cases (notably: a placeholder with internal whitespace is NOT a
placeholder — env-var names can't contain whitespace, so treating
`'${MY KEY}'` as a literal value is the right call).

Tracking: #630, #510

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second real `ModelBackend` for #510, against OpenAI's HTTP API or any
OpenAI-compatible endpoint via `baseUrl` override (Azure OpenAI,
Together AI, OpenRouter, vLLM's OpenAI shim). Validates the Phase 1
(#628) interface against a real provider with native tool-call support
and a different streaming wire format from Ollama.

Native `fetch` directly — no SDK dependency. The OpenAI wire surface
we touch (`POST /embeddings`, `POST /chat/completions` with SSE
streaming + tool calls) has been stable for 2+ years on the fields
we read; the mapping is mechanical and ~350 lines.

Endpoints + capabilities:

- `embed` → `POST {baseUrl}/embeddings`
- `generate` → `POST {baseUrl}/chat/completions` (chat shape always)
- `generateStream` → same with `stream: true`; SSE-framed responses
- `capabilities()` advertises `tools: true` (first backend with native
  tool-call support), `adapters: false` (OpenAI doesn't surface LoRA)

Tool calls (`toolMode: 'return'` end-to-end):

- Caller passes `tools: ToolDef[]` via `GenerateInput`'s object variant
  (`{ messages, tools, system }`) — the path Phase 1 established.
- Non-streaming: `result.toolCalls` mirrors OpenAI's
  `choices[0].message.tool_calls` after parsing each `function.arguments`
  string into an object per Phase 1's `ToolCall.arguments: object`
  contract.
- Streaming: backend internally accumulates index-keyed tool-call deltas
  (`Map<number, { id, name, argumentsBuf }>`), yields each call ONCE
  when its arguments parse cleanly on the terminal `finish_reason: 'tool_calls'`
  event. Same contract — caller never sees partial-argument strings.

Robustness:

- Sanitized errors. HTTP failures include OpenAI's `error.message` from
  the well-defined error envelope (model/service-side text, not user-
  input echo), capped at 500 chars for misbehaving compat shims.
  Malformed-NDJSON-equivalent (invalid SSE event) error is static so
  upstream-derived content doesn't leak through the thrown message.
- Defensive caps: SSE accumulator at 1 MiB (matches Phase 2's NDJSON
  cap); per-tool-call argument-buffer at 1 MiB to prevent unbounded
  accumulation across many sub-MiB events.
- Token-count hygiene: same `Number.isFinite && >= 0 && Number.isInteger`
  guard as Phase 2, so NaN/Infinity/negatives from upstream don't
  poison `SUM(prompt_tokens)`-style aggregates over `hdb_model_calls`.
- Dropped tool calls (malformed JSON arguments from upstream) log at
  warn so the silent skip is auditable.
- Unresolved `${VAR}` apiKey detection: constructor throws with a
  pointed "set the matching env var" error rather than waiting for
  a 401 from the first request.

`AbortSignal` composition (`opts.signal` + `requestTimeoutMs` via
`AbortSignal.any`) mirrors Phase 2's `OllamaBackend`.

Component shape matches the pattern in `components/mcp/index.ts`
(PR #649) and `components/ollama/index.ts` (PR #651): core imports a
register helper and calls it during boot; not a
`handleApplication(scope)` self-loader.

Tracking: #630, #510

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`validation/configValidator.ts` switches `modelEntrySchema` from a flat
`Joi.object` to `Joi.alternatives().conditional('.backend', { switch:
[...] })`. Each known backend (`ollama`, `openai`) gets a tight schema
with `.unknown(false)` so field-name typos block boot. Unknown backends
fall through to a permissive `Joi.object({ backend: required }).unknown(true)`
so future backends or third-party components can register their own
without core schema edits — `bootstrapModels` logs+skips at runtime.

`resources/models/bootstrap.ts`:

- Adds `openai: registerOpenAIBackend` to the factory map alongside
  `ollama`.
- Runs `expandEnvVarsDeep(entry)` on each entry BEFORE dispatching to
  the factory. `apiKey: ${OPENAI_API_KEY}` in YAML becomes the resolved
  `process.env` value at the backend; unresolved placeholders pass
  through unchanged and the backend's required-field check (with
  `isUnresolvedEnvVarPlaceholder` detection) throws a pointed error.
- Enumerates known backends in the "unknown backend, skipping" error
  message so operators can spot value-name typos like
  `backend: 'openi'` (instead of `'openai'`) — the discriminated
  schema catches field typos cleanly but value typos fall through to
  the permissive path, where this log line is now the only signal.
- Warns at boot when an entry has a literal credential field (e.g.,
  `apiKey: 'sk-...'` not wrapped in `${VAR}`). `harperdb-config.yaml`
  on disk in plaintext is an anti-pattern; the `${VAR}` indirection
  from `@harperfast/oauth` is documented but not enforced — the warn
  surfaces the issue at boot where it's still cheap to fix.

Tracking: #630, #510

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Unit tests (mocked `fetch`, ~480 lines):

- Shape: name, capabilities, baseUrl normalization (default / explicit
  scheme / trailing slash), Authorization Bearer header, optional
  OpenAI-Organization header, apiKey-required, unresolved-${VAR}
  detection.
- `embed`: /embeddings POST shape, Float32Array output, batch input,
  out-of-order index sort, model override, missing-model rejection,
  vector-count mismatch, non-finite vector values, non-finite token
  counts dropped, non-JSON response wrap, HTTP-error wrap, upstream
  `error.message` extraction with length cap, fallback when envelope
  absent.
- `generate`: string input → single user message, messages-array input,
  system-prepend on object input, tool forwarding to OpenAI's tools
  shape, tool_calls parsing with arguments deserialization, malformed-
  arguments drop, responseFormat mapping (text / json / json_schema),
  temperature/maxTokens mapping, finishReason mapping for
  length / content_filter, non-integer / non-finite token counts
  dropped, missing-choices rejection, non-string content rejection.
- `generateStream`: SSE per-event content delta, stream:true on body,
  split-across-chunks SSE, [DONE] terminator (events after [DONE]
  ignored), comment-line skip, invalid-JSON event sanitization
  (static message; no upstream content in thrown error), SSE buffer
  cap rename. Tool-call delta accumulation: assembles index-keyed
  deltas, parallel tool calls via distinct indices, drops malformed
  arguments, maps finish_reason='tool_calls', flushes buffer when
  stream ends without finish_reason, caps cumulative tool-call
  arguments at 1 MiB across events.
- AbortSignal: passthrough when no timeout configured, composed via
  AbortSignal.any with per-call timeout.
- `registerOpenAIBackend`: embedding/generative kinds register under
  logical names in Phase 2's registry.

`bootstrap.test.js` extends Phase 2 coverage with openai-specific cases:
single openai entry, ${ENV_VAR} resolution at bootstrap, unresolved
placeholder logs+skips, missing apiKey logs+skips, ollama and openai
entries side by side.

`configValidator.test.js` extends with discriminated-schema coverage:
openai entry with apiKey + model, ${ENV_VAR} placeholder as apiKey
value (validates clean; bootstrap resolves), baseUrl + organization
acceptance, missing-apiKey rejection, cross-backend field rejection
(host on openai, apiKey on ollama), side-by-side mixed entries,
unknown-backend pass-through to permissive schema.

Integration test (`integrationTests/server/openai-backend.test.ts`):
gated on OPENAI_API_KEY env; exercises the real OpenAI HTTP API to
validate the mocked wire format matches reality. Covers embed (single
+ batch), generate (string + messages + tools), generateStream
(content + tool-call assembly), AbortSignal cancellation. Dynamic
import of `OpenAIBackend` inside before() to dodge the pre-existing
harper_logger↔common_utils CJS require cycle (same workaround as
Phase 2's ollama integration test).

Tracking: #630, #510

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown
Contributor

claude Bot commented May 21, 2026

Reviewed; no blockers found.

@heskew heskew marked this pull request as ready for review May 21, 2026 20:29
@heskew heskew requested a review from a team as a code owner May 21, 2026 20:29
@heskew heskew requested a review from kriszyp May 21, 2026 20:29
Copy link
Copy Markdown
Member

@kriszyp kriszyp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty similar to ollama. And looks good!

agy auth / print-mode + Gemini CLI both hang in non-interactive Claude sessions

I wonder why this doesn't work 🤔

@heskew
Copy link
Copy Markdown
Member Author

heskew commented May 21, 2026

  • agy

I had agy doing work in other sessions at the time. Might be gemini rate limit woes. 🤷 I had agy do a one-off review from my box to verify the pr after seeing that.

@heskew heskew merged commit 6303d7e into main May 21, 2026
59 of 62 checks passed
@heskew heskew deleted the feat/models-openai branch May 21, 2026 20:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Models] Phase 3 — openai backend (+ toolMode: 'return')

2 participants