feat(models): Phase 3 — openai backend + toolMode: 'return'#698
Merged
Conversation
Adds `utility/expandEnvVar.ts` with three exports:
- `expandEnvVar(value)` — whole-string `${VAR_NAME}` placeholder resolution
against `process.env`. Falls back to the original placeholder string
when the env var is unset, so downstream "missing required field"
checks catch operator misconfiguration.
- `expandEnvVarsDeep(value)` — recursive walker over arrays and plain
objects, expanding every string leaf. Uses `Object.create(null)` for
the accumulator so `__proto__` / `constructor` keys in input cannot
pollute `Object.prototype`. Defensive measure for a util documented
as reusable.
- `isUnresolvedEnvVarPlaceholder(value)` — predicate used by callers
that want a louder error than "field is missing" when an operator's
`${VAR}` placeholder didn't resolve.
Matches the convention established by `@harperfast/oauth`'s
`src/lib/config.ts` so operator documentation and config snippets
carry over verbatim.
Both `expandEnvVar` and `isUnresolvedEnvVarPlaceholder` share a single
internal `looksLikePlaceholder` predicate so the two helpers agree on
edge cases (notably: a placeholder with internal whitespace is NOT a
placeholder — env-var names can't contain whitespace, so treating
`'${MY KEY}'` as a literal value is the right call).
Tracking: #630, #510
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Second real `ModelBackend` for #510, against OpenAI's HTTP API or any OpenAI-compatible endpoint via `baseUrl` override (Azure OpenAI, Together AI, OpenRouter, vLLM's OpenAI shim). Validates the Phase 1 (#628) interface against a real provider with native tool-call support and a different streaming wire format from Ollama. Native `fetch` directly — no SDK dependency. The OpenAI wire surface we touch (`POST /embeddings`, `POST /chat/completions` with SSE streaming + tool calls) has been stable for 2+ years on the fields we read; the mapping is mechanical and ~350 lines. Endpoints + capabilities: - `embed` → `POST {baseUrl}/embeddings` - `generate` → `POST {baseUrl}/chat/completions` (chat shape always) - `generateStream` → same with `stream: true`; SSE-framed responses - `capabilities()` advertises `tools: true` (first backend with native tool-call support), `adapters: false` (OpenAI doesn't surface LoRA) Tool calls (`toolMode: 'return'` end-to-end): - Caller passes `tools: ToolDef[]` via `GenerateInput`'s object variant (`{ messages, tools, system }`) — the path Phase 1 established. - Non-streaming: `result.toolCalls` mirrors OpenAI's `choices[0].message.tool_calls` after parsing each `function.arguments` string into an object per Phase 1's `ToolCall.arguments: object` contract. - Streaming: backend internally accumulates index-keyed tool-call deltas (`Map<number, { id, name, argumentsBuf }>`), yields each call ONCE when its arguments parse cleanly on the terminal `finish_reason: 'tool_calls'` event. Same contract — caller never sees partial-argument strings. Robustness: - Sanitized errors. HTTP failures include OpenAI's `error.message` from the well-defined error envelope (model/service-side text, not user- input echo), capped at 500 chars for misbehaving compat shims. Malformed-NDJSON-equivalent (invalid SSE event) error is static so upstream-derived content doesn't leak through the thrown message. - Defensive caps: SSE accumulator at 1 MiB (matches Phase 2's NDJSON cap); per-tool-call argument-buffer at 1 MiB to prevent unbounded accumulation across many sub-MiB events. - Token-count hygiene: same `Number.isFinite && >= 0 && Number.isInteger` guard as Phase 2, so NaN/Infinity/negatives from upstream don't poison `SUM(prompt_tokens)`-style aggregates over `hdb_model_calls`. - Dropped tool calls (malformed JSON arguments from upstream) log at warn so the silent skip is auditable. - Unresolved `${VAR}` apiKey detection: constructor throws with a pointed "set the matching env var" error rather than waiting for a 401 from the first request. `AbortSignal` composition (`opts.signal` + `requestTimeoutMs` via `AbortSignal.any`) mirrors Phase 2's `OllamaBackend`. Component shape matches the pattern in `components/mcp/index.ts` (PR #649) and `components/ollama/index.ts` (PR #651): core imports a register helper and calls it during boot; not a `handleApplication(scope)` self-loader. Tracking: #630, #510 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`validation/configValidator.ts` switches `modelEntrySchema` from a flat
`Joi.object` to `Joi.alternatives().conditional('.backend', { switch:
[...] })`. Each known backend (`ollama`, `openai`) gets a tight schema
with `.unknown(false)` so field-name typos block boot. Unknown backends
fall through to a permissive `Joi.object({ backend: required }).unknown(true)`
so future backends or third-party components can register their own
without core schema edits — `bootstrapModels` logs+skips at runtime.
`resources/models/bootstrap.ts`:
- Adds `openai: registerOpenAIBackend` to the factory map alongside
`ollama`.
- Runs `expandEnvVarsDeep(entry)` on each entry BEFORE dispatching to
the factory. `apiKey: ${OPENAI_API_KEY}` in YAML becomes the resolved
`process.env` value at the backend; unresolved placeholders pass
through unchanged and the backend's required-field check (with
`isUnresolvedEnvVarPlaceholder` detection) throws a pointed error.
- Enumerates known backends in the "unknown backend, skipping" error
message so operators can spot value-name typos like
`backend: 'openi'` (instead of `'openai'`) — the discriminated
schema catches field typos cleanly but value typos fall through to
the permissive path, where this log line is now the only signal.
- Warns at boot when an entry has a literal credential field (e.g.,
`apiKey: 'sk-...'` not wrapped in `${VAR}`). `harperdb-config.yaml`
on disk in plaintext is an anti-pattern; the `${VAR}` indirection
from `@harperfast/oauth` is documented but not enforced — the warn
surfaces the issue at boot where it's still cheap to fix.
Tracking: #630, #510
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Unit tests (mocked `fetch`, ~480 lines):
- Shape: name, capabilities, baseUrl normalization (default / explicit
scheme / trailing slash), Authorization Bearer header, optional
OpenAI-Organization header, apiKey-required, unresolved-${VAR}
detection.
- `embed`: /embeddings POST shape, Float32Array output, batch input,
out-of-order index sort, model override, missing-model rejection,
vector-count mismatch, non-finite vector values, non-finite token
counts dropped, non-JSON response wrap, HTTP-error wrap, upstream
`error.message` extraction with length cap, fallback when envelope
absent.
- `generate`: string input → single user message, messages-array input,
system-prepend on object input, tool forwarding to OpenAI's tools
shape, tool_calls parsing with arguments deserialization, malformed-
arguments drop, responseFormat mapping (text / json / json_schema),
temperature/maxTokens mapping, finishReason mapping for
length / content_filter, non-integer / non-finite token counts
dropped, missing-choices rejection, non-string content rejection.
- `generateStream`: SSE per-event content delta, stream:true on body,
split-across-chunks SSE, [DONE] terminator (events after [DONE]
ignored), comment-line skip, invalid-JSON event sanitization
(static message; no upstream content in thrown error), SSE buffer
cap rename. Tool-call delta accumulation: assembles index-keyed
deltas, parallel tool calls via distinct indices, drops malformed
arguments, maps finish_reason='tool_calls', flushes buffer when
stream ends without finish_reason, caps cumulative tool-call
arguments at 1 MiB across events.
- AbortSignal: passthrough when no timeout configured, composed via
AbortSignal.any with per-call timeout.
- `registerOpenAIBackend`: embedding/generative kinds register under
logical names in Phase 2's registry.
`bootstrap.test.js` extends Phase 2 coverage with openai-specific cases:
single openai entry, ${ENV_VAR} resolution at bootstrap, unresolved
placeholder logs+skips, missing apiKey logs+skips, ollama and openai
entries side by side.
`configValidator.test.js` extends with discriminated-schema coverage:
openai entry with apiKey + model, ${ENV_VAR} placeholder as apiKey
value (validates clean; bootstrap resolves), baseUrl + organization
acceptance, missing-apiKey rejection, cross-backend field rejection
(host on openai, apiKey on ollama), side-by-side mixed entries,
unknown-backend pass-through to permissive schema.
Integration test (`integrationTests/server/openai-backend.test.ts`):
gated on OPENAI_API_KEY env; exercises the real OpenAI HTTP API to
validate the mocked wire format matches reality. Covers embed (single
+ batch), generate (string + messages + tools), generateStream
(content + tool-call assembly), AbortSignal cancellation. Dynamic
import of `OpenAIBackend` inside before() to dodge the pre-existing
harper_logger↔common_utils CJS require cycle (same workaround as
Phase 2's ollama integration test).
Tracking: #630, #510
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Contributor
|
Reviewed; no blockers found. |
kriszyp
approved these changes
May 21, 2026
Member
kriszyp
left a comment
There was a problem hiding this comment.
Looks pretty similar to ollama. And looks good!
agy auth / print-mode + Gemini CLI both hang in non-interactive Claude sessions
I wonder why this doesn't work 🤔
Member
Author
I had agy doing work in other sessions at the time. Might be gemini rate limit woes. 🤷 I had agy do a one-off review from my box to verify the pr after seeing that. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Phase 3 of #510 — second real
ModelBackend, against OpenAI's HTTP API(or any OpenAI-compatible endpoint via
baseUrloverride). Validates thePhase 1 (#628) interface against a real provider with native tool-call
support and a different streaming wire format from Ollama.
Also lands
toolMode: 'return'end-to-end: caller passestools: ToolDef[], OpenAI returns tool-call requests in theGenerateResultorGenerateChunk.deltaToolCalls, caller resolvesexternally. The harder half (
toolMode: 'auto'orchestration) issplit to #612.
Closes #630
Tracking: #510
Note
Opened as Draft until CI clears, per the engineering-guidelines
step 16 (
harper-engineering-guidelines/SKILL.md). Specific reviewerassignment from the expertise matrix after Draft → Ready.
What ships
utility/expandEnvVar.ts— new reusable env-var expansion utility(
expandEnvVar,expandEnvVarsDeep,isUnresolvedEnvVarPlaceholder).Whole-string
${VAR_NAME}placeholders →process.env[VAR_NAME].Matches the convention from
@harperfast/oauth's config loader.Object.create(null)accumulator for prototype-pollution defense.components/openai/index.ts—OpenAIBackend implements ModelBackendover native
fetch(no SDK dep). Embed via/embeddings, generatevia
/chat/completions, stream via SSE. Tool-call support: forwardsGenerateInput.toolsas OpenAI'stoolsparameter; index-keyeddelta accumulator yields each tool call once when complete.
resources/models/bootstrap.ts— addsopenaito the factory map.Runs
expandEnvVarsDeep(entry)before factory dispatch. Enumeratesknown backends in the "unknown backend, skipping" error log so
operators can spot value-name typos (
backend: 'openi'). Warns whenan entry has a literal
apiKey(not wrapped in${VAR}) —surfaces the secret-on-disk anti-pattern at boot.
validation/configValidator.ts— switchesmodelEntrySchemafromflat to discriminated
Joi.alternatives().conditional('.backend', ...).ollamaandopenaiget tight schemas with.unknown(false)socross-backend field typos block boot. Unknown backends fall through
to a permissive schema so future backends pass validation; bootstrap
logs+skips at runtime.
Why native fetch, not the
openaiSDKThe official SDK is ~5MB with transitive dependencies that ship with
every Harper install regardless of whether anyone uses model APIs.
Native
fetchis ~300 lines of clean translation code, the OpenAIwire surface we touch has been stable for 2+ years, and we don't
need the SDK's auto-retry / per-request key rotation / telemetry
features. Sets the precedent for #633 anthropic (also bearer auth).
Bedrock will need an SDK for SigV4 signing — that's a Phase 6 call.
Secrets handling (sets precedent for #633)
bootstrapModelsrunsexpandEnvVarsDeepon the entry before passingit to the factory. Backend constructors detect unresolved placeholders
(env var unset) and throw a pointed "set the matching env var" error
rather than waiting for a 401 from the first request.
A literal
apiKey: 'sk-...'(not wrapped in${VAR}) is also acceptedbut logs a
warnat boot: secrets inharperdb-config.yamlland ondisk, in backups, and depending on deployment in replicated config —
the
${VAR}indirection is the documented pattern.Pre-PR verifier cadence
Per the agreed cadence for this series:
deep-reviewsingle-pass × 3 parallel agents (api / concurrency / config).Surfaced 8 findings total: 0 blockers, all Tier 4–5.
tool-call drop,
MAX_SSE_*rename, OpenAIerror.messageextraction with cap,
Object.create(null)accumulator.align whitespace check between
expandEnvVarandisUnresolvedEnvVarPlaceholder, warn on literal-credential at boot.(
claude+ claude-bot inline). Per Phase 2 lessons (agy auth /print-mode + Gemini CLI both hang in non-interactive Claude
sessions), the deep-review skill's domain agents are the productive
substrate.
Files
utility/expandEnvVar.tsunitTests/utility/expandEnvVar.test.jscomponents/openai/index.tsOpenAIBackend+registerOpenAIBackendunitTests/components/openai/index.test.jsfetchresources/models/bootstrap.tsexpandEnvVarsDeep+ enum-on-unknown log + literal-credential warnunitTests/resources/models/bootstrap.test.jsvalidation/configValidator.tsunitTests/validation/configValidator.test.jsintegrationTests/server/openai-backend.test.tsOPENAI_API_KEYAcceptance criteria
OpenAIBackendimplementsModelBackendper Phase 1's interfaceharper.models.embed()withbackend: openaiproduces a vectorharper.models.generate()produces a completionharper.models.generateStream()yieldsGenerateChunkper Phase 1's contracttoolMode: 'return'end-to-end (caller→OpenAI tools mapping; OpenAI tool-calls→GenerateResult.toolCalls/GenerateChunk.deltaToolCalls)responseFormat: 'text' | 'json' | { schema }maps to OpenAI'sresponse_formatbackend: 'openai', model, token counts, latencyAbortSignalcancels in-flight requestsbaseUrloverride works (verified via mock for Azure/Together/OpenRouter/vLLM shape)Test plan
npm run buildcleannpm run lint:requiredcleannpx prettier --check .cleanOPENAI_API_KEYsetOut of scope (per #630)
toolMode: 'auto'orchestration — split to Add agent-loop orchestration /toolMode: 'auto'toscope.models#612🤖 Generated with Claude Code