Skip to content

[Models] Phase 3 — openai backend (+ toolMode: 'return') #630

@heskew

Description

@heskew

Scope

Ship the second ModelBackend implementation in core: an openai backend that talks to the OpenAI API (or any OpenAI-compatible endpoint via baseUrl override). Validates the Phase 1 interface against a real provider with native tool-call support and a different streaming shape from Ollama, catching design flaws before more backends land.

Also lands toolMode: 'return' end-to-end: caller passes tools: ToolDef[], model returns tool-call requests in the GenerateResult or GenerateChunk, caller resolves externally. This is the trivial half of the tool-call story; toolMode: 'auto' orchestration is split out to #612.

What ships:

  • openai backend class implementing ModelBackend with embed, generate, generateStream.
  • Native tool-call support: GenerateOpts.tools passed to the model; tool-call deltas surface in GenerateChunk.deltaToolCalls and finalize in GenerateResult.toolCalls.
  • responseFormat mapping: text / JSON / JSON Schema modes.
  • Streaming via the OpenAI SDK's chunked completion API → AsyncIterable<GenerateChunk> per Phase 1's contract.
  • AbortSignal propagation into SDK calls.
  • Per-call accounting (including prompt_tokens, completion_tokens, latency) written through hdb_model_calls.

API surface

Implements the interface from Phase 1 (#628). Capability negotiation:

capabilities(): ModelCapabilities {
  return {
    embed: true,
    generate: true,
    stream: true,
    tools: true,           // first backend with native tool-call support
    adapters: false,       // OpenAI does not surface LoRA adapters externally
  };
}

Configuration:

models:
  embedding:
    high-quality:
      backend: openai
      model: <your-embedding-model>     # e.g. text-embedding-3-large
      apiKey: ${OPENAI_API_KEY}          # env-var expansion via existing config pattern
  generative:
    default:
      backend: openai
      model: <your-generative-model>     # e.g. gpt-4o or gpt-4-turbo
      apiKey: ${OPENAI_API_KEY}
      baseUrl: https://api.openai.com/v1  # optional; supports OpenAI-compatible endpoints (Azure OpenAI, Together AI, OpenRouter, vLLM, etc.)

toolMode: 'return' semantics

Caller passes tools: ToolDef[] and toolMode: 'return' (default). The backend:

  1. Translates ToolDef[] to OpenAI's tools parameter on the API call.
  2. Receives the model's response which may include tool_calls.
  3. Returns those on GenerateResult.toolCalls (non-streaming) or yields them via GenerateChunk.deltaToolCalls (streaming).
  4. Does not invoke any tool — the caller decides what to do with the tool-call requests.

Caller code looks like:

const result = await scope.models.generate(messages, { tools, toolMode: 'return' });
if (result.toolCalls?.length) {
  // caller executes the tool, builds a tool-response message, calls generate again
}

toolMode: 'auto' (orchestrator-driven loop) is #612's scope; this phase only ships the type-level field acceptance and the 'return' path.

Implementation notes

  • SDK: the official openai npm package.
  • SDK pinning: lock to a specific minor version, bump on a deliberate cadence per the existing Harper third-party trust model (a few days after upstream release).
  • Streaming: SDK's stream: true path yields delta events; backend translates them to GenerateChunk.
  • Token-count fields from result.usage (prompt_tokens, completion_tokens, total_tokens) map directly to TokenUsage.
  • gpu_ms is not reported by OpenAI's API; left undefined in the accounting record.
  • inputType: 'document' | 'query' on EmbedOpts: OpenAI's embedding models don't currently distinguish; field is ignored when present.
  • baseUrl override allows OpenAI-compatible third parties (Azure OpenAI, Together, OpenRouter, vLLM's own OpenAI shim, etc.) — sets the foundation for the FAB-503 fabric backend to also speak OpenAI-shape if it wants to.
  • signal: AbortSignal passed into the SDK's signal option; client disconnect → SDK aborts the underlying request.

Files

Path Change
resources/models/backends/openai.ts new — OpenAIBackend class
resources/models/backends/index.ts extended — register openai factory
package.json new dep — openai (pinned version)
test/models/openai.test.ts new — integration tests (mocked HTTP in CI; live test against real API behind an env-gated flag)

Acceptance criteria

  • OpenAIBackend implements ModelBackend per Phase 1's interface.
  • scope.models.embed() with backend: openai produces a vector from OpenAI's embedding endpoint.
  • scope.models.generate() with backend: openai produces a completion.
  • scope.models.generateStream() yields content deltas; tool-call deltas surface via GenerateChunk.deltaToolCalls when the model uses tools.
  • toolMode: 'return' end-to-end: caller passes tools, backend translates to OpenAI's tools param, model's tool-call requests reach the caller via GenerateResult.toolCalls / GenerateChunk.deltaToolCalls.
  • responseFormat: 'text' | 'json' | { schema } correctly maps to OpenAI's response_format parameter.
  • Per-call accounting records backend: 'openai', model, token counts, latency, success.
  • AbortSignal from BackendOpts cancels in-flight SDK calls.
  • baseUrl override works against an OpenAI-compatible third party (test fixture or mock).
  • SDK version pinned and documented; bump cadence noted in PR description.
  • Unit tests cover translation logic (ToolDef[] ↔ OpenAI tools, responseFormat mapping, streaming delta shape).
  • Integration test against real OpenAI behind OPENAI_API_KEY env gate.
  • CI green (unit + integration + 3 Node versions).

Out of scope

Stacks on

Independently shippable after Phase 1; parallel-able with Phase 2.

Hard prerequisites

Branch & PR conventions

Smoke test

# Prerequisites:
# - OPENAI_API_KEY set in env
# - Harper config has models.generative.default = { backend: openai, model: gpt-4o-mini, apiKey: ${OPENAI_API_KEY} }

# In a Resource method:
class ChatTest extends Resource {
  async post(_target, body, _request) {
    return await scope.models.generate([{ role: 'user', content: body.q }]);
  }
}

curl -X POST http://localhost:9926/ChatTest/ \
  -H 'Content-Type: application/json' \
  -d '{"q": "what is 2+2?"}'

# Expected: { content: "4" (or similar), finishReason: "stop", ... }
# Verify: SELECT * FROM system.hdb_model_calls WHERE backend = 'openai' ORDER BY $createdtime DESC LIMIT 1
#   shows method='generate', model='gpt-4o-mini', prompt_tokens, completion_tokens, latency_ms, success=true.

# Tool-call smoke test (return mode):
const tools = [{ name: 'get_weather', description: '...', parameters: {/* JSON Schema */} }];
const result = await scope.models.generate(messages, { tools, toolMode: 'return' });
// Expected: if the model decides to call get_weather, result.toolCalls has one entry; result.content may be empty.

Tracking

Part of #510. Sub-issue 3 of 6.


🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions