[Models] Phase 3 — openai backend (+ toolMode: 'return')


## Scope

Ship the second `ModelBackend` implementation in core: an `openai` backend that talks to the OpenAI API (or any OpenAI-compatible endpoint via `baseUrl` override). Validates the Phase 1 interface against a real provider with native tool-call support and a different streaming shape from Ollama, catching design flaws before more backends land.

Also lands `toolMode: 'return'` end-to-end: caller passes `tools: ToolDef[]`, model returns tool-call requests in the `GenerateResult` or `GenerateChunk`, caller resolves externally. This is the trivial half of the tool-call story; `toolMode: 'auto'` orchestration is split out to #612.

What ships:

- `openai` backend class implementing `ModelBackend` with `embed`, `generate`, `generateStream`.
- Native tool-call support: `GenerateOpts.tools` passed to the model; tool-call deltas surface in `GenerateChunk.deltaToolCalls` and finalize in `GenerateResult.toolCalls`.
- `responseFormat` mapping: text / JSON / JSON Schema modes.
- Streaming via the OpenAI SDK's chunked completion API → `AsyncIterable<GenerateChunk>` per Phase 1's contract.
- `AbortSignal` propagation into SDK calls.
- Per-call accounting (including prompt_tokens, completion_tokens, latency) written through `hdb_model_calls`.

## API surface

Implements the interface from Phase 1 (#628). Capability negotiation:

```typescript
capabilities(): ModelCapabilities {
  return {
    embed: true,
    generate: true,
    stream: true,
    tools: true,           // first backend with native tool-call support
    adapters: false,       // OpenAI does not surface LoRA adapters externally
  };
}
```

Configuration:

```yaml
models:
  embedding:
    high-quality:
      backend: openai
      model: <your-embedding-model>     # e.g. text-embedding-3-large
      apiKey: ${OPENAI_API_KEY}          # env-var expansion via existing config pattern
  generative:
    default:
      backend: openai
      model: <your-generative-model>     # e.g. gpt-4o or gpt-4-turbo
      apiKey: ${OPENAI_API_KEY}
      baseUrl: https://api.openai.com/v1  # optional; supports OpenAI-compatible endpoints (Azure OpenAI, Together AI, OpenRouter, vLLM, etc.)
```

## `toolMode: 'return'` semantics

Caller passes `tools: ToolDef[]` and `toolMode: 'return'` (default). The backend:

1. Translates `ToolDef[]` to OpenAI's `tools` parameter on the API call.
2. Receives the model's response which may include `tool_calls`.
3. Returns those on `GenerateResult.toolCalls` (non-streaming) or yields them via `GenerateChunk.deltaToolCalls` (streaming).
4. Does **not** invoke any tool — the caller decides what to do with the tool-call requests.

Caller code looks like:

```typescript
const result = await scope.models.generate(messages, { tools, toolMode: 'return' });
if (result.toolCalls?.length) {
  // caller executes the tool, builds a tool-response message, calls generate again
}
```

`toolMode: 'auto'` (orchestrator-driven loop) is **#612**'s scope; this phase only ships the type-level field acceptance and the `'return'` path.

## Implementation notes

- SDK: the official `openai` npm package.
- SDK pinning: lock to a specific minor version, bump on a deliberate cadence per the existing Harper third-party trust model (a few days after upstream release).
- Streaming: SDK's `stream: true` path yields delta events; backend translates them to `GenerateChunk`.
- Token-count fields from `result.usage` (`prompt_tokens`, `completion_tokens`, `total_tokens`) map directly to `TokenUsage`.
- `gpu_ms` is not reported by OpenAI's API; left undefined in the accounting record.
- `inputType: 'document' | 'query'` on `EmbedOpts`: OpenAI's embedding models don't currently distinguish; field is ignored when present.
- `baseUrl` override allows OpenAI-compatible third parties (Azure OpenAI, Together, OpenRouter, vLLM's own OpenAI shim, etc.) — sets the foundation for the FAB-503 `fabric` backend to also speak OpenAI-shape if it wants to.
- `signal: AbortSignal` passed into the SDK's `signal` option; client disconnect → SDK aborts the underlying request.

## Files

| Path | Change |
|---|---|
| `resources/models/backends/openai.ts` | new — `OpenAIBackend` class |
| `resources/models/backends/index.ts` | extended — register `openai` factory |
| `package.json` | new dep — `openai` (pinned version) |
| `test/models/openai.test.ts` | new — integration tests (mocked HTTP in CI; live test against real API behind an env-gated flag) |

## Acceptance criteria

- [ ] `OpenAIBackend` implements `ModelBackend` per Phase 1's interface.
- [ ] `scope.models.embed()` with `backend: openai` produces a vector from OpenAI's embedding endpoint.
- [ ] `scope.models.generate()` with `backend: openai` produces a completion.
- [ ] `scope.models.generateStream()` yields content deltas; tool-call deltas surface via `GenerateChunk.deltaToolCalls` when the model uses tools.
- [ ] `toolMode: 'return'` end-to-end: caller passes `tools`, backend translates to OpenAI's `tools` param, model's tool-call requests reach the caller via `GenerateResult.toolCalls` / `GenerateChunk.deltaToolCalls`.
- [ ] `responseFormat: 'text' | 'json' | { schema }` correctly maps to OpenAI's `response_format` parameter.
- [ ] Per-call accounting records `backend: 'openai'`, model, token counts, latency, success.
- [ ] `AbortSignal` from `BackendOpts` cancels in-flight SDK calls.
- [ ] `baseUrl` override works against an OpenAI-compatible third party (test fixture or mock).
- [ ] SDK version pinned and documented; bump cadence noted in PR description.
- [ ] Unit tests cover translation logic (`ToolDef[]` ↔ OpenAI `tools`, `responseFormat` mapping, streaming delta shape).
- [ ] Integration test against real OpenAI behind `OPENAI_API_KEY` env gate.
- [ ] CI green (unit + integration + 3 Node versions).

## Out of scope

- `toolMode: 'auto'` orchestration — split to **#612** (agent-loop orchestration).
- Anthropic / Bedrock backends — Phase 6 (#633).
- OpenAI batch API / long-running operations — `ModelCallResult.pending` is reserved in the interface but no backend emits it in v1.
- OpenAI Assistants API, Files API, image / audio endpoints — not in #510's scope.
- Prompt caching (Anthropic-style) — not exposed by the OpenAI provider in a stable shape; revisit when provider supports it.

## Stacks on

- **Phase 1** (#628) — needs the `ModelBackend` interface, backend registry, and `hdb_model_calls` writer.

Independently shippable after Phase 1; parallel-able with Phase 2.

## Hard prerequisites

- **#513** (`request.signal` exposed on Resource methods) — `Models` facade reads `ctx.signal` from ALS; this phase consumes it through `BackendOpts.signal` into the SDK's `signal` option.

## Branch & PR conventions

- Branch: `feat/models-openai-backend`
- PR base: `main` (after Phase 1 merges).
- Closes this issue via `Closes #<self>`; references #510 via `Tracking: #510`.

## Smoke test

```bash
# Prerequisites:
# - OPENAI_API_KEY set in env
# - Harper config has models.generative.default = { backend: openai, model: gpt-4o-mini, apiKey: ${OPENAI_API_KEY} }

# In a Resource method:
class ChatTest extends Resource {
  async post(_target, body, _request) {
    return await scope.models.generate([{ role: 'user', content: body.q }]);
  }
}

curl -X POST http://localhost:9926/ChatTest/ \
  -H 'Content-Type: application/json' \
  -d '{"q": "what is 2+2?"}'

# Expected: { content: "4" (or similar), finishReason: "stop", ... }
# Verify: SELECT * FROM system.hdb_model_calls WHERE backend = 'openai' ORDER BY $createdtime DESC LIMIT 1
#   shows method='generate', model='gpt-4o-mini', prompt_tokens, completion_tokens, latency_ms, success=true.

# Tool-call smoke test (return mode):
const tools = [{ name: 'get_weather', description: '...', parameters: {/* JSON Schema */} }];
const result = await scope.models.generate(messages, { tools, toolMode: 'return' });
// Expected: if the model decides to call get_weather, result.toolCalls has one entry; result.content may be empty.
```

## Tracking

Part of #510. Sub-issue 3 of 6.

---

🤖 Generated with [Claude Code](https://claude.com/claude-code)



Path	Change
`resources/models/backends/openai.ts`	new — `OpenAIBackend` class
`resources/models/backends/index.ts`	extended — register `openai` factory
`package.json`	new dep — `openai` (pinned version)
`test/models/openai.test.ts`	new — integration tests (mocked HTTP in CI; live test against real API behind an env-gated flag)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Models] Phase 3 — openai backend (+ toolMode: 'return') #630

Scope

API surface

`toolMode: 'return'` semantics

Implementation notes

Files

Acceptance criteria

Out of scope

Stacks on

Hard prerequisites

Branch & PR conventions

Smoke test

Tracking

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Models] Phase 3 — openai backend (+ toolMode: 'return') #630

Description

Scope

API surface

toolMode: 'return' semantics

Implementation notes

Files

Acceptance criteria

Out of scope

Stacks on

Hard prerequisites

Branch & PR conventions

Smoke test

Tracking

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`toolMode: 'return'` semantics