[Models] Phase 4 — OpenAI-compatible /v1/* gateway as built-in Resources


## Scope

Expose `POST /v1/embeddings` and `POST /v1/chat/completions` on the REST port so unmodified LangChain.js, Vercel AI SDK, OpenAI SDK, MCP-sampling clients, etc. can hit Harper for inference. The endpoints are thin protocol-translation wrappers over `scope.models`.

What ships:

- Two built-in Resources registered via `scope.resources.set(path, ResourceClass)`, dispatched through the REST port (`HTTP_PORT`, 9926) like any other Resource.
- Request-body translation (OpenAI shape ↔ internal `EmbedOpts` / `GenerateOpts` shape).
- SSE streaming on `/v1/chat/completions` for `stream: true`, using `openaiStream()` from **#514** to format the OpenAI chunk-delta envelope (including the terminal `[DONE]` sentinel).
- OpenAI-shape error responses (`{ error: { message, type, code, param } }`) for genuine drop-in compatibility.
- `GET /v1/models` — minimal endpoint listing registered backends' advertised model names so clients that probe for available models work.
- Tenant scoping resolved from `request.user` via the same accounting path Phase 1 set up.

## Why this is in core

The issue's Acceptance section asserts "Unmodified LangChain.js / OpenAI SDK client successfully completes a chat against Harper." That requires `/v1/*` to actually exist; this is where it lives.

This is the single highest-leverage external-adoption piece in #510 — LangChain, Vercel AI SDK, OpenAI SDK, MCP sampling, and many internal tools speak the OpenAI shape natively.

## Registration pattern

Built-in Resource registration matches the existing precedent at `resources/login.ts:4`:

```typescript
// resources/models/v1/index.ts
export function handleApplication(scope: Scope) {
  scope.resources.set('v1/embeddings', V1Embeddings);
  scope.resources.set('v1/chat/completions', V1ChatCompletions);
  scope.resources.set('v1/models', V1Models);
}

class V1Embeddings extends Resource {
  static async post(_target, body, request) {
    // Translate OpenAI shape → scope.models.embed()
    // Return OpenAI-shape response
  }
}

class V1ChatCompletions extends Resource {
  static async post(_target, body, request) {
    // Translate OpenAI shape → scope.models.generate() / generateStream()
    // For stream: true, yield via openaiStream() from #514
  }
}

class V1Models extends Resource {
  static async get(_target, _request) {
    // Enumerate registered backends' advertised model names
  }
}
```

Resources are dispatched through Harper's existing Resource pipeline — auth chain, ALS context, transaction handling, audit, `analytics.model_call` accounting all inherit automatically. No parallel infrastructure.

Earlier framing of this work pointed at `server/operationsServer.ts` route registration; that was the operations port (9925), not the REST port (9926). `/v1/*` lives on REST per the issue's "same port as REST" requirement, via the Resources registry — confirmed precedent at `resources/login.ts:4`.

## Translation surface

| OpenAI request field | Internal mapping |
|---|---|
| `model` | `EmbedOpts.model` / `GenerateOpts.model` |
| `messages[]` | `GenerateInput` (Message[] variant) |
| `tools[]` | `GenerateOpts.tools` |
| `tool_choice` | `GenerateOpts.toolMode` ('return' for explicit choice; 'auto' deferred to #612) |
| `temperature`, `max_tokens` (or `max_completion_tokens`) | `GenerateOpts.temperature`, `GenerateOpts.maxTokens` |
| `response_format` | `GenerateOpts.responseFormat` |
| `stream: true` | dispatches to `scope.models.generateStream()` instead of `generate()` |
| `Authorization: Bearer <token>` | resolved by Harper's existing auth chain (same as `POST /` operations route) |

| OpenAI response field | Internal source |
|---|---|
| `choices[0].message.content` | `GenerateResult.content` |
| `choices[0].message.tool_calls` | `GenerateResult.toolCalls` |
| `choices[0].finish_reason` | `GenerateResult.finishReason` |
| `usage.prompt_tokens` / `usage.completion_tokens` | `TokenUsage.promptTokens` / `completionTokens` |
| `id`, `created`, `object` | generated per-call |

For streaming: `openaiStream()` from #514 wraps the internal `AsyncIterable<GenerateChunk>` and produces the OpenAI-shape SSE delta envelope plus the terminal `[DONE]` sentinel. Harper's existing SSE serializer at `server/serverHelpers/contentTypes.ts:104-138` does the wire framing — no new streaming infrastructure.

## Error response shape

OpenAI clients consume errors as `{ error: { message, type, code, param } }`. For genuine drop-in compatibility, Phase 4 emits this shape on errors (auth failures, validation errors, backend failures, etc.). A small translation layer at the Resource boundary maps Harper's internal errors to the OpenAI shape.

## `GET /v1/models`

Some clients (notably LangChain.js's discovery flow) call `GET /v1/models` to enumerate available models. Returns:

```json
{
  "object": "list",
  "data": [
    { "id": "<model-name>", "object": "model", "owned_by": "<backend-name>" },
    ...
  ]
}
```

Sourced from registered backends' advertised model names via the registry.

## Tenant scoping

`request.user` populated by Harper's existing auth chain → tenant resolved per Phase 1's `extractTenantId()` stub (free-form string until the tenant-concept question resolves; see comment thread on #510). Tenant flows into `hdb_model_calls` per the standard accounting path.

## Files

| Path | Change |
|---|---|
| `resources/models/v1/index.ts` | new — `handleApplication` registers the three Resources |
| `resources/models/v1/embeddings.ts` | new — `V1Embeddings` Resource |
| `resources/models/v1/chatCompletions.ts` | new — `V1ChatCompletions` Resource |
| `resources/models/v1/models.ts` | new — `V1Models` Resource |
| `resources/models/v1/translation.ts` | new — OpenAI ↔ internal shape mappers |
| `resources/models/v1/errors.ts` | new — OpenAI-shape error translation |

## Acceptance criteria

- [ ] `POST /v1/embeddings` accepts OpenAI-shape requests, routes to `scope.models.embed()`, returns OpenAI-shape responses.
- [ ] `POST /v1/chat/completions` accepts OpenAI-shape requests, routes to `scope.models.generate()` / `generateStream()`, returns OpenAI-shape responses.
- [ ] `stream: true` on chat completions returns SSE with the OpenAI delta envelope and `[DONE]` sentinel.
- [ ] `tool_choice: 'auto'` requests with `tools[]` set work in `'return'` mode (caller-resolved) — model's tool-call requests reach the OpenAI client in `choices[0].message.tool_calls`.
- [ ] `GET /v1/models` enumerates registered backends' models.
- [ ] OpenAI-shape error responses on all error paths.
- [ ] Auth chain enforced — `Authorization: Bearer` header validated against Harper's user table; unauthenticated requests return OpenAI-shape 401.
- [ ] Per-call accounting flows through `hdb_model_calls` (because the Resources call `scope.models.*` under the hood; Phase 1's writer handles it).
- [ ] `AbortSignal` propagation: an OpenAI SSE client that disconnects mid-stream cancels the upstream call. (Uses `request.signal` from #513 → ALS → backend.)
- [ ] **End-to-end test**: an unmodified LangChain.js or OpenAI SDK client successfully completes a chat against a Harper instance with an OpenAI backend configured.
- [ ] CI green (unit + integration + 3 Node versions).

## Out of scope

- `tool_choice: 'auto'` with orchestrated tool execution — that's the `toolMode: 'auto'` orchestration in **#612**; this phase only ships the `'return'` path.
- Rate-limit response headers (`x-ratelimit-remaining-tokens`, etc.) — reserve the header-mutation pathway in the Resource; a future component / phase can populate them once rate limiting is built.
- OpenAI Assistants / Files / Images / Audio endpoints — only `/v1/embeddings` and `/v1/chat/completions` in v1.
- `Last-Event-ID` replay for SSE — that's #515; pairs with #511 conversation feeds, not this gateway.

## Stacks on

- **Phase 1** (#628) — for `scope.models` facade.
- **Phase 3** (#630) — for the `openai` backend that validates the OpenAI-shape translation end-to-end (since the gateway converts requests for backends, the test bar is "an OpenAI-shape backend can be driven via the gateway").

## Hard prerequisites

- **#513** (`request.signal`) — streaming cancellation depends on it.
- **#514** (`openaiStream()` formatter helper) — used to format the SSE delta envelope.

## Branch & PR conventions

- Branch: `feat/models-v1-gateway`
- PR base: `main` (after Phases 1 and 3 merge).
- Closes this issue via `Closes #<self>`; references #510 via `Tracking: #510`.

## Smoke test

```bash
# Prerequisites:
# - Phase 1 + Phase 3 (openai backend) merged and Harper running on REST port 9926
# - An OpenAI backend configured as default generative model
# - A valid Harper auth token

# Test with the OpenAI Python SDK pointed at Harper:
python3 -c '
import openai
client = openai.OpenAI(api_key="<harper-auth-token>", base_url="http://localhost:9926/v1")
r = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "say hello in three words"}],
)
print(r.choices[0].message.content)
'

# Expected: a short response from the configured model.
# Verify: SELECT * FROM system.hdb_model_calls WHERE backend = 'openai' ORDER BY $createdtime DESC LIMIT 1
#   shows the call attributed to the authenticated tenant, method='generate'.

# Streaming smoke:
python3 -c '
import openai
client = openai.OpenAI(api_key="<harper-auth-token>", base_url="http://localhost:9926/v1")
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "count to five"}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")
print()
'

# Expected: streaming text output, terminated cleanly.
```

## Tracking

Part of #510. Sub-issue 4 of 6.

---

🤖 Generated with [Claude Code](https://claude.com/claude-code)


OpenAI request field	Internal mapping
`model`	`EmbedOpts.model` / `GenerateOpts.model`
`messages[]`	`GenerateInput` (Message[] variant)
`tools[]`	`GenerateOpts.tools`
`tool_choice`	`GenerateOpts.toolMode` ('return' for explicit choice; 'auto' deferred to #612)
`temperature`, `max_tokens` (or `max_completion_tokens`)	`GenerateOpts.temperature`, `GenerateOpts.maxTokens`
`response_format`	`GenerateOpts.responseFormat`
`stream: true`	dispatches to `scope.models.generateStream()` instead of `generate()`
`Authorization: Bearer <token>`	resolved by Harper's existing auth chain (same as `POST /` operations route)

OpenAI response field	Internal source
`choices[0].message.content`	`GenerateResult.content`
`choices[0].message.tool_calls`	`GenerateResult.toolCalls`
`choices[0].finish_reason`	`GenerateResult.finishReason`
`usage.prompt_tokens` / `usage.completion_tokens`	`TokenUsage.promptTokens` / `completionTokens`
`id`, `created`, `object`	generated per-call

Path	Change
`resources/models/v1/index.ts`	new — `handleApplication` registers the three Resources
`resources/models/v1/embeddings.ts`	new — `V1Embeddings` Resource
`resources/models/v1/chatCompletions.ts`	new — `V1ChatCompletions` Resource
`resources/models/v1/models.ts`	new — `V1Models` Resource
`resources/models/v1/translation.ts`	new — OpenAI ↔ internal shape mappers
`resources/models/v1/errors.ts`	new — OpenAI-shape error translation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Models] Phase 4 — OpenAI-compatible /v1/* gateway as built-in Resources #631

Scope

Why this is in core

Registration pattern

Translation surface

Error response shape

`GET /v1/models`

Tenant scoping

Files

Acceptance criteria

Out of scope

Stacks on

Hard prerequisites

Branch & PR conventions

Smoke test

Tracking

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Models] Phase 4 — OpenAI-compatible /v1/* gateway as built-in Resources #631

Description

Scope

Why this is in core

Registration pattern

Translation surface

Error response shape

GET /v1/models

Tenant scoping

Files

Acceptance criteria

Out of scope

Stacks on

Hard prerequisites

Branch & PR conventions

Smoke test

Tracking

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

`GET /v1/models`