Skip to content

[Models] Phase 4 — OpenAI-compatible /v1/* gateway as built-in Resources #631

@heskew

Description

@heskew

Scope

Expose POST /v1/embeddings and POST /v1/chat/completions on the REST port so unmodified LangChain.js, Vercel AI SDK, OpenAI SDK, MCP-sampling clients, etc. can hit Harper for inference. The endpoints are thin protocol-translation wrappers over scope.models.

What ships:

  • Two built-in Resources registered via scope.resources.set(path, ResourceClass), dispatched through the REST port (HTTP_PORT, 9926) like any other Resource.
  • Request-body translation (OpenAI shape ↔ internal EmbedOpts / GenerateOpts shape).
  • SSE streaming on /v1/chat/completions for stream: true, using openaiStream() from Add openaiStream() formatter helper for OpenAI-compatible SSE streaming #514 to format the OpenAI chunk-delta envelope (including the terminal [DONE] sentinel).
  • OpenAI-shape error responses ({ error: { message, type, code, param } }) for genuine drop-in compatibility.
  • GET /v1/models — minimal endpoint listing registered backends' advertised model names so clients that probe for available models work.
  • Tenant scoping resolved from request.user via the same accounting path Phase 1 set up.

Why this is in core

The issue's Acceptance section asserts "Unmodified LangChain.js / OpenAI SDK client successfully completes a chat against Harper." That requires /v1/* to actually exist; this is where it lives.

This is the single highest-leverage external-adoption piece in #510 — LangChain, Vercel AI SDK, OpenAI SDK, MCP sampling, and many internal tools speak the OpenAI shape natively.

Registration pattern

Built-in Resource registration matches the existing precedent at resources/login.ts:4:

// resources/models/v1/index.ts
export function handleApplication(scope: Scope) {
  scope.resources.set('v1/embeddings', V1Embeddings);
  scope.resources.set('v1/chat/completions', V1ChatCompletions);
  scope.resources.set('v1/models', V1Models);
}

class V1Embeddings extends Resource {
  static async post(_target, body, request) {
    // Translate OpenAI shape → scope.models.embed()
    // Return OpenAI-shape response
  }
}

class V1ChatCompletions extends Resource {
  static async post(_target, body, request) {
    // Translate OpenAI shape → scope.models.generate() / generateStream()
    // For stream: true, yield via openaiStream() from #514
  }
}

class V1Models extends Resource {
  static async get(_target, _request) {
    // Enumerate registered backends' advertised model names
  }
}

Resources are dispatched through Harper's existing Resource pipeline — auth chain, ALS context, transaction handling, audit, analytics.model_call accounting all inherit automatically. No parallel infrastructure.

Earlier framing of this work pointed at server/operationsServer.ts route registration; that was the operations port (9925), not the REST port (9926). /v1/* lives on REST per the issue's "same port as REST" requirement, via the Resources registry — confirmed precedent at resources/login.ts:4.

Translation surface

OpenAI request field Internal mapping
model EmbedOpts.model / GenerateOpts.model
messages[] GenerateInput (Message[] variant)
tools[] GenerateOpts.tools
tool_choice GenerateOpts.toolMode ('return' for explicit choice; 'auto' deferred to #612)
temperature, max_tokens (or max_completion_tokens) GenerateOpts.temperature, GenerateOpts.maxTokens
response_format GenerateOpts.responseFormat
stream: true dispatches to scope.models.generateStream() instead of generate()
Authorization: Bearer <token> resolved by Harper's existing auth chain (same as POST / operations route)
OpenAI response field Internal source
choices[0].message.content GenerateResult.content
choices[0].message.tool_calls GenerateResult.toolCalls
choices[0].finish_reason GenerateResult.finishReason
usage.prompt_tokens / usage.completion_tokens TokenUsage.promptTokens / completionTokens
id, created, object generated per-call

For streaming: openaiStream() from #514 wraps the internal AsyncIterable<GenerateChunk> and produces the OpenAI-shape SSE delta envelope plus the terminal [DONE] sentinel. Harper's existing SSE serializer at server/serverHelpers/contentTypes.ts:104-138 does the wire framing — no new streaming infrastructure.

Error response shape

OpenAI clients consume errors as { error: { message, type, code, param } }. For genuine drop-in compatibility, Phase 4 emits this shape on errors (auth failures, validation errors, backend failures, etc.). A small translation layer at the Resource boundary maps Harper's internal errors to the OpenAI shape.

GET /v1/models

Some clients (notably LangChain.js's discovery flow) call GET /v1/models to enumerate available models. Returns:

{
  "object": "list",
  "data": [
    { "id": "<model-name>", "object": "model", "owned_by": "<backend-name>" },
    ...
  ]
}

Sourced from registered backends' advertised model names via the registry.

Tenant scoping

request.user populated by Harper's existing auth chain → tenant resolved per Phase 1's extractTenantId() stub (free-form string until the tenant-concept question resolves; see comment thread on #510). Tenant flows into hdb_model_calls per the standard accounting path.

Files

Path Change
resources/models/v1/index.ts new — handleApplication registers the three Resources
resources/models/v1/embeddings.ts new — V1Embeddings Resource
resources/models/v1/chatCompletions.ts new — V1ChatCompletions Resource
resources/models/v1/models.ts new — V1Models Resource
resources/models/v1/translation.ts new — OpenAI ↔ internal shape mappers
resources/models/v1/errors.ts new — OpenAI-shape error translation

Acceptance criteria

  • POST /v1/embeddings accepts OpenAI-shape requests, routes to scope.models.embed(), returns OpenAI-shape responses.
  • POST /v1/chat/completions accepts OpenAI-shape requests, routes to scope.models.generate() / generateStream(), returns OpenAI-shape responses.
  • stream: true on chat completions returns SSE with the OpenAI delta envelope and [DONE] sentinel.
  • tool_choice: 'auto' requests with tools[] set work in 'return' mode (caller-resolved) — model's tool-call requests reach the OpenAI client in choices[0].message.tool_calls.
  • GET /v1/models enumerates registered backends' models.
  • OpenAI-shape error responses on all error paths.
  • Auth chain enforced — Authorization: Bearer header validated against Harper's user table; unauthenticated requests return OpenAI-shape 401.
  • Per-call accounting flows through hdb_model_calls (because the Resources call scope.models.* under the hood; Phase 1's writer handles it).
  • AbortSignal propagation: an OpenAI SSE client that disconnects mid-stream cancels the upstream call. (Uses request.signal from Expose request.signal (AbortSignal) for Resource methods #513 → ALS → backend.)
  • End-to-end test: an unmodified LangChain.js or OpenAI SDK client successfully completes a chat against a Harper instance with an OpenAI backend configured.
  • CI green (unit + integration + 3 Node versions).

Out of scope

Stacks on

Hard prerequisites

Branch & PR conventions

Smoke test

# Prerequisites:
# - Phase 1 + Phase 3 (openai backend) merged and Harper running on REST port 9926
# - An OpenAI backend configured as default generative model
# - A valid Harper auth token

# Test with the OpenAI Python SDK pointed at Harper:
python3 -c '
import openai
client = openai.OpenAI(api_key="<harper-auth-token>", base_url="http://localhost:9926/v1")
r = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "say hello in three words"}],
)
print(r.choices[0].message.content)
'

# Expected: a short response from the configured model.
# Verify: SELECT * FROM system.hdb_model_calls WHERE backend = 'openai' ORDER BY $createdtime DESC LIMIT 1
#   shows the call attributed to the authenticated tenant, method='generate'.

# Streaming smoke:
python3 -c '
import openai
client = openai.OpenAI(api_key="<harper-auth-token>", base_url="http://localhost:9926/v1")
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "count to five"}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")
print()
'

# Expected: streaming text output, terminated cleanly.

Tracking

Part of #510. Sub-issue 4 of 6.


🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions