You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Expose POST /v1/embeddings and POST /v1/chat/completions on the REST port so unmodified LangChain.js, Vercel AI SDK, OpenAI SDK, MCP-sampling clients, etc. can hit Harper for inference. The endpoints are thin protocol-translation wrappers over scope.models.
What ships:
Two built-in Resources registered via scope.resources.set(path, ResourceClass), dispatched through the REST port (HTTP_PORT, 9926) like any other Resource.
GET /v1/models — minimal endpoint listing registered backends' advertised model names so clients that probe for available models work.
Tenant scoping resolved from request.user via the same accounting path Phase 1 set up.
Why this is in core
The issue's Acceptance section asserts "Unmodified LangChain.js / OpenAI SDK client successfully completes a chat against Harper." That requires /v1/* to actually exist; this is where it lives.
This is the single highest-leverage external-adoption piece in #510 — LangChain, Vercel AI SDK, OpenAI SDK, MCP sampling, and many internal tools speak the OpenAI shape natively.
Registration pattern
Built-in Resource registration matches the existing precedent at resources/login.ts:4:
// resources/models/v1/index.tsexportfunctionhandleApplication(scope: Scope){scope.resources.set('v1/embeddings',V1Embeddings);scope.resources.set('v1/chat/completions',V1ChatCompletions);scope.resources.set('v1/models',V1Models);}classV1EmbeddingsextendsResource{staticasyncpost(_target,body,request){// Translate OpenAI shape → scope.models.embed()// Return OpenAI-shape response}}classV1ChatCompletionsextendsResource{staticasyncpost(_target,body,request){// Translate OpenAI shape → scope.models.generate() / generateStream()// For stream: true, yield via openaiStream() from #514}}classV1ModelsextendsResource{staticasyncget(_target,_request){// Enumerate registered backends' advertised model names}}
Resources are dispatched through Harper's existing Resource pipeline — auth chain, ALS context, transaction handling, audit, analytics.model_call accounting all inherit automatically. No parallel infrastructure.
Earlier framing of this work pointed at server/operationsServer.ts route registration; that was the operations port (9925), not the REST port (9926). /v1/* lives on REST per the issue's "same port as REST" requirement, via the Resources registry — confirmed precedent at resources/login.ts:4.
Translation surface
OpenAI request field
Internal mapping
model
EmbedOpts.model / GenerateOpts.model
messages[]
GenerateInput (Message[] variant)
tools[]
GenerateOpts.tools
tool_choice
GenerateOpts.toolMode ('return' for explicit choice; 'auto' deferred to #612)
dispatches to scope.models.generateStream() instead of generate()
Authorization: Bearer <token>
resolved by Harper's existing auth chain (same as POST / operations route)
OpenAI response field
Internal source
choices[0].message.content
GenerateResult.content
choices[0].message.tool_calls
GenerateResult.toolCalls
choices[0].finish_reason
GenerateResult.finishReason
usage.prompt_tokens / usage.completion_tokens
TokenUsage.promptTokens / completionTokens
id, created, object
generated per-call
For streaming: openaiStream() from #514 wraps the internal AsyncIterable<GenerateChunk> and produces the OpenAI-shape SSE delta envelope plus the terminal [DONE] sentinel. Harper's existing SSE serializer at server/serverHelpers/contentTypes.ts:104-138 does the wire framing — no new streaming infrastructure.
Error response shape
OpenAI clients consume errors as { error: { message, type, code, param } }. For genuine drop-in compatibility, Phase 4 emits this shape on errors (auth failures, validation errors, backend failures, etc.). A small translation layer at the Resource boundary maps Harper's internal errors to the OpenAI shape.
GET /v1/models
Some clients (notably LangChain.js's discovery flow) call GET /v1/models to enumerate available models. Returns:
Sourced from registered backends' advertised model names via the registry.
Tenant scoping
request.user populated by Harper's existing auth chain → tenant resolved per Phase 1's extractTenantId() stub (free-form string until the tenant-concept question resolves; see comment thread on #510). Tenant flows into hdb_model_calls per the standard accounting path.
Files
Path
Change
resources/models/v1/index.ts
new — handleApplication registers the three Resources
resources/models/v1/embeddings.ts
new — V1Embeddings Resource
resources/models/v1/chatCompletions.ts
new — V1ChatCompletions Resource
resources/models/v1/models.ts
new — V1Models Resource
resources/models/v1/translation.ts
new — OpenAI ↔ internal shape mappers
resources/models/v1/errors.ts
new — OpenAI-shape error translation
Acceptance criteria
POST /v1/embeddings accepts OpenAI-shape requests, routes to scope.models.embed(), returns OpenAI-shape responses.
POST /v1/chat/completions accepts OpenAI-shape requests, routes to scope.models.generate() / generateStream(), returns OpenAI-shape responses.
stream: true on chat completions returns SSE with the OpenAI delta envelope and [DONE] sentinel.
tool_choice: 'auto' requests with tools[] set work in 'return' mode (caller-resolved) — model's tool-call requests reach the OpenAI client in choices[0].message.tool_calls.
GET /v1/models enumerates registered backends' models.
OpenAI-shape error responses on all error paths.
Auth chain enforced — Authorization: Bearer header validated against Harper's user table; unauthenticated requests return OpenAI-shape 401.
Per-call accounting flows through hdb_model_calls (because the Resources call scope.models.* under the hood; Phase 1's writer handles it).
End-to-end test: an unmodified LangChain.js or OpenAI SDK client successfully completes a chat against a Harper instance with an OpenAI backend configured.
Rate-limit response headers (x-ratelimit-remaining-tokens, etc.) — reserve the header-mutation pathway in the Resource; a future component / phase can populate them once rate limiting is built.
OpenAI Assistants / Files / Images / Audio endpoints — only /v1/embeddings and /v1/chat/completions in v1.
Phase 3 ([Models] Phase 3 — openai backend (+ toolMode: 'return') #630) — for the openai backend that validates the OpenAI-shape translation end-to-end (since the gateway converts requests for backends, the test bar is "an OpenAI-shape backend can be driven via the gateway").
# Prerequisites:# - Phase 1 + Phase 3 (openai backend) merged and Harper running on REST port 9926# - An OpenAI backend configured as default generative model# - A valid Harper auth token# Test with the OpenAI Python SDK pointed at Harper:
python3 -c 'import openaiclient = openai.OpenAI(api_key="<harper-auth-token>", base_url="http://localhost:9926/v1")r = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": "say hello in three words"}],)print(r.choices[0].message.content)'# Expected: a short response from the configured model.# Verify: SELECT * FROM system.hdb_model_calls WHERE backend = 'openai' ORDER BY $createdtime DESC LIMIT 1# shows the call attributed to the authenticated tenant, method='generate'.# Streaming smoke:
python3 -c 'import openaiclient = openai.OpenAI(api_key="<harper-auth-token>", base_url="http://localhost:9926/v1")stream = client.chat.completions.create( model="gpt-4o-mini", messages=[{"role": "user", "content": "count to five"}], stream=True,)for chunk in stream: print(chunk.choices[0].delta.content or "", end="")print()'# Expected: streaming text output, terminated cleanly.
Scope
Expose
POST /v1/embeddingsandPOST /v1/chat/completionson the REST port so unmodified LangChain.js, Vercel AI SDK, OpenAI SDK, MCP-sampling clients, etc. can hit Harper for inference. The endpoints are thin protocol-translation wrappers overscope.models.What ships:
scope.resources.set(path, ResourceClass), dispatched through the REST port (HTTP_PORT, 9926) like any other Resource.EmbedOpts/GenerateOptsshape)./v1/chat/completionsforstream: true, usingopenaiStream()from Add openaiStream() formatter helper for OpenAI-compatible SSE streaming #514 to format the OpenAI chunk-delta envelope (including the terminal[DONE]sentinel).{ error: { message, type, code, param } }) for genuine drop-in compatibility.GET /v1/models— minimal endpoint listing registered backends' advertised model names so clients that probe for available models work.request.uservia the same accounting path Phase 1 set up.Why this is in core
The issue's Acceptance section asserts "Unmodified LangChain.js / OpenAI SDK client successfully completes a chat against Harper." That requires
/v1/*to actually exist; this is where it lives.This is the single highest-leverage external-adoption piece in #510 — LangChain, Vercel AI SDK, OpenAI SDK, MCP sampling, and many internal tools speak the OpenAI shape natively.
Registration pattern
Built-in Resource registration matches the existing precedent at
resources/login.ts:4:Resources are dispatched through Harper's existing Resource pipeline — auth chain, ALS context, transaction handling, audit,
analytics.model_callaccounting all inherit automatically. No parallel infrastructure.Earlier framing of this work pointed at
server/operationsServer.tsroute registration; that was the operations port (9925), not the REST port (9926)./v1/*lives on REST per the issue's "same port as REST" requirement, via the Resources registry — confirmed precedent atresources/login.ts:4.Translation surface
modelEmbedOpts.model/GenerateOpts.modelmessages[]GenerateInput(Message[] variant)tools[]GenerateOpts.toolstool_choiceGenerateOpts.toolMode('return' for explicit choice; 'auto' deferred to #612)temperature,max_tokens(ormax_completion_tokens)GenerateOpts.temperature,GenerateOpts.maxTokensresponse_formatGenerateOpts.responseFormatstream: truescope.models.generateStream()instead ofgenerate()Authorization: Bearer <token>POST /operations route)choices[0].message.contentGenerateResult.contentchoices[0].message.tool_callsGenerateResult.toolCallschoices[0].finish_reasonGenerateResult.finishReasonusage.prompt_tokens/usage.completion_tokensTokenUsage.promptTokens/completionTokensid,created,objectFor streaming:
openaiStream()from #514 wraps the internalAsyncIterable<GenerateChunk>and produces the OpenAI-shape SSE delta envelope plus the terminal[DONE]sentinel. Harper's existing SSE serializer atserver/serverHelpers/contentTypes.ts:104-138does the wire framing — no new streaming infrastructure.Error response shape
OpenAI clients consume errors as
{ error: { message, type, code, param } }. For genuine drop-in compatibility, Phase 4 emits this shape on errors (auth failures, validation errors, backend failures, etc.). A small translation layer at the Resource boundary maps Harper's internal errors to the OpenAI shape.GET /v1/modelsSome clients (notably LangChain.js's discovery flow) call
GET /v1/modelsto enumerate available models. Returns:{ "object": "list", "data": [ { "id": "<model-name>", "object": "model", "owned_by": "<backend-name>" }, ... ] }Sourced from registered backends' advertised model names via the registry.
Tenant scoping
request.userpopulated by Harper's existing auth chain → tenant resolved per Phase 1'sextractTenantId()stub (free-form string until the tenant-concept question resolves; see comment thread on #510). Tenant flows intohdb_model_callsper the standard accounting path.Files
resources/models/v1/index.tshandleApplicationregisters the three Resourcesresources/models/v1/embeddings.tsV1EmbeddingsResourceresources/models/v1/chatCompletions.tsV1ChatCompletionsResourceresources/models/v1/models.tsV1ModelsResourceresources/models/v1/translation.tsresources/models/v1/errors.tsAcceptance criteria
POST /v1/embeddingsaccepts OpenAI-shape requests, routes toscope.models.embed(), returns OpenAI-shape responses.POST /v1/chat/completionsaccepts OpenAI-shape requests, routes toscope.models.generate()/generateStream(), returns OpenAI-shape responses.stream: trueon chat completions returns SSE with the OpenAI delta envelope and[DONE]sentinel.tool_choice: 'auto'requests withtools[]set work in'return'mode (caller-resolved) — model's tool-call requests reach the OpenAI client inchoices[0].message.tool_calls.GET /v1/modelsenumerates registered backends' models.Authorization: Bearerheader validated against Harper's user table; unauthenticated requests return OpenAI-shape 401.hdb_model_calls(because the Resources callscope.models.*under the hood; Phase 1's writer handles it).AbortSignalpropagation: an OpenAI SSE client that disconnects mid-stream cancels the upstream call. (Usesrequest.signalfrom Expose request.signal (AbortSignal) for Resource methods #513 → ALS → backend.)Out of scope
tool_choice: 'auto'with orchestrated tool execution — that's thetoolMode: 'auto'orchestration in Add agent-loop orchestration /toolMode: 'auto'toscope.models#612; this phase only ships the'return'path.x-ratelimit-remaining-tokens, etc.) — reserve the header-mutation pathway in the Resource; a future component / phase can populate them once rate limiting is built./v1/embeddingsand/v1/chat/completionsin v1.Last-Event-IDreplay for SSE — that's Forward Last-Event-ID into Resource methods for replay-capable SSE streams #515; pairs with Add ConversationResource for agent memory and conversation state #511 conversation feeds, not this gateway.Stacks on
scope.modelsfacade.openaibackend that validates the OpenAI-shape translation end-to-end (since the gateway converts requests for backends, the test bar is "an OpenAI-shape backend can be driven via the gateway").Hard prerequisites
request.signal) — streaming cancellation depends on it.openaiStream()formatter helper) — used to format the SSE delta envelope.Branch & PR conventions
feat/models-v1-gatewaymain(after Phases 1 and 3 merge).Closes #<self>; references Add unified model-access API (scope.models) #510 viaTracking: #510.Smoke test
Tracking
Part of #510. Sub-issue 4 of 6.
🤖 Generated with Claude Code