feat(models): Phase 2 — ollama backend#651
Conversation
|
Reviewed; no blockers found. |
Two findings from claude-bot's inline PR review on #651: - Integration `'AbortSignal cancels an in-flight stream'`: replace `ok(rejected || true)` (a tautology — asserts nothing) with a real termination check. Race the iterator drain against a 5 s deadline so the actual failure mode being guarded (hung stream after abort) fails the test instead of timing the suite out. - Unit `'throws OllamaBackendError when a stream line exceeds the byte cap'`: `1 << 20 + 1` evaluated to `1 << 21` (2 MiB) due to operator precedence — `+` binds tighter than `<<`. Prettier's autofix parenthesized the wrong side (`1 << (20 + 1)`, same value). Re-parenthesize to `(1 << 20) + 1` (1 MiB + 1 byte) — exactly one byte past the cap, the comment now matches the allocation, and the test memory footprint halves. Tracking: #629, #510 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
First real `ModelBackend` against the Ollama HTTP API: `embed` via `/api/embed`, `generate` via `/api/generate` or `/api/chat`, and `generateStream` via NDJSON over chunked HTTP. Backend lives under `components/<name>/` matching the pattern from the MCP foundation (PR #649) — core imports `registerOllamaBackend` and calls it during boot; the file is not a `handleApplication(scope)` self-loader. Capabilities advertise `tools: false` and `adapters: false`. Ollama tool support exists on some models but is uneven across the catalog; the v1 portability guarantee keeps it off here. Validates the Phase 1 (#628 / PR #638) `ModelBackend` interface against a non-trivial real provider without external dependencies in CI. Tracking: #629, #510 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a `models:` block to `harperdb-config.yaml` (presence-gated, matches
the `replication:` and `mcp:` conventions), validates it via Joi with
`.unknown(false)` so field typos block boot instead of silently skipping,
and dispatches per-entry registration through a factory map in
`resources/models/bootstrap.ts`.
Boot site: `components/componentLoader.ts` calls `bootstrapModels(config)`
once the root config is loaded and before per-component iteration, so
`scope.models.embed(...)` works from `handleApplication(scope)` as well
as from Resource methods.
The factory map (`{ ollama: registerOllamaBackend }`) is hardcoded for
v1. Unknown backends are logged at error level (not warn) and skipped —
silently registering nothing on an opt-in feature is a footgun. Schema
validation catches field-name typos before the factory runs.
`requestTimeoutMs: 0` is rejected by the schema (`min(1)`): omit the
field for "no timeout" so the meaning is unambiguous at the YAML layer.
Tracking: #629, #510
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `models` to `getHarperExports` so user code can call `harper.models.embed(...)` / `.generate(...)` / `.generateStream(...)` from anywhere a component runs — `handleApplication(scope)` init code, Resource methods, internal jobs. Uses a module-singleton `new Models()` rather than reaching into the per-`Scope` instance Phase 1 wires in `components/Scope.ts`. The `Models` facade has no per-Scope state — the backend registry is module-scope and the analytics writer is a process-singleton — so the two are equivalent in behavior. The singleton sidesteps wiring `Scope` references through `getHarperExports` (which only sees `ApplicationScope`) without touching Phase 1's existing wiring while #638 is still in review. Tracking: #629, #510 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Unit tests (mocked `fetch`): - `OllamaBackend` capability shape, host normalization, wire format for `/api/embed`, `/api/generate`, `/api/chat`, NDJSON streaming including split-line and oversize-line handling. - AbortSignal propagation: caller-only, composed via `AbortSignal.any` with a per-call timeout, and abort-while-pending. - Robust response handling: vector-count mismatch, non-finite vector values, non-finite / non-integer / negative token counts, non-string `content` fields, non-JSON response bodies — all surface as `OllamaBackendError`. - NDJSON error messages are static so upstream-derived content cannot leak through the thrown error. `bootstrap.ts` factory dispatch: ollama embedding + generative registration under arbitrary logical names, unknown-backend skip, bad entry shapes skipped without throwing. `configValidator` `models:` block coverage: missing `backend`, bad `requestTimeoutMs` (non-numeric, negative, `0`), typo'd field names (`.unknown(false)` rejection), multi-logical-name acceptance. Integration test (`integrationTests/server/ollama-backend.test.ts`): exercises `OllamaBackend` end-to-end against a real local Ollama, gated on reachability + presence of `OLLAMA_EMBED_MODEL` and `OLLAMA_GENERATE_MODEL` in the local `/api/tags`. Skips silently when unmet so CI without an Ollama provisioned passes. Validates that the mocked wire format used in unit tests matches what Ollama actually produces. Tracking: #629, #510 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Run prettier on the four ollama-related files (CI Format Check missed these locally before push). - Defer the integration test's `OllamaBackend` import to a dynamic `await import(...)` inside `before()`. Statically importing from `components/ollama/` triggers a pre-existing CJS require cycle (`utility/common_utils.ts` ↔ `utility/logging/harper_logger.ts`) when loaded by `node --test`, which is fatal on Node 22+ (`ERR_REQUIRE_CYCLE_MODULE`). Other integration tests don't hit it because they only import from `@harperfast/integration-testing` and spawn Harper as a subprocess. Tracking: #629, #510 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two findings from claude-bot's inline PR review on #651: - Integration `'AbortSignal cancels an in-flight stream'`: replace `ok(rejected || true)` (a tautology — asserts nothing) with a real termination check. Race the iterator drain against a 5 s deadline so the actual failure mode being guarded (hung stream after abort) fails the test instead of timing the suite out. - Unit `'throws OllamaBackendError when a stream line exceeds the byte cap'`: `1 << 20 + 1` evaluated to `1 << 21` (2 MiB) due to operator precedence — `+` binds tighter than `<<`. Prettier's autofix parenthesized the wrong side (`1 << (20 + 1)`, same value). Re-parenthesize to `(1 << 20) + 1` (1 MiB + 1 byte) — exactly one byte past the cap, the comment now matches the allocation, and the test memory footprint halves. Tracking: #629, #510 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
fa8ae69 to
ddb8c19
Compare
heskew
left a comment
There was a problem hiding this comment.
Deep Review of PR #651: feat(models): Phase 2 — ollama backend
This PR is an outstanding implementation of the ModelBackend interface. It is exceptionally clean, robust, and sets a perfect, highly-modular precedent for the upcoming model backend integrations (OpenAI, Anthropic, Bedrock, etc.).
Below is a detailed breakdown of the review findings, highlighting several outstanding defensive coding practices and verified implementation details.
Key Architectural Strengths
-
Modular Component-Based File Layout:
- Pivoting from the initial plan (
resources/models/backends/ollama.ts) to a component-based layout (components/ollama/index.ts) is a brilliant architectural decision. It aligns perfectly with the pattern established in the MCP component (components/mcp/) and prevents the core codebase from bloating with provider-specific transport code.
- Pivoting from the initial plan (
-
Lazy-Loaded Module Singleton Facade:
- The way
harper.modelsis exposed viagetHarperExportsinsecurity/jsLoader.tsusing a lazily-instantiated singletonModelsfacade is extremely elegant. This sidesteps complex scope-ref-wiring issues without introducing any global state concerns, as the backend registry and analytics writer are already process-singletons.
- The way
-
Boot-Time Precedence:
- Running
bootstrapModels(config)incomponents/componentLoader.tsat the root-level config load ensures that models are registered before individual plugin modules are processed. This makesscope.modelssafely available from any app initialization hook (handleApplication(scope)).
- Running
Exceptional Defensive Coding Practices
This PR stands out for its high level of defensive engineering:
- Poison-Resistant Aggregates (
assignFiniteTokenCount):- Guarding the token count insertion by checking
Number.isFinite,value >= 0, andNumber.isIntegeris a fantastic precaution. Since community-pulled models run on Ollama can return erratic evaluation values, this prevents poisoning of downstream database metrics (e.g.SUM(prompt_tokens)) that could crash analytics pipelines.
- Guarding the token count insertion by checking
- JSON Parse Error Sanitization:
- Wrapping native JSON parse failures inside a standardized
OllamaBackendError(rather than letting the rawSyntaxErrorbubble up) is an excellent security measure. Raw syntax error messages often echo the offending payload bytes, which could leak upstream or private user prompt content into server logs.
- Wrapping native JSON parse failures inside a standardized
- Stream Memory-Growth Protection (
MAX_NDJSON_LINE_BYTES):- Enforcing a 1 MiB line-byte cap inside the NDJSON streaming parser (
readNdjson) protects the server from runaway memory growth or heap exhaustion in case of a pathological or malicious stream output from a model.
- Enforcing a 1 MiB line-byte cap inside the NDJSON streaming parser (
- Ambiguity-Free Timeout Boundaries:
- Validating
requestTimeoutMs: number.min(1)in the Joi schema (and rejecting0) is highly pragmatic. Since0would otherwise validate but get treated loosely as "no timeout" in downstream signal composition, this prevents misleading caller intent.
- Validating
Verification of Core Logic
- NDJSON Parsing and Flushing:
- Walked through the stream decoder loop inside
readNdjson. The chunk buffering, index slicing, and the final flush call (buf += decoder.decode()) are mathematically complete and handle trailing chunks correctly without leaving unparsed bytes on the wire.
- Walked through the stream decoder loop inside
- Nomic-Embed Prefix Application:
- The conditional embedding prefix logic for Nomic-style models is very clean. It respects model-specific requirements while safely leaving non-prefix models untouched.
- Capability Integrity:
- Setting
tools: falseandadapters: falseis a highly pragmatic and honest choice for V1 portability, given the uneven tool-use capabilities across Ollama's local model catalog.
- Setting
Conclusion
The PR has been fully reviewed and is pristine and production-ready. It has superb test coverage (72 new/extended unit and integration tests) and serves as an exceptional blueprint for the rest of the Phase 2 backends.
🤖 Posted by Antigravity on Nathan's behalf
Summary
Phase 2 of #510 — the first real
ModelBackendimplementation, talkingto Ollama's HTTP API. Validates the Phase 1 (#628 / PR #638) interface
against a non-trivial real provider without external CI dependencies.
Closes #629
Tracking: #510
Note
Stacked on
feat/models-foundation(PR #638). Merge target moves tomainonce Phase 1 lands.What ships
components/ollama/index.ts—OllamaBackend implements ModelBackend(
embed,generate,generateStream) over nativefetch, plusregisterOllamaBackend(...)for the boot bridge to call.resources/models/bootstrap.ts—bootstrapModels(rootConfig)readsmodels.{embedding,generative}.<name>entries and dispatches througha
{ ollama: registerOllamaBackend }factory map. Unknown backendslog at error level and skip.
validation/configValidator.ts— Joi schema for the top-levelmodels:block..unknown(false)on the entry schema so typos blockboot instead of silently dropping into the registry as no-ops.
components/componentLoader.ts—bootstrapModels(config)runs atroot-config load time, before per-component iteration, so
scope.models.embed(...)works fromhandleApplication(scope).security/jsLoader.ts— addsmodelstogetHarperExportsso usercode can call
harper.models.embed(...)from anywhere a componentruns.
static/defaultConfig.yaml— intentionally unchanged. Backendsare opt-in via block presence (matches
replication:and themcp:block from PR [MCP] foundation: component scaffold + config schema + boot gating #649).
File-layout pivot from the issue body
Issue #629 proposed
resources/models/backends/ollama.ts(in-coredirectory). This PR ships the backend under
components/ollama/tomatch the convention Kyle established in #649 (
components/mcp/) —core imports a register function from the component path. This sets
the same precedent for #630–#633 (openai, gateway, anthropic, bedrock).
API decisions
tools: false,adapters: false. Ollama tool-call supportexists on some models but is uneven across the catalog; v1
portability stays honest by leaving the surface off.
/api/embed(not legacy/api/embeddings). Ollama deprecatedthe legacy endpoint; the new one accepts array input.
/api/generatefor string prompts,/api/chatfor messages.systemis prepended as the first message (Ollama chat has notop-level system field).
AbortSignalpropagation.opts.signalpasses through to fetch;when
requestTimeoutMsis configured, the two are combined viaAbortSignal.anyso the upstream HTTP closes on either caller abortor timeout.
inputType: 'document' | 'query'is honored for embedding modelsthat use prefixes (
nomic-embed-textv1.5+ takessearch_document:/
search_query:). Non-prefix models are untouched.is admin-trusted but the model it's running is community-pulled;
a bad eval-count poisoning
SUM(prompt_tokens)overhdb_model_callsis worth guarding against.
harper.models exposure (Phase 1 deferred decision A)
Picks
harper.modelsviagetHarperExports, the canonical Harperexport shape (
harper.Resource,harper.tables, etc.). Same paththe openai backend (#630), the
/v1/*gateway (#631), and the@embeddirective (#632) will plug into.A module-singleton
Modelsinstance is used injsLoader.tsratherthan reaching into Phase 1's per-Scope instance. The Models facade
has no per-Scope state (registry is module-scope, analytics writer is
a process-singleton), so the two are equivalent in behavior. The
singleton sidesteps wiring
Scopereferences throughgetHarperExports(which only seesApplicationScope) withouttouching Phase 1's existing per-Scope
new Models()incomponents/Scope.tswhile #638 is still in review.YAML→registry boot bridge (Phase 1 deferred decision B)
models:is a top-level config block.bootstrapModels(rootConfig)runs in
componentLoader.tsoncegetConfigObj()returns the mergedroot config, before per-component iteration. Dispatch via a hardcoded
factory map (
{ ollama: registerOllamaBackend }) — explicit, easy toread, idempotent. Joi defaults under
models.*don't propagate toenv.get(same finding Kyle hit onmcp.*in #649), so the bootstrapreads
getConfigObj()directly rather than the flat env map.The factory map will grow with #630 (openai), #633 (anthropic /
bedrock). A directive registry / self-registering shape is plausible
later; the explicit map is fine for v1.
Pre-PR verification cadence
Per the verifier cadence agreed for this series:
deep-reviewsingle-pass (three parallel agents —api,concurrency,config). Surfaced 11 findings; 7 applied asblockers in this PR, 1 deferred as Tier 5 (
AbortSignal.anyaccumulation under sustained load — speculative, defer until
profiling shows growth), 1 deferred forward-looking for [Models] Phase 3 — openai backend (+ toolMode: 'return') #630
(secrets-in-plaintext design).
in the local environment; deferring to the GitHub workflow's
cross-model review pass.
Files
components/ollama/index.tsOllamaBackend+registerOllamaBackendresources/models/bootstrap.tsbootstrapModels(rootConfig)validation/configValidator.tsmodels:Joi schema (with.unknown(false)typo guard)components/componentLoader.tsbootstrapModels(config)call at root-config loadsecurity/jsLoader.tsmodelsingetHarperExportsunitTests/components/ollama/index.test.jsfetchunitTests/resources/models/bootstrap.test.jsunitTests/validation/configValidator.test.jsmodels:blockintegrationTests/server/ollama-backend.test.tsAcceptance criteria
OllamaBackendimplementsModelBackendper Phase 1's interfacescope.models.embed()configured withbackend: ollamaproducesa vector (validated via integration test against local Ollama)
scope.models.generate()produces a completionscope.models.generateStream()yieldsGenerateChunkper thePhase 1 contract
backend: 'ollama', model, tokencounts, latency, success (via the Phase 1 writer — flows through
unchanged)
AbortSignalfromBackendOptscancels in-flight requeststools: false,adapters: falseinputType: 'document' | 'query'differentiates vectors onnomic-style models
auto-skip when unreachable — CI provisioning still TBD)
Test plan
npm run buildcleannpm run lint:requiredcleannomic-embed-text+llama3.2pulled —OLLAMA_HOST/OLLAMA_EMBED_MODEL/OLLAMA_GENERATE_MODELenvs honoredOut of scope
tools: falseper the issue body)adapters: false— FAB-503 territory via vLLM)AbortSignalpropagation (no explicitcancel endpoint on Ollama)
fetchfor v1🤖 Generated with Claude Code