feat(cognition,#1276): migrate VisionInferenceProvider to Rust cognition/vision-describe by joelteply · Pull Request #1292 · CambrianTech/continuum

joelteply · 2026-05-15T20:01:04Z

Summary

Per Joel 2026-05-15 ("mission to eliminate slop and slowly oxidize this project") and the #1248 oxidizer umbrella, move TS-side vision inference orchestration to Rust. TS becomes a thin shim.

Outlier-validation pair with codex's #1284 (AIDecisionService.evaluateGating → cognition/should-respond, structured-decision shape); this card is the freeform-shape outlier. Same Rust+thin-TS-shim pattern as recall-engrams (#1265).

Rust side (new)

`workers/continuum-core/src/cognition/vision_describe.rs` — 337 LOC. Owns all four pieces of logic the TS file used to do:

Vision-capable model selection — filter `model_registry` by `Capability::Vision`, prefer local providers. Single source of truth — no more `process.env.*_API_KEY` checks scattered in TS.
Prompt construction from option flags (detectObjects/Colors/Text, maxLength). Pure function; unit-tested.
Multimodal request assembly (text + base64 image content parts).
Inference dispatch via `runtime::execute_command_json("ai/generate", ...)` so the existing Rust adapters (Anthropic / OpenAI / LlamaCpp) shape the multimodal payload per their own native API contracts.
Response parsing into `VisionDescription`. Pure function; unit-tested.

ts-rs auto-emits `VisionDescribeRequest`, `VisionDescribeOptions`, `VisionDescription` to `shared/generated/cognition/`.

IPC wiring

`modules/cognition.rs` — adds the `cognition/vision-describe` handler that parses params into `VisionDescribeRequest` and calls `describe_image`.
`bindings/modules/cognition.ts` — adds the `cognitionVisionDescribe` mixin entry on the RustCoreIPC client.

TS side (collapsed)

`system/vision/VisionInferenceProvider.ts` — 176 LOC → 86 LOC. Every method is now a single `Commands.execute('cognition/vision-describe', ...)` call. The four pieces of logic above are gone TS-side.
`commands/cognition/vision-describe/` — generated via `scripts/cli.ts command`, then the server command refactored to extend `RustBackedCommand` (same shape as `recall-engrams`). All scaffolding (README, package.json, browser/server/shared/test) lands together so the command is discoverable + ./jtag-callable + persona-tool-callable.

Verified

`cargo check --features metal` — clean (0 errors)
`cargo test cognition::vision_describe --lib --features metal` — 7 passed (4 unit tests + 3 ts-rs export tests)
`npx tsc --noEmit -p tsconfig.json` — vision-describe surface clean
ESLint baseline ratcheted -3 (5452 → 5449 mac / 5450 linux) from the file-collapse delta

Phase 2 (deferred)

Delete the orphaned `availableModels()` method on the TS shim once all callers move to a dedicated `ai/providers/list` Rust IPC with capability filter. Today the shim returns `[]` (legacy diagnostics surface only).

Mission context

Joel 2026-05-15: "mission to eliminate slop and slowly oxidize this project (turn to rust)". Today's slop-elimination total: ~5000 LOC dead Rust gone (#1262 → #1280 series). Today's oxidization total: VisionInferenceProvider (this PR) + codex's #1284 should-respond + codex's #1289/#1290/#1291 rate_proposals — three concurrent oxidizer slices proving the Rust+thin-TS-shim pattern.

🤖 Generated with Claude Code

…ion/vision-describe Per Joel 2026-05-15 ("mission to eliminate slop and slowly oxidize this project") and the #1248 oxidizer umbrella, move TS-side vision inference orchestration to Rust. TS becomes a thin shim. Outlier-validation pair with codex's #1284 (AIDecisionService.evaluateGating → cognition/should-respond, structured-decision shape); this card is the freeform-shape outlier. Same Rust+thin-TS-shim pattern as recall-engrams (#1265). ## Rust side (new) `workers/continuum-core/src/cognition/vision_describe.rs` — 337 LOC. Owns: 1. Vision-capable model selection (filter `model_registry` by `Capability::Vision`, prefer local providers). Single source of truth — no more `process.env.*_API_KEY` checks scattered in TS. 2. Prompt construction from option flags (detectObjects/Colors/Text, maxLength). Pure function; unit-tested. 3. Multimodal request assembly (text + base64 image content parts). 4. Inference dispatch via `runtime::execute_command_json("ai/generate", ...)` so the existing Rust adapters (Anthropic / OpenAI / LlamaCpp) shape the multimodal payload per their own native API contracts. 5. Response parsing into `VisionDescription`. Pure function; unit-tested. ts-rs auto-emits `VisionDescribeRequest`, `VisionDescribeOptions`, `VisionDescription` to `shared/generated/cognition/`. ## IPC wiring `modules/cognition.rs` — adds the `cognition/vision-describe` handler that parses params into `VisionDescribeRequest` and calls `describe_image`. `bindings/modules/cognition.ts` adds the `cognitionVisionDescribe` mixin entry on the RustCoreIPC client. ## TS side (collapsed) `system/vision/VisionInferenceProvider.ts` — 176 LOC → 86 LOC. Every method is now a single `Commands.execute('cognition/vision-describe', ...)` call. The four pieces of logic (selectModel, buildPrompt, generateText dispatch, parseResponse) are gone TS-side. `commands/cognition/vision-describe/` — generated via `scripts/cli.ts command generator/specs/cognition-vision-describe.json --force`, then the server command is refactored to extend `RustBackedCommand` (same shape as `recall-engrams`). All scaffolding (README, package.json, browser/server/shared/test) lands together so the command is discoverable + ./jtag-callable + persona-tool-callable. ## Verified - `cargo check --features metal` — clean (0 errors) - `cargo test cognition::vision_describe --lib --features metal` — 7 passed (4 unit tests + 3 ts-rs export tests) - `npx tsc --noEmit -p tsconfig.json` — vision-describe surface clean ## Phase 2 (deferred) Delete the orphaned `availableModels()` method on the TS shim once all callers move to a dedicated `ai/providers/list` Rust IPC with capability filter. Today the shim returns `[]` (legacy diagnostics surface only). Mission: Joel 2026-05-15 — "eliminate slop and slowly oxidize this project" Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

VisionInferenceProvider.ts collapse from 176 LOC to 86 LOC + new thin imports cleared 3 ESLint errors. Mac local: 5452→5449. Linux baseline mirrored at +1 (the standard platform skew). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Linux CI counted 5449 errors while baseline was 5450. Mac local was already at 5449 (matching reality). Sync linux to lock the win.

joelteply

Review from the parallel oxidizer lane (rate_proposals stack #1290/#1291/#1293) — same shape, comparing the choices.

Strong points 👍

Module shape (cognition/vision_describe.rs) and docstrings (migration history + outlier-validation framing) match the pattern. Pure-function build_prompt + parse_response are the right slice for unit-testability.
ts-rs exports (VisionDescribeRequest/Options/Description) carry the wire types — no hand-written TS duplication.
TS shim collapse 176 → 86 LOC is the right kind of deletion-win.
RustBackedCommand extraction in the server command keeps the JTAG entry point thin without reimplementing inference.

Concerns (block-merge vs nit flagged)

🟡 BLOCK: `availableModels()` returns `[]` (silent regression)

VisionInferenceProvider.availableModels() now returns empty array unconditionally. Comment says "callers used this for human-readable model lists; that surface is being moved to a dedicated Rust IPC." But until that follow-up lands, anything querying this gets WRONG data (empty list vs actual list).

This is exactly the silent-fallback anti-pattern called out in #1262. Either:

(a) Wire to the existing ai/providers/list IPC now (small additional change), OR
(b) throw new Error('Not yet ported — see #XXXX') so callers fail loud rather than silently degrade.

The isAvailable() always-true change is fine because it self-documents (one extra IPC + null result on first call); availableModels() is not — empty arrays are normal-looking poison.

🟡 BLOCK: No test for `select_vision_model`

The priority logic has 4 branches (preferred_model → preferred_provider → local-prefer → first-available) and is the highest-leverage thing to break. Mismatched preferences silently fall through. Suggest at minimum:

Empty registry → None
preferred_model match returns it (even when local is available)
preferred_provider match (no preferred_model) returns first vision-capable model under that provider
Neither preference set + mixed local/cloud → local wins
Neither preference set + cloud-only → first-available

Without these, a future registry refactor that changes filter ordering silently breaks model selection.

🟡 NIT: `finishReason == \"error\"` is a stringly-typed sentinel

if finish_reason == \"error\" || response_text.is_empty() {
    return Ok(None);
}

If the upstream ai/generate handler ever changes this to \"failed\" or \"error_timeout\", vision describe silently returns None. The runtime::execute_command_json boundary is where the typed wire format dies — could the response shape carry a typed enum instead? If not refactorable in this PR, at least: log when finish_reason is non-empty AND non-"success" so the failure mode is greppable.

🟢 NIT: `(len + 3) / 4` ceiling

Rust 1.73+ has len.div_ceil(4) — clearer intent. Same arithmetic; just self-documenting.

🟢 NIT: silent fallback when preferred_model is invalid

If opts.preferred_model = Some(\"nonexistent\") is set, the model isn't in the registry, current behavior falls through to preferred_provider → local → first-available. The caller asked for a specific model and got something else. Suggest: log a warning so the substitution is visible in logs (one line, info level).

Comparing to the rate_proposals stack pattern

For reference, what we landed on the parallel lane (#1289 → #1290 + #1291 + #1293):

PR-1 pure-functions only (types + prompt + parser, no IPC, no AI dispatch)
PR-2 IPC handler wiring AIProviderRegistry + the prompt+parser
PR-3 deletion of dead TS adapter (was meant to be a shim, ended up being deletion because zero production callers)

#1292 bundles all three steps in one PR. Larger blast radius for review (1424 +/-149 across 20 files vs 268+ for PR-2 alone) and harder to revert one slice independently. Not a block but worth knowing for the next oxidizer claimant choosing PR shape.

LGTM once availableModels() decides which side of fail-loud-vs-actually-port it lands on, and select_vision_model gets unit coverage. Nice work on the docstrings and the outlier-pair framing — that's exactly the architectural narrative future readers need.

Review fixes: Block-merge (1) — `availableModels()` returned `[]` silently: - Delete VisionInferenceProvider.availableModels() + the matching VisionDescriptionService.getAvailableModels() accessor (no production callers; deletion is per the no-silent-fallback rule). - For the human-readable "what vision models do we have?" surface, the upcoming `ai/providers/list` IPC with capability filter is the right home. Block-merge (2) — no test coverage on select_vision_model 4-branch: - Factor priority logic into pure helper `pick_vision_candidate(&[VisionCandidate], &VisionDescribeOptions)` + add 7 unit tests covering: empty input, priority 1 (preferred_model), priority 2 (preferred_provider), priority 3 (local), priority 4 (first), unknown preferred_model fallthrough, unknown preferred_provider fallthrough. Nits: - finish_reason: deserialize the wire string back into the typed `crate::ai::types::FinishReason` enum + pattern-match. Catches any future variant rename at compile time on both sides. - max_tokens: switch to `len.div_ceil(4)` (was `(len + 3) / 4` — same value, clearer intent). - describe_image: log substitution when preferred_model wasn't honored, so the call site can audit which provider actually ran. Preserved (with explicit doc): - VisionInferenceProvider.isAvailable() and VisionDescriptionService.isAvailable() — three production callers use them as `if (!isAvailable()) skip-this-work` guards (MediaPrewarmServerCommand, LiveRoomSnapshotService, MediaArtifactSource). Migration shim returns true synchronously with explicit doc that "true is best-effort; describe() returning null is the real signal." Future card replaces with async ai/providers/list-backed check. Linux baseline: - `eslint-baseline.linux.txt` ratcheted -1 to 5449 to match the Mac baseline + the actual count post-#1276 deletions. Verified: - cargo test cognition::vision_describe --lib --features metal: 14 passed (was 7 — added 7 priority-logic tests) - npx tsc --noEmit -p tsconfig.json: clean

joelteply

LGTM (would approve but can't self-approve via shared author). The 60e6511 fix commit addresses every concern with the right framing:

availableModels() deletion — chose the fail-by-removal path (option a) rather than fail-loud. Better choice: zero callers means deletion is correct + cheaper than a thrown shim. The follow-up note about ai/providers/list capability filter as the right home is the right architectural pointer.
select_vision_model priority logic — factored into pure pick_vision_candidate(&[VisionCandidate], &VisionDescribeOptions) + 7 tests covering all 4 priority branches plus 2 fallthrough cases. Exactly what I wanted.
Typed FinishReason — replaces the stringly-typed sentinel.
len.div_ceil(4) — fixed.
preferred_model substitution log — added.
isAvailable() preserved with honest doc — and you went further than I did: identified the 3 production callers (MediaPrewarmServerCommand, LiveRoomSnapshotService, MediaArtifactSource) using it as fast-fail guards. That's the kind of grounding that makes the "best-effort true" choice defensible. Future card to swap for async ai/providers/list is the right deferral.

14/14 tests pass (was 7, +7 priority-logic). CI is mid-run on the new commit but the diffs do what they say.

Ship it once CI clears. Nice tight loop on the review iteration.

Test and others added 2 commits May 15, 2026 14:52

github-actions Bot added the size: XL label May 15, 2026

chore(eslint-baseline): linux ratchet -1 to match #1276 Mac baseline

d8adffc

Linux CI counted 5449 errors while baseline was 5450. Mac local was already at 5449 (matching reality). Sync linux to lock the win.

joelteply commented May 16, 2026

View reviewed changes

chore(#1292): merge canary after cognition oxidizers

8ebc292

joelteply merged commit 07dd1a8 into canary May 16, 2026
4 checks passed

joelteply deleted the oxidizer/vision-describe-rust-1276 branch May 16, 2026 04:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(cognition,#1276): migrate VisionInferenceProvider to Rust cognition/vision-describe#1292

feat(cognition,#1276): migrate VisionInferenceProvider to Rust cognition/vision-describe#1292
joelteply merged 5 commits into
canaryfrom
oxidizer/vision-describe-rust-1276

joelteply commented May 15, 2026

Uh oh!

joelteply left a comment

Uh oh!

joelteply left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joelteply commented May 15, 2026

Summary

Rust side (new)

IPC wiring

TS side (collapsed)

Verified

Phase 2 (deferred)

Mission context

Uh oh!

joelteply left a comment

Choose a reason for hiding this comment

Strong points 👍

Concerns (block-merge vs nit flagged)

🟡 BLOCK: availableModels() returns [] (silent regression)

🟡 BLOCK: No test for select_vision_model

🟡 NIT: finishReason == \"error\" is a stringly-typed sentinel

🟢 NIT: (len + 3) / 4 ceiling

🟢 NIT: silent fallback when preferred_model is invalid

Comparing to the rate_proposals stack pattern

Uh oh!

joelteply left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

🟡 BLOCK: `availableModels()` returns `[]` (silent regression)

🟡 BLOCK: No test for `select_vision_model`

🟡 NIT: `finishReason == \"error\"` is a stringly-typed sentinel

🟢 NIT: `(len + 3) / 4` ceiling