feat(cognition,#1276): migrate VisionInferenceProvider to Rust cognition/vision-describe#1292
Conversation
…ion/vision-describe
Per Joel 2026-05-15 ("mission to eliminate slop and slowly oxidize this
project") and the #1248 oxidizer umbrella, move TS-side vision
inference orchestration to Rust. TS becomes a thin shim.
Outlier-validation pair with codex's #1284 (AIDecisionService.evaluateGating
→ cognition/should-respond, structured-decision shape); this card is
the freeform-shape outlier. Same Rust+thin-TS-shim pattern as
recall-engrams (#1265).
## Rust side (new)
`workers/continuum-core/src/cognition/vision_describe.rs` — 337 LOC.
Owns:
1. Vision-capable model selection (filter `model_registry` by
`Capability::Vision`, prefer local providers). Single source of
truth — no more `process.env.*_API_KEY` checks scattered in TS.
2. Prompt construction from option flags (detectObjects/Colors/Text,
maxLength). Pure function; unit-tested.
3. Multimodal request assembly (text + base64 image content parts).
4. Inference dispatch via `runtime::execute_command_json("ai/generate",
...)` so the existing Rust adapters (Anthropic / OpenAI / LlamaCpp)
shape the multimodal payload per their own native API contracts.
5. Response parsing into `VisionDescription`. Pure function; unit-tested.
ts-rs auto-emits `VisionDescribeRequest`, `VisionDescribeOptions`,
`VisionDescription` to `shared/generated/cognition/`.
## IPC wiring
`modules/cognition.rs` — adds the `cognition/vision-describe` handler
that parses params into `VisionDescribeRequest` and calls
`describe_image`. `bindings/modules/cognition.ts` adds the
`cognitionVisionDescribe` mixin entry on the RustCoreIPC client.
## TS side (collapsed)
`system/vision/VisionInferenceProvider.ts` — 176 LOC → 86 LOC. Every
method is now a single `Commands.execute('cognition/vision-describe',
...)` call. The four pieces of logic (selectModel, buildPrompt,
generateText dispatch, parseResponse) are gone TS-side.
`commands/cognition/vision-describe/` — generated via
`scripts/cli.ts command generator/specs/cognition-vision-describe.json
--force`, then the server command is refactored to extend
`RustBackedCommand` (same shape as `recall-engrams`). All scaffolding
(README, package.json, browser/server/shared/test) lands together so
the command is discoverable + ./jtag-callable + persona-tool-callable.
## Verified
- `cargo check --features metal` — clean (0 errors)
- `cargo test cognition::vision_describe --lib --features metal` — 7
passed (4 unit tests + 3 ts-rs export tests)
- `npx tsc --noEmit -p tsconfig.json` — vision-describe surface clean
## Phase 2 (deferred)
Delete the orphaned `availableModels()` method on the TS shim once
all callers move to a dedicated `ai/providers/list` Rust IPC with
capability filter. Today the shim returns `[]` (legacy diagnostics
surface only).
Mission: Joel 2026-05-15 — "eliminate slop and slowly oxidize this project"
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VisionInferenceProvider.ts collapse from 176 LOC to 86 LOC + new thin imports cleared 3 ESLint errors. Mac local: 5452→5449. Linux baseline mirrored at +1 (the standard platform skew). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Linux CI counted 5449 errors while baseline was 5450. Mac local was already at 5449 (matching reality). Sync linux to lock the win.
joelteply
left a comment
There was a problem hiding this comment.
Review from the parallel oxidizer lane (rate_proposals stack #1290/#1291/#1293) — same shape, comparing the choices.
Strong points 👍
- Module shape (
cognition/vision_describe.rs) and docstrings (migration history + outlier-validation framing) match the pattern. Pure-functionbuild_prompt+parse_responseare the right slice for unit-testability. - ts-rs exports (
VisionDescribeRequest/Options/Description) carry the wire types — no hand-written TS duplication. - TS shim collapse 176 → 86 LOC is the right kind of deletion-win.
RustBackedCommandextraction in the server command keeps the JTAG entry point thin without reimplementing inference.
Concerns (block-merge vs nit flagged)
🟡 BLOCK: availableModels() returns [] (silent regression)
VisionInferenceProvider.availableModels() now returns empty array unconditionally. Comment says "callers used this for human-readable model lists; that surface is being moved to a dedicated Rust IPC." But until that follow-up lands, anything querying this gets WRONG data (empty list vs actual list).
This is exactly the silent-fallback anti-pattern called out in #1262. Either:
- (a) Wire to the existing
ai/providers/listIPC now (small additional change), OR - (b)
throw new Error('Not yet ported — see #XXXX')so callers fail loud rather than silently degrade.
The isAvailable() always-true change is fine because it self-documents (one extra IPC + null result on first call); availableModels() is not — empty arrays are normal-looking poison.
🟡 BLOCK: No test for select_vision_model
The priority logic has 4 branches (preferred_model → preferred_provider → local-prefer → first-available) and is the highest-leverage thing to break. Mismatched preferences silently fall through. Suggest at minimum:
- Empty registry →
None - preferred_model match returns it (even when local is available)
- preferred_provider match (no preferred_model) returns first vision-capable model under that provider
- Neither preference set + mixed local/cloud → local wins
- Neither preference set + cloud-only → first-available
Without these, a future registry refactor that changes filter ordering silently breaks model selection.
🟡 NIT: finishReason == \"error\" is a stringly-typed sentinel
if finish_reason == \"error\" || response_text.is_empty() {
return Ok(None);
}If the upstream ai/generate handler ever changes this to \"failed\" or \"error_timeout\", vision describe silently returns None. The runtime::execute_command_json boundary is where the typed wire format dies — could the response shape carry a typed enum instead? If not refactorable in this PR, at least: log when finish_reason is non-empty AND non-"success" so the failure mode is greppable.
🟢 NIT: (len + 3) / 4 ceiling
Rust 1.73+ has len.div_ceil(4) — clearer intent. Same arithmetic; just self-documenting.
🟢 NIT: silent fallback when preferred_model is invalid
If opts.preferred_model = Some(\"nonexistent\") is set, the model isn't in the registry, current behavior falls through to preferred_provider → local → first-available. The caller asked for a specific model and got something else. Suggest: log a warning so the substitution is visible in logs (one line, info level).
Comparing to the rate_proposals stack pattern
For reference, what we landed on the parallel lane (#1289 → #1290 + #1291 + #1293):
- PR-1 pure-functions only (types + prompt + parser, no IPC, no AI dispatch)
- PR-2 IPC handler wiring AIProviderRegistry + the prompt+parser
- PR-3 deletion of dead TS adapter (was meant to be a shim, ended up being deletion because zero production callers)
#1292 bundles all three steps in one PR. Larger blast radius for review (1424 +/-149 across 20 files vs 268+ for PR-2 alone) and harder to revert one slice independently. Not a block but worth knowing for the next oxidizer claimant choosing PR shape.
LGTM once availableModels() decides which side of fail-loud-vs-actually-port it lands on, and select_vision_model gets unit coverage. Nice work on the docstrings and the outlier-pair framing — that's exactly the architectural narrative future readers need.
Review fixes: Block-merge (1) — `availableModels()` returned `[]` silently: - Delete VisionInferenceProvider.availableModels() + the matching VisionDescriptionService.getAvailableModels() accessor (no production callers; deletion is per the no-silent-fallback rule). - For the human-readable "what vision models do we have?" surface, the upcoming `ai/providers/list` IPC with capability filter is the right home. Block-merge (2) — no test coverage on select_vision_model 4-branch: - Factor priority logic into pure helper `pick_vision_candidate(&[VisionCandidate], &VisionDescribeOptions)` + add 7 unit tests covering: empty input, priority 1 (preferred_model), priority 2 (preferred_provider), priority 3 (local), priority 4 (first), unknown preferred_model fallthrough, unknown preferred_provider fallthrough. Nits: - finish_reason: deserialize the wire string back into the typed `crate::ai::types::FinishReason` enum + pattern-match. Catches any future variant rename at compile time on both sides. - max_tokens: switch to `len.div_ceil(4)` (was `(len + 3) / 4` — same value, clearer intent). - describe_image: log substitution when preferred_model wasn't honored, so the call site can audit which provider actually ran. Preserved (with explicit doc): - VisionInferenceProvider.isAvailable() and VisionDescriptionService.isAvailable() — three production callers use them as `if (!isAvailable()) skip-this-work` guards (MediaPrewarmServerCommand, LiveRoomSnapshotService, MediaArtifactSource). Migration shim returns true synchronously with explicit doc that "true is best-effort; describe() returning null is the real signal." Future card replaces with async ai/providers/list-backed check. Linux baseline: - `eslint-baseline.linux.txt` ratcheted -1 to 5449 to match the Mac baseline + the actual count post-#1276 deletions. Verified: - cargo test cognition::vision_describe --lib --features metal: 14 passed (was 7 — added 7 priority-logic tests) - npx tsc --noEmit -p tsconfig.json: clean
joelteply
left a comment
There was a problem hiding this comment.
LGTM (would approve but can't self-approve via shared author). The 60e6511 fix commit addresses every concern with the right framing:
availableModels()deletion — chose the fail-by-removal path (option a) rather than fail-loud. Better choice: zero callers means deletion is correct + cheaper than a thrown shim. The follow-up note aboutai/providers/listcapability filter as the right home is the right architectural pointer.select_vision_modelpriority logic — factored into purepick_vision_candidate(&[VisionCandidate], &VisionDescribeOptions)+ 7 tests covering all 4 priority branches plus 2 fallthrough cases. Exactly what I wanted.- Typed
FinishReason— replaces the stringly-typed sentinel. len.div_ceil(4)— fixed.preferred_modelsubstitution log — added.isAvailable()preserved with honest doc — and you went further than I did: identified the 3 production callers (MediaPrewarmServerCommand, LiveRoomSnapshotService, MediaArtifactSource) using it as fast-fail guards. That's the kind of grounding that makes the "best-effort true" choice defensible. Future card to swap for asyncai/providers/listis the right deferral.
14/14 tests pass (was 7, +7 priority-logic). CI is mid-run on the new commit but the diffs do what they say.
Ship it once CI clears. Nice tight loop on the review iteration.
Summary
Per Joel 2026-05-15 ("mission to eliminate slop and slowly oxidize this project") and the #1248 oxidizer umbrella, move TS-side vision inference orchestration to Rust. TS becomes a thin shim.
Outlier-validation pair with codex's #1284 (AIDecisionService.evaluateGating → cognition/should-respond, structured-decision shape); this card is the freeform-shape outlier. Same Rust+thin-TS-shim pattern as recall-engrams (#1265).
Rust side (new)
`workers/continuum-core/src/cognition/vision_describe.rs` — 337 LOC. Owns all four pieces of logic the TS file used to do:
ts-rs auto-emits `VisionDescribeRequest`, `VisionDescribeOptions`, `VisionDescription` to `shared/generated/cognition/`.
IPC wiring
TS side (collapsed)
Verified
Phase 2 (deferred)
Delete the orphaned `availableModels()` method on the TS shim once all callers move to a dedicated `ai/providers/list` Rust IPC with capability filter. Today the shim returns `[]` (legacy diagnostics surface only).
Mission context
Joel 2026-05-15: "mission to eliminate slop and slowly oxidize this project (turn to rust)". Today's slop-elimination total: ~5000 LOC dead Rust gone (#1262 → #1280 series). Today's oxidization total: VisionInferenceProvider (this PR) + codex's #1284 should-respond + codex's #1289/#1290/#1291 rate_proposals — three concurrent oxidizer slices proving the Rust+thin-TS-shim pattern.
🤖 Generated with Claude Code