Skip to content

feat(cognition,#1276): migrate VisionInferenceProvider to Rust cognition/vision-describe#1292

Merged
joelteply merged 5 commits into
canaryfrom
oxidizer/vision-describe-rust-1276
May 16, 2026
Merged

feat(cognition,#1276): migrate VisionInferenceProvider to Rust cognition/vision-describe#1292
joelteply merged 5 commits into
canaryfrom
oxidizer/vision-describe-rust-1276

Conversation

@joelteply
Copy link
Copy Markdown
Contributor

Summary

Per Joel 2026-05-15 ("mission to eliminate slop and slowly oxidize this project") and the #1248 oxidizer umbrella, move TS-side vision inference orchestration to Rust. TS becomes a thin shim.

Outlier-validation pair with codex's #1284 (AIDecisionService.evaluateGating → cognition/should-respond, structured-decision shape); this card is the freeform-shape outlier. Same Rust+thin-TS-shim pattern as recall-engrams (#1265).

Rust side (new)

`workers/continuum-core/src/cognition/vision_describe.rs` — 337 LOC. Owns all four pieces of logic the TS file used to do:

  1. Vision-capable model selection — filter `model_registry` by `Capability::Vision`, prefer local providers. Single source of truth — no more `process.env.*_API_KEY` checks scattered in TS.
  2. Prompt construction from option flags (detectObjects/Colors/Text, maxLength). Pure function; unit-tested.
  3. Multimodal request assembly (text + base64 image content parts).
  4. Inference dispatch via `runtime::execute_command_json("ai/generate", ...)` so the existing Rust adapters (Anthropic / OpenAI / LlamaCpp) shape the multimodal payload per their own native API contracts.
  5. Response parsing into `VisionDescription`. Pure function; unit-tested.

ts-rs auto-emits `VisionDescribeRequest`, `VisionDescribeOptions`, `VisionDescription` to `shared/generated/cognition/`.

IPC wiring

  • `modules/cognition.rs` — adds the `cognition/vision-describe` handler that parses params into `VisionDescribeRequest` and calls `describe_image`.
  • `bindings/modules/cognition.ts` — adds the `cognitionVisionDescribe` mixin entry on the RustCoreIPC client.

TS side (collapsed)

  • `system/vision/VisionInferenceProvider.ts` — 176 LOC → 86 LOC. Every method is now a single `Commands.execute('cognition/vision-describe', ...)` call. The four pieces of logic above are gone TS-side.
  • `commands/cognition/vision-describe/` — generated via `scripts/cli.ts command`, then the server command refactored to extend `RustBackedCommand` (same shape as `recall-engrams`). All scaffolding (README, package.json, browser/server/shared/test) lands together so the command is discoverable + ./jtag-callable + persona-tool-callable.

Verified

  • `cargo check --features metal` — clean (0 errors)
  • `cargo test cognition::vision_describe --lib --features metal` — 7 passed (4 unit tests + 3 ts-rs export tests)
  • `npx tsc --noEmit -p tsconfig.json` — vision-describe surface clean
  • ESLint baseline ratcheted -3 (5452 → 5449 mac / 5450 linux) from the file-collapse delta

Phase 2 (deferred)

Delete the orphaned `availableModels()` method on the TS shim once all callers move to a dedicated `ai/providers/list` Rust IPC with capability filter. Today the shim returns `[]` (legacy diagnostics surface only).

Mission context

Joel 2026-05-15: "mission to eliminate slop and slowly oxidize this project (turn to rust)". Today's slop-elimination total: ~5000 LOC dead Rust gone (#1262#1280 series). Today's oxidization total: VisionInferenceProvider (this PR) + codex's #1284 should-respond + codex's #1289/#1290/#1291 rate_proposals — three concurrent oxidizer slices proving the Rust+thin-TS-shim pattern.

🤖 Generated with Claude Code

Test and others added 2 commits May 15, 2026 14:52
…ion/vision-describe

Per Joel 2026-05-15 ("mission to eliminate slop and slowly oxidize this
project") and the #1248 oxidizer umbrella, move TS-side vision
inference orchestration to Rust. TS becomes a thin shim.

Outlier-validation pair with codex's #1284 (AIDecisionService.evaluateGating
→ cognition/should-respond, structured-decision shape); this card is
the freeform-shape outlier. Same Rust+thin-TS-shim pattern as
recall-engrams (#1265).

## Rust side (new)

`workers/continuum-core/src/cognition/vision_describe.rs` — 337 LOC.
Owns:
1. Vision-capable model selection (filter `model_registry` by
   `Capability::Vision`, prefer local providers). Single source of
   truth — no more `process.env.*_API_KEY` checks scattered in TS.
2. Prompt construction from option flags (detectObjects/Colors/Text,
   maxLength). Pure function; unit-tested.
3. Multimodal request assembly (text + base64 image content parts).
4. Inference dispatch via `runtime::execute_command_json("ai/generate",
   ...)` so the existing Rust adapters (Anthropic / OpenAI / LlamaCpp)
   shape the multimodal payload per their own native API contracts.
5. Response parsing into `VisionDescription`. Pure function; unit-tested.

ts-rs auto-emits `VisionDescribeRequest`, `VisionDescribeOptions`,
`VisionDescription` to `shared/generated/cognition/`.

## IPC wiring

`modules/cognition.rs` — adds the `cognition/vision-describe` handler
that parses params into `VisionDescribeRequest` and calls
`describe_image`. `bindings/modules/cognition.ts` adds the
`cognitionVisionDescribe` mixin entry on the RustCoreIPC client.

## TS side (collapsed)

`system/vision/VisionInferenceProvider.ts` — 176 LOC → 86 LOC. Every
method is now a single `Commands.execute('cognition/vision-describe',
...)` call. The four pieces of logic (selectModel, buildPrompt,
generateText dispatch, parseResponse) are gone TS-side.

`commands/cognition/vision-describe/` — generated via
`scripts/cli.ts command generator/specs/cognition-vision-describe.json
--force`, then the server command is refactored to extend
`RustBackedCommand` (same shape as `recall-engrams`). All scaffolding
(README, package.json, browser/server/shared/test) lands together so
the command is discoverable + ./jtag-callable + persona-tool-callable.

## Verified

- `cargo check --features metal` — clean (0 errors)
- `cargo test cognition::vision_describe --lib --features metal` — 7
  passed (4 unit tests + 3 ts-rs export tests)
- `npx tsc --noEmit -p tsconfig.json` — vision-describe surface clean

## Phase 2 (deferred)

Delete the orphaned `availableModels()` method on the TS shim once
all callers move to a dedicated `ai/providers/list` Rust IPC with
capability filter. Today the shim returns `[]` (legacy diagnostics
surface only).

Mission: Joel 2026-05-15 — "eliminate slop and slowly oxidize this project"

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VisionInferenceProvider.ts collapse from 176 LOC to 86 LOC + new thin
imports cleared 3 ESLint errors. Mac local: 5452→5449. Linux baseline
mirrored at +1 (the standard platform skew).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Linux CI counted 5449 errors while baseline was 5450. Mac local was
already at 5449 (matching reality). Sync linux to lock the win.
Copy link
Copy Markdown
Contributor Author

@joelteply joelteply left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review from the parallel oxidizer lane (rate_proposals stack #1290/#1291/#1293) — same shape, comparing the choices.

Strong points 👍

  • Module shape (cognition/vision_describe.rs) and docstrings (migration history + outlier-validation framing) match the pattern. Pure-function build_prompt + parse_response are the right slice for unit-testability.
  • ts-rs exports (VisionDescribeRequest/Options/Description) carry the wire types — no hand-written TS duplication.
  • TS shim collapse 176 → 86 LOC is the right kind of deletion-win.
  • RustBackedCommand extraction in the server command keeps the JTAG entry point thin without reimplementing inference.

Concerns (block-merge vs nit flagged)

🟡 BLOCK: availableModels() returns [] (silent regression)

VisionInferenceProvider.availableModels() now returns empty array unconditionally. Comment says "callers used this for human-readable model lists; that surface is being moved to a dedicated Rust IPC." But until that follow-up lands, anything querying this gets WRONG data (empty list vs actual list).

This is exactly the silent-fallback anti-pattern called out in #1262. Either:

  • (a) Wire to the existing ai/providers/list IPC now (small additional change), OR
  • (b) throw new Error('Not yet ported — see #XXXX') so callers fail loud rather than silently degrade.

The isAvailable() always-true change is fine because it self-documents (one extra IPC + null result on first call); availableModels() is not — empty arrays are normal-looking poison.

🟡 BLOCK: No test for select_vision_model

The priority logic has 4 branches (preferred_model → preferred_provider → local-prefer → first-available) and is the highest-leverage thing to break. Mismatched preferences silently fall through. Suggest at minimum:

  • Empty registry → None
  • preferred_model match returns it (even when local is available)
  • preferred_provider match (no preferred_model) returns first vision-capable model under that provider
  • Neither preference set + mixed local/cloud → local wins
  • Neither preference set + cloud-only → first-available

Without these, a future registry refactor that changes filter ordering silently breaks model selection.

🟡 NIT: finishReason == \"error\" is a stringly-typed sentinel

if finish_reason == \"error\" || response_text.is_empty() {
    return Ok(None);
}

If the upstream ai/generate handler ever changes this to \"failed\" or \"error_timeout\", vision describe silently returns None. The runtime::execute_command_json boundary is where the typed wire format dies — could the response shape carry a typed enum instead? If not refactorable in this PR, at least: log when finish_reason is non-empty AND non-"success" so the failure mode is greppable.

🟢 NIT: (len + 3) / 4 ceiling

Rust 1.73+ has len.div_ceil(4) — clearer intent. Same arithmetic; just self-documenting.

🟢 NIT: silent fallback when preferred_model is invalid

If opts.preferred_model = Some(\"nonexistent\") is set, the model isn't in the registry, current behavior falls through to preferred_provider → local → first-available. The caller asked for a specific model and got something else. Suggest: log a warning so the substitution is visible in logs (one line, info level).

Comparing to the rate_proposals stack pattern

For reference, what we landed on the parallel lane (#1289#1290 + #1291 + #1293):

  • PR-1 pure-functions only (types + prompt + parser, no IPC, no AI dispatch)
  • PR-2 IPC handler wiring AIProviderRegistry + the prompt+parser
  • PR-3 deletion of dead TS adapter (was meant to be a shim, ended up being deletion because zero production callers)

#1292 bundles all three steps in one PR. Larger blast radius for review (1424 +/-149 across 20 files vs 268+ for PR-2 alone) and harder to revert one slice independently. Not a block but worth knowing for the next oxidizer claimant choosing PR shape.

LGTM once availableModels() decides which side of fail-loud-vs-actually-port it lands on, and select_vision_model gets unit coverage. Nice work on the docstrings and the outlier-pair framing — that's exactly the architectural narrative future readers need.

Review fixes:

Block-merge (1) — `availableModels()` returned `[]` silently:
- Delete VisionInferenceProvider.availableModels() + the matching
  VisionDescriptionService.getAvailableModels() accessor (no
  production callers; deletion is per the no-silent-fallback rule).
- For the human-readable "what vision models do we have?" surface,
  the upcoming `ai/providers/list` IPC with capability filter is the
  right home.

Block-merge (2) — no test coverage on select_vision_model 4-branch:
- Factor priority logic into pure helper
  `pick_vision_candidate(&[VisionCandidate], &VisionDescribeOptions)`
  + add 7 unit tests covering: empty input, priority 1
  (preferred_model), priority 2 (preferred_provider), priority 3
  (local), priority 4 (first), unknown preferred_model fallthrough,
  unknown preferred_provider fallthrough.

Nits:
- finish_reason: deserialize the wire string back into the typed
  `crate::ai::types::FinishReason` enum + pattern-match. Catches
  any future variant rename at compile time on both sides.
- max_tokens: switch to `len.div_ceil(4)` (was `(len + 3) / 4` —
  same value, clearer intent).
- describe_image: log substitution when preferred_model wasn't
  honored, so the call site can audit which provider actually ran.

Preserved (with explicit doc):
- VisionInferenceProvider.isAvailable() and
  VisionDescriptionService.isAvailable() — three production callers
  use them as `if (!isAvailable()) skip-this-work` guards
  (MediaPrewarmServerCommand, LiveRoomSnapshotService,
  MediaArtifactSource). Migration shim returns true synchronously
  with explicit doc that "true is best-effort; describe() returning
  null is the real signal." Future card replaces with async
  ai/providers/list-backed check.

Linux baseline:
- `eslint-baseline.linux.txt` ratcheted -1 to 5449 to match the Mac
  baseline + the actual count post-#1276 deletions.

Verified:
- cargo test cognition::vision_describe --lib --features metal:
  14 passed (was 7 — added 7 priority-logic tests)
- npx tsc --noEmit -p tsconfig.json: clean
Copy link
Copy Markdown
Contributor Author

@joelteply joelteply left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (would approve but can't self-approve via shared author). The 60e6511 fix commit addresses every concern with the right framing:

  • availableModels() deletion — chose the fail-by-removal path (option a) rather than fail-loud. Better choice: zero callers means deletion is correct + cheaper than a thrown shim. The follow-up note about ai/providers/list capability filter as the right home is the right architectural pointer.
  • select_vision_model priority logic — factored into pure pick_vision_candidate(&[VisionCandidate], &VisionDescribeOptions) + 7 tests covering all 4 priority branches plus 2 fallthrough cases. Exactly what I wanted.
  • Typed FinishReason — replaces the stringly-typed sentinel.
  • len.div_ceil(4) — fixed.
  • preferred_model substitution log — added.
  • isAvailable() preserved with honest doc — and you went further than I did: identified the 3 production callers (MediaPrewarmServerCommand, LiveRoomSnapshotService, MediaArtifactSource) using it as fast-fail guards. That's the kind of grounding that makes the "best-effort true" choice defensible. Future card to swap for async ai/providers/list is the right deferral.

14/14 tests pass (was 7, +7 priority-logic). CI is mid-run on the new commit but the diffs do what they say.

Ship it once CI clears. Nice tight loop on the review iteration.

@joelteply joelteply merged commit 07dd1a8 into canary May 16, 2026
4 checks passed
@joelteply joelteply deleted the oxidizer/vision-describe-rust-1276 branch May 16, 2026 04:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant