Skip to content

Integrate Qwen3.5-0.8B as local VisionDescriptionService — real-time scene understanding at 1GB #480

@joelteply

Description

@joelteply

Discovery

Qwen3.5 0.8B does real-time video captioning on Mac: <1s per frame, ~1GB model, streaming descriptions as video plays. Understands scenes, not just detects objects.

Source: https://x.com/HuggingModels — running on Mac Studio M2 Ultra via MLX.

Impact on Continuum

This replaces our current VisionDescriptionService pipeline (YOLO + cloud vision) with a FULLY LOCAL solution:

  • 1GB model — fits on MacBook Air alongside everything else
  • <1s per frame — real-time, not batch
  • Scene understanding — "a hand is positioned in the foreground, palm facing the viewer" not just "hand detected"
  • Streaming — describes as video plays, not after

Use Cases

  1. Live call vision: Persona watches WebRTC video feed, describes to text-only personas in real-time
  2. Screenshot verification (Coding agent visual + runtime verification: screenshot + console errors + simulator/emulator testing #453): Describe rendered page in <1s for coding agent QA
  3. Game testing: Watch game play, describe frame-by-frame for automated evaluation
  4. Avatar QA: Detect vertex corruption automatically ("geometry appears shredded" vs "clean anime face")
  5. UI testing: "The button is in the top-right corner, the text says Submit" — automated visual assertions

Integration

Replace VisionDescriptionService's cloud path with local Qwen3.5-0.8B:

  • Load via Candle (GGUF) or MLX (on Mac)
  • Content-addressed cache still applies (don't re-describe identical frames)
  • Falls back to cloud vision if local model unavailable
  • 0.8B is small enough to be ALWAYS loaded — no paging needed

The Bigger Picture

With this model:

  • Every persona can SEE at zero cost
  • Visual QA becomes instant and free
  • The sensory pipeline is 100% local
  • MacBook Air becomes a fully-sighted system

This is what "local AI is getting unreasonably capable" means for us.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions