Skip to content

Low-latency sensory pipeline — sub-100ms vision + real-time audio for personas #652

@joelteply

Description

@joelteply

Summary

Personas need to see, hear, and speak at human-conversational speed. This is NOT about adding modalities (see #649, #650) — it's about making them FAST enough for real-time interaction.

Latency Budgets

Sense Target Current Gap
Vision (scene understanding) <100ms Bridge via VisionDescriptionService (slow) Need native or distilled model
Audio input (hearing) <50ms STT bridge (200-500ms) Need streaming encoder
Audio output (speech) <150ms first-byte TTS bridge (500ms+) Need streaming vocoder
Touch/interaction events <16ms Already fast (DOM events) OK

Architecture Requirements

  • ALL processing off main thread (AudioWorklet, Web Workers, Rust workers)
  • Streaming — don't wait for complete input. Process chunks as they arrive
  • Transferable buffers — zero-copy between threads
  • Adaptive quality — degrade gracefully (lower resolution, skip frames) rather than block
  • Local inference only — can't hit an API for real-time sensory processing

Vision Pipeline (target: <100ms)

  1. Frame capture (requestAnimationFrame or Intersection Observer) → <1ms
  2. Resize/crop to model input size → <5ms (Web Worker)
  3. Run distilled vision model (Qwen3.5-0.8B or MobileCLIP) → <80ms (Rust worker)
  4. Inject description into persona context → <5ms

Audio Pipeline (target: <50ms input, <150ms output)

Input:

  1. AudioWorklet captures PCM chunks → 0ms (runs on audio thread)
  2. Transfer to Rust worker via SharedArrayBuffer → <1ms
  3. Streaming Whisper encoder → <40ms per chunk
  4. Text injected to persona → <5ms

Output:

  1. LLM generates speech tokens → streaming
  2. Vocoder decodes to PCM → <100ms first chunk
  3. AudioWorklet plays back → <1ms queue

Key Principle

This is a hard real-time system. The render loop is sacred. Miss a frame budget and the experience breaks. This is where Rust workers earn their keep — JS cannot meet these latency targets.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions