Skip to content

Make whisper an optional extra with faster-whisper by default#1877

Merged
paul-nechifor merged 13 commits intodevfrom
sam/move-whisper
May 6, 2026
Merged

Make whisper an optional extra with faster-whisper by default#1877
paul-nechifor merged 13 commits intodevfrom
sam/move-whisper

Conversation

@Dreamsorcerer
Copy link
Copy Markdown
Collaborator

@Dreamsorcerer Dreamsorcerer commented Apr 17, 2026

Problem

Whisper requires downloading a 150MB model and depends on torch (with GBs of CUDA downloads).

Solution

Provide faster-whisper by default (2MB) and use as a fallback when whisper is not available.
This avoids the 150MB download, and means we are one step closer to not depending on torch for a base install.

Breaking Changes

Users need to request dimos[whisper] now for full whisper feature.

Test

python -c "
from dimos.stream.audio.pipelines import stt
node = stt()
node.emit_text().subscribe(on_next=lambda t: print(f"USER: {t}"))
from dimos.stream.audio.utils import keepalive
keepalive()
"

@Dreamsorcerer Dreamsorcerer marked this pull request as ready for review April 17, 2026 15:54
@Dreamsorcerer
Copy link
Copy Markdown
Collaborator Author

TTS seems to work pretty well with faster-whisper anyway.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 17, 2026

Greptile Summary

This PR makes openai-whisper optional by introducing faster-whisper as the default backend (bundled in the agents extra), while moving openai-whisper to the dev extra under a new dimos[whisper] install path.

  • node_whisper.py now has a two-stage import try/except at module level: prefers openai-whisper if present, falls back to faster-whisper, and raises a clear ImportError if neither is found.
  • pyproject.toml replaces openai-whisper with faster-whisper>=1.0.0 in the agents extra and adds openai-whisper to dev; faster_whisper is also added to the mypy ignore list.

Confidence Score: 4/5

Safe to merge after fixing the misleading install hint in the ImportError message.

The ImportError raised when neither backend is found tells users to run pip install dimos[whisper], but that extra does not exist in pyproject.tomlopenai-whisper lives only in dev. Any user who hits this error and follows the hint will get a pip failure instead of a working install. The rest of the change — backend selection logic, compute_type mapping, segment joining, and dependency pinning — looks correct.

dimos/stream/audio/stt/node_whisper.py — specifically the ImportError message at the bottom of the fallback try/except block.

Important Files Changed

Filename Overview
dimos/stream/audio/stt/node_whisper.py Added module-level try/except for backend selection; transcription path split per backend. Contains a misleading install hint in the ImportError message and a deprecated logger.warn call.
pyproject.toml Swaps openai-whisper for faster-whisper>=1.0.0 in the agents extra and adds openai-whisper to dev; no whisper extra is defined for end users who want the full backend.
uv.lock Lock file updated to include faster-whisper 1.2.1 and its ctranslate2 dependency; openai-whisper moved to dev group. Generated file, no logic to review.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Module import] --> B{import whisper}
    B -- success --> C[_USE_FASTER_WHISPER = False]
    B -- ImportError --> D{import faster_whisper.WhisperModel}
    D -- success --> E[_USE_FASTER_WHISPER = True\nlog warning]
    D -- ImportError --> F[raise ImportError\n'No whisper backend found']

    C --> G[WhisperNode.__init__]
    E --> G

    G --> H{_USE_FASTER_WHISPER?}
    H -- True --> I[WhisperModel\ndevice=auto\ncompute_type from fp16]
    H -- False --> J[whisper.load_model]

    I --> K[emit_text → transcribe\njoin segments]
    J --> L[emit_text → transcribe\nresult text strip]
Loading

Comments Outside Diff (1)

  1. dimos/stream/audio/stt/node_whisper.py, line 43-47 (link)

    P1 The error message tells users to run pip install dimos[whisper], but no whisper extra exists in pyproject.tomlopenai-whisper was moved to dev only. Running that command will fail with "invalid extra". The hint should direct users to install openai-whisper directly or via dimos[dev].

Reviews (2): Last reviewed commit: "Merge branch 'dev' into sam/move-whisper" | Re-trigger Greptile

Comment thread dimos/stream/audio/stt/node_whisper.py Outdated
Comment thread dimos/stream/audio/stt/node_whisper.py
Comment thread pyproject.toml Outdated
Comment thread pyproject.toml Outdated
Comment thread pyproject.toml Outdated
Comment thread dimos/stream/audio/stt/node_whisper.py Outdated
Comment thread dimos/stream/audio/stt/node_whisper.py Outdated
@paul-nechifor paul-nechifor enabled auto-merge (squash) May 6, 2026 19:16
@paul-nechifor paul-nechifor merged commit 3b2622a into dev May 6, 2026
4 checks passed
@paul-nechifor paul-nechifor deleted the sam/move-whisper branch May 6, 2026 19:16
paul-nechifor added a commit that referenced this pull request May 9, 2026
<div align="center"><img width="1000" alt="banner_bordered_trimmed" src="https://github.com/user-attachments/assets/64f13b39-da06-4f58-add0-cfc44f04db4e" /></div>

<div align="center">The Agentive Operating System for Physical Space</div>

<div align="center"><a href="https://discord.gg/dimos">https://discord.gg/dimos</a></div>

# Release v0.0.12: memory2 streaming engine, async modules, OpenArm + G1 low-level, MuJoCo manipulation, slimmer install

## TL;DR

memory2 lands as a typed streaming observation engine. Modules can now be `async def`. Three new robots: OpenArm bimanual, G1 whole-body low-level, Go2 over Unitree SDK2. Manipulation now runs in MuJoCo. The base install drops several hundred MB: perception, sim, and whisper are now opt-in extras.

## Highlights

108 commits, 11 contributors, 972 files changed.

## ⚠️ Breaking Changes

Most breakers are import / config renames. The install shrinkage requires opting into extras you used to get for free.

- Perception (and torch, bitsandbytes) removed from base install — `pip install 'dimos[perception]'` to opt in. ([#1888](#1888))
- Sim removed from base install (~550 MB) — `pip install 'dimos[sim]'`. ([#1878](#1878))
- `faster-whisper` is the default STT — `pip install 'dimos[whisper]'` for full Whisper. ([#1877](#1877))
- `Blueprint.build` removed — use `ModuleCoordinator.build(blueprint)` and import from `dimos.core.coordination`. ([#1744](#1744))
- Module `Config` must be a `pydantic.BaseModel`; module `__init__` signature standardized. ([#1510](#1510))
- `__init__.py` re-exports removed — import directly from defining modules. ([#1545](#1545))
- Blueprint aliases removed — use `MyModule.blueprint()` instead of `my_module()`. ([#1606](#1606))
- Old `Agent` class removed — now agent communication is just through MCP. ([#1657](#1657))
- Teleop blueprints regrouped under `teleop_*`; `VisualizingTeleopModule` removed. ([#1602](#1602))
- Manipulation: joint names use `/` (`arm/joint1`); `WorldStateMonitor` → `RobotStateMonitor`; `_hardware.py` removed in favor of `RobotConfig`. ([#1728](#1728), [#1725](#1725))
- `use_mesh_obstacles` default flipped to `False`; `ObjectDB` no longer matches by name. ([#1656](#1656))

## 🚀 Upgrade

```bash
pip install -U 'dimos[perception,sim,whisper]'
```

Drop extras you don't use.

## ✨ New Features

### memory2

Typed streaming observation engine. memory2 replaces per-blueprint perception buffers with first-class streams that record, query, and visualize together.

- Stream/Observation primitives, SQLite + R*Tree + vec0 backends, codecs, live channels. ([#1536](#1536)) by [@leshy](https://github.com/leshy)
- Recorder/Query modules, semantic search, 3D detection projections, ~10× smaller replay files. ([#1769](#1769)) by [@leshy](https://github.com/leshy)
- StreamModules + Go2 auto-recorder + visualization scaffolding. ([#1682](#1682), [#1925](#1925), [#1637](#1637)) by [@leshy](https://github.com/leshy)

### Async modules & agent API

Modules can now be `async def`. New surface area for scripting and tool-calling agents.

- Async modules: `async def` handlers/RPCs, async `@rpc`, `self.spawn`, latest-only dispatch. ([#1920](#1920)) by [@paul-nechifor](https://github.com/paul-nechifor)
- Porcelain Python API: `connect()` to script against a running DimOS. ([#1779](#1779)) by [@paul-nechifor](https://github.com/paul-nechifor)
- `app.peek_stream(name, timeout)` for one-shot stream samples. ([#1909](#1909)) by [@paul-nechifor](https://github.com/paul-nechifor)
- MCP tool streams: tools push progress back to agents during a single tool call. ([#1713](#1713)) by [@paul-nechifor](https://github.com/paul-nechifor)
- Modules can be restarted at runtime; multiple blueprints can start after startup. ([#1755](#1755), [#1744](#1744)) by [@paul-nechifor](https://github.com/paul-nechifor)
- New patrolling module + `unitree-go2-security` agentic patrol blueprint; patrol rewritten as an async module. ([#1488](#1488), [#1619](#1619), [#1939](#1939)) by [@paul-nechifor](https://github.com/paul-nechifor)
- Blueprint config via CLI `-o key=value`, `__` env vars, and `--config=foo.json`; `--help` lists options. ([#1543](#1543)) by [@Dreamsorcerer](https://github.com/Dreamsorcerer)

### Robot support

- OpenArm bimanual: from-scratch CAN driver, adapter, blueprints, mock + real planner. ([#1897](#1897)) by [@mustafab0](https://github.com/mustafab0)
- G1 humanoid: 500 Hz whole-body low-level coordinator + `unitree-g1-coordinator` blueprint. ([#1954](#1954)) by [@mustafab0](https://github.com/mustafab0)
- Go2 over Unitree SDK2 (`dimos[unitree-dds]`) with Nix-based cyclonedds setup. ([#1885](#1885)) by [@ruthwikdasyam](https://github.com/ruthwikdasyam)
- Go2 rage mode (~2.5 m/s) over WebRTC + dedicated keyboard-teleop blueprint. ([#1903](#1903)) by [@ruthwikdasyam](https://github.com/ruthwikdasyam)
- Go2 WebRTC TwistAdapter integrated with ControlCoordinator. ([#1362](#1362)) by [@mustafab0](https://github.com/mustafab0)
- `dimos go2tool discover` / `connect-wifi`: find Go2s on LAN or over Bluetooth and configure Wi-Fi without the vendor app. ([#1990](#1990)) by [@leshy](https://github.com/leshy)
- New Hong Kong office Go2 replay datasets. ([#1991](#1991)) by [@leshy](https://github.com/leshy)

### Manipulation in MuJoCo

Any manipulation blueprint can now run in sim with `--simulation`.

```bash
dimos --simulation run coordinator-xarm7
```

- Manipulation in MuJoCo: `--simulation` spins up a sim arm; coordinator and teleop blueprints (xArm6/xArm7/Piper) honor the flag, replacing per-arm sim blueprints. ([#1639](#1639), [#2027](#2027)) by [@ruthwikdasyam](https://github.com/ruthwikdasyam)
- Quest teleop in MuJoCo for xArm6 + Piper, with eye-in-hand sim cameras. ([#1958](#1958)) by [@ruthwikdasyam](https://github.com/ruthwikdasyam)
- Sim assets for xArm6 and Piper + `MujocoCamera` (drop-in RealSense replacement in sim). ([#1642](#1642), [#1694](#1694)) by [@ruthwikdasyam](https://github.com/ruthwikdasyam)
- Unified `RobotConfig`: parse joints/DOF/limits from URDF/MJCF instead of hand-wiring. ([#1699](#1699)) by [@mustafab0](https://github.com/mustafab0)
- Drake loader supports MJCF; configured home pose + EE orientation honored when planning. ([#1722](#1722)) by [@mustafab0](https://github.com/mustafab0)
- Manipulation demo: `look`/`drop_on` skills, distance-adaptive grasps, structured agent prompt. ([#1656](#1656)) by [@mustafab0](https://github.com/mustafab0)
- Control blueprints split into a package; hardcoded IPs replaced with env vars (`XARM7_IP`, …). ([#1601](#1601)) by [@ruthwikdasyam](https://github.com/ruthwikdasyam)
- Unity simulator as a DimOS module (`dimos run unity-sim`); auto-downloads on Linux x86. ([#1539](#1539)) by [@jeff-hykin](https://github.com/jeff-hykin)

### Core & visualizer

- Rust native modules: write performance-critical modules in Rust (LCM transport, `NativeModule` API). ([#1794](#1794)) by [@aclauer](https://github.com/aclauer)
- Watchdog kills all DimOS child processes (and grandchildren) when the parent exits. ([#1886](#1886)) by [@paul-nechifor](https://github.com/paul-nechifor)
- DockerModules restored: parallel deploy, image pull, build args, rebuild on Dockerfile change. ([#1431](#1431)) by [@jeff-hykin](https://github.com/jeff-hykin)
- Voxel maps render as Rerun Points3D spheres (~10× faster at high point counts). ([#1793](#1793)) by [@ruthwikdasyam](https://github.com/ruthwikdasyam)
- Rerun pipeline latency: PointCloud2 ~350 ms → ~5 ms; costmap ~40 ms → ~5 ms. ([#1747](#1747)) by [@ruthwikdasyam](https://github.com/ruthwikdasyam)
- Blueprint module dependency graph auto-rendered as a Graph tab in Rerun. ([#1705](#1705)) by [@ruthwikdasyam](https://github.com/ruthwikdasyam)
- Viewer remote `--connect` works again — clicks/teleop flow over websockets. ([#1784](#1784)) by [@jeff-hykin](https://github.com/jeff-hykin)
- `dtop` shows native child-process CPU; `dtop --log` records metrics to a memory2 SQLite store for offline plotting. ([#1880](#1880), [#2004](#2004)) by [@aclauer](https://github.com/aclauer)

## 🐛 Bug Fixes

- macOS UDP receive buffer probed up to 32 MiB. ([#1789](#1789)) by [@Dreamsorcerer](https://github.com/Dreamsorcerer)
- `--disable` works again; module-by-name lookup fixed. ([#1707](#1707), [#1689](#1689)) by [@paul-nechifor](https://github.com/paul-nechifor)
- `CameraModule.stop()` reachable via RPC; doc imports use canonical paths. ([#1773](#1773)) by [@jeff-hykin](https://github.com/jeff-hykin)
- Python 3.10 compat (`typing_extensions.Self`); skip `pyrealsense2` on macOS; fix `nix develop` LCM build on macOS. ([#1621](#1621), [#1556](#1556), [#1610](#1610)) by [@jeff-hykin](https://github.com/jeff-hykin)
- Rerun grid raised above the costmap; viewer bumped to 0.30.0a6. ([#1714](#1714), [#1690](#1690), [#1785](#1785)) by [@ruthwikdasyam](https://github.com/ruthwikdasyam)
- Go2 lidar timestamps repaired: non-monotonic frames clamped to the expected period; older firmware that never updates timestamps falls back to system time after a calibration window. ([#1992](#1992), [#2021](#2021)) by [@leshy](https://github.com/leshy), [@aclauer](https://github.com/aclauer)
- Replay memory leak fixed. ([#2025](#2025)) by [@leshy](https://github.com/leshy)

## ⚡ Performance

- `dimos --help` ~5 s → ~2 s by trimming heavy imports and deferring `o3dpickle`. ([#1571](#1571), [#1721](#1721)) by [@jeff-hykin](https://github.com/jeff-hykin), [@Dreamsorcerer](https://github.com/Dreamsorcerer)

## 🔒 Security

- MCP/Foxglove/GStreamer bind to localhost by default; set `MCP_HOST=0.0.0.0` to expose. ([#1698](#1698)) by [@vrinek](https://github.com/vrinek)
- Update vulnerable dependencies flagged by Dependabot. ([#1989](#1989)) by [@paul-nechifor](https://github.com/paul-nechifor)

## 👥 New Contributors

- [@aclauer](https://github.com/aclauer) made his first contribution in [#1794](#1794)
- [@vrinek](https://github.com/vrinek) made his first contribution in [#1698](#1698)

**Full Changelog**: v0.0.11...v0.0.12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants