Make whisper an optional extra with faster-whisper by default#1877
Make whisper an optional extra with faster-whisper by default#1877paul-nechifor merged 13 commits intodevfrom
Conversation
|
TTS seems to work pretty well with faster-whisper anyway. |
Greptile SummaryThis PR makes
Confidence Score: 4/5Safe to merge after fixing the misleading install hint in the ImportError message. The ImportError raised when neither backend is found tells users to run dimos/stream/audio/stt/node_whisper.py — specifically the ImportError message at the bottom of the fallback try/except block. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Module import] --> B{import whisper}
B -- success --> C[_USE_FASTER_WHISPER = False]
B -- ImportError --> D{import faster_whisper.WhisperModel}
D -- success --> E[_USE_FASTER_WHISPER = True\nlog warning]
D -- ImportError --> F[raise ImportError\n'No whisper backend found']
C --> G[WhisperNode.__init__]
E --> G
G --> H{_USE_FASTER_WHISPER?}
H -- True --> I[WhisperModel\ndevice=auto\ncompute_type from fp16]
H -- False --> J[whisper.load_model]
I --> K[emit_text → transcribe\njoin segments]
J --> L[emit_text → transcribe\nresult text strip]
|
<div align="center"><img width="1000" alt="banner_bordered_trimmed" src="https://github.com/user-attachments/assets/64f13b39-da06-4f58-add0-cfc44f04db4e" /></div> <div align="center">The Agentive Operating System for Physical Space</div> <div align="center"><a href="https://discord.gg/dimos">https://discord.gg/dimos</a></div> # Release v0.0.12: memory2 streaming engine, async modules, OpenArm + G1 low-level, MuJoCo manipulation, slimmer install ## TL;DR memory2 lands as a typed streaming observation engine. Modules can now be `async def`. Three new robots: OpenArm bimanual, G1 whole-body low-level, Go2 over Unitree SDK2. Manipulation now runs in MuJoCo. The base install drops several hundred MB: perception, sim, and whisper are now opt-in extras. ## Highlights 108 commits, 11 contributors, 972 files changed. ##⚠️ Breaking Changes Most breakers are import / config renames. The install shrinkage requires opting into extras you used to get for free. - Perception (and torch, bitsandbytes) removed from base install — `pip install 'dimos[perception]'` to opt in. ([#1888](#1888)) - Sim removed from base install (~550 MB) — `pip install 'dimos[sim]'`. ([#1878](#1878)) - `faster-whisper` is the default STT — `pip install 'dimos[whisper]'` for full Whisper. ([#1877](#1877)) - `Blueprint.build` removed — use `ModuleCoordinator.build(blueprint)` and import from `dimos.core.coordination`. ([#1744](#1744)) - Module `Config` must be a `pydantic.BaseModel`; module `__init__` signature standardized. ([#1510](#1510)) - `__init__.py` re-exports removed — import directly from defining modules. ([#1545](#1545)) - Blueprint aliases removed — use `MyModule.blueprint()` instead of `my_module()`. ([#1606](#1606)) - Old `Agent` class removed — now agent communication is just through MCP. ([#1657](#1657)) - Teleop blueprints regrouped under `teleop_*`; `VisualizingTeleopModule` removed. ([#1602](#1602)) - Manipulation: joint names use `/` (`arm/joint1`); `WorldStateMonitor` → `RobotStateMonitor`; `_hardware.py` removed in favor of `RobotConfig`. ([#1728](#1728), [#1725](#1725)) - `use_mesh_obstacles` default flipped to `False`; `ObjectDB` no longer matches by name. ([#1656](#1656)) ## 🚀 Upgrade ```bash pip install -U 'dimos[perception,sim,whisper]' ``` Drop extras you don't use. ## ✨ New Features ### memory2 Typed streaming observation engine. memory2 replaces per-blueprint perception buffers with first-class streams that record, query, and visualize together. - Stream/Observation primitives, SQLite + R*Tree + vec0 backends, codecs, live channels. ([#1536](#1536)) by [@leshy](https://github.com/leshy) - Recorder/Query modules, semantic search, 3D detection projections, ~10× smaller replay files. ([#1769](#1769)) by [@leshy](https://github.com/leshy) - StreamModules + Go2 auto-recorder + visualization scaffolding. ([#1682](#1682), [#1925](#1925), [#1637](#1637)) by [@leshy](https://github.com/leshy) ### Async modules & agent API Modules can now be `async def`. New surface area for scripting and tool-calling agents. - Async modules: `async def` handlers/RPCs, async `@rpc`, `self.spawn`, latest-only dispatch. ([#1920](#1920)) by [@paul-nechifor](https://github.com/paul-nechifor) - Porcelain Python API: `connect()` to script against a running DimOS. ([#1779](#1779)) by [@paul-nechifor](https://github.com/paul-nechifor) - `app.peek_stream(name, timeout)` for one-shot stream samples. ([#1909](#1909)) by [@paul-nechifor](https://github.com/paul-nechifor) - MCP tool streams: tools push progress back to agents during a single tool call. ([#1713](#1713)) by [@paul-nechifor](https://github.com/paul-nechifor) - Modules can be restarted at runtime; multiple blueprints can start after startup. ([#1755](#1755), [#1744](#1744)) by [@paul-nechifor](https://github.com/paul-nechifor) - New patrolling module + `unitree-go2-security` agentic patrol blueprint; patrol rewritten as an async module. ([#1488](#1488), [#1619](#1619), [#1939](#1939)) by [@paul-nechifor](https://github.com/paul-nechifor) - Blueprint config via CLI `-o key=value`, `__` env vars, and `--config=foo.json`; `--help` lists options. ([#1543](#1543)) by [@Dreamsorcerer](https://github.com/Dreamsorcerer) ### Robot support - OpenArm bimanual: from-scratch CAN driver, adapter, blueprints, mock + real planner. ([#1897](#1897)) by [@mustafab0](https://github.com/mustafab0) - G1 humanoid: 500 Hz whole-body low-level coordinator + `unitree-g1-coordinator` blueprint. ([#1954](#1954)) by [@mustafab0](https://github.com/mustafab0) - Go2 over Unitree SDK2 (`dimos[unitree-dds]`) with Nix-based cyclonedds setup. ([#1885](#1885)) by [@ruthwikdasyam](https://github.com/ruthwikdasyam) - Go2 rage mode (~2.5 m/s) over WebRTC + dedicated keyboard-teleop blueprint. ([#1903](#1903)) by [@ruthwikdasyam](https://github.com/ruthwikdasyam) - Go2 WebRTC TwistAdapter integrated with ControlCoordinator. ([#1362](#1362)) by [@mustafab0](https://github.com/mustafab0) - `dimos go2tool discover` / `connect-wifi`: find Go2s on LAN or over Bluetooth and configure Wi-Fi without the vendor app. ([#1990](#1990)) by [@leshy](https://github.com/leshy) - New Hong Kong office Go2 replay datasets. ([#1991](#1991)) by [@leshy](https://github.com/leshy) ### Manipulation in MuJoCo Any manipulation blueprint can now run in sim with `--simulation`. ```bash dimos --simulation run coordinator-xarm7 ``` - Manipulation in MuJoCo: `--simulation` spins up a sim arm; coordinator and teleop blueprints (xArm6/xArm7/Piper) honor the flag, replacing per-arm sim blueprints. ([#1639](#1639), [#2027](#2027)) by [@ruthwikdasyam](https://github.com/ruthwikdasyam) - Quest teleop in MuJoCo for xArm6 + Piper, with eye-in-hand sim cameras. ([#1958](#1958)) by [@ruthwikdasyam](https://github.com/ruthwikdasyam) - Sim assets for xArm6 and Piper + `MujocoCamera` (drop-in RealSense replacement in sim). ([#1642](#1642), [#1694](#1694)) by [@ruthwikdasyam](https://github.com/ruthwikdasyam) - Unified `RobotConfig`: parse joints/DOF/limits from URDF/MJCF instead of hand-wiring. ([#1699](#1699)) by [@mustafab0](https://github.com/mustafab0) - Drake loader supports MJCF; configured home pose + EE orientation honored when planning. ([#1722](#1722)) by [@mustafab0](https://github.com/mustafab0) - Manipulation demo: `look`/`drop_on` skills, distance-adaptive grasps, structured agent prompt. ([#1656](#1656)) by [@mustafab0](https://github.com/mustafab0) - Control blueprints split into a package; hardcoded IPs replaced with env vars (`XARM7_IP`, …). ([#1601](#1601)) by [@ruthwikdasyam](https://github.com/ruthwikdasyam) - Unity simulator as a DimOS module (`dimos run unity-sim`); auto-downloads on Linux x86. ([#1539](#1539)) by [@jeff-hykin](https://github.com/jeff-hykin) ### Core & visualizer - Rust native modules: write performance-critical modules in Rust (LCM transport, `NativeModule` API). ([#1794](#1794)) by [@aclauer](https://github.com/aclauer) - Watchdog kills all DimOS child processes (and grandchildren) when the parent exits. ([#1886](#1886)) by [@paul-nechifor](https://github.com/paul-nechifor) - DockerModules restored: parallel deploy, image pull, build args, rebuild on Dockerfile change. ([#1431](#1431)) by [@jeff-hykin](https://github.com/jeff-hykin) - Voxel maps render as Rerun Points3D spheres (~10× faster at high point counts). ([#1793](#1793)) by [@ruthwikdasyam](https://github.com/ruthwikdasyam) - Rerun pipeline latency: PointCloud2 ~350 ms → ~5 ms; costmap ~40 ms → ~5 ms. ([#1747](#1747)) by [@ruthwikdasyam](https://github.com/ruthwikdasyam) - Blueprint module dependency graph auto-rendered as a Graph tab in Rerun. ([#1705](#1705)) by [@ruthwikdasyam](https://github.com/ruthwikdasyam) - Viewer remote `--connect` works again — clicks/teleop flow over websockets. ([#1784](#1784)) by [@jeff-hykin](https://github.com/jeff-hykin) - `dtop` shows native child-process CPU; `dtop --log` records metrics to a memory2 SQLite store for offline plotting. ([#1880](#1880), [#2004](#2004)) by [@aclauer](https://github.com/aclauer) ## 🐛 Bug Fixes - macOS UDP receive buffer probed up to 32 MiB. ([#1789](#1789)) by [@Dreamsorcerer](https://github.com/Dreamsorcerer) - `--disable` works again; module-by-name lookup fixed. ([#1707](#1707), [#1689](#1689)) by [@paul-nechifor](https://github.com/paul-nechifor) - `CameraModule.stop()` reachable via RPC; doc imports use canonical paths. ([#1773](#1773)) by [@jeff-hykin](https://github.com/jeff-hykin) - Python 3.10 compat (`typing_extensions.Self`); skip `pyrealsense2` on macOS; fix `nix develop` LCM build on macOS. ([#1621](#1621), [#1556](#1556), [#1610](#1610)) by [@jeff-hykin](https://github.com/jeff-hykin) - Rerun grid raised above the costmap; viewer bumped to 0.30.0a6. ([#1714](#1714), [#1690](#1690), [#1785](#1785)) by [@ruthwikdasyam](https://github.com/ruthwikdasyam) - Go2 lidar timestamps repaired: non-monotonic frames clamped to the expected period; older firmware that never updates timestamps falls back to system time after a calibration window. ([#1992](#1992), [#2021](#2021)) by [@leshy](https://github.com/leshy), [@aclauer](https://github.com/aclauer) - Replay memory leak fixed. ([#2025](#2025)) by [@leshy](https://github.com/leshy) ## ⚡ Performance - `dimos --help` ~5 s → ~2 s by trimming heavy imports and deferring `o3dpickle`. ([#1571](#1571), [#1721](#1721)) by [@jeff-hykin](https://github.com/jeff-hykin), [@Dreamsorcerer](https://github.com/Dreamsorcerer) ## 🔒 Security - MCP/Foxglove/GStreamer bind to localhost by default; set `MCP_HOST=0.0.0.0` to expose. ([#1698](#1698)) by [@vrinek](https://github.com/vrinek) - Update vulnerable dependencies flagged by Dependabot. ([#1989](#1989)) by [@paul-nechifor](https://github.com/paul-nechifor) ## 👥 New Contributors - [@aclauer](https://github.com/aclauer) made his first contribution in [#1794](#1794) - [@vrinek](https://github.com/vrinek) made his first contribution in [#1698](#1698) **Full Changelog**: v0.0.11...v0.0.12
Problem
Whisper requires downloading a 150MB model and depends on torch (with GBs of CUDA downloads).
Solution
Provide faster-whisper by default (2MB) and use as a fallback when whisper is not available.
This avoids the 150MB download, and means we are one step closer to not depending on torch for a base install.
Breaking Changes
Users need to request
dimos[whisper]now for full whisper feature.Test