Voice-driven wrapper around Claude CLI sessions. Speak into your microphone, get responses read back to you.
Microphone → cpal capture → VAD → Whisper STT → Claude CLI (stream-json) → espeak-ng TTS → Speakers
- Nix package manager
- Claude CLI installed and authenticated
- A working microphone and speakers/headphones
# Enter the development shell (pulls all dependencies)
nix-shell
# Download the Whisper speech-to-text model (~148MB, one-time)
./setup.sh
# Build and run
cargo run --releaseJarvis will initialize four components in sequence:
- Whisper model — loads
models/ggml-base.en.bininto memory - Claude session — spawns a persistent
claude -p --stream-jsonprocess - Microphone — opens your default input device via ALSA
- Voice detection — starts listening for speech
Once you see Ready! Speak into your microphone., just talk. Jarvis will:
- Detect when you start and stop speaking (energy-based VAD)
- Transcribe your speech with Whisper
- Send the text to Claude
- Speak Claude's response through your speakers with espeak-ng
Press Ctrl+C to quit.
The VAD thresholds are set in src/main.rs:
vad::Vad::new(
0.015, // RMS energy threshold (lower = more sensitive)
700, // Silence duration (ms) to end an utterance
300, // Minimum speech duration (ms) to count as valid
);If Jarvis isn't picking up your voice, lower the threshold. If it triggers on background noise, raise it.
├── shell.nix # Nix dev environment
├── Cargo.toml # Rust dependencies
├── setup.sh # Whisper model downloader
├── src/
│ ├── main.rs # Event loop
│ ├── audio.rs # Microphone capture (cpal/ALSA)
│ ├── vad.rs # Voice activity detection
│ ├── stt.rs # Speech-to-text (whisper-rs)
│ ├── tts.rs # Text-to-speech (espeak-ng)
│ └── claude.rs # Claude CLI integration (stream-json)
├── examples/
│ ├── test_claude.rs # Standalone Claude integration test
│ └── test_pipeline.rs # Full pipeline test without microphone
└── tests/
└── integration_test.rs # Whisper transcription accuracy test
nix-shell
# Unit/integration tests (requires the Whisper model)
cargo test
# Test just the Claude integration
cargo run --release --example test_claude
# Test the full pipeline (espeak-ng → Whisper → Claude → espeak-ng)
cargo run --release --example test_pipelineManaged by shell.nix — no manual installation needed:
| Component | Crate / Tool | Purpose |
|---|---|---|
| Audio capture | cpal |
Cross-platform mic input via ALSA |
| Speech-to-text | whisper-rs |
Local Whisper model (whisper.cpp bindings) |
| Text-to-speech | espeak-ng |
Offline speech synthesis |
| Claude interface | claude CLI |
Multi-turn conversations via stream-json |
| Build tooling | cmake, gcc, libclang | Compiling whisper.cpp and bindgen |
You are a team of senior software engineers who use Rust to build things.
Your goal is to build a system that wraps claude console sessions and allows the human to communicate with the session through their microphone. You use voice processing to convert the sound into text. You then speak back to the human through the speakers.
Research this as much as you need, and attempt to build a functional project with no input from the user.
You may use nix-shell to get access to tools and libraries if they are not available. Ideally, write a shell.nix file to load and build the project.
Work until you have tested the product and have a high level of confidence it functions.
f you can create a wrapper around interactive mode, similar to happy.engineering