Skip to content
/ jarvis Public

A Claude Voice Wrapper it Wrote Itself

Notifications You must be signed in to change notification settings

boj/jarvis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Jarvis

Voice-driven wrapper around Claude CLI sessions. Speak into your microphone, get responses read back to you.

Microphone → cpal capture → VAD → Whisper STT → Claude CLI (stream-json) → espeak-ng TTS → Speakers

Prerequisites

Quick Start

# Enter the development shell (pulls all dependencies)
nix-shell

# Download the Whisper speech-to-text model (~148MB, one-time)
./setup.sh

# Build and run
cargo run --release

Jarvis will initialize four components in sequence:

  1. Whisper model — loads models/ggml-base.en.bin into memory
  2. Claude session — spawns a persistent claude -p --stream-json process
  3. Microphone — opens your default input device via ALSA
  4. Voice detection — starts listening for speech

Once you see Ready! Speak into your microphone., just talk. Jarvis will:

  • Detect when you start and stop speaking (energy-based VAD)
  • Transcribe your speech with Whisper
  • Send the text to Claude
  • Speak Claude's response through your speakers with espeak-ng

Press Ctrl+C to quit.

Configuration

The VAD thresholds are set in src/main.rs:

vad::Vad::new(
    0.015, // RMS energy threshold (lower = more sensitive)
    700,   // Silence duration (ms) to end an utterance
    300,   // Minimum speech duration (ms) to count as valid
);

If Jarvis isn't picking up your voice, lower the threshold. If it triggers on background noise, raise it.

Project Structure

├── shell.nix                 # Nix dev environment
├── Cargo.toml                # Rust dependencies
├── setup.sh                  # Whisper model downloader
├── src/
│   ├── main.rs               # Event loop
│   ├── audio.rs              # Microphone capture (cpal/ALSA)
│   ├── vad.rs                # Voice activity detection
│   ├── stt.rs                # Speech-to-text (whisper-rs)
│   ├── tts.rs                # Text-to-speech (espeak-ng)
│   └── claude.rs             # Claude CLI integration (stream-json)
├── examples/
│   ├── test_claude.rs        # Standalone Claude integration test
│   └── test_pipeline.rs      # Full pipeline test without microphone
└── tests/
    └── integration_test.rs   # Whisper transcription accuracy test

Running Tests

nix-shell

# Unit/integration tests (requires the Whisper model)
cargo test

# Test just the Claude integration
cargo run --release --example test_claude

# Test the full pipeline (espeak-ng → Whisper → Claude → espeak-ng)
cargo run --release --example test_pipeline

Dependencies

Managed by shell.nix — no manual installation needed:

Component Crate / Tool Purpose
Audio capture cpal Cross-platform mic input via ALSA
Speech-to-text whisper-rs Local Whisper model (whisper.cpp bindings)
Text-to-speech espeak-ng Offline speech synthesis
Claude interface claude CLI Multi-turn conversations via stream-json
Build tooling cmake, gcc, libclang Compiling whisper.cpp and bindgen

Original Prompt

You are a team of senior software engineers who use Rust to build things.

Your goal is to build a system that wraps claude console sessions and allows the human to communicate with the session through their microphone. You use voice processing to convert the sound into text. You then speak back to the human through the speakers.

Research this as much as you need, and attempt to build a functional project with no input from the user.

You may use nix-shell to get access to tools and libraries if they are not available. Ideally, write a shell.nix file to load and build the project.

Work until you have tested the product and have a high level of confidence it functions.

f you can create a wrapper around interactive mode, similar to happy.engineering

About

A Claude Voice Wrapper it Wrote Itself

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •