RealtimeVoice

ASR Benchmark Suite comparing NVIDIA Nemotron against Whisper and Deepgram in real-world voice pipelines.

Try It Now

Open in Google Colab - Run ASR benchmarks directly in your browser!

Benchmark Results

Model	Latency	vs Whisper	Type	Hardware
Nemotron 600M	43ms	21x faster	Local GPU	L40S
Deepgram Nova-2	272ms	3.4x faster	Cloud API	-
Whisper medium	916ms	baseline	Local	M-series

Key Findings

Nemotron is 21x faster than Whisper on GPU hardware
Nemotron is 6x faster than Deepgram cloud API (and runs locally)
Cold start adds ~380ms (first inference), warm runs are ~43ms
L40S GPU ($0.86/hr on Brev) is sufficient for Nemotron 600M

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                         RealtimeVoice Pipeline                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────┐    ┌──────────────┐    ┌──────────┐    ┌──────────┐          │
│  │          │    │              │    │          │    │          │          │
│  │  INPUT   │───▶│  ASR ENGINE  │───▶│   LLM    │───▶│   TTS    │          │
│  │  Audio   │    │  (Pluggable) │    │  (NIM)   │    │  Output  │          │
│  │          │    │              │    │          │    │          │          │
│  └──────────┘    └──────────────┘    └──────────┘    └──────────┘          │
│                         │                                                   │
│         ┌───────────────┼───────────────┐                                  │
│         ▼               ▼               ▼                                  │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐                           │
│  │  Nemotron  │  │  Whisper   │  │  Deepgram  │                           │
│  │   43ms     │  │   916ms    │  │   272ms    │                           │
│  │  (GPU)     │  │  (Local)   │  │  (Cloud)   │                           │
│  └────────────┘  └────────────┘  └────────────┘                           │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│  BENCHMARK HARNESS                                                          │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  Latency (ms)  │  RTF (Real-Time Factor)  │  WER (Word Error Rate) │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────┘

Quick Start

Local Setup (Mac/Linux - No GPU)

git clone https://github.com/QbitLoop/RealtimeVoice.git
cd RealtimeVoice
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# Configure API keys
cp .env.example .env
# Edit .env with your NVIDIA_API_KEY and DEEPGRAM_API_KEY

# Run comparison
python scripts/quick_compare.py

GPU Setup (Brev Cloud)

See docs/BREV_QUICKSTART.md for detailed instructions.

# Install Brev CLI
brew install brevdev/homebrew-brev/brev
brev login

# Create GPU instance
brev create nemotron-asr --gpu L40S

# SSH and setup
brev shell nemotron-asr

ASR Models

Model	Implementation	Requirements	Latency
Nemotron 600M	`src/asr/nemotron.py`	NVIDIA GPU + NeMo	43ms
Whisper medium	`src/asr/whisper.py`	mlx-whisper or faster-whisper	916ms
Deepgram Nova-2	`src/asr/deepgram.py`	API key ($200 free credit)	272ms

Project Structure

RealtimeVoice/
├── src/
│   ├── main.py              # Voice agent entry point
│   ├── config.py            # Configuration
│   ├── asr/                 # ASR implementations
│   │   ├── base.py          # Abstract interface
│   │   ├── nemotron.py      # NVIDIA Nemotron
│   │   ├── whisper.py       # OpenAI Whisper
│   │   └── deepgram.py      # Deepgram API
│   ├── llm/
│   │   └── nim.py           # NVIDIA NIM integration
│   ├── tts/
│   │   └── piper.py         # Piper TTS
│   └── benchmark/
│       ├── harness.py       # Benchmark runner
│       └── metrics.py       # Latency, WER, RTF
├── scripts/
│   ├── quick_compare.py     # Fast ASR comparison
│   ├── run_benchmark.py     # Full benchmark suite
│   └── brev_setup.sh        # GPU instance setup
├── docs/
│   └── BREV_QUICKSTART.md   # Step-by-step GPU guide
├── tests/
│   └── audio_samples/       # Test audio files
└── results/                 # Benchmark outputs

Commands

# Interactive voice agent
python src/main.py --asr whisper --tts simple

# Process audio file
python src/main.py --file tests/audio_samples/test1.wav

# Quick ASR comparison
python scripts/quick_compare.py

# Full benchmark suite
python scripts/run_benchmark.py --models whisper,deepgram,nemotron

API Keys

Service	URL	Purpose
NVIDIA NIM	build.nvidia.com	LLM inference
Deepgram	console.deepgram.com	Cloud ASR
HuggingFace	huggingface.co	Model downloads

Metrics

Metric	Description
Latency (ms)	Time from audio input to transcript output
RTF	Real-Time Factor: processing time / audio duration
WER	Word Error Rate: transcription accuracy

Related Projects

Project	Description
NemotronVoiceRAG	Voice-enabled RAG with pluggable backends
nvidia-nim-rag-demo	Production RAG with NVIDIA NIM
ai-infra-advisor	TCO Calculator for GPU infrastructure

License

MIT

Built by QbitLoop | Brand-WHFT Design System

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docs		docs
notebooks		notebooks
results		results
scripts		scripts
src		src
tests/audio_samples		tests/audio_samples
.env.example		.env.example
.gitignore		.gitignore
BREV_SETUP.md		BREV_SETUP.md
CLAUDE.md		CLAUDE.md
Captures-cli.md		Captures-cli.md
PLAN.md		PLAN.md
README.md		README.md
Voice Agent Demo.md		Voice Agent Demo.md
requirements.txt		requirements.txt
test_nemotron.py		test_nemotron.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RealtimeVoice

Try It Now

Benchmark Results

Key Findings

Architecture

Quick Start

Local Setup (Mac/Linux - No GPU)

GPU Setup (Brev Cloud)

ASR Models

Project Structure

Commands

API Keys

Metrics

Related Projects

License

About

Uh oh!

Releases

Packages

Languages

QbitLoop/RealtimeVoice

Folders and files

Latest commit

History

Repository files navigation

RealtimeVoice

Try It Now

Benchmark Results

Key Findings

Architecture

Quick Start

Local Setup (Mac/Linux - No GPU)

GPU Setup (Brev Cloud)

ASR Models

Project Structure

Commands

API Keys

Metrics

Related Projects

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages