ASR Benchmark Suite comparing NVIDIA Nemotron against Whisper and Deepgram in real-world voice pipelines.
Open in Google Colab - Run ASR benchmarks directly in your browser!
| Model | Latency | vs Whisper | Type | Hardware |
|---|---|---|---|---|
| Nemotron 600M | 43ms | 21x faster | Local GPU | L40S |
| Deepgram Nova-2 | 272ms | 3.4x faster | Cloud API | - |
| Whisper medium | 916ms | baseline | Local | M-series |
- Nemotron is 21x faster than Whisper on GPU hardware
- Nemotron is 6x faster than Deepgram cloud API (and runs locally)
- Cold start adds ~380ms (first inference), warm runs are ~43ms
- L40S GPU ($0.86/hr on Brev) is sufficient for Nemotron 600M
┌─────────────────────────────────────────────────────────────────────────────┐
│ RealtimeVoice Pipeline │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌──────────┐ ┌──────────────┐ ┌──────────┐ ┌──────────┐ │
│ │ │ │ │ │ │ │ │ │
│ │ INPUT │───▶│ ASR ENGINE │───▶│ LLM │───▶│ TTS │ │
│ │ Audio │ │ (Pluggable) │ │ (NIM) │ │ Output │ │
│ │ │ │ │ │ │ │ │ │
│ └──────────┘ └──────────────┘ └──────────┘ └──────────┘ │
│ │ │
│ ┌───────────────┼───────────────┐ │
│ ▼ ▼ ▼ │
│ ┌────────────┐ ┌────────────┐ ┌────────────┐ │
│ │ Nemotron │ │ Whisper │ │ Deepgram │ │
│ │ 43ms │ │ 916ms │ │ 272ms │ │
│ │ (GPU) │ │ (Local) │ │ (Cloud) │ │
│ └────────────┘ └────────────┘ └────────────┘ │
│ │
├─────────────────────────────────────────────────────────────────────────────┤
│ BENCHMARK HARNESS │
│ ┌─────────────────────────────────────────────────────────────────────┐ │
│ │ Latency (ms) │ RTF (Real-Time Factor) │ WER (Word Error Rate) │ │
│ └─────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
git clone https://github.com/QbitLoop/RealtimeVoice.git
cd RealtimeVoice
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt
# Configure API keys
cp .env.example .env
# Edit .env with your NVIDIA_API_KEY and DEEPGRAM_API_KEY
# Run comparison
python scripts/quick_compare.pySee docs/BREV_QUICKSTART.md for detailed instructions.
# Install Brev CLI
brew install brevdev/homebrew-brev/brev
brev login
# Create GPU instance
brev create nemotron-asr --gpu L40S
# SSH and setup
brev shell nemotron-asr| Model | Implementation | Requirements | Latency |
|---|---|---|---|
| Nemotron 600M | src/asr/nemotron.py |
NVIDIA GPU + NeMo | 43ms |
| Whisper medium | src/asr/whisper.py |
mlx-whisper or faster-whisper | 916ms |
| Deepgram Nova-2 | src/asr/deepgram.py |
API key ($200 free credit) | 272ms |
RealtimeVoice/
├── src/
│ ├── main.py # Voice agent entry point
│ ├── config.py # Configuration
│ ├── asr/ # ASR implementations
│ │ ├── base.py # Abstract interface
│ │ ├── nemotron.py # NVIDIA Nemotron
│ │ ├── whisper.py # OpenAI Whisper
│ │ └── deepgram.py # Deepgram API
│ ├── llm/
│ │ └── nim.py # NVIDIA NIM integration
│ ├── tts/
│ │ └── piper.py # Piper TTS
│ └── benchmark/
│ ├── harness.py # Benchmark runner
│ └── metrics.py # Latency, WER, RTF
├── scripts/
│ ├── quick_compare.py # Fast ASR comparison
│ ├── run_benchmark.py # Full benchmark suite
│ └── brev_setup.sh # GPU instance setup
├── docs/
│ └── BREV_QUICKSTART.md # Step-by-step GPU guide
├── tests/
│ └── audio_samples/ # Test audio files
└── results/ # Benchmark outputs
# Interactive voice agent
python src/main.py --asr whisper --tts simple
# Process audio file
python src/main.py --file tests/audio_samples/test1.wav
# Quick ASR comparison
python scripts/quick_compare.py
# Full benchmark suite
python scripts/run_benchmark.py --models whisper,deepgram,nemotron| Service | URL | Purpose |
|---|---|---|
| NVIDIA NIM | build.nvidia.com | LLM inference |
| Deepgram | console.deepgram.com | Cloud ASR |
| HuggingFace | huggingface.co | Model downloads |
| Metric | Description |
|---|---|
| Latency (ms) | Time from audio input to transcript output |
| RTF | Real-Time Factor: processing time / audio duration |
| WER | Word Error Rate: transcription accuracy |
| Project | Description |
|---|---|
| NemotronVoiceRAG | Voice-enabled RAG with pluggable backends |
| nvidia-nim-rag-demo | Production RAG with NVIDIA NIM |
| ai-infra-advisor | TCO Calculator for GPU infrastructure |
MIT
Built by QbitLoop | Brand-WHFT Design System