Skip to content

ASR Benchmark: NVIDIA Nemotron vs Whisper vs Deepgram - Nemotron is 21x faster

Notifications You must be signed in to change notification settings

QbitLoop/RealtimeVoice

Repository files navigation

RealtimeVoice

Open In Colab NVIDIA Python License

ASR Benchmark Suite comparing NVIDIA Nemotron against Whisper and Deepgram in real-world voice pipelines.

Try It Now

Open in Google Colab - Run ASR benchmarks directly in your browser!


Benchmark Results

Model Latency vs Whisper Type Hardware
Nemotron 600M 43ms 21x faster Local GPU L40S
Deepgram Nova-2 272ms 3.4x faster Cloud API -
Whisper medium 916ms baseline Local M-series

Key Findings

  • Nemotron is 21x faster than Whisper on GPU hardware
  • Nemotron is 6x faster than Deepgram cloud API (and runs locally)
  • Cold start adds ~380ms (first inference), warm runs are ~43ms
  • L40S GPU ($0.86/hr on Brev) is sufficient for Nemotron 600M

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                         RealtimeVoice Pipeline                              │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌──────────┐    ┌──────────────┐    ┌──────────┐    ┌──────────┐          │
│  │          │    │              │    │          │    │          │          │
│  │  INPUT   │───▶│  ASR ENGINE  │───▶│   LLM    │───▶│   TTS    │          │
│  │  Audio   │    │  (Pluggable) │    │  (NIM)   │    │  Output  │          │
│  │          │    │              │    │          │    │          │          │
│  └──────────┘    └──────────────┘    └──────────┘    └──────────┘          │
│                         │                                                   │
│         ┌───────────────┼───────────────┐                                  │
│         ▼               ▼               ▼                                  │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐                           │
│  │  Nemotron  │  │  Whisper   │  │  Deepgram  │                           │
│  │   43ms     │  │   916ms    │  │   272ms    │                           │
│  │  (GPU)     │  │  (Local)   │  │  (Cloud)   │                           │
│  └────────────┘  └────────────┘  └────────────┘                           │
│                                                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│  BENCHMARK HARNESS                                                          │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │  Latency (ms)  │  RTF (Real-Time Factor)  │  WER (Word Error Rate) │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────┘

Quick Start

Local Setup (Mac/Linux - No GPU)

git clone https://github.com/QbitLoop/RealtimeVoice.git
cd RealtimeVoice
python3 -m venv venv && source venv/bin/activate
pip install -r requirements.txt

# Configure API keys
cp .env.example .env
# Edit .env with your NVIDIA_API_KEY and DEEPGRAM_API_KEY

# Run comparison
python scripts/quick_compare.py

GPU Setup (Brev Cloud)

See docs/BREV_QUICKSTART.md for detailed instructions.

# Install Brev CLI
brew install brevdev/homebrew-brev/brev
brev login

# Create GPU instance
brev create nemotron-asr --gpu L40S

# SSH and setup
brev shell nemotron-asr

ASR Models

Model Implementation Requirements Latency
Nemotron 600M src/asr/nemotron.py NVIDIA GPU + NeMo 43ms
Whisper medium src/asr/whisper.py mlx-whisper or faster-whisper 916ms
Deepgram Nova-2 src/asr/deepgram.py API key ($200 free credit) 272ms

Project Structure

RealtimeVoice/
├── src/
│   ├── main.py              # Voice agent entry point
│   ├── config.py            # Configuration
│   ├── asr/                 # ASR implementations
│   │   ├── base.py          # Abstract interface
│   │   ├── nemotron.py      # NVIDIA Nemotron
│   │   ├── whisper.py       # OpenAI Whisper
│   │   └── deepgram.py      # Deepgram API
│   ├── llm/
│   │   └── nim.py           # NVIDIA NIM integration
│   ├── tts/
│   │   └── piper.py         # Piper TTS
│   └── benchmark/
│       ├── harness.py       # Benchmark runner
│       └── metrics.py       # Latency, WER, RTF
├── scripts/
│   ├── quick_compare.py     # Fast ASR comparison
│   ├── run_benchmark.py     # Full benchmark suite
│   └── brev_setup.sh        # GPU instance setup
├── docs/
│   └── BREV_QUICKSTART.md   # Step-by-step GPU guide
├── tests/
│   └── audio_samples/       # Test audio files
└── results/                 # Benchmark outputs

Commands

# Interactive voice agent
python src/main.py --asr whisper --tts simple

# Process audio file
python src/main.py --file tests/audio_samples/test1.wav

# Quick ASR comparison
python scripts/quick_compare.py

# Full benchmark suite
python scripts/run_benchmark.py --models whisper,deepgram,nemotron

API Keys

Service URL Purpose
NVIDIA NIM build.nvidia.com LLM inference
Deepgram console.deepgram.com Cloud ASR
HuggingFace huggingface.co Model downloads

Metrics

Metric Description
Latency (ms) Time from audio input to transcript output
RTF Real-Time Factor: processing time / audio duration
WER Word Error Rate: transcription accuracy

Related Projects

Project Description
NemotronVoiceRAG Voice-enabled RAG with pluggable backends
nvidia-nim-rag-demo Production RAG with NVIDIA NIM
ai-infra-advisor TCO Calculator for GPU infrastructure

License

MIT


Built by QbitLoop | Brand-WHFT Design System

About

ASR Benchmark: NVIDIA Nemotron vs Whisper vs Deepgram - Nemotron is 21x faster

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published