modelping

⚡ Benchmark LLM, STT, TTS, and full voice pipeline latency across every major AI provider.

One tool. Every provider. The metrics that actually matter.

What It Measures

Category	Metrics
LLM	Time to First Token (TTFT) P50/P95/P99, tokens/sec, cost
STT	Transcription latency, time to first partial transcript
TTS	Time to First Audio Byte (TTFB), realtime factor
Pipeline	Full STT→LLM→TTS end-to-end latency (the hero metric)

All measurements use real streaming requests — latency is captured at the byte level.

Installation

pip install modelping

Or install from source:

git clone https://github.com/LatencyGrid/modelping
cd modelping
pip install -e .

Quick Start

# Copy and configure API keys
cp .env.example .env
# Edit .env with your keys

# Run the full voice pipeline benchmark (hero feature)
modelping pipeline --stt groq-whisper-large-v3 --llm llama-3.3-70b-versatile --tts cartesia-sonic-2

# Benchmark LLMs head-to-head
modelping run gpt-4o claude-3-5-sonnet-20241022 gemini-2.5-flash

# Benchmark STT providers
modelping stt

# Benchmark TTS providers
modelping tts

Sample Output

Pipeline (STT→LLM→TTS)

$ modelping pipeline --stt groq-whisper-large-v3 --llm llama-3.3-70b-versatile --tts cartesia-sonic-2

╭──────────────────────────────────────────────────────────────────────────────╮
│  modelping pipeline  •  3 runs                                               │
╰──────────────────────────────────────────────────────────────────────────────╯

 STT                     LLM                      TTS              STT    LLM    TTS   Total
 ─────────────────────────────────────────────────────────────────────────────────────────────
 groq/whisper-large-v3   groq/llama-3.3-70b        cartesia/sonic-2  182ms  44ms   91ms   317ms

✓ Pipeline tested  •  fastest total: 317ms

LLM

$ modelping run gpt-4o claude-3-5-sonnet-20241022 gemini-2.5-flash llama-3.3-70b-versatile

╭─────────────────────────────────────────────────────────────────────────────────╮
│  modelping  •  5 runs  •  prompt: 64 tokens                                     │
╰─────────────────────────────────────────────────────────────────────────────────╯

 Model                          Provider     TTFT P50   TTFT P95   Tok/s   Cost/1M
 ─────────────────────────────────────────────────────────────────────────────────
 llama-3.3-70b-versatile        groq           42ms       67ms    312.4     $0.79
 gemini-2.5-flash               google        936ms     1100ms     45.2     $0.60
 claude-3-5-sonnet-20241022     anthropic     198ms      234ms     71.1    $15.00
 gpt-4o                         openai        312ms      489ms     82.3    $10.00

✓ 4 models tested  •  12.3s total

Colors: 🟢 green = fastest, 🟡 yellow = mid, 🔴 red = slowest (relative to the tested set).

STT

$ modelping stt --runs 3

 Model                    Provider     Latency P50   Latency P95   Words
 ────────────────────────────────────────────────────────────────────────
 whisper-large-v3-turbo   groq            180ms         210ms        9
 nova-2                   deepgram        240ms         290ms        9
 whisper-1                openai          890ms        1100ms        9

✓ 3 providers tested

TTS

$ modelping tts --runs 3

 Model                    Provider      TTFB P50   TTFB P95   Realtime
 ──────────────────────────────────────────────────────────────────────
 sonic-2                  cartesia         89ms      112ms      14.2x
 eleven_flash_v2_5        elevenlabs      210ms      267ms       8.1x
 aura-asteria-en          deepgram        198ms      245ms       9.3x
 tts-1                    openai          312ms      398ms       6.4x

✓ 4 providers tested

Full CLI Reference

# LLM benchmarks
modelping run gpt-4o claude-3-5-sonnet-20241022 gemini-2.5-flash
modelping run --all
modelping run --provider groq
modelping run gpt-4o --runs 10
modelping run gpt-4o --prompt "custom prompt"
modelping run gpt-4o --json
modelping run gpt-4o --csv
modelping run gpt-4o --fail-above-ttft 500

# STT benchmarks
modelping stt
modelping stt groq-whisper-large-v3 deepgram-nova-2
modelping stt --runs 5

# TTS benchmarks
modelping tts
modelping tts cartesia-sonic-2 elevenlabs-flash
modelping tts --runs 5
modelping tts --text "Custom text to synthesize"

# Pipeline benchmark (full STT→LLM→TTS)
modelping pipeline
modelping pipeline --stt groq-whisper-large-v3 --llm gpt-4o-mini --tts cartesia-sonic-2
modelping pipeline --stt all --llm all --tts all
modelping pipeline --runs 3

# List available models
modelping models
modelping models --provider anthropic

Supported Providers

LLM

Model	Provider	Input $/1M	Output $/1M
gpt-4o	openai	$2.50	$10.00
gpt-4o-mini	openai	$0.15	$0.60
o3-mini	openai	$1.10	$4.40
claude-3-5-sonnet-20241022	anthropic	$3.00	$15.00
claude-3-haiku-20240307	anthropic	$0.25	$1.25
gemini-2.5-flash	google	$0.15	$0.60
llama-3.3-70b-versatile	groq	$0.59	$0.79
mixtral-8x7b-32768	groq	$0.24	$0.24
accounts/fireworks/models/llama-v3p1-70b-instruct	fireworks	$0.90	$0.90
meta-llama/Llama-3.3-70B-Instruct-Turbo	together	$0.88	$0.88
mistral-large-latest	mistral	$2.00	$6.00
mistral-small-latest	mistral	$0.10	$0.30
command-r-plus	cohere	$2.50	$10.00
command-r	cohere	$0.15	$0.60

STT

Model	Provider
whisper-large-v3	groq
whisper-large-v3-turbo	groq
distil-whisper-large-v3-en	groq
whisper-1	openai
gpt-4o-transcribe	openai
nova-2	deepgram
nova-3	deepgram
best	assemblyai
nano	assemblyai
(default)	gladia

TTS

Model	Provider
eleven_flash_v2_5	elevenlabs
eleven_multilingual_v2	elevenlabs
sonic-2	cartesia
sonic-english	cartesia
tts-1	openai
tts-1-hd	openai
(streaming)	fish-audio
PlayDialog	playht
Play3.0-mini	playht
aura-asteria-en	deepgram
aura-luna-en	deepgram
blizzard	lmnt
aurora	lmnt

Configuration

Set API keys in a .env file in your working directory:

# LLM providers
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...
GROQ_API_KEY=gsk_...
FIREWORKS_API_KEY=fw_...
TOGETHER_API_KEY=...
MISTRAL_API_KEY=...
COHERE_API_KEY=...

# STT providers
DEEPGRAM_API_KEY=...
ASSEMBLYAI_API_KEY=...
GLADIA_API_KEY=...

# TTS providers
ELEVENLABS_API_KEY=...
CARTESIA_API_KEY=...
FISH_AUDIO_API_KEY=...
PLAYHT_API_KEY=...
PLAYHT_USER_ID=...
LMNT_API_KEY=...

modelping auto-detects configured providers and skips (with a warning) any without keys set.

CI/CD Example

GitHub Actions

name: AI Latency Check

on:
  schedule:
    - cron: '0 */6 * * *'   # every 6 hours
  workflow_dispatch:

jobs:
  latency-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'

      - name: Install modelping
        run: pip install modelping

      - name: Run LLM latency benchmark
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
          GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
        run: |
          modelping run gpt-4o claude-3-5-sonnet-20241022 --runs 3 --fail-above-ttft 1000

      - name: Check voice pipeline latency
        env:
          GROQ_API_KEY: ${{ secrets.GROQ_API_KEY }}
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          CARTESIA_API_KEY: ${{ secrets.CARTESIA_API_KEY }}
        run: modelping pipeline --stt groq-whisper-large-v3 --llm gpt-4o-mini --tts cartesia-sonic-2 --fail-above-ttft 500

      - name: Export JSON results
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          modelping run --provider openai --json > results.json

      - name: Upload results
        uses: actions/upload-artifact@v4
        with:
          name: latency-results
          path: results.json

Roadmap

LLM benchmarking (TTFT, throughput, cost)
STT benchmarking (transcription latency)
TTS benchmarking (time to first audio byte)
Full STT→LLM→TTS pipeline benchmark
Community leaderboard — submit anonymous results, see global rankings
Web UI — run benchmarks from your browser (bring your own keys)
Self-hosted / open source model endpoints
Historical tracking and latency alerts

Contributing

See CONTRIBUTING.md for instructions on adding a new provider.

PRs welcome for:

New providers
New models / updated pricing
Output improvements
Bug fixes

git clone https://github.com/LatencyGrid/modelping
cd modelping
pip install -e .

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.github		.github
modelping		modelping
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

modelping

What It Measures

Installation

Quick Start

Sample Output

Pipeline (STT→LLM→TTS)

LLM

STT

TTS

Full CLI Reference

Supported Providers

LLM

STT

TTS

Configuration

CI/CD Example

GitHub Actions

Roadmap

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

modelping

What It Measures

Installation

Quick Start

Sample Output

Pipeline (STT→LLM→TTS)

LLM

STT

TTS

Full CLI Reference

Supported Providers

LLM

STT

TTS

Configuration

CI/CD Example

GitHub Actions

Roadmap

Contributing

License

About

Topics

Resources

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages