llm-bench

Benchmarks for local LLM and NVIDIA NIM inference with Claude Code.

Getting Started (what you need to change)

NVIDIA API key — copy .env.example to .env and add your key (free at build.nvidia.com):
```
NVIDIA_API_KEY=your-nvidia-api-key
```

Python dependencies — install into a venv:

python -m venv .venv
.venv/Scripts/pip install httpx openai evalplus python-dotenv

Local bench — requires llama-server running on :8081 and llama-cpp-local as a sibling directory (for tune_ncpu.py / verify_vram_256k.py fixture paths)

Structure

llm-bench/
├── shared/     — scripts used by both local and NVIDIA benches
├── local/      — local llama.cpp model benchmarks + results
└── nvidia/     — NVIDIA NIM cloud inference benchmarks + results

Shared scripts

Script	Purpose
`bench_llm_speed.py`	TTFT, prefill tok/s, gen tok/s — real Claude Code fixture
`niah_test.py`	Needle in a Haystack — recall at different context depths
`coding_test.py`	10-problem coding benchmark (easy/medium/hard)
`evalplus_test.py`	HumanEval+ 164-problem benchmark (pass@1)
`test_real_tools.py`	Tool-calling with real 86K Claude Code fixture (242 tools)
`test_attribution_header.py`	KV cache impact of Claude Code billing header
`test_batch_compact.py`	Compact-scale cold prefill benchmark at different batch sizes
`test_context_accuracy.py`	NIAH + reasoning quality at 128K vs 256K context
`fixture_real_request.json`	Real 86K Claude Code request (242 tools, full system prompt)

Local results summary

Model: Qwen3.6-35B-A3B IQ3_XXS via llama.cpp b9143 on RTX 3080 10GB

Metric	128K	256K
Gen tok/s	55.5	47.5
Warm TTFT	0.1s	0.1s
Cold TTFT	12.6s	15.5s
Tool calling	PASS	—
NIAH	100%	100%
Coding	90% (9/10)	—
EvalPlus	92.7% (152/164)	—

See local/results/ for raw CSVs.

NVIDIA NIM results summary

Models tested on NVIDIA NIM free tier via clawgate proxy.

See nvidia/results/ for raw CSVs.

Running benchmarks

Local

cd local
.\run_all_models.ps1   # full suite (auto-tunes n-cpu-moe, runs all tests)
.\run_bench.ps1        # quick bench against already-running server

NVIDIA NIM

cd nvidia
.\run_bench.ps1
.\run_clawgate_test.ps1

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
local		local
nvidia		nvidia
shared		shared
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm-bench

Getting Started (what you need to change)

Structure

Shared scripts

Local results summary

NVIDIA NIM results summary

Running benchmarks

Local

NVIDIA NIM

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

llm-bench

Getting Started (what you need to change)

Structure

Shared scripts

Local results summary

NVIDIA NIM results summary

Running benchmarks

Local

NVIDIA NIM

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages