Throughput Bench

A small CLI for measuring inference throughput (img/s) of vision backbones on a single GPU, with a focus on geospatial workloads. Covers 33 timm architectures (ResNet, EfficientNet, ConvNeXt, MobileNet, RegNet, ViT including L/8 + H/14 + G/14, DeiT3, Swin, BEiT, DinoV3, CoAtNet) plus 12 geospatial foundation-model encoders (DOFA, CROMA, SenPaMAE, Galileo, OlmoEarth) under fp32 / fp16 / bf16 / AMP and torch.compile. Results are appended to a per-GPU CSV (NVIDIA H100 NVL · Tesla V100-SXM2 32 GB so far); an interactive Globe Race webapp turns those numbers into "how fast can each backbone map the world?"

Figure 1. Globe Race webapp — pick two backbones and a GSD; the dot grid fills in proportional to land area each model has processed at its measured throughput.

Quick start

git clone https://github.com/calebrob6/throughput-bench.git
cd throughput-bench
make setup          # conda env, or: pip install -r requirements.txt
make benchmark      # appends to results/<gpu_slug>.csv (auto-detected from nvidia-smi)

Pass extra flags through ARGS=:

make benchmark GPU_ID=2
make benchmark ARGS="--models resnet18 olmoearth_nano --timed-seconds 10"
make benchmark ARGS="--compile-modes default max-autotune"
make benchmark ARGS="--input-channels 4 --input-size 128"
make benchmark ARGS="--geo-compare"     # geo FMs + timm baselines at matching input shapes

Re-running on the same GPU is a free no-op for already-completed configs: the script enumerates every (model, precision, compile, channels, size) combo up front, prints a one-line skip summary, and runs only what's missing.

Geospatial foundation models (optional)

The geo wrappers in geo_models.py need extra dependencies (Python ≥ 3.11, < 3.14). make setup pulls them in via environment.yml; for manual install:

pip install omegaconf
pip install git+https://github.com/allenai/olmoearth_pretrain_minimal.git@main
pip install git+https://github.com/geobreeze/geobreeze.git

The git pin on olmoearth_pretrain_minimal is intentional — the PyPI release (≤ 0.0.3) misses the dtype-safe CompositeEncodings fix from PR #10, without which OlmoEarth crashes under .half() / .bfloat16(). Per-model precision support lives in geo_models.GEO_MODEL_REGISTRY[name]['supported_precisions'] (DOFA / CROMA / Galileo are fp32 + AMP only because of upstream dtype issues; OlmoEarth + SenPaMAE support all four).

`torch.compile` system requirements

Triton shells out to the system gcc, which needs the GNU assembler (as) and glibc dev headers. Slim Linux images often miss them and fail with cannot execute 'as' or cannot find /usr/lib/libc_nonshared.a. Install:

sudo apt-get install -y binutils libc6-dev      # Debian / Ubuntu
sudo tdnf install -y binutils glibc-devel       # Azure Linux / RHEL family

environment.yml also pulls binutils into the conda env as a fallback.

Models

Family	Sizes	Type
ResNet	18 / 50 / 101 / 152	CNN
EfficientNet	B0 / B4 / B7	CNN
ConvNeXt	T / S / B / L	CNN
MobileNetV3	Small / Large	CNN
RegNetY	400MF / 4GF	CNN
ViT	Ti/16 / S/16 / B/16 / L/16 / L/8 / H/14 / G/14	ViT
DeiT3	S/16 / B/16	ViT
Swin	T / S / B / L	ViT
BEiT	B/16 / L/16	ViT
DinoV3	H+/16	ViT
CoAtNet	CoAtNet-0 / CoAtNet-2	Hybrid
DOFA	B/16 / L/16	Geo ViT
CROMA	Optical / SAR	Geo ViT
SenPaMAE	B/16	Geo ViT
Galileo	Nano/8 / Base/8 / Large/8	Geo ViT
OlmoEarth	Nano/8 / Tiny/8 / Base/8 / Large/8	Geo ViT

Add new architectures by extending MODEL_REGISTRY in models.py (timm-compatible names) or GEO_MODEL_REGISTRY in geo_models.py (custom wrappers).

Methodology

GPU isolation — aborts if other processes are using the target GPU; override with --force.
Precision — fp32 enables TF32 matmuls on Ampere+ (CSV's tf32_enabled flag disambiguates); fp16 / bf16 cast the model with .half() / .bfloat16(); amp keeps the model in fp32 and wraps forward in torch.autocast. bf16 is auto-skipped on pre-Ampere GPUs.
Compile — runs both none and default torch.compile modes by default; max-autotune available via --compile-modes.
Batch size — starts at the requested size (default 512) and halves on OOM until it fits or hits 1. Pass --batch-sizes 1 8 32 64 to sweep.
Timing — 20 warmup iters, then ≥ 30 s of timed iters, wall-clock with torch.cuda.synchronize() at boundaries. Reports throughput, mean / p50 / p95 / p99 latency, peak GPU memory.
Data — pre-allocated GPU batch by default (peak compute throughput); pass --dataloader for the realistic end-to-end path that includes DataLoader IPC + host→device transfer. On a V100 the IPC overhead alone (~140 ms / batch at bs=512) roughly halves ResNet-18 throughput vs the pre-allocated path.
Reproducibility — every run also writes results/<gpu_slug>_hardware.json with driver / clocks / power-cap / git SHA.

benchmark_sanity_check.py is a minimal standalone ResNet-18 timer for cross-checking against the main script.

Pixels/sec → Square Kilometers

Throughput becomes a coverage rate once you fix the Ground Sample Distance (the physical size of one pixel):

area_per_patch = (224 × GSD)² / 10⁶  km²
coverage_rate  = throughput × area_per_patch  km²/s

Sensor	GSD	Area / 224² patch	@ 1,000 img/s	@ 5,000 img/s
High-res commercial	0.3 m	0.0045 km²	4.5 km²/s	22.6 km²/s
NAIP / aerial	1 m	0.050 km²	50 km²/s	251 km²/s
Sentinel-2 (10m)	10 m	5.02 km²	5,017 km²/s	25,088 km²/s
Sentinel-2 (20m)	20 m	20.07 km²	20,070 km²/s	100,352 km²/s
Landsat (30m)	30 m	45.16 km²	45,158 km²/s	225,792 km²/s

Numbers assume non-overlapping patches on a single GPU; sliding windows in production typically overlap 50%+.

Contributing results

Got an A100, H100, MI300, or anything else? PRs welcome.

make benchmark            # writes results/<gpu_slug>.csv + _hardware.json
# Add an entry for the new GPU to results/index.json (csv / hardware / label)
git checkout -b results/<your-gpu>
git add results/
git commit -m "Add results for <your GPU>"
git push origin results/<your-gpu>

PR checklist:

GPU was idle during the run (the script enforces this unless you pass --force).
results/index.json updated so the webapp picks the new GPU up.

Citation

@software{throughput-bench2026,
  title={Throughput Bench: Geospatial Model Throughput Benchmark},
  author={Robinson, Caleb},
  year={2026},
  url={https://github.com/calebrob6/throughput-bench},
  license={MIT}
}

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
images		images
results		results
webapp		webapp
.gitignore		.gitignore
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
benchmark.py		benchmark.py
benchmark_dataloader_ipc.py		benchmark_dataloader_ipc.py
benchmark_sanity_check.py		benchmark_sanity_check.py
data.py		data.py
environment.yml		environment.yml
geo_models.py		geo_models.py
models.py		models.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Throughput Bench

Quick start

Geospatial foundation models (optional)

`torch.compile` system requirements

Models

Methodology

Pixels/sec → Square Kilometers

Contributing results

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Throughput Bench

Quick start

Geospatial foundation models (optional)

torch.compile system requirements

Models

Methodology

Pixels/sec → Square Kilometers

Contributing results

Citation

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`torch.compile` system requirements

Packages