Metal-accelerated C/C++ inference for ESM Cambrian (ESM-C), built on llama.cpp / ggml.
- Source: github.com/AnanyaP-WDW/esmc.cpp
- Pre-converted GGUF models: AnanyaPathak/esmc-300m-gguf on Hugging Face (model card with benchmarks, quick start, and usage)
git submodule update --init --recursive
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j8
./build/esmc-embed --helpOr: make -C build esmc-embed
python3 -m venv .venv && .venv/bin/pip install -r tools/requirements.txt
hf download biohub/ESMC-300M --local-dir ./esmc-300m
.venv/bin/python tools/inspect_esmc_weights.py ./esmc-300mThe biohub checkpoint uses the native esmc.* layout (fused QKV, fused SwiGLU fc1_weight).
The HuggingFace model.layers.* layout in plan §2.1 is also supported when present.
.venv/bin/pip install -e ./ggml/gguf-py -r tools/requirements.txt
.venv/bin/python tools/convert_esmc_to_gguf.py ./esmc-300m ./models/esmc-300m-f16.gguf
# Verify metadata + tensors
.venv/bin/python tools/verify_gguf.py ./models/esmc-300m-f16.gguf
.venv/bin/python ./ggml/gguf-py/gguf/scripts/gguf_dump.py ./models/esmc-300m-f16.gguf --no-tensors.venv/bin/python tests/test_tokenizer.py
# ACDEF -> [0, 5, 23, 13, 9, 18, 2] (matches HF tokenizer.json).venv/bin/python tests/check_layer0_qk.py
./build/esmc-embed -m ./models/esmc-300m-f16.gguf -s ACDEF --layers 8 --no-metalcmake --build build -j8
./build/esmc-embed -m ./models/esmc-300m-f16.gguf --verify-load --no-metal# Per-residue embeddings (.npy: [n_tokens, n_embd])
./build/esmc-embed -m ./models/esmc-300m-Q4_K_M.gguf \
-s "MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGY" \
--pool none --output embedding.npy
# Mean-pooled sequence embedding (strips CLS/EOS)
./build/esmc-embed -m ./models/esmc-300m-Q4_K_M.gguf \
-s "MKTVRQ..." --pool mean --output embedding.npyThe commands below reproduce every number in the paper, in order, from a clean clone. Run them on an Apple Silicon Mac (16 GB recommended) for the full CPU + Metal matrix; non-Apple/CPU-only hosts can run everything except the Metal rows. Budget ~3 GB of downloads (checkpoint + ProteinGym archive) and a few hours for the complete benchmark matrix.
Prerequisites: macOS with the Xcode command-line tools, CMake ≥ 3.14,
Python ≥ 3.10, and a HuggingFace account/token for the checkpoint download. The
PyTorch reference and the PyTorch baselines require torch; the Metal backend
requires an Apple GPU.
git clone --recursive https://github.com/AnanyaP-WDW/esmc.cpp.git
cd esmc.cpp
git submodule update --init --recursive # only if you cloned without --recursive
cmake -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build -j8 # builds esmc-embed, esmc-bench, esmc-quantizepython3 -m venv .venv
.venv/bin/pip install -r tools/requirements.txt
.venv/bin/pip install -e ./ggml/gguf-py
.venv/bin/pip install torch # PyTorch reference + baselineshf download EvolutionaryScale/esmc-300m-2024-12 --local-dir ./esmc-300m.venv/bin/python tools/convert_esmc_to_gguf.py ./esmc-300m ./models/esmc-300m-f16.gguf --dtype f16
.venv/bin/python tools/convert_esmc_to_gguf.py ./esmc-300m ./models/esmc-300m-f32.gguf --dtype f32
.venv/bin/python tools/verify_gguf.py ./models/esmc-300m-f16.ggufcmake --build build --target esmc-quantize
./build/esmc-quantize models/esmc-300m-f32.gguf models/esmc-300m-Q8_0.gguf Q8_0
./build/esmc-quantize models/esmc-300m-f32.gguf models/esmc-300m-Q4_K_M.gguf Q4_K_M
./build/esmc-quantize models/esmc-300m-f32.gguf models/esmc-300m-Q4_K_S.gguf Q4_K_S.venv/bin/python tests/test_tokenizer.py # M4: token IDs
.venv/bin/python tests/check_layer0_qk.py # M5: layer-0 Q/K probe
.venv/bin/python tests/validate.py --no-metal # M6: full forward (CPU)
.venv/bin/python tests/validate.py --metal --compare-cpu-metal # CPU/Metal parityThe correctness harness compares against a PyTorch reference that is not checked into git (~107 MB). Regenerate it once from the checkpoint:
.venv/bin/python tests/generate_reference.py \
--fasta benchmarks/sequences_correctness.fasta \
--output tests/reference_embeddings.npz# Numerical correctness (paper Table 2): all precisions vs PyTorch, 100 Swiss-Prot seqs
.venv/bin/python benchmarks/correctness.py
# Throughput (paper Table 3, Fig 1): esmc.cpp CPU/Metal + PyTorch CPU/MPS
cmake --build build --target esmc-bench
.venv/bin/python benchmarks/throughput.py --config benchmarks/config_throughput_300m.json
# Memory footprint (paper Fig 2): peak RSS across all 36 configurations
.venv/bin/python benchmarks/memory.py --config benchmarks/config_memory_300m.json
# Downstream ProteinGym variant-effect (paper Table 4, Fig 3): 10 assays x 1000 variants
.venv/bin/python benchmarks/fetch_proteingym_subset.py --download \
--selection-mode multi-assay --max-assays 10 --max-rows 1000 --min-rows 1000 \
--max-length 512 --output-dir benchmarks/proteingym_subset_10k \
--manifest benchmarks/datasets/proteingym_subset_10k_manifest.json
.venv/bin/python benchmarks/downstream.py --config benchmarks/config_downstream_300m_10k.json.venv/bin/python benchmarks/paper_artifacts.py # figure SVGs + summary CSVs
.venv/bin/python benchmarks/make_reproduction_bundle.py # self-contained results bundleHost: Apple M1 (arm64), 16 GB unified memory, macOS 26.5.
Reference: official PyTorch ESM-C 300M (tests/ref_forward.py on native safetensors).
Datasets: 100 reviewed human UniProt sequences (correctness); length-bucketed FASTA (throughput/memory); ProteinGym 10-assay × 1000-variant subset (downstream).
Raw CSV/JSON artifacts live under results/; paper-ready Markdown/LaTeX tables are regenerated by benchmarks/make_reproduction_bundle.py → results/reproduction_bundle/tables/. Full experiment log: lab_manual.md.
Regenerate with benchmarks/paper_artifacts.py (writes SVG; PNG if matplotlib is installed).
Per-residue cosine similarity on 100 Swiss-Prot sequences (short / medium / long buckets). Pass = per-sequence mean cosine > 0.999 (F16, Q8_0) or > 0.995 (Q4_K_*).
| Precision | Seqs | Aggregate mean cos | Worst mean cos | Worst min cos | Mean-pool L2 (max) | Pass rate |
|---|---|---|---|---|---|---|
| F16 | 100 | 0.99999 | 0.99997 | 0.99971 | 0.0030 | 100/100 |
| Q8_0 | 100 | 0.99971 | 0.99938 | 0.99427 | 0.0164 | 100/100 |
| Q4_K_M | 100 | 0.99597 | 0.99245 | 0.94013 | 0.0656 | 91/100 |
| Q4_K_S | 100 | 0.99523 | 0.98982 | 0.92806 | 0.0709 | 75/100 |
GGUF on-disk size (300M): F16 634 MiB, Q8_0 337 MiB, Q4_K_M 237 MiB, Q4_K_S 228 MiB.
Single-sequence throughput by length bucket; each backend runs in a fresh process. Best esmc.cpp = highest seq/s for that bucket among CPU/Metal × F16/Q8_0/Q4_K_*.
| Bucket | Tokens | Best esmc.cpp | seq/s | PyTorch CPU | PyTorch MPS | vs CPU | vs MPS |
|---|---|---|---|---|---|---|---|
| short | 47 | metal/q4_k_s | 14.54 | 10.31 | 29.29 | 1.41× | 0.50× |
| medium | 235 | metal/q4_k_m | 5.62 | 4.56 | 10.11 | 1.23× | 0.56× |
| long | 850 | metal/q8_0 | 1.33 | 1.74 | 2.83 | 0.76× | 0.47× |
Metal 4-bit esmc.cpp beats PyTorch CPU on short and medium sequences at ~520 MiB peak RAM; PyTorch MPS remains fastest on this hardware.
Peak resident set size (RSS) measured with /usr/bin/time -l in fresh processes. All 12 configurations below pass the 16 GiB machine budget.
| Configuration | Peak RSS (MiB) | Model file (MiB) | ≤ 16 GiB |
|---|---|---|---|
| esmc.cpp / cpu / f16 | 7426 | 634 | yes |
| esmc.cpp / cpu / q8_0 | 6831 | 337 | yes |
| esmc.cpp / cpu / q4_k_m | 6632 | 237 | yes |
| esmc.cpp / cpu / q4_k_s | 6613 | 228 | yes |
| esmc.cpp / cpu / f32 | 3989 | 1266 | yes |
| esmc.cpp / metal / f32 | 2570 | 1266 | yes |
| pytorch / cpu / f32 | 1588 | 1270 | yes |
| esmc.cpp / metal / f16 | 1323 | 634 | yes |
| esmc.cpp / metal / q8_0 | 736 | 337 | yes |
| esmc.cpp / metal / q4_k_m | 531 | 237 | yes |
| esmc.cpp / metal / q4_k_s | 519 | 228 | yes |
| pytorch / mps / f32 | 282 | 1270 | yes |
Deployment sweet spot: Metal Q4_K_M or Q4_K_S — ~520 MiB peak RSS, ~230 MiB on disk, 1.2–1.4× PyTorch CPU throughput on short/medium sequences.
10 short stability assays, 1000 variants each. Variants scored by cosine between mean-pooled mutant and wild-type embeddings vs PyTorch reference. Pass = |Δ| ≤ 0.01 per assay per metric (Spearman, Pearson, Kendall τ_b, top/bottom-decile overlap).
| Precision | Assays | Mean abs Spearman Δ | Max abs Spearman Δ | All-metric pass |
|---|---|---|---|---|
| F16 | 10 | 0.0006 | 0.0014 | 50/50 |
| Q8_0 | 10 | 0.0031 | 0.0092 | 45/50 |
| Q4_K_M | 10 | 0.0068 | 0.0231 | 38/50 |
| Q4_K_S | 10 | 0.0110 | 0.0258 | 32/50 |
F16 and Q8_0 preserve rank metrics well; 4-bit schemes are less stable under the strict 0.01 Spearman tolerance (see paper for caveats).
Pre-converted GGUF files (F32, F16, Q8_0, Q4_K_M, Q4_K_S) are published at
AnanyaPathak/esmc-300m-gguf.
The model card documents which
file to pick, how to build esmc-embed, benchmark numbers, and license terms.
huggingface-cli download AnanyaPathak/esmc-300m-gguf esmc-300m-Q8_0.gguf --local-dir ./models
shasum -a 256 models/*.gguf # verify against results/reproduction_bundle/models/MODELS.mdTo refresh the model card and re-upload after local changes:
.venv/bin/python benchmarks/make_reproduction_bundle.py --hf-repo AnanyaPathak/esmc-300m-gguf
.venv/bin/python tools/upload_to_hf.py \
--repo-id AnanyaPathak/esmc-300m-gguf \
--models-dir ./models \
--model-card results/reproduction_bundle/model_card/README.md \
--createAssemble a single self-describing directory with GGUF files (or checksum manifest), all benchmark CSV/JSON, plots, and paper-ready Markdown + LaTeX tables:
.venv/bin/python benchmarks/make_reproduction_bundle.py
# -> results/reproduction_bundle/{models,benchmarks,plots,tables,model_card,MANIFEST.json}--gguf-mode link (default) symlinks the GGUF files; use copy to duplicate bytes
or manifest for a checksum-only reference. MANIFEST.json records the git hash,
host info, and a checksummed inventory of every file in the bundle.
See plan.md for the full engineering specification and lab_manual.md for the experiment log.
Built with ESM.
- ESM-C 300M weights and the GGUF derivatives produced here are governed by the EvolutionaryScale Cambrian Open License Agreement, subject to the Acceptable Use Policy. The ESMC 300M Model is licensed under the EvolutionaryScale Cambrian Open License Agreement.
- esmc.cpp's own source code (runtime, converter, harnesses, docs) is MIT.
- ggml / llama.cpp (the
ggml/submodule) remains MIT (seeggml/LICENSE).
See LICENSE and NOTICE for the full terms and required attributions.