UMA-Inverse is a LigandMPNN-style inverse-folding project that mirrors the UMA-Fold repository layout but swaps sparse message passing for a dense invariant PairMixer encoder.
- Comparable data protocol: uses LigandMPNN parsing and split JSON files.
- Dense geometry reasoning: replaces KNN-only neighborhood updates with triangle multiplication over pair features.
- Single-GPU friendly path: bf16 mixed precision, gradient checkpointing, and curriculum neighborhood cropping controls.
- Invariant-first design: uses distances and angle-derived local features, avoiding SE(3)-heavy kernels for v1.
- Parse protein-ligand structure with LigandMPNN
parse_PDBandfeaturize. - Build node features:
- Protein residues: backbone dihedral sin/cos + chain id embedding
- Ligand atoms: element one-hot embedding
- Build dense pair tensor
Z_ijfrom:- node_i projection
- node_j projection
- RBF(distance(i, j))
- Run stacked PairMixer blocks:
- triangle multiplication outgoing
- triangle multiplication incoming
- transition MLP
- Decode residue logits (20 AA + X) with ligand-aware context.
UMA-Inverse/
├── configs/
├── data/
├── logs/
├── scripts/
├── src/
└── tests/
Please refer to ORDER_OF_OPERATIONS.md for the exact run sequence.
cd UMA-Inverse
# 1. Fetch & Preprocess
sbatch scripts/SLURM/01a_fetch_data.sh
sbatch scripts/SLURM/01b_preprocess.sh
# 2. Pilot / Overfit Test
sbatch scripts/SLURM/02_pilot_run.sh
# 3. Curriculum Training
sbatch scripts/SLURM/03_train_model.sh
# 4. Inference (design sequences)
uv run uma-inverse design --pdb my.pdb --ckpt checkpoints/last.ckpt --num-samples 10The uma-inverse CLI exposes three subcommands. See
docs/inference.md and docs/benchmarks.md
for the full references.
# Design new sequences for a PDB
uv run uma-inverse design --pdb my.pdb --ckpt checkpoints/last.ckpt \
--num-samples 10 --temperature 0.1 --top-p 0.95 --seed 42 \
--fix "A1 A2 A3" --bias "W:3.0"
# Batch mode with a JSON spec and crash recovery
uv run uma-inverse design --pdb-list spec.json --ckpt ckpt.ckpt \
--out-dir outputs/screen --num-samples 5 --resume
# Score native or user sequence
uv run uma-inverse score --pdb my.pdb --ckpt checkpoints/last.ckpt \
--mode autoregressive --num-batches 10
# Full benchmark suite for a trained checkpoint (writes paper-ready tables + figures)
uv run uma-inverse benchmark --ckpt checkpoints/uma-inverse-best.ckpt \
--val-json LigandMPNN/training/valid.json --pdb-dir data/raw/pdb_archive \
--n-pdbs 500precision: bf16-mixedbatch_size: 1max_total_nodes: 384gradient_checkpointing: true
- This is an educational implementation for architecture experimentation.
- v1 focuses on sequence prediction quality and training stability.
- Sidechain coordinate prediction can be added as a v2 auxiliary head.