Add GPU node exclusion, runtime cap, and unified memory for inference by DimaMolod · Pull Request #42 · KosinskiLab/AlphaPulldownSnakemake

DimaMolod · 2026-05-20T13:57:19Z

Summary

Makes structure_inference robust to two real cluster failure modes seen in
production: jobs landing on GPUs the prediction container can't use, and large
complexes exhausting GPU VRAM.

Three new (backward-compatible) config options, all read by the existing
structure_inference rule:

slurm_exclude_nodes — comma-separated node list passed straight to
sbatch --exclude (e.g. "gpu50,gpu51,gpu52,gpu53"). Use it to skip nodes
whose GPU the container can't compile for — e.g. a CUDA compute capability
newer than the bundled ptxas, which fails with ptxas too old /
UNIMPLEMENTED. --constraint/--gres are managed by the Slurm executor
plugin (and forbidden inside slurm_extra), so --exclude is the supported
way to drop a few incompatible nodes while keeping the rest of the partition.
It is a Slurm resource, not rule code, so it does not trigger reruns of
already-finished predictions.
structure_inference_max_runtime (default 10080 = 7 days) — caps the
per-job wall time. Wall time scales with the retry attempt (1440 * attempt
minutes); without a cap, enough retries request more time than the partition
MaxTime and SLURM rejects the job with Requested time limit is invalid.
structure_inference_unified_memory (default true) +
structure_inference_xla_mem_fraction (default 3.2) — export the
DeepMind-recommended
JAX/XLA unified-memory env so inference spills GPU VRAM into host RAM instead
of OOM-ing (RESOURCE_EXHAUSTED / bfc_allocator ran out of memory):
```
export TF_FORCE_UNIFIED_MEMORY=true
export XLA_PYTHON_CLIENT_PREALLOCATE=false
export XLA_CLIENT_MEM_FRACTION=3.2
```
Set structure_inference_unified_memory: false to fail fast instead. When
disabled the env string is empty, so the rule's shell is byte-identical to
before.

Notes

Pair slurm_exclude_nodes with structure_inference_gpu_model to both
restrict to a model and exclude bad nodes.
Because unified memory slows down when actually spilling, give the job enough
host RAM via structure_inference_ram_bytes.

Testing

snakemake --list and --dry-run parse cleanly with the modified Snakefile.
--dry-run -p confirms the unified-memory exports appear in the
structure_inference shell, and that structure_inference_unified_memory: false removes them.
Verified slurm_exclude_nodes produces slurm_extra=--exclude=<nodes> on the
inference jobs and SLURM normalizes it to the expected ExcNodeList.
Unified-memory env confirmed against AlphaFold 3's own docs/performance.md.

🤖 Generated with Claude Code

structure_inference jobs can now avoid unsuitable GPUs and survive large complexes that exceed VRAM: - slurm_exclude_nodes: comma-separated nodes passed to sbatch --exclude, to skip GPUs the prediction container cannot use (e.g. a CUDA compute capability newer than the bundled ptxas, which fails "ptxas too old" / UNIMPLEMENTED). It is a Slurm resource, not rule code, so it does not trigger reruns of finished predictions. --constraint/--gres are managed by the plugin (and forbidden in slurm_extra), so --exclude is the supported way to drop specific nodes. - structure_inference_max_runtime: cap wall time so retry scaling (1440 * attempt minutes) cannot exceed the partition MaxTime and produce "Requested time limit is invalid" sbatch failures. Default 10080 (7 days). - structure_inference_unified_memory (default true): export the DeepMind-recommended JAX/XLA unified-memory env (TF_FORCE_UNIFIED_MEMORY, XLA_PYTHON_CLIENT_PREALLOCATE=false, XLA_CLIENT_MEM_FRACTION) so inference spills GPU VRAM into host RAM instead of OOM-ing. Toggle off to fail fast; tune via structure_inference_xla_mem_fraction. Documented in config/config.yaml and README. The unified-memory env is empty when disabled, so the rule code is unchanged in that case. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ble details - config.yaml: note that *_ram_bytes values are in MB (used as SLURM --mem), not bytes — 64000 = ~64 GB. - README: keep the SLURM section minimal; move the GPU-exclude/runtime-cap and unified-memory explanations into <details> blocks so non-expert users are not overwhelmed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

DimaMolod · 2026-05-20T15:01:01Z

Empirical validation: unified memory resolves a real OOM

Tested end-to-end on a pair that genuinely OOM'd before: O00194+Q9ULV0
(2066 tokens, ~25 GiB) had failed with RESOURCE_EXHAUSTED on 24 GB RTX 3090s.
Re-ran it forced back onto a 3090 (structure_inference_gpu_model: "3090")
with structure_inference_unified_memory: true, exercising this branch's
Snakefile shell (not an external env):

Node: gpu35 (RTX 3090, 24 GB), jobid 54337038
Result: completed successfully — model.cif, ranking_scores.csv,
completed_fold.txt written, no RESOURCE_EXHAUSTED.
Timing: inference 16:38→16:58 (~20 min vs the usual few minutes) — the
expected host-paging slowdown when actually spilling, i.e. the documented
speed/robustness trade-off.

Toggling structure_inference_unified_memory: false removes the exports (verified
via --dry-run -p), so the behaviour is opt-out.

…U VRAM) structure_inference_xla_mem_fraction now defaults to "auto": instead of a fixed 3.2, the per-job ceiling is computed in the inference shell as (allocated host RAM, the SLURM --mem value) / (physical GPU VRAM read via nvidia-smi once the job lands on a node). This keeps XLA's unified-memory ceiling within the SLURM allocation so it cannot oversubscribe host RAM past what the job requested and get OOM-killed -- the EMBL run_AF_multimer.sh convention. The GPU VRAM is only known at run time and the SLURM executor exposes no per-job env hook (it passes the submit env through --export=ALL, which is the same for every job), so the value must be computed in the job shell; doing it inside the container also avoids apptainer env-crossing. Falls back to 3.2 if nvidia-smi is unavailable. XLA_PYTHON_CLIENT_PREALLOCATE=false is kept (without it XLA grabs a large VRAM slice up front, defeating on-demand spill). Pin a number to override. Also exports XLA_PYTHON_CLIENT_MEM_FRACTION alongside XLA_CLIENT_MEM_FRACTION so the JAX/AF2 path honors the same ceiling. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

DimaMolod · 2026-05-21T08:01:21Z

Follow-up commit: structure_inference_xla_mem_fraction now defaults to auto (was a fixed 3.2).

When auto, the fraction is computed per job in the inference shell as (allocated host RAM, the SLURM --mem value) / (physical GPU VRAM read via nvidia-smi once the job lands). This keeps XLA's unified-memory ceiling within the SLURM allocation so it can't oversubscribe host RAM past what the job requested and get OOM-killed — the EMBL run_AF_multimer.sh convention. Falls back to 3.2 if nvidia-smi is unavailable; pin a number to override.

Notes:

Computed in the job shell because the value depends on which GPU the job lands on (only known at run time) and the SLURM executor exposes no per-job env hook (it passes the submit env through --export=ALL, same for every job). Doing it inside the container also avoids apptainer env-crossing.
XLA_PYTHON_CLIENT_PREALLOCATE=false is kept (without it XLA grabs a large VRAM slice up front, defeating on-demand spill). Also now exports XLA_PYTHON_CLIENT_MEM_FRACTION alongside XLA_CLIENT_MEM_FRACTION so the JAX/AF2 path honors the same ceiling.
Verified via snakemake -n -p: the rendered shell shows auto → nvidia-smi + awk division, a pinned number → direct export, and disabled → no exports.

Keep only the code-reader rationale (why the fraction is resolved at run time, why --exclude doesn't trigger reruns); the user-facing "what each option does" already lives in config.yaml and README. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

DimaMolod and others added 2 commits May 20, 2026 15:57

DimaMolod merged commit bd962ae into main May 21, 2026

DimaMolod deleted the feature/gpu-exclude-unified-memory branch May 21, 2026 08:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GPU node exclusion, runtime cap, and unified memory for inference#42

Add GPU node exclusion, runtime cap, and unified memory for inference#42
DimaMolod merged 4 commits into
mainfrom
feature/gpu-exclude-unified-memory

DimaMolod commented May 20, 2026

Uh oh!

DimaMolod commented May 20, 2026

Uh oh!

DimaMolod commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DimaMolod commented May 20, 2026

Summary

Notes

Testing

Uh oh!

DimaMolod commented May 20, 2026

Empirical validation: unified memory resolves a real OOM

Uh oh!

DimaMolod commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant