Fix GPU memory leak from missing .detach() in model wrappers by reillyosadchey · Pull Request #491 · TorchSim/torch-sim

reillyosadchey · 2026-03-04T20:55:50Z

Summary

Several model wrappers return tensors still attached to the autograd computation graph, causing the entire forward-pass graph to be retained in GPU memory across simulation steps
Added .detach() calls to model outputs in fairchem, orb, metatomic, fairchem_legacy, and graphpes_framework
The fairchem_legacy wrapper also used .clone() without .detach() on inputs, retaining graph references via self.data_object

Details

Model	Issue	Severity
`fairchem.py`	energy/forces/stress not detached	High
`orb.py`	predictions + conservative forces/stress not detached	High
`metatomic.py`	forces/stress not detached (energy was correct)	Medium
`fairchem_legacy.py`	`.clone()` without `.detach()` on inputs	Medium
`graphpes_framework.py`	external library predictions not detached	Low

Without .detach(), .to(dtype=...) and .view() create new graph nodes that still reference the full forward pass. Integrators store these into state.energy/state.forces/state.stress, keeping the graph alive until the next step overwrites them — and longer if trajectory loggers or MC routines .clone() the state.

Test plan

Existing tests pass (no functional change — detach only affects graph retention)
Verify GPU memory stays flat over long simulations with FairChem/Orb models

Several model wrappers were returning tensors still attached to the computation graph, causing the entire forward-pass graph to be retained in memory across simulation steps. - fairchem: detach energy, forces, stress predictions - orb: detach prediction outputs and conservative forces/stress - metatomic: detach forces and stress (energy was already detached) - fairchem_legacy: use detach().clone() on inputs to prevent graph retention via self.data_object - graphpes_framework: detach predictions from external library

CompRhys · 2026-03-04T21:29:19Z

Thank you for the PR I will merge it once tests and lint pass.

It is worth noting that we are attempting to move all the models you fixed this for to external posture. These fixes may also need to be applied upstream.

In the pair potential backend refactor I am doing I also included a retrain graph argument to allow for diff-sim tutorial. I think that's a very niche use so wouldn't add the flag here but just an FYI

CompRhys · 2026-03-04T22:12:13Z

torch_sim/models/graphpes_framework.py

        atomic_graph = state_to_atomic_graph(state, cutoff)
-        return self._gp_model.predict(atomic_graph, self._properties)  # type: ignore[return-value]
+        preds = self._gp_model.predict(atomic_graph, self._properties)  # ty: ignore[call-non-callable]
+        return {k: v.detach() for k, v in preds.items()}


It feels cleaner to be to detatch everything like this all in one line at the end? Could you update all the models to follow this pattern?

if it's already detached then the op is idempotent and so no harm.

CompRhys · 2026-03-04T22:13:20Z

torch_sim/models/graphpes_framework.py

        atomic_graph = state_to_atomic_graph(state, cutoff)
-        return self._gp_model.predict(atomic_graph, self._properties)  # type: ignore[return-value]
+        preds = self._gp_model.predict(atomic_graph, self._properties)  # ty: ignore[call-non-callable]
+        return {k: v.detach() for k, v in preds.items()}


if it's already detached then the op is idempotent and so no harm.

Move .detach() calls to a single return statement in each model's forward method instead of detaching inline at each assignment.

CompRhys · 2026-03-04T22:24:26Z

Thanks!

reillyosadchey added 2 commits March 4, 2026 15:50

Merge branch 'main' into fix/detach-model-outputs-memory-leak

a3e9152

Fix lint: update type suppression and remove broken codespell comment

02aceff

CompRhys reviewed Mar 4, 2026

View reviewed changes

CompRhys approved these changes Mar 4, 2026

View reviewed changes

Refactor: unify detach pattern across all model wrappers

6b83fad

Move .detach() calls to a single return statement in each model's forward method instead of detaching inline at each assignment.

CompRhys enabled auto-merge (squash) March 4, 2026 22:24

CompRhys disabled auto-merge March 4, 2026 22:59

CompRhys merged commit 7f85ec4 into TorchSim:main Mar 4, 2026
64 of 66 checks passed

CompRhys mentioned this pull request Mar 5, 2026

Check if models are detached #494

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix GPU memory leak from missing .detach() in model wrappers#491

Fix GPU memory leak from missing .detach() in model wrappers#491
CompRhys merged 4 commits intoTorchSim:mainfrom
reillyosadchey:fix/detach-model-outputs-memory-leak

reillyosadchey commented Mar 4, 2026

Uh oh!

CompRhys commented Mar 4, 2026

Uh oh!

CompRhys Mar 4, 2026

Uh oh!

CompRhys Mar 4, 2026

Uh oh!

reillyosadchey Mar 4, 2026

Uh oh!

CompRhys Mar 4, 2026

Uh oh!

CompRhys commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

reillyosadchey commented Mar 4, 2026

Summary

Details

Test plan

Uh oh!

CompRhys commented Mar 4, 2026

Uh oh!

CompRhys Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

CompRhys Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

reillyosadchey Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

CompRhys Mar 4, 2026

Choose a reason for hiding this comment

Uh oh!

CompRhys commented Mar 4, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants