Skip to content

Fix CUDA IPC cache leaks during weight updates#1731

Merged
zhuzilin merged 1 commit intomainfrom
memory_opt
Mar 17, 2026
Merged

Fix CUDA IPC cache leaks during weight updates#1731
zhuzilin merged 1 commit intomainfrom
memory_opt

Conversation

@zhuzilin
Copy link
Copy Markdown
Contributor

Root cause: ForkingPickler calls storage._share_cuda_() on GPU tensors, creating permanent entries in the CUDA IPC cache that hold strong references to GPU memory. These entries are only released when torch.cuda.ipc_collect() detects the consumer has closed its IPC handle.

Fix (in update_weight_from_tensor.py):

  1. del hf_named_tensors added alongside long_lived_tensors to break chunk overlap
  2. torch.cuda.ipc_collect() after each chunk's ray.get() + del — releases IPC cache entries for completed chunks
  3. torch.cuda.ipc_collect() after the post-loop barrier — releases the last chunk's IPC entries for non-source ranks (which don't wait for ray.get())

**Root cause:** `ForkingPickler` calls `storage._share_cuda_()` on GPU tensors, creating permanent entries in the CUDA IPC cache that hold strong references to GPU memory. These entries are only released when `torch.cuda.ipc_collect()` detects the consumer has closed
its IPC handle.

**Fix (in `update_weight_from_tensor.py`):**
1. `del hf_named_tensors` added alongside `long_lived_tensors` to break chunk overlap
2. `torch.cuda.ipc_collect()` after each chunk's `ray.get()` + `del` — releases IPC cache entries for completed chunks
3. `torch.cuda.ipc_collect()` after the post-loop barrier — releases the last chunk's IPC entries for non-source ranks (which don't wait for `ray.get()`)
@zhuzilin zhuzilin merged commit 183e525 into main Mar 17, 2026
1 check passed
@zhuzilin zhuzilin deleted the memory_opt branch March 17, 2026 02:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant