[cleanup] Remove NCCL_CUMEM_ENABLE=0 from `prepare_runtime_environment` by erictang000 · Pull Request #1600 · NovaSky-AI/SkyRL

erictang000 · 2026-04-30T20:39:23Z

We previously had the following snippet in prepare_runtime_environment

# NOTE (charlie): See https://github.com/vllm-project/vllm/blob/c6b0a7d3ba03ca414be1174e9bd86a97191b7090/vllm/worker/worker_base.py#L445
# and https://docs.vllm.ai/en/v0.9.2/usage/troubleshooting.html?h=nccl_cumem_enable#known-issues
    if cfg.generator.inference_engine.weight_sync_backend == "nccl":
        env_vars["NCCL_CUMEM_ENABLE"] = "0"

The NCCL bug that required this was resolved in NCCL 2.22.3, and this override was removed from vLLM:

NCCL: NVIDIA/nccl#1234
vLLM: vllm-project/vllm#24141

Since the resolved NCCL version is shipped with PyTorch, and we are pinned to 2.10.0 (NCCL 2.26.2), it seems safe to remove this env var for older NCCL versions.

In fact, Nemo-RL actually sets this env-var to 1 (link).

Verified that GSM8K still works with this flag removed both colocated and non-colocated, and with vllm tp=2

We see that max gpu memory utilization is also slightly lower with this env var removed, as it enables newer NCCL version memory optimizations (as mentioned in vllm-project/vllm#24141):

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 1 additional finding.

gemini-code-assist

Code Review

This pull request removes the logic that sets the NCCL_CUMEM_ENABLE environment variable to '0' when the NCCL weight synchronization backend is used during runtime environment preparation. I have no feedback to provide as there are no review comments.

x

7d28afa

devin-ai-integration Bot reviewed Apr 30, 2026

View reviewed changes

SumanthRH approved these changes Apr 30, 2026

View reviewed changes

SumanthRH merged commit f6a61a8 into NovaSky-AI:main Apr 30, 2026
5 of 6 checks passed

gemini-code-assist Bot reviewed Apr 30, 2026

View reviewed changes

erictang000 mentioned this pull request May 7, 2026

[vllm] Investigate whether NCCL_CUMEM_ENABLE should be set to 0 #1630

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[cleanup] Remove NCCL_CUMEM_ENABLE=0 from `prepare_runtime_environment` #1600

[cleanup] Remove NCCL_CUMEM_ENABLE=0 from `prepare_runtime_environment` #1600
SumanthRH merged 1 commit into
NovaSky-AI:mainfrom
erictang000:remove_nccl_cumem_enable

erictang000 commented Apr 30, 2026 •

edited by devin-ai-integration Bot

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

erictang000 commented Apr 30, 2026 • edited by devin-ai-integration Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

erictang000 commented Apr 30, 2026 •

edited by devin-ai-integration Bot

Loading