Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

set NCCL_SHM_DISABLE=1 for test_parallel_executor_profilery.py #28484

Merged
merged 1 commit into from Nov 9, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
Expand Up @@ -19,6 +19,14 @@
import paddle.fluid as fluid
import paddle.fluid.core as core
from paddle.fluid.tests.unittests.test_profiler import TestProfiler
import os

# NCCL 2.7 decides to use shared memory while NCCL 2.6 didn't, hence causing the error.
# include/shm.h:28 NCCL WARN Call to posix_fallocate failed: No space left on device
#
# Set environment variables NCCL_SHM_DISABLE=1 to disables the Shared Memory (SHM) transports
# and force to use P2P which is the default transports way of NCCL2.6.
os.environ['NCCL_SHM_DISABLE'] = str(1)


class TestPEProfiler(TestProfiler):
Expand Down