-
Notifications
You must be signed in to change notification settings - Fork 600
Description
Hi, I’m running into some compilation issues when trying to compiling some of the release branches from source and build a wheel out of them. (attached a reproducible example at the end)
The issues seem to be due to using clang++ as the nvcc host compiler. Release version v2.7 is the last version that I am able to successfully compile with clang++. Releases v2.8 and v2.9 fail with the typical error ninja: build stopped: subcommand failed. I also got a few segmentation faults in between.
Using GNU 13.3.0 as nvcc host compiler works fine, however in my build environment I’m restricted to ideally using clang. I’m using Python 3.10, PyTorch 2.7.1 with CUDA 12.8.1 and clang-18.
Any ideas on what could be the issue, or how I can get more detailed logs on what is causing the ninja failures or segmentation faults? The -Wunused-command-line-argument to my understanding should be noice that is fine to ignore?
Here is part of the stacktrace when setting MAX_JOBS=1
#12 1.996 Re-run cmake no build system arguments
#12 3.186 -- The CUDA compiler identification is NVIDIA 12.8.93 with host compiler Clang 18.1.8
#12 3.344 -- The CXX compiler identification is Clang 18.1.8
#12 3.353 -- Detecting CUDA compiler ABI info
#12 5.849 -- Detecting CUDA compiler ABI info - done
#12 5.918 -- Check for working CUDA compiler: /usr/local/cuda/bin/nvcc - skipped
#12 5.922 -- Detecting CUDA compile features
#12 5.922 -- Detecting CUDA compile features - done
#12 5.941 -- Detecting CXX compiler ABI info
#12 6.059 -- Detecting CXX compiler ABI info - done
#12 6.073 -- Check for working CXX compiler: /usr/bin/clang++ - skipped
#12 6.074 -- Detecting CXX compile features
#12 6.075 -- Detecting CXX compile features - done
#12 6.089 -- Found CUDAToolkit: /usr/local/cuda/targets/x86_64-linux/include (found version "12.8.93")
#12 6.090 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
#12 6.212 -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
#12 6.213 -- Found Threads: TRUE
#12 6.312 -- cudnn found at /usr/lib/x86_64-linux-gnu/libcudnn.so.
#12 6.312 -- Found LIBRARY: /usr/local/lib/python3.10/dist-packages/nvidia/cudnn/include
#12 6.312 -- cuDNN: /usr/lib/x86_64-linux-gnu/libcudnn.so
#12 6.312 -- cuDNN: /usr/local/lib/python3.10/dist-packages/nvidia/cudnn/include
#12 6.314 -- cudnn_graph found at /usr/lib/x86_64-linux-gnu/libcudnn_graph.so.
#12 6.315 -- cudnn_engines_runtime_compiled found at /usr/lib/x86_64-linux-gnu/libcudnn_engines_runtime_compiled.so.
#12 6.317 -- cudnn_ops found at /usr/lib/x86_64-linux-gnu/libcudnn_ops.so.
#12 6.318 -- cudnn_cnn found at /usr/lib/x86_64-linux-gnu/libcudnn_cnn.so.
#12 6.319 -- cudnn_adv found at /usr/lib/x86_64-linux-gnu/libcudnn_adv.so.
#12 6.321 -- cudnn_engines_precompiled found at /usr/lib/x86_64-linux-gnu/libcudnn_engines_precompiled.so.
#12 6.322 -- cudnn_heuristic found at /usr/lib/x86_64-linux-gnu/libcudnn_heuristic.so.
#12 6.419 -- Found Python: /usr/bin/python3.10 (found version "3.10.19") found components: Interpreter Development.Module
#12 6.691 -- Parallel build jobs: max
#12 6.691 -- Threads per parallel build job: 1
#12 6.691 -- Configuring done (4.7s)
#12 6.711 CMake Warning:
#12 6.711 Manually-specified variables were not used by the project:
#12 6.711
#12 6.711 pybind11_DIR
#12 6.711
#12 6.711
#12 6.711 -- Generating done (0.0s)
#12 6.712 -- Build files have been written to: /build/transformer_engine/build/cmake
#12 6.721 Change Dir: '/build/transformer_engine/build/cmake'
#12 6.721
#12 6.721 Run Build Command(s): /usr/local/bin/ninja -v -j 1
#12 146.1 [1/67] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/usr/bin/clang++ -DNV_CUDNN_FRONTEND_USE_DYNAMIC_LOADING -Dtransformer_engine_EXPORTS -I/build/transformer_engine/transformer_engine/common/.. -I/build/transformer_engine/transformer_engine/common/include -I/usr/local/lib/python3.10/dist-packages/nvidia/mathdx/include -I/build/transformer_engine/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/build/transformer_engine/transformer_engine/common/../../3rdparty/cutlass/include -I/build/transformer_engine/transformer_engine/common/../../3rdparty/cutlass/tools/util/include -I/build/transformer_engine/build/cmake/string_headers -isystem /usr/local/cuda/targets/x86_64-linux/include -isystem /usr/local/cuda/targets/x86_64-linux/include/cccl -isystem /usr/local/lib/python3.10/dist-packages/nvidia/cudnn/include -Wl,--version-script=/build/transformer_engine/transformer_engine/common/libtransformer_engine.version --expt-relaxed-constexpr -O3 --threads 1 -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[compute_70,sm_70]" "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" -Xcompiler=-fPIC --generate-code=arch=compute_100a,code=sm_100a --generate-code=arch=compute_120a,code=sm_120a --generate-code=arch=compute_90a,code=sm_90a -g0 -MD -MT CMakeFiles/transformer_engine.dir/gemm/cutlass_grouped_gemm.cu.o -MF CMakeFiles/transformer_engine.dir/gemm/cutlass_grouped_gemm.cu.o.d -x cu -c /build/transformer_engine/transformer_engine/common/gemm/cutlass_grouped_gemm.cu -o CMakeFiles/transformer_engine.dir/gemm/cutlass_grouped_gemm.cu.o
#12 146.1 nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
#12 146.1 clang++: warning: -Wl,--version-script=/build/transformer_engine/transformer_engine/common/libtransformer_engine.version: 'linker' input unused [-Wunused-command-line-argument]
#12 146.1 clang++: warning: -Wl,--version-script=/build/transformer_engine/transformer_engine/common/libtransformer_engine.version: 'linker' input unused [-Wunused-command-line-argument]
#12 146.1 clang++: warning: -Wl,--version-script=/build/transformer_engine/transformer_engine/common/libtransformer_engine.version: 'linker' input unused [-Wunused-command-line-argument]
#12 146.1 clang++: warning: -Wl,--version-script=/build/transformer_engine/transformer_engine/common/libtransformer_engine.version: 'linker' input unused [-Wunused-command-line-argument]
#12 146.1 clang++: warning: -Wl,--version-script=/build/transformer_engine/transformer_engine/common/libtransformer_engine.version: 'linker' input unused [-Wunused-command-line-argument]
#12 146.1 clang++: warning: -Wl,--version-script=/build/transformer_engine/transformer_engine/common/libtransformer_engine.version: 'linker' input unused [-Wunused-command-line-argument]
#12 146.1 clang++: warning: -Wl,--version-script=/build/transformer_engine/transformer_engine/common/libtransformer_engine.version: 'linker' input unused [-Wunused-command-line-argument]
#12 146.1 clang++: warning: -Wl,--version-script=/build/transformer_engine/transformer_engine/common/libtransformer_engine.version: 'linker' input unused [-Wunused-command-line-argument]
#12 146.1 ptxas /tmp/tmpxft_00000090_00000000-7_cutlass_grouped_gemm.compute_120a.ptx, line 4286; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_120a' as this feature is expected to have substantially reduced performance on some future architectures
#12 146.1 ptxas /tmp/tmpxft_00000090_00000000-7_cutlass_grouped_gemm.compute_120a.ptx, line 4308; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_120a' as this feature is expected to have substantially reduced performance on some future architectures
#12 146.1 ptxas /tmp/tmpxft_00000090_00000000-7_cutlass_grouped_gemm.compute_120a.ptx, line 11668; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_120a' as this feature is expected to have substantially reduced performance on some future architectures
#12 146.1 ptxas /tmp/tmpxft_00000090_00000000-7_cutlass_grouped_gemm.compute_120a.ptx, line 11689; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_120a' as this feature is expected to have substantially reduced performance on some future architectures
#12 146.1 ptxas /tmp/tmpxft_00000090_00000000-7_cutlass_grouped_gemm.compute_120a.ptx, line 19047; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_120a' as this feature is expected to have substantially reduced performance on some future architectures
#12 146.1 ptxas /tmp/tmpxft_00000090_00000000-7_cutlass_grouped_gemm.compute_120a.ptx, line 19069; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_120a' as this feature is expected to have substantially reduced performance on some future architectures
#12 146.1 ptxas /tmp/tmpxft_00000090_00000000-7_cutlass_grouped_gemm.compute_120a.ptx, line 26428; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_120a' as this feature is expected to have substantially reduced performance on some future architectures
#12 146.1 ptxas /tmp/tmpxft_00000090_00000000-7_cutlass_grouped_gemm.compute_120a.ptx, line 26450; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_120a' as this feature is expected to have substantially reduced performance on some future architectures
#12 146.1 ptxas /tmp/tmpxft_00000090_00000000-7_cutlass_grouped_gemm.compute_120a.ptx, line 33319; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_120a' as this feature is expected to have substantially reduced performance on some future architectures
#12 146.1 ptxas /tmp/tmpxft_00000090_00000000-7_cutlass_grouped_gemm.compute_120a.ptx, line 33340; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_120a' as this feature is expected to have substantially reduced performance on some future architectures
#12 146.1 ptxas /tmp/tmpxft_00000090_00000000-7_cutlass_grouped_gemm.compute_120a.ptx, line 40207; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_120a' as this feature is expected to have substantially reduced performance on some future architectures
#12 146.1 ptxas /tmp/tmpxft_00000090_00000000-7_cutlass_grouped_gemm.compute_120a.ptx, line 40229; warning : Advisory: '.multicast::cluster' modifier on instruction 'cp.async.bulk{.tensor}' should be used on .target 'sm_90a/sm_100a/sm_101a' instead of .target 'sm_120a' as this feature is expected to have substantially reduced performance on some future architectures
#12 146.1 ptxas info : (C7506) Potential Performance Loss: 'setmaxnreg' ignored to maintain compatibility into 'extern' call.
#12 146.1 ptxas info : (C7506) Potential Performance Loss: 'setmaxnreg' ignored to maintain compatibility into 'extern' call.
#12 146.1 ptxas info : (C7506) Potential Performance Loss: 'setmaxnreg' ignored to maintain compatibility into 'extern' call.
#12 146.1 ptxas info : (C7506) Potential Performance Loss: 'setmaxnreg' ignored to maintain compatibility into 'extern' call.
#12 146.1 ptxas info : (C7506) Potential Performance Loss: 'setmaxnreg' ignored to maintain compatibility into 'extern' call.
#12 146.1 ptxas info : (C7506) Potential Performance Loss: 'setmaxnreg' ignored to maintain compatibility into 'extern' call.
#12 146.1 clang++: warning: -Wl,--version-script=/build/transformer_engine/transformer_engine/common/libtransformer_engine.version: 'linker' input unused [-Wunused-command-line-argument]
#12 340.5 [2/67] /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/usr/bin/clang++ -DNV_CUDNN_FRONTEND_USE_DYNAMIC_LOADING -Dtransformer_engine_EXPORTS -I/build/transformer_engine/transformer_engine/common/.. -I/build/transformer_engine/transformer_engine/common/include -I/usr/local/lib/python3.10/dist-packages/nvidia/mathdx/include -I/build/transformer_engine/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/build/transformer_engine/transformer_engine/common/../../3rdparty/cutlass/include -I/build/transformer_engine/transformer_engine/common/../../3rdparty/cutlass/tools/util/include -I/build/transformer_engine/build/cmake/string_headers -isystem /usr/local/cuda/targets/x86_64-linux/include -isystem /usr/local/cuda/targets/x86_64-linux/include/cccl -isystem /usr/local/lib/python3.10/dist-packages/nvidia/cudnn/include -Wl,--version-script=/build/transformer_engine/transformer_engine/common/libtransformer_engine.version --expt-relaxed-constexpr -O3 --threads 1 -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[compute_70,sm_70]" "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" -Xcompiler=-fPIC --generate-code=arch=compute_100a,code=sm_100a --generate-code=arch=compute_120a,code=sm_120a -MD -MT CMakeFiles/transformer_engine.dir/util/cast.cu.o -MF CMakeFiles/transformer_engine.dir/util/cast.cu.o.d -x cu -c /build/transformer_engine/transformer_engine/common/util/cast.cu -o CMakeFiles/transformer_engine.dir/util/cast.cu.o
#12 340.5 FAILED: [code=139] CMakeFiles/transformer_engine.dir/util/cast.cu.o
#12 340.5 /usr/local/cuda/bin/nvcc -forward-unknown-to-host-compiler -ccbin=/usr/bin/clang++ -DNV_CUDNN_FRONTEND_USE_DYNAMIC_LOADING -Dtransformer_engine_EXPORTS -I/build/transformer_engine/transformer_engine/common/.. -I/build/transformer_engine/transformer_engine/common/include -I/usr/local/lib/python3.10/dist-packages/nvidia/mathdx/include -I/build/transformer_engine/transformer_engine/common/../../3rdparty/cudnn-frontend/include -I/build/transformer_engine/transformer_engine/common/../../3rdparty/cutlass/include -I/build/transformer_engine/transformer_engine/common/../../3rdparty/cutlass/tools/util/include -I/build/transformer_engine/build/cmake/string_headers -isystem /usr/local/cuda/targets/x86_64-linux/include -isystem /usr/local/cuda/targets/x86_64-linux/include/cccl -isystem /usr/local/lib/python3.10/dist-packages/nvidia/cudnn/include -Wl,--version-script=/build/transformer_engine/transformer_engine/common/libtransformer_engine.version --expt-relaxed-constexpr -O3 --threads 1 -O3 -DNDEBUG -std=c++17 "--generate-code=arch=compute_70,code=[compute_70,sm_70]" "--generate-code=arch=compute_80,code=[compute_80,sm_80]" "--generate-code=arch=compute_89,code=[compute_89,sm_89]" "--generate-code=arch=compute_90,code=[compute_90,sm_90]" -Xcompiler=-fPIC --generate-code=arch=compute_100a,code=sm_100a --generate-code=arch=compute_120a,code=sm_120a -MD -MT CMakeFiles/transformer_engine.dir/util/cast.cu.o -MF CMakeFiles/transformer_engine.dir/util/cast.cu.o.d -x cu -c /build/transformer_engine/transformer_engine/common/util/cast.cu -o CMakeFiles/transformer_engine.dir/util/cast.cu.o
#12 340.5 nvcc warning : Support for offline compilation for architectures prior to '<compute/sm/lto>_75' will be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning).
#12 340.5 clang++: warning: -Wl,--version-script=/build/transformer_engine/transformer_engine/common/libtransformer_engine.version: 'linker' input unused [-Wunused-command-line-argument]
#12 340.5 clang++: warning: -Wl,--version-script=/build/transformer_engine/transformer_engine/common/libtransformer_engine.version: 'linker' input unused [-Wunused-command-line-argument]
#12 340.5 clang++: warning: -Wl,--version-script=/build/transformer_engine/transformer_engine/common/libtransformer_engine.version: 'linker' input unused [-Wunused-command-line-argument]
#12 340.5 clang++: warning: -Wl,--version-script=/build/transformer_engine/transformer_engine/common/libtransformer_engine.version: 'linker' input unused [-Wunused-command-line-argument]
#12 340.5 clang++: warning: -Wl,--version-script=/build/transformer_engine/transformer_engine/common/libtransformer_engine.version: 'linker' input unused [-Wunused-command-line-argument]
#12 340.5 clang++: warning: -Wl,--version-script=/build/transformer_engine/transformer_engine/common/libtransformer_engine.version: 'linker' input unused [-Wunused-command-line-argument]
#12 340.5 clang++: warning: -Wl,--version-script=/build/transformer_engine/transformer_engine/common/libtransformer_engine.version: 'linker' input unused [-Wunused-command-line-argument]
#12 340.5 Segmentation fault (core dumped)
#12 340.5 ninja: build stopped: subcommand failed.
#12 340.5
#12 340.5 Traceback (most recent call last):
#12 340.5 File "/build/transformer_engine/build_tools/build_ext.py", line 89, in _build_cmake
#12 340.5 Building CMake extension transformer_engine
#12 340.5 Running command /usr/local/lib/python3.10/dist-packages/cmake/data/bin/cmake -S /build/transformer_engine/transformer_engine/common -B /build/transformer_engine/build/cmake -DPython_EXECUTABLE=/usr/bin/python3.10 -DPython_INCLUDE_DIR=/usr/include/python3.10 -DPython_SITEARCH=/usr/local/lib/python3.10/dist-packages -DCMAKE_BUILD_TYPE=Release -DCMAKE_INSTALL_PREFIX=/build/transformer_engine/build/lib.linux-x86_64-cpython-310 -DCMAKE_CUDA_ARCHITECTURES=70;80;89;90;100;120 -Dpybind11_DIR=/usr/local/lib/python3.10/dist-packages/pybind11/share/cmake/pybind11 -GNinja
#12 340.5 Running command /usr/local/lib/python3.10/dist-packages/cmake/data/bin/cmake --build /build/transformer_engine/build/cmake --verbose --parallel 1
#12 340.5 subprocess.run(command, cwd=build_dir, check=True)
#12 340.5 File "/usr/lib/python3.10/subprocess.py", line 526, in run
#12 340.5 raise CalledProcessError(retcode, process.args,
#12 340.5 subprocess.CalledProcessError: Command '['/usr/local/lib/python3.10/dist-packages/cmake/data/bin/cmake', '--build', '/build/transformer_engine/build/cmake', '--verbose', '--parallel', '1']' returned non-zero exit status 139.
#12 340.5
#12 340.5 During handling of the above exception, another exception occurred:
#12 340.5
#12 340.5 Traceback (most recent call last):
#12 340.5 File "/build/transformer_engine/setup.py", line 181, in <module>
#12 340.5 setuptools.setup(
#12 340.5 File "/usr/local/lib/python3.10/dist-packages/setuptools/__init__.py", line 115, in setup
#12 340.5 return distutils.core.setup(**attrs)
#12 340.5 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 186, in setup
#12 340.5 return run_commands(dist)
#12 340.5 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/core.py", line 202, in run_commands
#12 340.5 dist.run_commands()
#12 340.5 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 1002, in run_commands
#12 340.5 self.run_command(cmd)
#12 340.5 File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1102, in run_command
#12 340.5 super().run_command(command)
#12 340.5 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 1021, in run_command
#12 340.5 cmd_obj.run()
#12 340.5 File "/build/transformer_engine/setup.py", line 49, in run
#12 340.5 super().run()
#12 340.5 File "/usr/local/lib/python3.10/dist-packages/setuptools/command/bdist_wheel.py", line 370, in run
#12 340.5 self.run_command("build")
#12 340.5 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 357, in run_command
#12 340.5 self.distribution.run_command(command)
#12 340.5 File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1102, in run_command
#12 340.5 super().run_command(command)
#12 340.5 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 1021, in run_command
#12 340.5 cmd_obj.run()
#12 340.5 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/command/build.py", line 135, in run
#12 340.5 self.run_command(cmd_name)
#12 340.5 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py", line 357, in run_command
#12 340.5 self.distribution.run_command(command)
#12 340.5 File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1102, in run_command
#12 340.5 super().run_command(command)
#12 340.5 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 1021, in run_command
#12 340.5 cmd_obj.run()
#12 340.5 File "/build/transformer_engine/build_tools/build_ext.py", line 121, in run
#12 340.5 ext._build_cmake(
#12 340.5 File "/build/transformer_engine/build_tools/build_ext.py", line 91, in _build_cmake
#12 340.5 raise RuntimeError(f"Error when running CMake: {e}")
#12 340.5 RuntimeError: Error when running CMake: Command '['/usr/local/lib/python3.10/dist-packages/cmake/data/bin/cmake', '--build', '/build/transformer_engine/build/cmake', '--verbose', '--parallel', '1']' returned non-zero exit status 139.
#12 ERROR: process "/bin/sh -c MAX_JOBS=1 python3.10 setup.py bdist_wheel" did not complete successfully: exit code: 1
------
> [9/9] RUN MAX_JOBS=1 python3.10 setup.py bdist_wheel:
340.5 self.distribution.run_command(command)
340.5 File "/usr/local/lib/python3.10/dist-packages/setuptools/dist.py", line 1102, in run_command
340.5 super().run_command(command)
340.5 File "/usr/local/lib/python3.10/dist-packages/setuptools/_distutils/dist.py", line 1021, in run_command
340.5 cmd_obj.run()
340.5 File "/build/transformer_engine/build_tools/build_ext.py", line 121, in run
340.5 ext._build_cmake(
340.5 File "/build/transformer_engine/build_tools/build_ext.py", line 91, in _build_cmake
340.5 raise RuntimeError(f"Error when running CMake: {e}")
340.5 RuntimeError: Error when running CMake: Command '['/usr/local/lib/python3.10/dist-packages/cmake/data/bin/cmake', '--build', '/build/transformer_engine/build/cmake', '--verbose', '--parallel', '1']' returned non-zero exit status 139.
------
Dockerfile:73
--------------------
71 | ENV NVCC_VERBOSE=1
72 |
73 | >>> RUN MAX_JOBS=1 python3.10 setup.py bdist_wheel
74 |
--------------------
ERROR: failed to solve: process "/bin/sh -c MAX_JOBS=1 python3.10 setup.py bdist_wheel" did not complete successfully: exit code: 1
Below is a reproducible example. I’m building the docker image with
DOCKER_BUILDKIT=1 docker build --progress=plain -t te29:pt271-cu128-clang18 -f Dockerfile .FROM nvidia/cuda:12.8.1-devel-ubuntu24.04
RUN apt-get update && apt-get install -y \
libcudnn9-dev-cuda-12=9.8.0.87-1 \
libcudnn9-cuda-12=9.8.0.87-1 \
ninja-build \
build-essential \
software-properties-common \
curl \
git \
&& apt-get autoremove \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
# --- Install Python 3.10 from apt (Deadsnakes) ---
RUN add-apt-repository -y ppa:deadsnakes/ppa \
&& apt-get update \
&& apt-get install -y \
python3.10 \
python3.10-venv \
python3.10-distutils \
python3.10-dev \
&& python3.10 -m ensurepip --upgrade \
&& python3.10 -m pip install --upgrade pip setuptools wheel \
&& rm -rf /var/lib/apt/lists/*
ARG LLVM_VERSION=18
RUN curl -sLO https://apt.llvm.org/llvm-snapshot.gpg.key \
&& apt-key add llvm-snapshot.gpg.key \
# for UBUNTU_CODENAME
&& . /etc/os-release \
&& echo "deb http://apt.llvm.org/$UBUNTU_CODENAME/ llvm-toolchain-$UBUNTU_CODENAME-$LLVM_VERSION main" > /etc/apt/sources.list.d/clang-$LLVM_VERSION.list \
&& apt-get update -q \
&& apt-get install -y clang-$LLVM_VERSION libomp-$LLVM_VERSION-dev \
&& update-alternatives --install /usr/bin/clang clang /usr/bin/clang-$LLVM_VERSION 100 \
&& update-alternatives --install /usr/bin/clang++ clang++ /usr/bin/clang++-$LLVM_VERSION 100 \
&& update-alternatives --set clang /usr/bin/clang-$LLVM_VERSION \
&& update-alternatives --set clang++ /usr/bin/clang++-$LLVM_VERSION \
# Clean up
&& apt-get autoremove \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
ENV CC="/usr/bin/clang"
ENV CXX="/usr/bin/clang++"
# using g++ as host compiler here works fine
ENV CUDAHOSTCXX="/usr/bin/clang++"
RUN mkdir -p /build/transformer_engine && cd /build/transformer_engine \
&& git clone --recursive https://github.com/NVIDIA/TransformerEngine.git -b v2.9 . \
&& git submodule update --init --recursive
WORKDIR /build/transformer_engine
RUN python3.10 -m pip install cmake pybind11[global] ninja setuptools wheel nvidia-mathdx==25.1.1
RUN python3.10 -m pip install torch==2.7.1 --index-url https://download.pytorch.org/whl/cu128
ENV NVTE_RELEASE_BUILD=1
ENV NVTE_FRAMEWORK=pytorch
ENV PATH="/usr/local/cuda/bin:${PATH}"
ENV LD_LIBRARY_PATH="/usr/local/cuda/lib64:${LD_LIBRARY_PATH}"
ENV CUDA_HOME=/usr/local/cuda
ENV CUDA_ROOT=/usr/local/cuda
ENV CUDA_PATH=/usr/local/cuda
ENV CUDADIR=/usr/local/cuda
ENV VERBOSE=1
ENV CMAKE_VERBOSE_MAKEFILE=ON
ENV NVCC_VERBOSE=1
RUN MAX_JOBS=1 python3.10 setup.py bdist_wheel