Describe the bug
pip install transformer_engine[pytorch] fails w/ the following error when building a Docker image (header missing in multiple steps):
Building wheels for collected packages: transformer_engine_torch
Building wheel for transformer_engine_torch (pyproject.toml) ... error
error: subprocess-exited-with-error
× Building wheel for transformer_engine_torch (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [279 lines of output]
/tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml # type: ignore[import]
/tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/_subclasses/functional_tensor.py:279: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
cpu = _conversion_method_template(device=torch.device("cpu"))
/tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/setuptools/_distutils/dist.py:289: UserWarning: Unknown distribution option: 'tests_require'
warnings.warn(msg)
running bdist_wheel
running build
running build_ext
building 'transformer_engine_torch' extension
creating /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/build/temp.linux-aarch64-cpython-312/csrc
creating /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/build/temp.linux-aarch64-cpython-312/csrc/extensions
creating /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/build/temp.linux-aarch64-cpython-312/csrc/extensions/multi_tensor
[1/27] c++ -MMD -MF /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/build/temp.linux-aarch64-cpython-312/csrc/extensions/apply_rope.o.d -fno-strict-overflow -Wsign-compare -DNDEBUG -g -O2 -Wall -fPIC -I/usr/local/cuda/include -I/tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/common_headers -I/tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/common_headers/common -I/tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/common_headers/common/include -I/tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/csrc -I/tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/include -I/tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/opt/venv/include -I/usr/include/python3.12 -c -c /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/csrc/extensions/apply_rope.cpp -o /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/build/temp.linux-aarch64-cpython-312/csrc/extensions/apply_rope.o -O3 -fvisibility=hidden -g0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1018"' -DTORCH_EXTENSION_NAME=transformer_engine_torch -std=c++17
FAILED: [code=1] /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/build/temp.linux-aarch64-cpython-312/csrc/extensions/apply_rope.o
c++ -MMD -MF /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/build/temp.linux-aarch64-cpython-312/csrc/extensions/apply_rope.o.d -fno-strict-overflow -Wsign-compare -DNDEBUG -g -O2 -Wall -fPIC -I/usr/local/cuda/include -I/tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/common_headers -I/tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/common_headers/common -I/tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/common_headers/common/include -I/tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/csrc -I/tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/include -I/tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/opt/venv/include -I/usr/include/python3.12 -c -c /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/csrc/extensions/apply_rope.cpp -o /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/build/temp.linux-aarch64-cpython-312/csrc/extensions/apply_rope.o -O3 -fvisibility=hidden -g0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1018"' -DTORCH_EXTENSION_NAME=transformer_engine_torch -std=c++17
In file included from /tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/include/c10/cuda/CUDADeviceAssertionHost.h:3,
from /tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/include/c10/cuda/CUDAException.h:3,
from /tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/include/c10/cuda/CUDAFunctions.h:12,
from /tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/include/ATen/cuda/CUDAContextLight.h:28,
from /tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/include/ATen/cuda/CUDAContext.h:3,
from /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/csrc/extensions/../common.h:12,
from /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/csrc/extensions/../extensions.h:12,
from /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/csrc/extensions/apply_rope.cpp:7:
/tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/include/c10/cuda/CUDAMacros.h:8:10: fatal error: c10/cuda/impl/cuda_cmake_macros.h: No such file or directory
8 | #include <c10/cuda/impl/cuda_cmake_macros.h>
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
compilation terminated.
Steps/Code to reproduce bug
Run:
pip install transformer_engine[pytorch]
Expected behavior
Installation succeeds.
Environment overview (please complete the following information)
- Environment location: Docker
- Method of Transformer Engine install:
pip install
- Using
nvidia/cuda:12.9.1-cudnn-devel-ubuntu24.04 base image
- CUDA 12.9.1
- Python 3.12
- Torch 2.8
Environment details
If NVIDIA docker image is used you don't need to specify these.
Using nvidia/cuda:12.9.1-cudnn-devel-ubuntu24.04
Device details
- N/A -- fails at building the image stage.
Additional context
The header files seem exist but the compiler fails to find them (see the last print):
root@cb2a83ae75c0:/opt/ml/code# python
Python 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch, pathlib, sys
>>> print("python:", sys.version)
python: 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]
>>> print("torch:", torch.__version__, "cuda:", torch.version.cuda)
torch: 2.8.0+cu129 cuda: 12.9
>>> print("cuda_built:", torch.backends.cuda.is_built())
cuda_built: True
>>> inc = pathlib.Path(torch.__file__).parent/'include'
>>> print("torch include exists:", inc.exists())
torch include exists: True
>>> print("has c10/cuda dir:", (inc/'c10'/'cuda').exists())
has c10/cuda dir: True
>>> print("impl dir:", list((inc/'c10'/'cuda'/'impl').glob('*'))[:5] if (inc/'c10'/'cuda'/'impl').exists() else "missing")
impl dir: [PosixPath('/opt/venv/lib/python3.12/site-packages/torch/include/c10/cuda/impl/CUDAGuardImpl.h'), PosixPath('/opt/venv/lib/python3.12/site-packages/torch/include/c10/cuda/impl/cuda_cmake_macros.h'), PosixPath('/opt/venv/lib/python3.12/site-packages/torch/include/c10/cuda/impl/CUDATest.h')]
Describe the bug
pip install transformer_engine[pytorch]fails w/ the following error when building a Docker image (header missing in multiple steps):Steps/Code to reproduce bug
Run:
pip install transformer_engine[pytorch]Expected behavior
Installation succeeds.
Environment overview (please complete the following information)
pip installnvidia/cuda:12.9.1-cudnn-devel-ubuntu24.04base imageEnvironment details
If NVIDIA docker image is used you don't need to specify these.
Using
nvidia/cuda:12.9.1-cudnn-devel-ubuntu24.04Device details
Additional context
The header files seem exist but the compiler fails to find them (see the last print):