Skip to content

transformer_engine_torch build fails when building a Docker image #2175

@astan-iq

Description

@astan-iq

Describe the bug

pip install transformer_engine[pytorch] fails w/ the following error when building a Docker image (header missing in multiple steps):

Building wheels for collected packages: transformer_engine_torch
  Building wheel for transformer_engine_torch (pyproject.toml) ... error
  error: subprocess-exited-with-error
  
  × Building wheel for transformer_engine_torch (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [279 lines of output]
      /tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/cuda/__init__.py:63: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
        import pynvml  # type: ignore[import]
      /tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/_subclasses/functional_tensor.py:279: UserWarning: Failed to initialize NumPy: No module named 'numpy' (Triggered internally at /pytorch/torch/csrc/utils/tensor_numpy.cpp:81.)
        cpu = _conversion_method_template(device=torch.device("cpu"))
      /tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/setuptools/_distutils/dist.py:289: UserWarning: Unknown distribution option: 'tests_require'
        warnings.warn(msg)
      running bdist_wheel
      running build
      running build_ext
      building 'transformer_engine_torch' extension
      creating /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/build/temp.linux-aarch64-cpython-312/csrc
      creating /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/build/temp.linux-aarch64-cpython-312/csrc/extensions
      creating /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/build/temp.linux-aarch64-cpython-312/csrc/extensions/multi_tensor
      [1/27] c++ -MMD -MF /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/build/temp.linux-aarch64-cpython-312/csrc/extensions/apply_rope.o.d -fno-strict-overflow -Wsign-compare -DNDEBUG -g -O2 -Wall -fPIC -I/usr/local/cuda/include -I/tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/common_headers -I/tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/common_headers/common -I/tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/common_headers/common/include -I/tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/csrc -I/tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/include -I/tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/opt/venv/include -I/usr/include/python3.12 -c -c /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/csrc/extensions/apply_rope.cpp -o /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/build/temp.linux-aarch64-cpython-312/csrc/extensions/apply_rope.o -O3 -fvisibility=hidden -g0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1018"' -DTORCH_EXTENSION_NAME=transformer_engine_torch -std=c++17
      FAILED: [code=1] /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/build/temp.linux-aarch64-cpython-312/csrc/extensions/apply_rope.o
      c++ -MMD -MF /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/build/temp.linux-aarch64-cpython-312/csrc/extensions/apply_rope.o.d -fno-strict-overflow -Wsign-compare -DNDEBUG -g -O2 -Wall -fPIC -I/usr/local/cuda/include -I/tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/common_headers -I/tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/common_headers/common -I/tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/common_headers/common/include -I/tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/csrc -I/tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/include -I/tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/include/torch/csrc/api/include -I/opt/venv/include -I/usr/include/python3.12 -c -c /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/csrc/extensions/apply_rope.cpp -o /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/build/temp.linux-aarch64-cpython-312/csrc/extensions/apply_rope.o -O3 -fvisibility=hidden -g0 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1018"' -DTORCH_EXTENSION_NAME=transformer_engine_torch -std=c++17
      In file included from /tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/include/c10/cuda/CUDADeviceAssertionHost.h:3,
                       from /tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/include/c10/cuda/CUDAException.h:3,
                       from /tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/include/c10/cuda/CUDAFunctions.h:12,
                       from /tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/include/ATen/cuda/CUDAContextLight.h:28,
                       from /tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/include/ATen/cuda/CUDAContext.h:3,
                       from /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/csrc/extensions/../common.h:12,
                       from /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/csrc/extensions/../extensions.h:12,
                       from /tmp/pip-install-3ry5vkew/transformer-engine-torch_bc4182522c604a088997468859cd48fd/csrc/extensions/apply_rope.cpp:7:
      /tmp/pip-build-env-i3pp4n6r/overlay/lib/python3.12/site-packages/torch/include/c10/cuda/CUDAMacros.h:8:10: fatal error: c10/cuda/impl/cuda_cmake_macros.h: No such file or directory
          8 | #include <c10/cuda/impl/cuda_cmake_macros.h>
            |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      compilation terminated.

Steps/Code to reproduce bug

Run:
pip install transformer_engine[pytorch]

Expected behavior

Installation succeeds.

Environment overview (please complete the following information)

  • Environment location: Docker
  • Method of Transformer Engine install: pip install
  • Using nvidia/cuda:12.9.1-cudnn-devel-ubuntu24.04 base image
  • CUDA 12.9.1
  • Python 3.12
  • Torch 2.8

Environment details

If NVIDIA docker image is used you don't need to specify these.
Using nvidia/cuda:12.9.1-cudnn-devel-ubuntu24.04

Device details

  • N/A -- fails at building the image stage.

Additional context

The header files seem exist but the compiler fails to find them (see the last print):

root@cb2a83ae75c0:/opt/ml/code# python
Python 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch, pathlib, sys
>>> print("python:", sys.version)
python: 3.12.3 (main, Aug 14 2025, 17:47:21) [GCC 13.3.0]
>>> print("torch:", torch.__version__, "cuda:", torch.version.cuda)
torch: 2.8.0+cu129 cuda: 12.9
>>> print("cuda_built:", torch.backends.cuda.is_built())
cuda_built: True
>>> inc = pathlib.Path(torch.__file__).parent/'include'
>>> print("torch include exists:", inc.exists())
torch include exists: True
>>> print("has c10/cuda dir:", (inc/'c10'/'cuda').exists())
has c10/cuda dir: True
>>> print("impl dir:", list((inc/'c10'/'cuda'/'impl').glob('*'))[:5] if (inc/'c10'/'cuda'/'impl').exists() else "missing")
impl dir: [PosixPath('/opt/venv/lib/python3.12/site-packages/torch/include/c10/cuda/impl/CUDAGuardImpl.h'), PosixPath('/opt/venv/lib/python3.12/site-packages/torch/include/c10/cuda/impl/cuda_cmake_macros.h'), PosixPath('/opt/venv/lib/python3.12/site-packages/torch/include/c10/cuda/impl/CUDATest.h')]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions