Skip to content

PyTorch 2.9+cu128 wheel fails to import on systems with CUDA Toolkit <12.5 due to libcudart load-order #1330

@andrewmusselman

Description

@andrewmusselman

Summary

On Linux systems where the installed CUDA Toolkit is older than 12.5 (e.g., Ubuntu 24.04's nvidia-cuda-toolkit package, which ships CUDA 12.4), import torch fails after a fresh uv sync --group dev --extra qdp of the RC2 environment:

ImportError: /…/site-packages/torch/lib/libc10_cuda.so: undefined symbol:
    cudaGetDriverEntryPointByVersion, version libcudart.so.12

cudaGetDriverEntryPointByVersion was added in CUDA 12.5. The PyTorch 2.9.0+cu128 wheel currently resolved by the project's lockfile needs it.

Root cause

The PyTorch wheel bundles a compatible libcudart.so.12 (with the symbol) at nvidia/cuda_runtime/lib/. But libc10_cuda.so's RUNPATH walks several other nvidia-* wheel directories first; LD_DEBUG=libs shows the loader finds the right libcudart.so.12, calls init, then immediately calls fini (refcount drops to zero before libc10_cuda resolves the symbol). This is a PyTorch wheel packaging fragility, not a Mahout code defect, but it blocks Mahout testing.

Reproducer

On Ubuntu 24.04 with nvidia-cuda-toolkit (12.4) installed:

git checkout mahout-qumat-0.6.0-RC2
uv sync --group dev --extra qdp
uv run python -c "import torch"   # fails

Workaround

Force-load the bundled cudart via LD_PRELOAD:

export LD_PRELOAD=$VIRTUAL_ENV/lib/python3.12/site-packages/nvidia/cuda_runtime/lib/libcudart.so.12
uv run python -c "import torch; print(torch.__version__)"   # works
uv run pytest -v                                            # works

Suggested fixes (any one)

  • Document the workaround in qdp/DEVELOPMENT.md for users with CUDA Toolkit <12.5.
  • Document a minimum CUDA Toolkit version (12.5 or 12.6) in the project requirements.
  • Pin PyTorch to a wheel built against an older CUDA in qdp/qdp-python/pyproject.toml until the upstream wheel issue is resolved.
  • Wrap test entry points (Makefile targets, docs commands) to set LD_PRELOAD automatically when the system's libcudart lacks the symbol.

Environment

  • OS: Ubuntu 24.04
  • CUDA Toolkit: 12.4 (apt nvidia-cuda-toolkit)
  • GPU: NVIDIA GeForce GTX 1060 with Max-Q Design (sm_61) — irrelevant to this issue; fails before any GPU work
  • Python: 3.12.12 (uv-managed)
  • PyTorch: 2.9.0+cu128 (resolved by RC2 lockfile)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions