Summary
On Linux systems where the installed CUDA Toolkit is older than 12.5 (e.g., Ubuntu 24.04's nvidia-cuda-toolkit package, which ships CUDA 12.4), import torch fails after a fresh uv sync --group dev --extra qdp of the RC2 environment:
ImportError: /…/site-packages/torch/lib/libc10_cuda.so: undefined symbol:
cudaGetDriverEntryPointByVersion, version libcudart.so.12
cudaGetDriverEntryPointByVersion was added in CUDA 12.5. The PyTorch 2.9.0+cu128 wheel currently resolved by the project's lockfile needs it.
Root cause
The PyTorch wheel bundles a compatible libcudart.so.12 (with the symbol) at nvidia/cuda_runtime/lib/. But libc10_cuda.so's RUNPATH walks several other nvidia-* wheel directories first; LD_DEBUG=libs shows the loader finds the right libcudart.so.12, calls init, then immediately calls fini (refcount drops to zero before libc10_cuda resolves the symbol). This is a PyTorch wheel packaging fragility, not a Mahout code defect, but it blocks Mahout testing.
Reproducer
On Ubuntu 24.04 with nvidia-cuda-toolkit (12.4) installed:
git checkout mahout-qumat-0.6.0-RC2
uv sync --group dev --extra qdp
uv run python -c "import torch" # fails
Workaround
Force-load the bundled cudart via LD_PRELOAD:
export LD_PRELOAD=$VIRTUAL_ENV/lib/python3.12/site-packages/nvidia/cuda_runtime/lib/libcudart.so.12
uv run python -c "import torch; print(torch.__version__)" # works
uv run pytest -v # works
Suggested fixes (any one)
- Document the workaround in
qdp/DEVELOPMENT.md for users with CUDA Toolkit <12.5.
- Document a minimum CUDA Toolkit version (12.5 or 12.6) in the project requirements.
- Pin PyTorch to a wheel built against an older CUDA in
qdp/qdp-python/pyproject.toml until the upstream wheel issue is resolved.
- Wrap test entry points (Makefile targets, docs commands) to set
LD_PRELOAD automatically when the system's libcudart lacks the symbol.
Environment
- OS: Ubuntu 24.04
- CUDA Toolkit: 12.4 (apt
nvidia-cuda-toolkit)
- GPU: NVIDIA GeForce GTX 1060 with Max-Q Design (sm_61) — irrelevant to this issue; fails before any GPU work
- Python: 3.12.12 (uv-managed)
- PyTorch: 2.9.0+cu128 (resolved by RC2 lockfile)
Summary
On Linux systems where the installed CUDA Toolkit is older than 12.5 (e.g., Ubuntu 24.04's
nvidia-cuda-toolkitpackage, which ships CUDA 12.4),import torchfails after a freshuv sync --group dev --extra qdpof the RC2 environment:cudaGetDriverEntryPointByVersionwas added in CUDA 12.5. The PyTorch 2.9.0+cu128 wheel currently resolved by the project's lockfile needs it.Root cause
The PyTorch wheel bundles a compatible
libcudart.so.12(with the symbol) atnvidia/cuda_runtime/lib/. Butlibc10_cuda.so's RUNPATH walks several othernvidia-*wheel directories first;LD_DEBUG=libsshows the loader finds the rightlibcudart.so.12, callsinit, then immediately callsfini(refcount drops to zero beforelibc10_cudaresolves the symbol). This is a PyTorch wheel packaging fragility, not a Mahout code defect, but it blocks Mahout testing.Reproducer
On Ubuntu 24.04 with
nvidia-cuda-toolkit(12.4) installed:Workaround
Force-load the bundled cudart via
LD_PRELOAD:Suggested fixes (any one)
qdp/DEVELOPMENT.mdfor users with CUDA Toolkit <12.5.qdp/qdp-python/pyproject.tomluntil the upstream wheel issue is resolved.LD_PRELOADautomatically when the system's libcudart lacks the symbol.Environment
nvidia-cuda-toolkit)