PyTorch 2.9+cu128 wheel fails to import on systems with CUDA Toolkit <12.5 due to libcudart load-order

### Summary
 
On Linux systems where the installed CUDA Toolkit is older than 12.5 (e.g., Ubuntu 24.04's `nvidia-cuda-toolkit` package, which ships CUDA 12.4), `import torch` fails after a fresh `uv sync --group dev --extra qdp` of the RC2 environment:
 
```
ImportError: /…/site-packages/torch/lib/libc10_cuda.so: undefined symbol:
    cudaGetDriverEntryPointByVersion, version libcudart.so.12
```
 
`cudaGetDriverEntryPointByVersion` was added in CUDA 12.5. The PyTorch 2.9.0+cu128 wheel currently resolved by the project's lockfile needs it.
 
### Root cause
 
The PyTorch wheel bundles a compatible `libcudart.so.12` (with the symbol) at `nvidia/cuda_runtime/lib/`. But `libc10_cuda.so`'s RUNPATH walks several other `nvidia-*` wheel directories first; `LD_DEBUG=libs` shows the loader finds the right `libcudart.so.12`, calls `init`, then immediately calls `fini` (refcount drops to zero before `libc10_cuda` resolves the symbol). This is a PyTorch wheel packaging fragility, not a Mahout code defect, but it blocks Mahout testing.
 
### Reproducer
 
On Ubuntu 24.04 with `nvidia-cuda-toolkit` (12.4) installed:
 
```bash
git checkout mahout-qumat-0.6.0-RC2
uv sync --group dev --extra qdp
uv run python -c "import torch"   # fails
```
 
### Workaround
 
Force-load the bundled cudart via `LD_PRELOAD`:
 
```bash
export LD_PRELOAD=$VIRTUAL_ENV/lib/python3.12/site-packages/nvidia/cuda_runtime/lib/libcudart.so.12
uv run python -c "import torch; print(torch.__version__)"   # works
uv run pytest -v                                            # works
```
 
### Suggested fixes (any one)
 
- **Document the workaround** in `qdp/DEVELOPMENT.md` for users with CUDA Toolkit <12.5.
- **Document a minimum CUDA Toolkit version** (12.5 or 12.6) in the project requirements.
- **Pin PyTorch** to a wheel built against an older CUDA in `qdp/qdp-python/pyproject.toml` until the upstream wheel issue is resolved.
- **Wrap test entry points** (Makefile targets, docs commands) to set `LD_PRELOAD` automatically when the system's libcudart lacks the symbol.
### Environment
 
- OS: Ubuntu 24.04
- CUDA Toolkit: 12.4 (apt `nvidia-cuda-toolkit`)
- GPU: NVIDIA GeForce GTX 1060 with Max-Q Design (sm_61) — irrelevant to this issue; fails before any GPU work
- Python: 3.12.12 (uv-managed)
- PyTorch: 2.9.0+cu128 (resolved by RC2 lockfile)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PyTorch 2.9+cu128 wheel fails to import on systems with CUDA Toolkit <12.5 due to libcudart load-order #1330

Summary

Root cause

Reproducer

Workaround

Suggested fixes (any one)

Environment

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

PyTorch 2.9+cu128 wheel fails to import on systems with CUDA Toolkit <12.5 due to libcudart load-order #1330

Description

Summary

Root cause

Reproducer

Workaround

Suggested fixes (any one)

Environment

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions