-
Notifications
You must be signed in to change notification settings - Fork 223
Description
After #1242 is merged, we have a nice test matrix for different Windows configurations. However, currently they lock at the same driver version latest:
cuda-python/ci/test-matrix.json
Lines 37 to 48 in c4079dd
| { "ARCH": "amd64", "PY_VER": "3.10", "CUDA_VER": "12.9.1", "LOCAL_CTK": "0", "GPU": "rtx2080", "DRIVER": "latest", "DRIVER_MODE": "WDDM" }, | |
| { "ARCH": "amd64", "PY_VER": "3.10", "CUDA_VER": "13.0.2", "LOCAL_CTK": "1", "GPU": "rtxpro6000", "DRIVER": "latest", "DRIVER_MODE": "TCC" }, | |
| { "ARCH": "amd64", "PY_VER": "3.11", "CUDA_VER": "12.9.1", "LOCAL_CTK": "1", "GPU": "v100", "DRIVER": "latest", "DRIVER_MODE": "MCDM" }, | |
| { "ARCH": "amd64", "PY_VER": "3.11", "CUDA_VER": "13.0.2", "LOCAL_CTK": "0", "GPU": "rtx4090", "DRIVER": "latest", "DRIVER_MODE": "WDDM" }, | |
| { "ARCH": "amd64", "PY_VER": "3.12", "CUDA_VER": "12.9.1", "LOCAL_CTK": "0", "GPU": "l4", "DRIVER": "latest", "DRIVER_MODE": "MCDM" }, | |
| { "ARCH": "amd64", "PY_VER": "3.12", "CUDA_VER": "13.0.2", "LOCAL_CTK": "1", "GPU": "a100", "DRIVER": "latest", "DRIVER_MODE": "TCC" }, | |
| { "ARCH": "amd64", "PY_VER": "3.13", "CUDA_VER": "12.9.1", "LOCAL_CTK": "1", "GPU": "l4", "DRIVER": "latest", "DRIVER_MODE": "TCC" }, | |
| { "ARCH": "amd64", "PY_VER": "3.13", "CUDA_VER": "13.0.2", "LOCAL_CTK": "0", "GPU": "rtxpro6000", "DRIVER": "latest", "DRIVER_MODE": "MCDM" }, | |
| { "ARCH": "amd64", "PY_VER": "3.14", "CUDA_VER": "12.9.1", "LOCAL_CTK": "0", "GPU": "v100", "DRIVER": "latest", "DRIVER_MODE": "TCC" }, | |
| { "ARCH": "amd64", "PY_VER": "3.14", "CUDA_VER": "13.0.2", "LOCAL_CTK": "1", "GPU": "l4", "DRIVER": "latest", "DRIVER_MODE": "MCDM" }, | |
| { "ARCH": "amd64", "PY_VER": "3.14t", "CUDA_VER": "12.9.1", "LOCAL_CTK": "1", "GPU": "l4", "DRIVER": "latest", "DRIVER_MODE": "TCC" }, | |
| { "ARCH": "amd64", "PY_VER": "3.14t", "CUDA_VER": "13.0.2", "LOCAL_CTK": "0", "GPU": "a100", "DRIVER": "latest", "DRIVER_MODE": "MCDM" } |
But looking at this more closely, one would find that this
DRIVER label is not used on Windows at all. Instead, the version is hard-wired in the installer script:cuda-python/ci/tools/install_gpu_driver.ps1
Lines 8 to 10 in c4079dd
| # Set the correct URL, filename, and arguments to the installer | |
| # This driver is picked to support Windows 11 & CUDA 13.0 | |
| $version = '581.15' |
The reason we need the DRIVER label on Linux is because the driver is pre-installed in the runner VMs (and maintained by the runner team), and we need to use the label to compute the runner name, whereas on Windows due to technical challenges we need to install the driver ourselves as part of the CI jobs.
But, it gives us a unique opportunity to do something that we cannot do on Linux runners today, which is to select the driver versions that we intend to cover.
I think the DRIVER label on Windows could be repurposed to specify the UMD version, with the test matrix expanded:
| CTK version | UMD version | purpose | |
|---|---|---|---|
| prev major | 12.x | 12.0 | test CUDA minor version compatibility |
| prev major | 12.x | 13.x | test CUDA backward compatibility |
| curr major | 13.0 | 13.0 | |
| curr major | 13.0 | 13.x | test CUDA backward compatibility |
| curr major | 13.x | 13.0 | test CUDA minor version compatibility |
| curr major | 13.x | 13.x |
and we find a way to map the UMD version (ex: 13.0) to KMD version (ex: 581.15). Currently there is no public way to do this mapping and we need to hard-code a small table.
Perhaps this can be added to the nightly runs? #294