I often have a hard time debugging GPU issues when running containerised sorting. There are already a lot of useful checks in SI and performing checks is not always straightforward e.g. #1398 as can't always rely on commands like nvidia-smi being accessible. However I think there might be room for a couple more useful checks, I've divided this into 'easy checks' and 'harder checks'.
Easier Checks
If would be good to check that docker or singularity is installed at all on the system at all (e.g. docker --version returns nonzero code). At the moment if not installed it gives a unclear error.
If you are running singularity spython is a dependency and if docker then docker (downloaded from PyPi). However these can't be installed easily by default as they don't all work cross platform. You could do something like the below in the pyproject.toml
"docker; platform_system=='Windows'",
"docker; platform_system=='Darwin'",
"spython; platform_system=='Linux'", # I think missing from SI?
"cuda-python; platform_system != 'Darwin'",
Alternatively, it would be nice to raise an error if trying to run with docker and docker is not installed or running with singularity and spython is not installed.
Harder Checks
Today we had an issue that was quite hard to debug where nvidia-docker was not installed but was required. It would be great to check if running on docker nvidia-docker is installed (some details here). However, it sounds like this is a bit of a nightmare, AFAIK it's not cross-platform and nvidia-docker is now superseded by nvidia-container-toolkit. Nonetheless it might be worth checking that if trying to use docker, and on Linux, that either nvidia-docker or nvidia-docker2 or nvidia-container-toolkit is installed (some more detailed explanation between their differences here. The best would be to check this through docker directly but I don't think it's possible.
I often have a hard time debugging GPU issues when running containerised sorting. There are already a lot of useful checks in SI and performing checks is not always straightforward e.g. #1398 as can't always rely on commands like
nvidia-smibeing accessible. However I think there might be room for a couple more useful checks, I've divided this into 'easy checks' and 'harder checks'.Easier Checks
If would be good to check that docker or singularity is installed at all on the system at all (e.g.
docker --versionreturns nonzero code). At the moment if not installed it gives a unclear error.If you are running singularity
spythonis a dependency and if docker thendocker(downloaded from PyPi). However these can't be installed easily by default as they don't all work cross platform. You could do something like the below in thepyproject.tomlAlternatively, it would be nice to raise an error if trying to run with docker and
dockeris not installed or running with singularity andspythonis not installed.Harder Checks
Today we had an issue that was quite hard to debug where
nvidia-dockerwas not installed but was required. It would be great to check if running on dockernvidia-dockeris installed (some details here). However, it sounds like this is a bit of a nightmare, AFAIK it's not cross-platform and nvidia-docker is now superseded by nvidia-container-toolkit. Nonetheless it might be worth checking that if trying to use docker, and on Linux, that eithernvidia-dockerornvidia-docker2ornvidia-container-toolkitis installed (some more detailed explanation between their differences here. The best would be to check this through docker directly but I don't think it's possible.