Skip to content

Additional checks on GPU when running containerised sorting #2855

@JoeZiminski

Description

@JoeZiminski

I often have a hard time debugging GPU issues when running containerised sorting. There are already a lot of useful checks in SI and performing checks is not always straightforward e.g. #1398 as can't always rely on commands like nvidia-smi being accessible. However I think there might be room for a couple more useful checks, I've divided this into 'easy checks' and 'harder checks'.

Easier Checks

If would be good to check that docker or singularity is installed at all on the system at all (e.g. docker --version returns nonzero code). At the moment if not installed it gives a unclear error.

If you are running singularity spython is a dependency and if docker then docker (downloaded from PyPi). However these can't be installed easily by default as they don't all work cross platform. You could do something like the below in the pyproject.toml

"docker; platform_system=='Windows'",
"docker; platform_system=='Darwin'",
"spython; platform_system=='Linux'",  # I think missing from SI?
"cuda-python; platform_system != 'Darwin'",

Alternatively, it would be nice to raise an error if trying to run with docker and docker is not installed or running with singularity and spython is not installed.

Harder Checks

Today we had an issue that was quite hard to debug where nvidia-docker was not installed but was required. It would be great to check if running on docker nvidia-docker is installed (some details here). However, it sounds like this is a bit of a nightmare, AFAIK it's not cross-platform and nvidia-docker is now superseded by nvidia-container-toolkit. Nonetheless it might be worth checking that if trying to use docker, and on Linux, that either nvidia-docker or nvidia-docker2 or nvidia-container-toolkit is installed (some more detailed explanation between their differences here. The best would be to check this through docker directly but I don't think it's possible.

Metadata

Metadata

Assignees

No one assigned

    Labels

    containerIssues related to container (docker/singularity) versions of sortersperformancePerformance issues/improvements

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions