Issues installing torch with the GPU on top of synerbi/sirf:service-gpu #213

Imraj-Singh · 2024-03-06T17:44:24Z

I ran into some issues installing torch within the synerbi/sirf:service-gpu container.

I cloned the SIRF-Exercises locally, then opened the folder within VSCode changing the .devcontainer base image from devel-sevice to sevice-gpu. I then "Reopen in devcontainer" this takes a while to pull the image and update the environment.

In the new container the command nvidia-smi is not found, but nvcc is found. I tried installing torch, but can't seem to get the GPU working: torch.cuda.is_available() is false.

It was giving some more sinister error when I tried a couple of weeks ago (something to do with not finding some cuda libraries) but can't seem to recreate... I am quite sure mamba is just choosing the cpu version of torch due to not being able to find some drivers? Perhaps all what's needed is a apt update but I run into permission issues...

@casperdcl apologies if this isn't the right place to put this issue!

The text was updated successfully, but these errors were encountered:

casperdcl · 2024-04-12T14:23:16Z

changing the .devcontainer base image from devel-sevice to sevice-gpu

but current config uses ghcr.io/synerbi/sirf:latest not :devel-sevice

SIRF-Exercises/.devcontainer.json

Line 10 in 9d3b979

"image": "ghcr.io/synerbi/sirf:latest",

Also it doesn't have any GPU-specific options like --gpus all etc... and I don't know if VSCode locally works with the underlying docker-stacks image (SyneRBI/SIRF-SuperBuild#865)

Does this not work for you?

$ docker run --rm --gpus all -it ghcr.io/synerbi/sirf:latest-gpu /bin/bash
sirf$ pip install torch
sirf$ python -c 'import torch; print(torch.cuda.is_available())'

Imraj-Singh · 2024-04-14T14:31:13Z

I must've used the older config.

I've just tried with the the new image and ir still does not recognise nvidia-smi, and when I run sirf$ python -c 'import torch; print(torch.cuda.is_available())' it returns false. I pulled the image then ran pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 to install pytorch.

casperdcl · 2024-04-15T09:41:01Z

hmm... on my machine I have:

$ docker run --rm --gpus all -it ghcr.io/synerbi/sirf:latest-gpu nvidia-smi
Mon Apr 15 09:40:26 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
...

(So cuda version 12.4)

Imraj-Singh · 2024-04-16T08:27:13Z

It works! I just needed launch the devcontainer with slight updates:
"image": "ghcr.io/synerbi/sirf:latest-gpu", "runArgs": ["--gpus=all"],
This is in-line with what you wrote here, my apologies for not seeing this earlier. I guess it could be worth adding a note in the DocForParticipants, although this may be too niche and you do have links to the docker README in there.

KrisThielemans · 2024-04-16T08:54:14Z

Great. Is there a way to have 2 devcontainer specs?

casperdcl · 2024-04-16T11:15:42Z

Apart from templates (vis. SyneRBI/SIRF-SuperBuild#865) not really afaik

casperdcl · 2024-05-13T09:56:57Z

@paskino just made this for @gschramm

SIRF-SuperBuild/docker/compose.sh -bg -- \
  --build-arg EXTRA_BUILD_FLAGS="-DSIRF_TAG=2c66faff3bc0f12c864cfc2a2931eba5ade60ba0"

FROM synerbi/sirf:latest-gpu
RUN pip install torch

docker build -t ghcr.io/synerbi/sirf:training2024_0.1 .

and pushed it to ghcr.io/synerbi/sirf:training2024_0.1

Though I think it should be:

FROM synerbi/sirf:latest-gpu
RUN mamba install jupytext parallelproj \
  && pip install --no-cache-dir torch \
  && mamba clean -a -y -f && fix-permissions "${CONDA_DIR}" /home/${NB_USER}

KrisThielemans · 2024-05-13T18:15:36Z

Not 100% sure about adding parallelproj here. We build one ourselves already. Could lead to interesting conflicts (as our build is independent of pip). We could avoid that by using instructions on https://github.com/SyneRBI/SIRF/wiki/Building-SIRF-and-CIL-with-conda, but it's too late for that now.

So... do we need parallelproj Python for this image?

(Obviously, all the fix-permissions stuff is a bit ugly, but as long as you manage this...)

gschramm · 2024-05-14T05:21:21Z

parallelproj (and also jupytext) are indeed not needed. Sorry for the confusion (I asked a while ago for parallelproj). The only extra package I need for the DL exercises is pytorch

KrisThielemans · 2024-05-14T08:15:12Z

@samdporter @NicoleJurjew dod you need jupytext?

If so, it should be added to the environment.yml

NicoleJurjew · 2024-05-14T08:53:33Z

Hi Kris, no, not installed on my machine. Nicole

…

________________________________ From: Kris Thielemans ***@***.***> Sent: 14 May 2024 09:15 To: SyneRBI/SIRF-Exercises ***@***.***> Cc: Jurjew, Nicole ***@***.***>; Mention ***@***.***> Subject: Re: [SyneRBI/SIRF-Exercises] Issues installing torch with the GPU on top of synerbi/sirf:service-gpu (Issue #213) ⚠ Caution: External sender @samdporter<https://github.com/samdporter> @NicoleJurjew<https://github.com/NicoleJurjew> dod you need jupytext? If so, it should be added to the environment.yml — Reply to this email directly, view it on GitHub<#213 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AUSFUBVZYBLIRNQ2SZ6MXSTZCHB2NAVCNFSM6AAAAABEJQGLIKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBZGU2TQNBQGI>. You are receiving this because you were mentioned.Message ID: ***@***.***>

samdporter · 2024-05-14T09:17:51Z

I'm not using it either (although having a read about it, it looks like I probably should?)

casperdcl · 2024-05-14T10:14:15Z

Ok. --no-cache-dir would also make the images 2GB smaller (!).

I'll just add jupytext and remove caches in the current image.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues installing torch with the GPU on top of synerbi/sirf:service-gpu #213

Issues installing torch with the GPU on top of synerbi/sirf:service-gpu #213

Imraj-Singh commented Mar 6, 2024 •

edited

Loading

casperdcl commented Apr 12, 2024 •

edited

Loading

Imraj-Singh commented Apr 14, 2024

casperdcl commented Apr 15, 2024 •

edited

Loading

Imraj-Singh commented Apr 16, 2024

KrisThielemans commented Apr 16, 2024

casperdcl commented Apr 16, 2024 •

edited

Loading

casperdcl commented May 13, 2024 •

edited

Loading

KrisThielemans commented May 13, 2024

gschramm commented May 14, 2024

KrisThielemans commented May 14, 2024

NicoleJurjew commented May 14, 2024 via email

samdporter commented May 14, 2024

casperdcl commented May 14, 2024 •

edited

Loading

Issues installing torch with the GPU on top of synerbi/sirf:service-gpu #213

Issues installing torch with the GPU on top of synerbi/sirf:service-gpu #213

Comments

Imraj-Singh commented Mar 6, 2024 • edited Loading

casperdcl commented Apr 12, 2024 • edited Loading

Imraj-Singh commented Apr 14, 2024

casperdcl commented Apr 15, 2024 • edited Loading

Imraj-Singh commented Apr 16, 2024

KrisThielemans commented Apr 16, 2024

casperdcl commented Apr 16, 2024 • edited Loading

casperdcl commented May 13, 2024 • edited Loading

KrisThielemans commented May 13, 2024

gschramm commented May 14, 2024

KrisThielemans commented May 14, 2024

NicoleJurjew commented May 14, 2024 via email

samdporter commented May 14, 2024

casperdcl commented May 14, 2024 • edited Loading

Imraj-Singh commented Mar 6, 2024 •

edited

Loading

casperdcl commented Apr 12, 2024 •

edited

Loading

casperdcl commented Apr 15, 2024 •

edited

Loading

casperdcl commented Apr 16, 2024 •

edited

Loading

casperdcl commented May 13, 2024 •

edited

Loading

casperdcl commented May 14, 2024 •

edited

Loading