Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dockerize the "Remote GPU" service #224

Merged
merged 11 commits into from Jul 10, 2020
Merged

Dockerize the "Remote GPU" service #224

merged 11 commits into from Jul 10, 2020

Conversation

mikaelhg
Copy link
Contributor

@mikaelhg mikaelhg commented Jun 16, 2020

This PR needs some testing before merging, so I'm creating it as a draft.

Based on my previous contribution to AliaksandrSiarohin/first-order-model#55.

Building

docker build -t avatarify .

Running

docker run -it --rm --gpus=all -p 5557:5557 -p 5558:5558 avatarify

@mikaelhg mikaelhg mentioned this pull request Jun 16, 2020
@mintmaker
Copy link
Contributor

Works, but with 15-20 fps instead of ~30 fps (1080 Ti, Ubuntu 20.04). Probably because of network speed as nvidia-docker is just a bit behind native installations of tensorlfow (and probably pytorch, too).
A way to forward the video devices directly would be awesome.

But for remote GPU server it's great.

@mikaelhg
Copy link
Contributor Author

@mintmaker, would you mind testing with docker run parameters --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 please?

@mintmaker
Copy link
Contributor

runs, but the performance does not improve. With the first avatar it runs with ~23 fps, but after switching it goes down to around ~16 fps and goes up after that.

If possible (not an expert in Docker) everything dockerized would be super cool. With the current solution the worker has to install Miniconda/Anaconda and some modules, which is okay in the convential case, but not optimal for a local setup.

@mikaelhg
Copy link
Contributor Author

mikaelhg commented Jun 16, 2020

I'm currently experimenting with running the client side on Docker as well with

Client side:

docker run -it --rm --privileged \
    --env="DISPLAY" --env="QT_X11_NO_MITSHM=1" \
    --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \
      avatarify python3 afy/cam_fomm.py \
          --config fomm/config/vox-adv-256.yaml \
          --checkpoint vox-adv-cpk.pth.tar --virt-cam 9 \
          --relative --is-client \
          --in-addr tcp://192.168.1.45:5557 --out-addr tcp://192.168.1.45:5558

In order for the desktop program running inside the Docker container to be able to talk to your X server:

xhost +local:root

Server side:

docker run -it --rm --gpus=all --privileged \
    -p 5557:5557 -p 5558:5558 \
    -v /tmp/models:/root/.torch/models \
    avatarify

Standalone:

PYTHONPATH=$PYTHONPATH:$(pwd):$(pwd)/fomm python3 afy/cam_fomm.py \
    --config fomm/config/vox-adv-256.yaml \
    --checkpoint vox-adv-cpk.pth.tar --virt-cam 9 \
    --relative --is-client \
    --in-addr tcp://192.168.1.45:5557 --out-addr tcp://192.168.1.45:5558

Getting 33 fps while streaming from my laptop to my desktop with a RTX 2070 and back.

@mintmaker
Copy link
Contributor

Nice that you corrected the thing with -p, I forgot to mention I needed to change it.

I changed the ip you wrote with mine and the client and server were able to connect (the server downloaded stuff, he only does it when he is connected with client). But the /dev/video9 does not exist.
I tried to make with sudo modprobe v4l2loopback exclusive_caps=1 video_nr="9" card_label="avatarify", but it does not generate a new one. (The first one I did was sudo modprobe v4l2loopback devices=1 and it worked, but the following commands not). I will try it again after a reboot.

@mikaelhg
Copy link
Contributor Author

mikaelhg commented Jun 17, 2020

You don't have to reboot, you can just rmmod the module, and modprobe it back with new options.

Instead of the -p options, you could try with --network=host to see if the Docker proxies are slowing things down.

@mintmaker
Copy link
Contributor

I thought removing the module would cause a reinstall, so thanks.

Yeah, now it works with /dev/video9. But still 25 fps. --network=host makes it fast first (40 fps; which is weird, I don't even get it with my native install), but then after zooming in and recalibrating goes down to 25 fps.

Testing with:
docker run -it --rm --privileged --gpus all\ --env="DISPLAY" --env="QT_X11_NO_MITSHM=1" \ --volume="/tmp/.X11-unix:/tmp/.X11-unix:rw" \ avatarify python3 afy/cam_fomm.py \ --config fomm/config/vox-adv-256.yaml \ --checkpoint vox-adv-cpk.pth.tar --virt-cam 9 \ --relative
is successful. Runs with 33 fps (same as native installation).

Thank you!

And btw, what about a parameter downloading in a dir, which is mounted into docker? It always takes around half a minute to download the weights.

@mikaelhg
Copy link
Contributor Author

mikaelhg commented Jun 17, 2020

See -v /tmp/models:/root/.torch/models in the previous comment.

And thanks for testing!

@mikaelhg
Copy link
Contributor Author

@alievk, based on testing, we'd need to add some documentation on how to use the Docker image, where should that go?

@mintmaker
Copy link
Contributor

mintmaker commented Jun 17, 2020

Wow, thank you very much, it works!

@mintmaker
Copy link
Contributor

@mikaelhg What about using docker when --docker is specified in run.sh? And for installing the same.

@alievk
Copy link
Owner

alievk commented Jun 19, 2020

@alievk, based on testing, we'd need to add some documentation on how to use the Docker image, where should that go?

You can add Docker subsection to the Install section in the Readme.

@mintmaker
Copy link
Contributor

@mikaelhg I am currently updating the README and run.sh.

@mikaelhg
Copy link
Contributor Author

mikaelhg commented Jun 19, 2020

@alievk, are you planning to add release tags to git? I'm currently pointing to specific commits in the Dockerfile as build args, which can be changed on the docker build command line.

@mintmaker
Copy link
Contributor

mintmaker commented Jun 19, 2020

I have README.md and run.sh updated for docker. Tests are successful.

@mintmaker
Copy link
Contributor

I am now working on support for remote gpu.
I have a few questions @alievk : Should I add the new subsections to the table of contents? Also the wiki needs then to be updated. How should we then merge my and @mikaelhg 's changes (maybe merge this pr and then merge my pr with README and run.sh?)

@mikaelhg mikaelhg marked this pull request as ready for review June 19, 2020 13:15
@mikaelhg
Copy link
Contributor Author

mikaelhg commented Jun 19, 2020

Marked the PR ready for review, let me know if you want some changes or documentation in this PR in addition to @mintmaker's work.

@mintmaker
Copy link
Contributor

I updated run.sh for remote gpu and Docker. Tests are succesfull, once again.

@mintmaker
Copy link
Contributor

@alievk @mikaelhg I am opening the PR with updated readme and run.sh now.

@mikaelhg
Copy link
Contributor Author

@mintmaker, I added you as a collaborator to my fork, so you can push changes to this branch directly, and they'll be added to this PR, if you'd rather do that.

@mintmaker
Copy link
Contributor

Or should I merge your fork first into mine?

@mintmaker
Copy link
Contributor

@mikaelhg Okay, thanks!

@bjarthur
Copy link

i wonder if the performance would be better with singularity instead of docker

@bjarthur
Copy link

do you plan on pushing the image to the cloud? that'd be great

@bjarthur
Copy link

https://hub.docker.com/r/bijanmmarkes/avatarify

@alievk
Copy link
Owner

alievk commented Jun 20, 2020

I am now working on support for remote gpu.
I have a few questions @alievk : Should I add the new subsections to the table of contents? Also the wiki needs then to be updated. How should we then merge my and @mikaelhg 's changes (maybe merge this pr and then merge my pr with README and run.sh?)

I can't give an access to the Wiki only, so you could create a page in your fork and I'll copy-paste it to the original repo.

@alievk
Copy link
Owner

alievk commented Jun 30, 2020

@mintmaker is there any reason not to use --gpus flag?

@mintmaker
Copy link
Contributor

@alievk Not sure what you mean, The --gpus flag is only used with docker, so it's not important when running natively. If docker is utilized the --gpus all flag is added in the docker run [...] call, which assumes nvidia-docker is installed (it should be installed only when one owns a gpu) and adding it without nvidia-docker installed could cause some issues. I can't verify it because I have nvidia-docker already installed. Can you or @mikaelhg run it without nvidia-docker installed but with --gpus flag? If no issues are cause I can remove the flag.

@mikaelhg
Copy link
Contributor Author

Can you or @mikaelhg run it without nvidia-docker installed but with --gpus flag? If no issues are cause I can remove the flag.

$ docker run -it --rm --gpus=all ubuntu:20.04 bash
docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

@alievk
Copy link
Owner

alievk commented Jul 4, 2020

I have this

$ ./run.sh  --is-worker --no-vcam --docker
xhost:  unable to open display ""
docker: Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.40/containers/create: dial unix /var/run/docker.sock: connect: permission denied.
See 'docker run --help'.
xhost:  unable to open display ""

Why is it trying to access a display? Is that necessary?

@mintmaker
Copy link
Contributor

In order to display the preview windows it needs to acces the X Screen. As a worker this is not needed and will cause the error you just posted. I will fix this.

@alievk
Copy link
Owner

alievk commented Jul 5, 2020

@mintmaker works perfectly!

One last thing I wanted to mention is that Avatarify is not supposed for running on CPU, so we have to run on GPU by default and stress that in the installation section (merge 1 and 2 steps). If you want to keep the --gpus option for some reason, I would change it to --no-gpus, because the default behaviour is run on a GPU. If we keep everything as is people would post issues complaining for slow performance ("nobody reads the manual").

P.S. sorry for late responds, very busy with other stuff!

@adelin-b
Copy link

adelin-b commented Jul 7, 2020

I found a bug in the branch, when using docker, adding a new face, reloading with L and then changing face with D or A it crashes, here is the error :

./run.sh --docker --gpus
[1594162784.345596] Images reloaded
Traceback (most recent call last):
  File "afy/cam_fomm.py", line 315, in <module>
    change_avatar(predictor, avatars[cur_ava])
  File "afy/cam_fomm.py", line 85, in change_avatar
    avatar_kp = predictor.get_frame_kp(new_avatar)
  File "/app/avatarify/afy/predictor_local.py", line 92, in get_frame_kp
    kp_landmarks = self.fa.get_landmarks(image)
  File "/usr/local/lib/python3.6/dist-packages/face_alignment/api.py", line 107, in get_landmarks
    return self.get_landmarks_from_image(image_or_path, detected_faces)
  File "/usr/local/lib/python3.6/dist-packages/torch/autograd/grad_mode.py", line 43, in decorate_no_grad
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/face_alignment/api.py", line 134, in get_landmarks_from_image
    if image.ndim == 2:
AttributeError: 'list' object has no attribute 'ndim'
non-network local connections being removed from access control list

@alievk
Copy link
Owner

alievk commented Jul 8, 2020

Hi @adelin-b, this is not relevant to the docker branch. Could you please write about the bug in the issues?

Please also explain how to reproduce this bug and attach the avatar image if possible.

@mintmaker
Copy link
Contributor

@adelin-b I fixed this issue a few commits (#221 ) before, but the Dockerfile clones the git repository one commit before my fix. This should be updated before merging, for sure.

@mikaelhg
Copy link
Contributor Author

mikaelhg commented Jul 8, 2020

@alievk mentioned the possibility of tagging known good commits as release versions, that would be the way to go.

I think we've pretty much exhausted the scope of this PR, so after that change, it would probably be time for a GO/NO-GO decision.

@mintmaker
Copy link
Contributor

@mikaelhg I'll write the wiki page and I think the part for docker is done then.
@alievk I'd like to recommend the docker based method in the wiki as it is easier and cleaner to setup, is that okay?

@alievk
Copy link
Owner

alievk commented Jul 8, 2020

@mikaelhg I'll write the wiki page and I think the part for docker is done then.
@alievk I'd like to recommend the docker based method in the wiki as it is easier and cleaner to setup, is that okay?

that's ok

@mintmaker
Copy link
Contributor

mintmaker commented Jul 9, 2020

@alievk I forgot to add the installation of v4l2loopback in the README. So I'll make another script install_docker.sh. In your install.sh you clone from your fork of the original repo v4l2loopback, does it have a reason?

@alievk
Copy link
Owner

alievk commented Jul 9, 2020

@mintmaker I had a small fix to the upstream repository, changing the minimum version of Linux kernel alievk/v4l2loopback@9dc1079, but looks like he already fixed that in the master branch https://github.com/umlaeute/v4l2loopback/blob/master/v4l2loopback.c#L721

Before the fix there was an error when creating a virtual camera on earlier Linux kernel versions.

@alievk
Copy link
Owner

alievk commented Jul 9, 2020

I think you can stick to the upstream repository.

@mintmaker
Copy link
Contributor

My updated wiki is here (only Remote-GPU.md is changed): https://github.com/mintmaker/avatarify_docker_wiki

@alievk
Copy link
Owner

alievk commented Jul 10, 2020

@mintmaker excellent!

I think we are ready to merge. I can make a tag called docker then we have to replace the commit id with the tag in the Dockerfile. Is that ok?

@mikaelhg
Copy link
Contributor Author

Let's go!

@alievk alievk merged commit 5638984 into alievk:master Jul 10, 2020
@alievk
Copy link
Owner

alievk commented Jul 10, 2020

@mintmaker @mikaelhg thank you very much!

Next I'll make a tag, fix the Dockerfile, update the wiki and mention you in the readme.

@josharmour
Copy link
Contributor

josharmour commented Jul 20, 2020

I'm having issues with this and willing to help troubleshoot.

joshu@server:~/avatarify$ ./run.sh --is-worker --no-vcam --docker docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: file creation failed: /var/lib/docker/overlay2/7d7d67718330862032ed32abcd83445444647cb88d459df597c431f30f2ebb5a/merged/dev/nvidia-uvm: input/output error\\\\n\\\"\"": unknown. joshu@server:~/avatarify$

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants