WIP: Forward device requests #7124

yoanisgil · 2020-01-06T20:48:53Z

Add support for Nvidia GPUs (solves #6691)

Signed-off-by: Yoanis Gil <gil.yoanis@gmail.com>

yoanisgil · 2020-01-06T20:51:15Z

This PR is pretty much WIP as I have tested only one use case and also depends on docker/docker-py#2471

Signed-off-by: Yoanis Gil <gil.yoanis@gmail.com>

yoanisgil · 2020-01-06T21:28:46Z

@rumpl @ndeloof @ulyssessouza I'm happy to make this PR more compliant (tests, documentation, etc) as long as you think it is heading int he right direction.

Signed-off-by: Yoanis Gil <gil.yoanis@gmail.com>

ndeloof · 2020-01-07T07:22:15Z

compose/config/config_schema_v3.7.json

@@ -597,6 +599,21 @@
          }
        }
      }
+    },
+    "device_requests": {


leading schema definition is https://github.com/docker/cli/tree/master/cli/compose/schema/data
see #7047
I'm currently working on getting this schema duplication issue resolved.

That actually raises a very good point. Maybe the GPU allocation request should be based on generic_resources?. Just sayin because of this: https://github.com/docker/cli/blob/master/cli/compose/loader/full-example.yml

But then that will require a translation to a DeviceRequest

ndeloof · 2020-01-07T07:38:01Z

compose/config/config_schema_v3.7.json

@@ -151,6 +151,8 @@

        "external_links": {"type": "array", "items": {"type": "string"}, "uniqueItems": true},
        "extra_hosts": {"$ref": "#/definitions/list_or_dict"},
+        "device_requests": {"$ref": "#/definitions/device_requests"},


Within schema v3, runtime constraints have been moved into deployment.resources. Sounds a better place for such device allocation request

I can move but seems to me like it's better to wait until you figure out how to reconcile schema definitions between docker and compose before making any other change ...

yoanisgil

Feedback

docker#7124 docker/docker-py#2471

lshamis · 2020-03-05T19:44:39Z

I thought a docker-compose file tries to match the docker command line.

If I want to set set ipc, pid, privileged, ... the docker cli would look like:

docker run --ipc=host --pid=host --privileged=true ...

and the docker-compose looks like:

services:
    myservice:
         ipc: "host",
         pid: "host",
         privileged: true,
         ...

With newer versions of docker, the command line to use the nvidia runtime:

docker run --gpus=all --rm nvidia/cuda nvidia-smi

so I would assume that the docker-compose would look like:

services:
    myservice:
         gpus: "all",
         ...

but with this PR (according to the comments #6691), gpu usage would look like:

services:
    gpu:
        image: 'nvidia/cuda:9.0-base'
        command: 'nvidia-smi'
        device_requests:
            - capabilities:
               - "gpu"

which seems much more complicated to me.

hadim · 2020-07-01T02:44:22Z

Is there anyone working on this PR still?

xkortex · 2020-07-06T18:27:37Z

@lshamis I believe the motivation behind the device_requests syntax is the long-term ongoing goal of ensuring swarm mode is as generic as possible. docker run --gpus=all makes sense in the scope of docker run but not so much in docker swarm. However I agree, the requests notation is more confusing. As awesome as the aspirations for swarm are, there is still a user base reliant on compose for single-machine orchestration.

network_mode: host is a compose v3 field which is incompatible with swarm. Is it plausible to use this syntax for local-machine-only, and throw an error if used in swarm mode, in a similar manner to network_mode?

services:
    myservice:
         gpus: "all",
         ...

I think it's more violating of law-of-least-surprisal to lack this syntax when docker run supports it.

Nearly my entire department is still using 2.X syntax for runtime: nvidia because we use gpu enabled containers day-in and day-out and cannot always set daemon.json with a default runtime.

To give you an idea, here's the sidequest I currently have to embark on to test my ML api, vs what used to simply be docker-compose up :

docker-compose up my application stack. But the ML container dies, because no GPU, thus the worker times out
docker run --gpus=all --name=ml_server --network=myapp_default -v all_the_args_normally_in_compose my_ml_image:latest the CUDA container
restart the worker container, which fortunately still accepts the container name for dns lookup, but I have to remember to set the --name flag when I start the CUDA container, or else it can't talk to it

yoanisgil · 2020-07-07T02:44:00Z

Is there anyone working on this PR still?

I'm not (mostly because of the lack of feedback and/or interest).

hadim · 2020-08-07T12:08:53Z

FYI: docker/docker-py#2471 has been merged.

visheratin · 2020-08-28T00:50:39Z

Can someone tell what stops the PR from being merged?

edurenye · 2020-09-02T09:32:28Z

There are conflicting files, it needs to be rebased I guess.

PhotoTeeborChoka · 2020-09-07T20:35:36Z

Guys, (@rumpl @ndeloof @ulyssessouza), would it be possible to give the necessary feedback to @yoanisgil so that he can finalize this PR or so that somebody can take over and finalize it so that it can be merged? ML applications, as well as GPU based ones cannot be used very well with docker-compose unless the compatibility with docker 19.03 is complete regarding the --gpus flag.

The largest hindrance in the form of lacking docker-py support is now gone, so let's make this work!

ysyyork · 2020-09-21T16:02:22Z

Any progress on merging this into master?

alexandrecharpy · 2020-09-29T12:27:08Z

Agree with PhotoTeerborChoka. If this is just a "rebase"/conflict issue, would it speed up things if I rebase a fork from @yoanisgil (with his consent) ?

yoanisgil · 2020-10-15T16:21:37Z

@alexandrecharpy please go ahead. At the time I submitted the PR I had the time and will to make this happen but unfortunately that's no longer the case.

glours · 2022-07-27T13:46:38Z

Thanks for taking the time to create this issue/pull request!

Unfortunately, Docker Compose V1 has reached end-of-life and we are not accepting any more changes (except for security issues). Please try and reproduce your issue with Compose V2 or rewrite your pull request to be based on the v2 branch and create a new issue or PR with the relevant Compose V2 information.

yoanisgil requested review from ndeloof, rumpl and ulyssessouza as code owners January 6, 2020 20:48

GordonTheTurtle added the dco/no label Jan 6, 2020

Forward device requests

f55b937

Signed-off-by: Yoanis Gil <gil.yoanis@gmail.com>

GordonTheTurtle removed the dco/no label Jan 6, 2020

yoanisgil mentioned this pull request Jan 6, 2020

Support for NVIDIA GPUs under Docker Compose #6691

Closed

GordonTheTurtle added the dco/no label Jan 6, 2020

Fix flake8 issues

82309e6

Signed-off-by: Yoanis Gil <gil.yoanis@gmail.com>

Remove trailing whitespace

2e36cf9

Signed-off-by: Yoanis Gil <gil.yoanis@gmail.com>

GordonTheTurtle removed the dco/no label Jan 6, 2020

Use empty list

330c1bf

Signed-off-by: Yoanis Gil <gil.yoanis@gmail.com>

docker deleted a comment from GordonTheTurtle Jan 7, 2020

ndeloof reviewed Jan 7, 2020

View reviewed changes

yoanisgil commented Jan 7, 2020

View reviewed changes

sileht added a commit to sileht/compose that referenced this pull request Jan 30, 2020

glue for device-requests support

931c918

docker#7124 docker/docker-py#2471

sileht added a commit to sileht/compose that referenced this pull request Jan 30, 2020

glue for device-requests support

49a13ae

docker#7124 docker/docker-py#2471

sileht added a commit to sileht/compose that referenced this pull request Jan 30, 2020

glue for device-requests support

fb85a0b

docker#7124 docker/docker-py#2471

sileht added a commit to sileht/compose that referenced this pull request Jan 30, 2020

glue for device-requests support

abb0f50

docker#7124 docker/docker-py#2471

robertgzr mentioned this pull request May 27, 2020

docker devicerequests/nvidia support balena-os/balena-jetson#57

Open

ndeloof added the Docker Compose V1 label Nov 24, 2021

glours closed this Jul 27, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Forward device requests #7124

WIP: Forward device requests #7124

yoanisgil commented Jan 6, 2020

yoanisgil commented Jan 6, 2020

yoanisgil commented Jan 6, 2020

ndeloof Jan 7, 2020 •

edited

Loading

yoanisgil Jan 7, 2020

ndeloof Jan 7, 2020

yoanisgil Jan 7, 2020

yoanisgil left a comment

lshamis commented Mar 5, 2020

hadim commented Jul 1, 2020

xkortex commented Jul 6, 2020 •

edited

Loading

yoanisgil commented Jul 7, 2020

hadim commented Aug 7, 2020

visheratin commented Aug 28, 2020

edurenye commented Sep 2, 2020

PhotoTeeborChoka commented Sep 7, 2020 •

edited

Loading

ysyyork commented Sep 21, 2020

alexandrecharpy commented Sep 29, 2020

yoanisgil commented Oct 15, 2020

glours commented Jul 27, 2022

WIP: Forward device requests #7124

WIP: Forward device requests #7124

Conversation

yoanisgil commented Jan 6, 2020

yoanisgil commented Jan 6, 2020

yoanisgil commented Jan 6, 2020

ndeloof Jan 7, 2020 • edited Loading

Choose a reason for hiding this comment

yoanisgil Jan 7, 2020

Choose a reason for hiding this comment

ndeloof Jan 7, 2020

Choose a reason for hiding this comment

yoanisgil Jan 7, 2020

Choose a reason for hiding this comment

yoanisgil left a comment

Choose a reason for hiding this comment

lshamis commented Mar 5, 2020

hadim commented Jul 1, 2020

xkortex commented Jul 6, 2020 • edited Loading

yoanisgil commented Jul 7, 2020

hadim commented Aug 7, 2020

visheratin commented Aug 28, 2020

edurenye commented Sep 2, 2020

PhotoTeeborChoka commented Sep 7, 2020 • edited Loading

ysyyork commented Sep 21, 2020

alexandrecharpy commented Sep 29, 2020

yoanisgil commented Oct 15, 2020

glours commented Jul 27, 2022

ndeloof Jan 7, 2020 •

edited

Loading

xkortex commented Jul 6, 2020 •

edited

Loading

PhotoTeeborChoka commented Sep 7, 2020 •

edited

Loading