Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support CDI devices in --device flag #4084

Merged
merged 2 commits into from
Jul 6, 2023
Merged

Conversation

elezar
Copy link
Contributor

@elezar elezar commented Mar 10, 2023

- What I did

Added support for specifying CDI devices in the --device flag. See #3864

Note that support for specifying CDI devices in the --device flag has been merged into Podman, Singularity (OCI mode), and Containerd's ctr command.

All of these rely on the fact that fully-qualified CDI devices names provide a unique mechanism to identify these devices names which does not conflict with devices nodes on Linux systems or devices identifiers on Windows systems.

- How I did it

Before performing the platform-specific validation and parsing of the --device arguments, the specified device is checked to see whether it is a fully-qualified device name of the form: vendor.com/class=name. If this is the case, it is considered a CDI device.

A device request (similar to the --gpus flag) is constructed from the specified CDI device names and added to the HostConfig to be passed to the API. This requires moby/moby#45134 to be regognized and processed properly by the daemon.

- How to verify it

Prerequisites:

  1. The Daemon includes the changes from Add support for CDI devices under Linux moby/moby#45134
  2. The Daemon is configured in experimental mode
  3. The CLI includes the changes from this PR

Create a CDI specification for testing:

cat > vendor.yaml << EOF
cdiVersion: 0.3.0
kind: vendor1.com/device
devices:
- name: test
  containerEdits:
    env:
    - FOO="injected by cdi"
EOF

Copy the generated spec to /var/run/cdi:

sudo mkdir -p /var/run/cdi
sudo cp vendor.yaml /var/run/cdi

Run the docker CLI requesting the device:

docker run --rm -ti --device vendor1.com/device=test  busybox  env
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
HOSTNAME=e9c8e63f6ce2
TERM=xterm
FOO="injected by cdi"
HOME=/root

Running with an undefined cdi device name shows:

docker run --rm -ti --device vendor1.com/device=invalid  busybox  env
docker: Error response from daemon: CDI device injection failed: unresolvable CDI devices vendor1.com/device=invalid.

If the daemon does not support CDI (e.g. is in experimental mode), the following will be shown:

docker run --rm -ti --device vendor1.com/device=test  busybox  env
docker: Error response from daemon: could not select device driver "cdi" with capabilities: [[cdi]].

- Description for the changelog

Add support for requesting CDI devices using the --device flag will show an error indicating a missing cdi driver.

- A picture of a cute animal (not mandatory but encouraged)

Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulling in nearly 40,000 lines of vendored code just to use cdi.IsQualifiedName is unlikely to be acceptable. Go is fairly good at only vendoring the parts of the package graph which are reachable from the main module, so splitting the name parsing logic from github.com/container-orchestrated-devices/container-device-interface/pkg/cdi into a separate package (within the same module) might be sufficient to cut out the unnecessary vendoring.

@elezar
Copy link
Contributor Author

elezar commented Mar 21, 2023

Pulling in nearly 40,000 lines of vendored code just to use cdi.IsQualifiedName is unlikely to be acceptable. Go is fairly good at only vendoring the parts of the package graph which are reachable from the main module, so splitting the name parsing logic from github.com/container-orchestrated-devices/container-device-interface/pkg/cdi into a separate package (within the same module) might be sufficient to cut out the unnecessary vendoring.

Yes, there have been discussions in the COD working group regarding better handling CDI dependencies. I will chat to @klihub as well to see whether we can get something done for this.

Update: I have created cncf-tags/container-device-interface#130 to move the device name validation to a separate package.

@elezar
Copy link
Contributor Author

elezar commented Mar 29, 2023

@corhere thanks for the suggestion with regards to reworking the package upstream. I have reorganised it and the dependencies that are now pulled in are more limited.

cli/command/container/opts_test.go Outdated Show resolved Hide resolved
@klueska
Copy link

klueska commented Jun 1, 2023

The PR to add CDI support to moby has been merged: moby/moby#45134
@corhere is there something still needed to move this PR along to ensure that it makes the train for Docker 25?

Copy link
Collaborator

@cpuguy83 cpuguy83 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little apprehensive to shove this into the existing --device flag due to past issues with the-v flag....
But if it works it works and I don't want to hold it up since even technically -v works with all the things we've overloaded it with... so LGTM.

@thaJeztah
Copy link
Member

We'll have a look at this; want to have a quick check with some other people on the UX 👍

@elezar
Copy link
Contributor Author

elezar commented Jun 2, 2023

Thanks @thaJeztah and others for the review. If there are UX changes required, please let us know. Note that we currently have the same support in Podman, Singularity, and CTR and it would be good to have the same UX across all CLI clients so that we don't end up in the situation we have now with the --gpu flag. If we do choose some other flag or representation, these other clients would have to be updated to also support the new flag.

With these changes the --device flag handles the following cases:

  • Linux device nodes following the pattern: /abs/path/on/host[:/abs/path/in/container][:permissions]
  • Windows devices following the pattern: class/{{UUID}}
  • CDI devices following the pattern: {{VENDOR_ID}}/{{DEVICE_CLASS}}={{DEVICE_ID}} (where none of the elements can be blank or contain slashes).

From my perspective, the risk for ambiguity across these three cases is low and is outweighed by the simplicity of the UX in reasoning about devices. That is to say, users are used to using the --device flag to request access to devices.

@elezar
Copy link
Contributor Author

elezar commented Jun 19, 2023

We'll have a look at this; want to have a quick check with some other people on the UX 👍

@thaJeztah has there been any feedback on possible UX issues?

@elezar elezar requested a review from cpuguy83 June 27, 2023 13:58
Signed-off-by: Evan Lezar <elezar@nvidia.com>
Signed-off-by: Evan Lezar <elezar@nvidia.com>
@elezar
Copy link
Contributor Author

elezar commented Jul 4, 2023

@corhere @cpuguy83 @neersighted @thaJeztah

I've updated the vendoring again. PTAL.

@codecov-commenter
Copy link

codecov-commenter commented Jul 6, 2023

Codecov Report

Merging #4084 (dbd9d5d) into master (dc2eb3b) will increase coverage by 0.01%.
The diff coverage is 100.00%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4084      +/-   ##
==========================================
+ Coverage   59.38%   59.39%   +0.01%     
==========================================
  Files         288      288              
  Lines       24749    24761      +12     
==========================================
+ Hits        14696    14708      +12     
  Misses       9169     9169              
  Partials      884      884              

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, sorry for the delay!

@thaJeztah thaJeztah merged commit b8f51d9 into docker:master Jul 6, 2023
74 checks passed
@elezar elezar deleted the add-cdi-support branch July 7, 2023 09:54
@elezar elezar mentioned this pull request Aug 25, 2023
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants