-
Notifications
You must be signed in to change notification settings - Fork 441
Add basic integration tests for docker runtime #860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
test/e2e/e2e_test.go
Outdated
|
|
||
| // If there's an error, include stderr in the error message | ||
| if err != nil { | ||
| return "", errors.New(stderr.String()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should deffinitely log the output and return a wrapped err here instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would we not still want to return the stdout?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I introduced an error on purpose to see how it would look like
[FAILED] in [It] - /localhome/local-eduardoa/test/e2e/nvidia-container-toolkit_test.go:45 @ 01/14/25 16:30:38.847
• [FAILED] [1.528 seconds]
docker when running nvidia-smi -L [It] it should show the same output from outside a container
/localhome/local-eduardoa/test/e2e/nvidia-container-toolkit_test.go:38
[FAILED] Unexpected error:
<*errors.errorString | 0xc000128030>:
Unable to find image 'bla:latest' locally
docker: Error response from daemon: pull access denied for bla, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.
See 'docker run --help'.
{
s: "Unable to find image 'bla:latest' locally\ndocker: Error response from daemon: pull access denied for bla, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.\nSee 'docker run --help'.\n",
}
occurred
In [It] at: /localhome/local-eduardoa/test/e2e/nvidia-container-toolkit_test.go:45 @ 01/14/25 16:30:38.847There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My question was more on whether this is idiomatic. What about:
if err := cmd.Run(); err != nil {
GinkgoLogr.Error(err, "Failed to run script:\nSTDERR: %v\nSTDOUT: %v", stderr.String(), stdout.String())
return "", err
}
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and error now reads
docker when running nvidia-smi -L should support NVIDIA_VISIBLE_DEVICES
/localhome/local-eduardoa/test/e2e/nvidia-container-toolkit_test.go:43
[FAILED] in [It] - /localhome/local-eduardoa/test/e2e/nvidia-container-toolkit_test.go:45 @ 01/15/25 11:31:24.787
• [FAILED] [1.621 seconds]
docker when running nvidia-smi -L [It] should support NVIDIA_VISIBLE_DEVICES
/localhome/local-eduardoa/test/e2e/nvidia-container-toolkit_test.go:43
[FAILED] Unexpected error:
<*errors.errorString | 0xc00020c4c0>:
script execution failed: exit status 125
STDOUT:
STDERR: Unable to find image 'bla:latest' locally
docker: Error response from daemon: pull access denied for bla, repository does not exist or may require 'docker login': denied: requested access to the resource is denied.
See 'docker run --help'.
{
s: "script execution failed: exit status 125\nSTDOUT: \nSTDERR: Unable to find image 'bla:latest' locally\ndocker: Error response from daemon: pull access denied for bla, repository do
es not exist or may require 'docker login': denied: requested access to the resource is denied.\nSee 'docker run --help'.\n",
}
occurred
In [It] at: /localhome/local-eduardoa/test/e2e/nvidia-container-toolkit_test.go:45 @ 01/15/25 11:31:24.787There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Not going to block on this.
| // container shows the same output inside the container as outside the | ||
| // container. This means that the following commands must all produce | ||
| // the same output | ||
| When("Running nvidia-smi -L", Ordered, func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Ordered required here? Is there a way that the different mechanisms to select devices are treated as separate "contexts"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from documentation: Ordered is a decorator that allows you to mark a container as ordered. Specs in the container will always run in the order they appear. They will never be randomized and they will never run in parallel with one another, though they may run in parallel with other specs.
So is not required, we can remove it if we want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are no specs in this container though, or am I misunderstanding what a "Spec" is? Can we rewrite the separate tests as Specs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean an It() for each run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have updated the PR description to reflect new changes related to this thread
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also update the title to make it more specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both PR title and description are updated now
43b82e4 to
097f649
Compare
894da55 to
255892b
Compare
| Expect(err).ToNot(HaveOccurred()) | ||
| }) | ||
|
|
||
| Describe("comparing outputs of nvidia-smi -L inside and outside a container", func() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the Describe spec? What about:
When("running nvidia-smi -L")
It("should support NVIDIA_VISIBLE_DEVICES")
It("should support automatic CDI spec generation")
It("should support the --gpus flag using the nvidia-container-runtime-hook")
It("should support the --gpus flag using the nvidia-container-runtime")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not needed, let me check how it looks without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, I have removed the Describe , I decided to keep the strings as is for now to reflect the test being run as described in the design doc, we can edit the strings as a follow up PR once we add further complexity like installing and uninstalling things during testing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we want to match the strings to the doc? They should signify the intent, and if that's not clear from the doc then the doc should be updated.
To sumarize: We want these tests to validate that one is able to run nvidia-smi in a container and get the expected result regardless of the mechanism used to inject the devices into the container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, will edit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should I also re-write the vectorAdd and deviceQuery tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ready for review
255892b to
b22dfd8
Compare
b22dfd8 to
ede9606
Compare
| _, err := runScript("docker pull ubuntu") | ||
| Expect(err).ToNot(HaveOccurred()) | ||
|
|
||
| rawOut, err = runScript("nvidia-smi -L") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| rawOut, err = runScript("nvidia-smi -L") | |
| hostOutput, err = runScript("nvidia-smi -L") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| }) | ||
|
|
||
| It("should support NVIDIA_VISIBLE_DEVICES", func(ctx context.Context) { | ||
| outContainer, err := runScript("docker run --rm -i --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all ubuntu nvidia-smi -L") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| outContainer, err := runScript("docker run --rm -i --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all ubuntu nvidia-smi -L") | |
| containerOutput, err := runScript("docker run --rm -i --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all ubuntu nvidia-smi -L") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| out, err = runScript("docker run --rm -i --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0") | ||
| Expect(err).ToNot(HaveOccurred()) | ||
|
|
||
| // look for string "Test PASSED" in the output, if not found, fail the test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this comment says more than what is already readable through the expect statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
|
||
| It("should support NVIDIA_VISIBLE_DEVICES", func(ctx context.Context) { | ||
| var err error | ||
| out, err = runScript("docker run --rm -i --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
| out, err = runScript("docker run --rm -i --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0") | |
| containerOutput, err = runScript("docker run --rm -i --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all nvcr.io/nvidia/k8s/cuda-sample:vectoradd-cuda12.5.0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| Expect(err).ToNot(HaveOccurred()) | ||
| }) | ||
|
|
||
| var out string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alternatively call this referenceOutput
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
| @@ -0,0 +1,20 @@ | |||
| module github.com/NVIDIA/nvidia-container-toolkit/test | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not critical to this PR, but we should ensure that we update the dependabot configs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was it done here?
|
Thanks @ArangoGutierrez. This looks really good now. I have some minor comments around variable naming. |
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
ede9606 to
8cc672b
Compare
elezar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ArangoGutierrez. Let's get this in.
Add a set of basic integration tests for using the NVIDIA Container Toolkit with Docker.
This adds the following tests:
nvidia-smitestsvectorAddtestsdeviceQuerytestsThese tests assume that the NVIDIA Container Toolkit is installed and that Docker is configured with the
nvidiaruntime.Example output