-
Notifications
You must be signed in to change notification settings - Fork 423
Fix leaking of tmpfs mount in CDI mode #1168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Pull Request Test Coverage Report for Build 16051212578Details
💛 - Coveralls |
38c25d5
to
1c79350
Compare
return unix.Mount("tmpfs", target, "tmpfs", 0, fmt.Sprintf("size=%d", size)) | ||
} | ||
|
||
func createFileInRoot(containerRootDirPath string, destinationPath string, mode os.FileMode) (string, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: This function also exists in internal/ldconfig
we should move this to a separate pacakge.
_, _, err = runner.Run("mkdir -p /tmp/empty") | ||
Expect(err).ToNot(HaveOccurred()) | ||
|
||
_, _, err = runner.Run("mount | sort > /tmp/mounts.before") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about capturing the output of mount | sort
as variables and then comparing these using the Gomega matchers?
Expect(output).To(Equal("ModifyDeviceFiles: 0\n")) | ||
}) | ||
|
||
//sudo docker run --runtime=nvidia --rm -ti -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=all --mount type=bind,source=/tmp/empty,target=/empty,bind-propagation=shared ubuntu true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, sorry it was me during devel, to not forget that bit
}) | ||
|
||
//sudo docker run --runtime=nvidia --rm -ti -e NVIDIA_VISIBLE_DEVICES=all -e NVIDIA_DRIVER_CAPABILITIES=all --mount type=bind,source=/tmp/empty,target=/empty,bind-propagation=shared ubuntu true | ||
It("should work with nvidia-container-runtime", func(ctx context.Context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's also add a case for --runtime=runc
for the "legacy" code path. As a general question, could we add a tag to tests to indicate that it's targeting the legacy code path?
BeforeAll(func(ctx context.Context) { | ||
_, _, err := runner.Run("docker pull ubuntu") | ||
Expect(err).ToNot(HaveOccurred()) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Let's remove this from the diff.
50ac666
to
91f1a73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors how the NVIDIA container runtime hook creates and mounts its params file—switching to procfd-based APIs and secure file creation to prevent mount leaks—and adds end-to-end tests to ensure no host mounts remain after running a container.
- Switch
createParamsFileInContainer
to useutils.WithProcfd
and acreateFileInRoot
helper for safer tmpfs and bind mounts. - Introduce a secure, mknodat-based file creation function (
createFileInRoot
). - Add E2E tests that record host mounts before/after running containers in both legacy and nvidia runtimes to catch mount leaks.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
File | Description |
---|---|
tests/e2e/nvidia-container-toolkit_test.go | New “Disabling device node creation” suite: captures host mounts, runs containers, and asserts no new mounts |
cmd/nvidia-cdi-hook/disable-device-node-modification/params_linux.go | Refactored createParamsFileInContainer to use procfd mounts and secure file creation, replacing temp dir logic |
Comments suppressed due to low confidence (1)
cmd/nvidia-cdi-hook/disable-device-node-modification/params_linux.go:44
- [nitpick] The error message could include the target path (e.g.,
hookScratchDirPath
) to make debugging mount failures clearer.
return fmt.Errorf("failed to create tmpfs mount for params file: %w", err)
91f1a73
to
70cb383
Compare
This change adds e2e tests to ensure that running a container with shared mount propagaion does not result in leaked mounts. Note that as soon as a single bind mount is requested with shared propagation, the rootfs is also mounted with shared propagation. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Evan Lezar <elezar@nvidia.com>
This change ensures that the tmpfs mount created for the modified NVIDIA params file does not leak to the host. Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com> Co-authored-by: Evan Lezar <elezar@nvidia.com>
bc9ebdd
to
6d68eb7
Compare
This patch fixes the handling of temporary files and directories in the NVIDIA container runtime hook, ensuring we don't leak tmpfs mounts when handling the params file in the container when in CDI mode. This is done by ensuring that the tmpfs mount that we create for the modified params file is created in the container's mount namespace instead of on the host.
We also add end2end tests to check for leaking of mounts.