Skip to content

Conversation

elezar
Copy link
Member

@elezar elezar commented Sep 19, 2025

@elezar
Copy link
Member Author

elezar commented Sep 19, 2025

Please note. This is a very quick first draft.

@elezar elezar force-pushed the drop-in-file-support branch 2 times, most recently from fff1388 to 456ff8f Compare September 19, 2025 14:37
topLevelConfigFile = value
}
// TODO: This should be a sane default.
dropInConfigFile := "/run/toolkit/config/99-nvidia.toml"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question -- what about /run/nvidia/toolkit/config/99-nvidia.toml? The toolkit container creates the toolkit.pid file at /run/nvidia/toolkit/toolkit.pid

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have updated this to /run/nvidia/toolkit/config/99-nvidia.toml.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tariq1890 and I discussed this offline. I have updated the host path to /etc/containerd/conf.d/99-nvidia.toml

case gpuv1.Containerd.String():
runtimeConfigFile = DefaultContainerdConfigFile
topLevelConfigFile := DefaultContainerdConfigFile
// TODO: We should also read RUNTIME_CONFIG here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack. Should RUNTIME_CONFIG, if defined, take precedence over CONTAINERD_CONFIG/CRIO_CONFIG?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the latter should have higher precedence as it is more specific than RUNTIME_CONFIG. If you're in agreement with this, can we resolve the TODO in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Let me update this...

Copy link
Contributor

@cdesiniotis cdesiniotis Sep 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated so we now read RUNTIME_CONFIG as well, with the more specific envvars, e.g. CONTAINERD_CONFIG / CRIO_CONFIG, taking precedence. I also added additional unit test cases for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@cdesiniotis cdesiniotis marked this pull request as ready for review September 23, 2025 22:48
@cdesiniotis cdesiniotis force-pushed the drop-in-file-support branch 2 times, most recently from aad82d3 to c5dc4f4 Compare September 24, 2025 16:13
@tariq1890
Copy link
Contributor

Why can't we mount the parent dir of the top-level config and the drop-in config dir?

It seems like a lot of the container-toolkit implementation details are spilling into this PR? Any way to minimise?

@cdesiniotis
Copy link
Contributor

Why can't we mount the parent dir of the top-level config and the drop-in config dir?

It seems like a lot of the container-toolkit implementation details are spilling into this PR? Any way to minimise?

We are mounting the parent dir for both config files.

This change adds explicit support for drop-in configs as supported
by containerd and cri-o.

Signed-off-by: Evan Lezar <elezar@nvidia.com>
Co-authored-by: Christopher Desiniotis <cdesiniotis@nvidia.com>
@tariq1890
Copy link
Contributor

Coverage increased (+0.3%) to 21.388%

🚀 🚀 🚀

@cdesiniotis cdesiniotis merged commit 1513b72 into NVIDIA:main Sep 24, 2025
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants