Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions cmd/nvidia-ctk-installer/container/container.go
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,9 @@ type Options struct {
SetAsDefault bool
RestartMode string
HostRootMount string
// NvidiaConfig specifies the path to the NVIDIA-specific config file to use instead of
// modifying the main configuration file.
NvidiaConfig string
}

// Configure applies the options to the specified config
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -180,5 +180,6 @@ func getRuntimeConfig(o *container.Options, co *Options) (engine.Interface, erro
containerd.WithRuntimeType(co.runtimeType),
containerd.WithUseLegacyConfig(co.useLegacyConfig),
containerd.WithContainerAnnotations(co.containerAnnotationsFromCDIPrefixes()...),
containerd.WithNvidiaConfig(o.NvidiaConfig),
)
}
1 change: 1 addition & 0 deletions cmd/nvidia-ctk-installer/container/runtime/crio/crio.go
Original file line number Diff line number Diff line change
Expand Up @@ -206,5 +206,6 @@ func getRuntimeConfig(o *container.Options) (engine.Interface, error) {
toml.FromFile(o.Config),
),
),
crio.WithNvidiaConfig(o.NvidiaConfig),
)
}
9 changes: 9 additions & 0 deletions cmd/nvidia-ctk-installer/container/runtime/runtime.go
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ const (
defaultHostRootMount = "/host"

runtimeSpecificDefault = "RUNTIME_SPECIFIC_DEFAULT"

defaultNVIDIARuntimeConfigFilePath = "/etc/nvidia-container-runtime/config.d/99-nvidia.conf"
)

type Options struct {
Expand All @@ -54,6 +56,13 @@ func Flags(opts *Options) []cli.Flag {
Destination: &opts.Config,
Sources: cli.EnvVars("RUNTIME_CONFIG", "CONTAINERD_CONFIG", "DOCKER_CONFIG"),
},
&cli.StringFlag{
Name: "drop-in-config",
Usage: "Path to the NVIDIA-specific config file to create. When specified, runtime configurations are saved to this file instead of modifying the main config file",
Destination: &opts.NvidiaConfig,
Value: defaultNVIDIARuntimeConfigFilePath,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: How does this work for crio? Do we also have the ability to update the imports there? Is this something that we want to do?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible to point to a diff location than the default one (see https://www.ibm.com/docs/en/zoscp/1.1.0?topic=options-crio-commands-flags#crio-commands__title__4)
--config-dir | Path to the configuration drop-in directory.

So we could offer the same drop-in-config flag on the crio side.
Let me see how that would look like

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear. I'm ok for runtime-specific defaults here. We already do this for other settings here:

// Apply the runtime-specific config changes.
switch runtime {
case containerd.Name:
if opts.Config == runtimeSpecificDefault {
opts.Config = containerd.DefaultConfig
}
if opts.Socket == runtimeSpecificDefault {
opts.Socket = containerd.DefaultSocket
}
if opts.RestartMode == runtimeSpecificDefault {
opts.RestartMode = containerd.DefaultRestartMode
}
case crio.Name:
if opts.Config == runtimeSpecificDefault {
opts.Config = crio.DefaultConfig
}
if opts.Socket == runtimeSpecificDefault {
opts.Socket = crio.DefaultSocket
}
if opts.RestartMode == runtimeSpecificDefault {
opts.RestartMode = crio.DefaultRestartMode
}
case docker.Name:
if opts.Config == runtimeSpecificDefault {
opts.Config = docker.DefaultConfig
}
if opts.Socket == runtimeSpecificDefault {
opts.Socket = docker.DefaultSocket
}
if opts.RestartMode == runtimeSpecificDefault {
opts.RestartMode = docker.DefaultRestartMode
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think is ok to go with defaults for now

Sources: cli.EnvVars("RUNTIME_DROP_IN_CONFIG"),
},
&cli.StringFlag{
Name: "executable-path",
Usage: "The path to the runtime executable. This is used to extract the current config",
Expand Down
11 changes: 11 additions & 0 deletions cmd/nvidia-ctk/runtime/configure/configure.go
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the following:

nvidia-ctk runtime configure \
    --runtime=containerd \
    --config=/etc/containerd/config.toml \
    --nvidia-config=/etc/containerd/conf.d/99-nvidia.toml

and

nvidia-ctk runtime configure \
    --runtime=crio \
    --config=/etc/crio/crio.conf \
    --nvidia-config=/etc/crio/conf.d/99-nvidia.toml

the drop in dirs can be inferred as the PARENT dir of the nvidia-config file.

Note that IFF --nvidia-config="" we could trigger the same behaviour as --dry-run in which case no config modifications are made to the specified config files.

Does it make sense to set up tests for the various combinations here and then use that to drive development?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented this approach. PTAL

Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,8 @@ const (
defaultConfigSource = configSourceFile
configSourceCommand = "command"
configSourceFile = "file"

defaultNVIDIARuntimeConfigFilePath = "/etc/nvidia-container-runtime/config.d/99-nvidia.conf"
)

type command struct {
Expand All @@ -73,6 +75,7 @@ type config struct {
configSource string
mode string
hookFilePath string
nvidiaConfig string

nvidiaRuntime struct {
name string
Expand Down Expand Up @@ -118,6 +121,12 @@ func (m command) build() *cli.Command {
Usage: "path to the config file for the target runtime",
Destination: &config.configFilePath,
},
&cli.StringFlag{
Name: "drop-in-config",
Usage: "path to the NVIDIA-specific config file to create. When specified, runtime configurations are saved to this file instead of modifying the main config file",
Destination: &config.nvidiaConfig,
Value: defaultNVIDIARuntimeConfigFilePath,
},
&cli.StringFlag{
Name: "executable-path",
Usage: "The path to the runtime executable. This is used to extract the current config",
Expand Down Expand Up @@ -268,12 +277,14 @@ func (m command) configureConfigFile(config *config) error {
containerd.WithLogger(m.logger),
containerd.WithPath(config.configFilePath),
containerd.WithConfigSource(configSource),
containerd.WithNvidiaConfig(config.nvidiaConfig),
)
case "crio":
cfg, err = crio.New(
crio.WithLogger(m.logger),
crio.WithPath(config.configFilePath),
crio.WithConfigSource(configSource),
crio.WithNvidiaConfig(config.nvidiaConfig),
)
case "docker":
cfg, err = docker.New(
Expand Down
161 changes: 160 additions & 1 deletion pkg/config/engine/containerd/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ package containerd

import (
"fmt"
"os"
"path/filepath"

"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/engine"
"github.com/NVIDIA/nvidia-container-toolkit/pkg/config/toml"
Expand Down Expand Up @@ -123,12 +125,38 @@ func (c *Config) EnableCDI() {
*c.Tree = config
}

// RemoveRuntime removes a runtime from the docker config
// RemoveRuntime removes a runtime from the containerd config
func (c *Config) RemoveRuntime(name string) error {
if c == nil || c.Tree == nil {
return nil
}

// If using NVIDIA-specific configuration, handle file cleanup
if c.nvidiaConfig != "" {
// Check if all NVIDIA runtimes are being removed
remainingNvidiaRuntimes := 0
if runtimes := c.GetPath([]string{"plugins", c.CRIRuntimePluginName, "containerd", "runtimes"}); runtimes != nil {
if runtimesTree, ok := runtimes.(*toml.Tree); ok {
for _, runtimeName := range runtimesTree.Keys() {
if c.isNvidiaRuntime(runtimeName) && runtimeName != name {
remainingNvidiaRuntimes++
}
}
}
}
Comment on lines +134 to +146
Copy link
Preview

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This runtime counting logic is duplicated between CRI-O and containerd implementations. Consider extracting this into a shared helper function to reduce code duplication.

Copilot uses AI. Check for mistakes.


// If this is the last NVIDIA runtime, remove the NVIDIA config file
if remainingNvidiaRuntimes == 0 {
if err := os.Remove(c.nvidiaConfig); err != nil && !os.IsNotExist(err) {
c.Logger.Warningf("Failed to remove NVIDIA config file %s: %v", c.nvidiaConfig, err)
} else {
c.Logger.Infof("Removed NVIDIA config file: %s", c.nvidiaConfig)
}
// Don't modify the in-memory tree when using NVIDIA-specific configuration
return nil
}
}

config := *c.Tree

config.DeletePath([]string{"plugins", c.CRIRuntimePluginName, "containerd", "runtimes", name})
Expand All @@ -154,3 +182,134 @@ func (c *Config) RemoveRuntime(name string) error {
*c.Tree = config
return nil
}

// Save writes the config to the specified path or NVIDIA-specific config file
func (c *Config) Save(path string) (int64, error) {
if c.nvidiaConfig == "" {
// Backward compatibility: save to main config
return c.Tree.Save(path)
}

// Ensure directory for NVIDIA config file exists
dir := filepath.Dir(c.nvidiaConfig)
if err := os.MkdirAll(dir, 0755); err != nil {
return 0, fmt.Errorf("failed to create directory for NVIDIA config: %w", err)
}

// Save runtime configs to NVIDIA config file
nvidiaConfig := c.extractRuntimeConfig()
n, err := nvidiaConfig.Save(c.nvidiaConfig)
if err != nil {
return n, fmt.Errorf("failed to save NVIDIA config: %w", err)
}

// Update main config with imports directive
if err := c.updateMainConfigImports(path); err != nil {
// Try to clean up the NVIDIA config file on error
os.Remove(c.nvidiaConfig)
Copy link
Preview

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error from os.Remove is silently ignored during cleanup. Consider logging the cleanup failure or handling it appropriately, as this could indicate permission issues or other problems.

Suggested change
os.Remove(c.nvidiaConfig)
if removeErr := os.Remove(c.nvidiaConfig); removeErr != nil {
c.Logger.Errorf("failed to remove NVIDIA config file during cleanup: %v", removeErr)
}

Copilot uses AI. Check for mistakes.

return n, fmt.Errorf("failed to update main config imports: %w", err)
}

c.Logger.Infof("Wrote NVIDIA runtime configuration to: %s", c.nvidiaConfig)
return n, nil
}

// extractRuntimeConfig creates a new config tree with only runtime configurations
func (c *Config) extractRuntimeConfig() *toml.Tree {
config, _ := toml.TreeFromMap(map[string]interface{}{
"version": c.Version,
})

// Extract runtime configurations for NVIDIA runtimes
if runtimes := c.GetPath([]string{"plugins", c.CRIRuntimePluginName, "containerd", "runtimes"}); runtimes != nil {
if runtimesTree, ok := runtimes.(*toml.Tree); ok {
nvidiaRuntimes, _ := toml.TreeFromMap(map[string]interface{}{})
for _, name := range runtimesTree.Keys() {
if c.isNvidiaRuntime(name) {
if runtime := runtimesTree.Get(name); runtime != nil {
nvidiaRuntimes.Set(name, runtime)
}
}
}
if len(nvidiaRuntimes.Keys()) > 0 {
config.SetPath([]string{"plugins", c.CRIRuntimePluginName, "containerd", "runtimes"}, nvidiaRuntimes)
}
}
}

// Extract default runtime name if it's one of ours
if defaultRuntime, ok := c.GetPath([]string{"plugins", c.CRIRuntimePluginName, "containerd", "default_runtime_name"}).(string); ok {
if c.isNvidiaRuntime(defaultRuntime) {
config.SetPath([]string{"plugins", c.CRIRuntimePluginName, "containerd", "default_runtime_name"}, defaultRuntime)
}
}

// Extract CDI enablement
if cdiEnabled, ok := c.GetPath([]string{"plugins", c.CRIRuntimePluginName, "enable_cdi"}).(bool); ok && cdiEnabled {
config.SetPath([]string{"plugins", c.CRIRuntimePluginName, "enable_cdi"}, true)
}

return config
}

// updateMainConfigImports ensures the main config includes an imports directive
func (c *Config) updateMainConfigImports(path string) error {
// Load the main config file
mainConfig, err := toml.FromFile(path).Load()
if err != nil {
// If the file doesn't exist, create a minimal config with imports
if os.IsNotExist(err) {
mainConfig, _ = toml.TreeFromMap(map[string]interface{}{
"version": c.Version,
})
} else {
return fmt.Errorf("failed to load main config: %w", err)
}
}

// Add imports directive if not present
importPattern := c.nvidiaConfig
imports := mainConfig.Get("imports")
if imports == nil {
mainConfig.Set("imports", []string{importPattern})
} else if importsList, ok := imports.([]interface{}); ok {
Comment on lines +273 to +275
Copy link
Preview

Copilot AI Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The imports handling logic has complex type checking with multiple nested conditions. Consider refactoring this into separate helper functions to improve readability and maintainability.

Copilot uses AI. Check for mistakes.

// Check if the import pattern already exists
found := false
for _, imp := range importsList {
if impStr, ok := imp.(string); ok && impStr == importPattern {
found = true
break
}
}
if !found {
// Add our import pattern
importsList = append(importsList, importPattern)
mainConfig.Set("imports", importsList)
}
} else if importsStrList, ok := imports.([]string); ok {
// Check if the import pattern already exists
found := false
for _, imp := range importsStrList {
if imp == importPattern {
found = true
break
}
}
if !found {
// Add our import pattern
importsStrList = append(importsStrList, importPattern)
mainConfig.Set("imports", importsStrList)
}
} else {
return fmt.Errorf("unexpected imports type: %T", imports)
}

// Save the updated main config
_, err = mainConfig.Save(path)
return err
}

// isNvidiaRuntime checks if the runtime name is an NVIDIA runtime
func (c *Config) isNvidiaRuntime(name string) bool {
return name == "nvidia" || name == "nvidia-cdi" || name == "nvidia-legacy"
Comment on lines +312 to +314
Copy link
Preview

Copilot AI Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded runtime names are duplicated between containerd and crio implementations. Consider extracting these to a shared constant or configuration to avoid inconsistencies.

Suggested change
// isNvidiaRuntime checks if the runtime name is an NVIDIA runtime
func (c *Config) isNvidiaRuntime(name string) bool {
return name == "nvidia" || name == "nvidia-cdi" || name == "nvidia-legacy"
// nvidiaRuntimeNames contains the recognized NVIDIA runtime names.
var nvidiaRuntimeNames = []string{"nvidia", "nvidia-cdi", "nvidia-legacy"}
// isNvidiaRuntime checks if the runtime name is an NVIDIA runtime
func (c *Config) isNvidiaRuntime(name string) bool {
for _, n := range nvidiaRuntimeNames {
if name == n {
return true
}
}
return false

Copilot uses AI. Check for mistakes.

}
2 changes: 1 addition & 1 deletion pkg/config/engine/containerd/config_v1.go
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,7 @@ func (c *ConfigV1) RemoveRuntime(name string) error {

// Save writes the config to a file
func (c ConfigV1) Save(path string) (int64, error) {
return (Config)(c).Save(path)
return (*Config)(&c).Save(path)
Comment on lines 124 to +125
Copy link
Preview

Copilot AI Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This type conversion creates a pointer to a copy of the struct instead of using the existing pointer. It should be return ((*Config)(c)).Save(path) to avoid creating an unnecessary copy.

Copilot uses AI. Check for mistakes.

}

func (c *ConfigV1) GetRuntimeConfig(name string) (engine.RuntimeConfig, error) {
Expand Down
4 changes: 4 additions & 0 deletions pkg/config/engine/containerd/containerd.go
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,9 @@ type Config struct {
// for the CRI runtime service. The name of this plugin was changed in v3 of the
// containerd configuration file.
CRIRuntimePluginName string
// nvidiaConfig specifies the path to the NVIDIA-specific configuration file.
// If set, runtime configurations will be saved to this file instead of the main config.
nvidiaConfig string
}

var _ engine.Interface = (*Config)(nil)
Expand Down Expand Up @@ -108,6 +111,7 @@ func New(opts ...Option) (engine.Interface, error) {
RuntimeType: b.runtimeType,
UseLegacyConfig: b.useLegacyConfig,
ContainerAnnotations: b.containerAnnotations,
nvidiaConfig: b.nvidiaConfig,
}

switch configVersion {
Expand Down
10 changes: 10 additions & 0 deletions pkg/config/engine/containerd/option.go
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ type builder struct {
path string
runtimeType string
containerAnnotations []string
nvidiaConfig string
}

// Option defines a function that can be used to configure the config builder
Expand Down Expand Up @@ -82,3 +83,12 @@ func WithContainerAnnotations(containerAnnotations ...string) Option {
b.containerAnnotations = containerAnnotations
}
}

// WithNvidiaConfig sets the NVIDIA-specific config file path for the config builder.
// When set, configurations will be saved to this file instead of modifying
// the main config file directly.
func WithNvidiaConfig(path string) Option {
return func(b *builder) {
b.nvidiaConfig = path
}
}
Loading
Loading