Skip to content

feat: add ExtraVolumes and ExtraVolumeMounts to all daemonset specs#2507

Open
75asu wants to merge 1 commit into
NVIDIA:mainfrom
75asu:feat/add-extra-volumes-and-mounts-to-daemonsets
Open

feat: add ExtraVolumes and ExtraVolumeMounts to all daemonset specs#2507
75asu wants to merge 1 commit into
NVIDIA:mainfrom
75asu:feat/add-extra-volumes-and-mounts-to-daemonsets

Conversation

@75asu
Copy link
Copy Markdown

@75asu 75asu commented May 30, 2026

Resolves #1532

cc @guptaNswati -- you added the help wanted + feature labels and invited a PR. Here it is.

Description

Adds ExtraVolumes ([]corev1.Volume) and ExtraVolumeMounts ([]corev1.VolumeMount) fields to every daemonset-managing component spec, so ClusterPolicy users can mount arbitrary host paths or other volume sources into the operator-managed daemonsets. The motivating case (issue #1532) is GKE, where NVIDIA driver libraries live at a non-standard path (/home/kubernetes/bin/nvidia/) and have to be exposed to dcgm-exporter and gpu-feature-discovery pods that don't request a GPU resource.

The PR is "full sweep" across all 16 daemonset specs in ClusterPolicySpec -- not just the two mentioned in the issue body -- for consistency:

DriverSpec, ToolkitSpec, DevicePluginSpec, SandboxDevicePluginSpec, KataDevicePluginSpec, DCGMExporterSpec, DCGMSpec, NodeStatusExporterSpec, GPUFeatureDiscoverySpec, MIGManagerSpec, KataManagerSpec, CCManagerSpec, VFIOManagerSpec, VGPUManagerSpec, VGPUDeviceManagerSpec, ValidatorSpec.

How it works

A single helper applyExtraVolumes(obj, vols, mounts) is added near the existing addExtraAnnotations helper. Each Transform* daemonset function gets one new line at the end, after applyHostNetworkConfig, calling the helper with the matching config.<Component>.ExtraVolumes / ExtraVolumeMounts. Mounts are appended to every container in the pod spec; init containers are intentionally left untouched (they are operator-owned setup steps, not user-facing application containers).

Two transforms share a parent spec by design:

  • TransformMPSControlDaemon reads from DevicePluginSpec (same as TransformDevicePlugin)
  • TransformSandboxValidator reads from ValidatorSpec (same as TransformValidator)

Both are commented inline to make the sharing explicit.

Example usage after this change

apiVersion: nvidia.com/v1
kind: ClusterPolicy
spec:
  dcgmExporter:
    extraVolumes:
      - name: nvidia-install-dir-host
        hostPath: { path: /home/kubernetes/bin/nvidia }
    extraVolumeMounts:
      - name: nvidia-install-dir-host
        mountPath: /usr/local/nvidia

Note on diff size

The CRD yaml files (config/crd/bases/, bundle/manifests/, deployments/gpu-operator/crds/) gain ~36k lines each. That is the expected expansion: corev1.Volume is a tagged union over ~30 volume source types, and the OpenAPI schema for it inlines all of them. Multiplied across 16 specs and mirrored to three CRD locations, the line count grows fast. All three yaml files are output of make manifests + make sync-crds; only the four hand-written files matter for review:

  • api/nvidia/v1/clusterpolicy_types.go -- the 32 field declarations
  • controllers/object_controls.go -- the helper + 18 call sites
  • controllers/transforms_test.go -- new test case for TransformDCGMExporter
  • api/nvidia/v1/zz_generated.deepcopy.go -- autogen from controller-gen object

Checklist

  • No secrets, sensitive information, or unrelated changes
  • Lint checks passing (make lint) -- 0 issues
  • Generated assets in-sync (make validate-generated-assets)
  • Go mod artifacts in-sync (make validate-modules) -- all modules verified
  • Test cases are added for new code paths

Testing

  • New sub-case transform_dcgm_exporter_with_extra_volumes_and_volume_mounts added to TestTransformDCGMExporter exercising both ExtraVolumes (single hostPath volume) and ExtraVolumeMounts (single mount). Verifies the volume is appended to the pod spec and the mount is appended to every container.
  • Full controllers test suite passes locally.
  • The other 15 daemonsets follow the identical pattern, so the single test exercises the helper that every other transform calls. Happy to add per-component tests in a follow-up if maintainers prefer; wanted to keep this PR scoped.

Signed-off-by: Asutosh Panda <asutosh.pda@gmail.com>
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 30, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ability to add extra volumes and extra volume mounts to daemonsets

1 participant