Skip to content

How to mount containerPath to a hostPath for discover NVIDIA libraries w/o CDI spec  #632

@Dragoncell

Description

@Dragoncell

Hello,

During the E2E test of changes in GPU Operator to support COS (https://gitlab.com/nvidia/kubernetes/gpu-operator/-/merge_requests/1061), I found out that to discover the nvidia lib, it requries the specific PATH/LD_LIBRARY_PATH on the pod spec:

after the pod is running

$ kubectl get pods -n gpu-operator
NAME                                                       READY   STATUS      RESTARTS   AGE
gpu-feature-discovery-rr2x2                                1/1     Running     0          4h16m
gpu-operator-66575c8958-sslch                              1/1     Running     0          4h16m
noperator-node-feature-discovery-gc-6968c7c64-g7w7r        1/1     Running     0          4h16m
noperator-node-feature-discovery-master-749679f664-dvs48   1/1     Running     0          4h16m
noperator-node-feature-discovery-worker-glhxw              1/1     Running     0          4h16m
nvidia-container-toolkit-daemonset-wvpvx                   1/1     Running     0          4h16m
nvidia-cuda-validator-z84ks                                0/1     Completed   0          4h15m
nvidia-dcgm-exporter-9r87v                                 1/1     Running     0          4h16m
nvidia-device-plugin-daemonset-fp7hm                       1/1     Running     0          4h16m
nvidia-operator-validator-hstkb                            1/1     Running     0          4h16m

and deploy the GPU workload

apiVersion: v1
kind: Pod
metadata:
  name: my-gpu-pod
spec:
  containers:
  - name: my-gpu-container
    image: nvidia/cuda:11.0.3-base-ubuntu20.04
    command: ["bash", "-c"]
    args: 
    - |-
      # export PATH="$PATH:/home/kubernetes/bin/nvidia/bin";
      # export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/kubernetes/bin/nvidia/lib64;
      nvidia-smi;
    resources:
      limits: 
        nvidia.com/gpu: "1"

I looked at the OCI spec of the container, the PATH looks like PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

In GKE's device plugin case, we expect that nvidia bin under /usr/local. (https://github.com/GoogleCloudPlatform/container-engine-accelerators/blob/145797868c0f6bd6a0f37c0295f06dfe5fa94265/cmd/nvidia_gpu/nvidia_gpu.go#L42)

Is there something similar we can configure in the k8s device plugin as well so that container path /usr/local could mount to a nvidia bin dir, which is /home/kubernetes/bin/nvidia on the host ? Thanks

Metadata

Metadata

Assignees

Labels

lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions