Skip to content

GPU Operator with KubeVirt - Node Configuration #550

@doronkg

Description

@doronkg

Hi! The following excerpt is taken from the official documentation on GPU Operator with KubeVirt docs:

The GPU Operator can now be configured to deploy different software components on worker nodes depending on what GPU workload is configured to run on those nodes. Consider the following example.

Node A is configured to run containers.
Node B is configured to run virtual machines with Passthrough GPU.
Node C is configured to run virtual machines with vGPU.

We provide GPU capabilities through OpenShift, utilizing the NVIDIA GPU Operator. We aim to enable both GPUs for containers and vGPUs for VMs on the same cluster, ideally on the same nodes.

According to the latest NVIDIA GPU Operator release notes, it seems like this isn't feasible at all, as:

The NVIDIA GPU Operator can only be used to deploy a single NVIDIA GPU Driver type and version. The NVIDIA vGPU and Data Center GPU Driver cannot be used within the same cluster.

This discrepancy in the documentation raises several questions:

  1. Is it, or is it not possible to configure GPU access for both containers and VMs on the same cluster or node?
    • If not, is this limitation inherent to the software, or could it be overcome through operator configuration workarounds?
    • If not, are there any plans to introduce dual configuration support for the same cluster/node in the future?
      • If the answer is yes, is it under open discussion & development that accepts contributions?
  2. Can vGPUs be utilized through containers, thereby allowing both methods to be employed using a single configuration?

Metadata

Metadata

Assignees

No one assigned

    Labels

    lifecycle/staleDenotes an issue or PR has remained open with no activity and has become stale.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions