Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion gpu-operator/getting-started.rst
Original file line number Diff line number Diff line change
Expand Up @@ -347,7 +347,8 @@ with the NVIDIA GPU Operator.
Refer to the :ref:`GPU Operator Component Matrix` on the platform support page.

When using RHEL8 with Kubernetes, SELinux must be enabled either in permissive or enforcing mode for use with the GPU Operator.
Additionally, network restricted environments are not supported.
Additionally, when using RHEL8 with containerd as the runtime and SELinux is enabled (either in permissive or enforcing mode) at the host level, containerd must also be configured for SELinux, by setting the ``enable_selinux=true`` configuration option.
Note, network restricted environments are not supported.


Pre-Installed NVIDIA GPU Drivers
Expand Down
2 changes: 1 addition & 1 deletion gpu-operator/release-notes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ New Features

* Added support for the NVIDIA Data Center GPU Driver version 570.124.06.

* Added support for KubeVirt and OpenShift Virtualization with vGPU v18 for A30, A100, and H100 GPUs.
* Added support for KubeVirt and OpenShift Virtualization with vGPU v18 on H200NVL.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cdesiniotis just wanted to double check this change, we talked about it quick in a standup recently, but I'm not sure i wrote the details down right

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is correct.


* Added support for NVIDIA Network Operator v25.1.0.
Refer to :ref:`Support for GPUDirect RDMA` and :ref:`Support for GPUDirect Storage`.
Expand Down
16 changes: 8 additions & 8 deletions openshift/openshift-virtualization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,8 @@ Procedure
version: 3.2.0
kernelArguments:
- intel_iommu=on
# If you are using AMD CPU, include the following argument:
# - amd_iommu=on

#. Create the new ``MachineConfig`` object:

Expand Down Expand Up @@ -196,7 +198,7 @@ Use the following steps to build the vGPU Manager container and push it to a pri

.. code-block:: console

$ cd vgpu-manager/rhel
$ cd vgpu-manager/rhel8

#. Copy the NVIDIA vGPU Manager from your extracted zip file:

Expand All @@ -210,24 +212,22 @@ Use the following steps to build the vGPU Manager container and push it to a pri
* ``VERSION`` - The NVIDIA vGPU Manager version downloaded from the NVIDIA Software Portal.
* ``OS_TAG`` - This must match the Guest OS version.
For RedHat OpenShift, specify ``rhcos4.x`` where _x_ is the supported minor OCP version.
* ``CUDA_VERSION`` - CUDA base image version to build the driver image with.

.. code-block:: console

$ export PRIVATE_REGISTRY=my/private/registry VERSION=510.73.06 OS_TAG=rhcos4.11 CUDA_VERSION=11.7.1
$ export PRIVATE_REGISTRY=my/private/registry VERSION=510.73.06 OS_TAG=rhcos4.11

.. note::
.. note::

The recommended registry to use is the Integrated OpenShift Container Platform registry.
For more information about the registry, see `Accessing the registry <https://docs.openshift.com/container-platform/latest/registry/accessing-the-registry.html>`_.
The recommended registry to use is the Integrated OpenShift Container Platform registry.
For more information about the registry, see `Accessing the registry <https://docs.openshift.com/container-platform/latest/registry/accessing-the-registry.html>`_.

#. Build the NVIDIA vGPU Manager image:

.. code-block:: console

$ docker build \
--build-arg DRIVER_VERSION=${VERSION} \
--build-arg CUDA_VERSION=${CUDA_VERSION} \
-t ${PRIVATE_REGISTRY}/vgpu-manager:${VERSION}-${OS_TAG} .

#. Push the NVIDIA vGPU Manager image to your private registry:
Expand All @@ -242,7 +242,7 @@ Installing the NVIDIA GPU Operator using the CLI

Install the NVIDIA GPU Operator using the guidance at :ref:`Installing the NVIDIA GPU Operator <install-nvidiagpu>`.

.. note:: When prompted to create a cluster policy follow the guidance :ref:`Creating a ClusterPolicy for the GPU Operator<install-cluster-policy-vGPU>`.
.. note:: When prompted to create a cluster policy follow the guidance :ref:`Creating a ClusterPolicy for the GPU Operator<install-cluster-policy-vGPU>`.

Create the secret
=================
Expand Down
59 changes: 30 additions & 29 deletions partner-validated/mirantis-mke.rst
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,36 @@ Validated Configuration Matrix
- NVIDIA GPU
- Hardware Model

* - k0s v1.31.5+k0s / k0rdent 0.1.0
- v24.9.2
- | Ubuntu 22.04
- containerd v1.7.24 with the NVIDIA Container Toolkit v1.17.4
- 1.31.5
- Helm v3
- | 2x NVIDIA RTX 4000 SFF Ada 20GB GDDR6 (ECC)
- | Supermicro SuperServer 6028U-E1CNR4T+

| 1000W Supermicro PWS-1K02A-1R

| 2x Intel Xeon E5-2630v4, 10C/20T 2.2/3.1 GHz LGA 2011-3 25MB 85W

| 32GB DDR4-2666 RDIMM, M393A4K40BB2-CTD6Q

| NVMe 960GB PM983 NVMe M.2, MZ1LB960HAJQ-00007

| 2 x NVIDIA RTX 4000 SFF Ada 20GB GDDR6 (ECC), 70W, PCIe 4.0x16, 4x

| 4x Mini DisplayPort 1.4a

* - MKE 3.8
- v24.9.2
- | Ubuntu 22.04
- Mirantis Container Runtime (MCR) 25.0.1
- 1.31.5
- Helm v3
- | NVIDIA T4 Tensor Core
- | AWS EC2 g4dn.2xlarge (8vcpus/32GB)

* - MKE 3.6.2+ and 3.5.7+
- v23.3.1
- | RHEL 8.7
Expand Down Expand Up @@ -71,35 +101,6 @@ Validated Configuration Matrix
| 1x RAID Controller PERC H710

| 1x Network card FM487
* - MKE 3.8
- v24.9.2
- | Ubuntu 22.04
- Mirantis Container Runtime (MCR) 25.0.1
- 1.31.5
- Helm v3
- | NVIDIA T4 Tensor Core
- | AWS EC2 g4dn.2xlarge (8vcpus/32GB)
* - k0s v1.31.5+k0s / k0rdent 0.1.0
- v24.9.2
- | Ubuntu 22.04
- containerd v1.7.24 with the NVIDIA Container Toolkit v1.17.4
- 1.31.5
- Helm v3
- | 2x NVIDIA RTX 4000 SFF Ada 20GB GDDR6 (ECC)
- | Supermicro SuperServer 6028U-E1CNR4T+

| 1000W Supermicro PWS-1K02A-1R

| 2x Intel Xeon E5-2630v4, 10C/20T 2.2/3.1 GHz LGA 2011-3 25MB 85W

| 32GB DDR4-2666 RDIMM, M393A4K40BB2-CTD6Q

| NVMe 960GB PM983 NVMe M.2, MZ1LB960HAJQ-00007

| 2 x NVIDIA RTX 4000 SFF Ada 20GB GDDR6 (ECC), 70W, PCIe 4.0x16, 4x

| 4x Mini DisplayPort 1.4a


*************
Prerequisites
Expand Down