Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/advanced/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,6 @@ Advanced Configurations

.. toctree::
Proxy & Air-gapped <proxy-airgapped.rst>
DOCA Driver Container <doca-drivers.rst>
DOCA-OFED Driver Container <doca-drivers.rst>
Other Advanced Configurations <advanced-configurations.rst>
Container Images Digests <images-sha256.rst>
34 changes: 17 additions & 17 deletions docs/advanced/doca-drivers.rst
Original file line number Diff line number Diff line change
@@ -1,20 +1,20 @@
.. headings # #, * *, =, -, ^, ", ~
.. include:: ../common/vars.rst

****************************
NVIDIA DOCA Driver Container
****************************
*********************************
NVIDIA DOCA-OFED Driver Container
*********************************

.. contents:: On this page
:depth: 2
:local:
:backlinks: none

==================================================
NVIDIA DOCA Driver Container Environment Variables
==================================================
=======================================================
NVIDIA DOCA-OFED Driver Container Environment Variables
=======================================================

The following are special environment variables supported by the NVIDIA DOCA Driver container to configure its behavior:
The following are special environment variables supported by the NVIDIA DOCA-OFED Driver container to configure its behavior:

.. list-table::
:header-rows: 1
Expand All @@ -28,7 +28,7 @@ The following are special environment variables supported by the NVIDIA DOCA Dri
- Create an udev rule to preserve "old-style" path based netdev names e.g enp3s0f0
* - UNLOAD_STORAGE_MODULES
- "false"
- | Unload host storage modules prior to loading NVIDIA DOCA Driver modules:
- | Unload host storage modules prior to loading NVIDIA DOCA-OFED Driver modules:
| * ib_isert
| * nvme_rdma
| * nvmet_rdma
Expand All @@ -37,19 +37,19 @@ The following are special environment variables supported by the NVIDIA DOCA Dri
| * ib_srpt
* - ENABLE_NFSRDMA
- "false"
- Enable loading of NFS & NVME related storage modules from a NVIDIA DOCA Driver container
- Enable loading of NFS & NVME related storage modules from a NVIDIA DOCA-OFED Driver container
* - RESTORE_DRIVER_ON_POD_TERMINATION
- "false"
- Restore host drivers when a container

In addition, it is possible to specify any environment variables to be exposed to the NVIDIA DOCA Driver container, such as the standard "HTTP_PROXY", "HTTPS_PROXY", "NO_PROXY".
In addition, it is possible to specify any environment variables to be exposed to the NVIDIA DOCA-OFED Driver container, such as the standard "HTTP_PROXY", "HTTPS_PROXY", "NO_PROXY".

.. warning::
CREATE_IFNAMES_UDEV is set automatically by the Network Operator, depending on the Operating System of the worker nodes in the cluster (the cluster is assumed to be homogenous).

.. warning::
When ENABLE_NFSRDMA is set to `true`, it is not possible to load NVME related storage modules from NVIDIA DOCA Driver container when they are in use by the system
(e.g the system has NVMe SSD drives in use). User should ensure the modules are not in use and blacklist them prior to the use of NVIDIA DOCA Driver container.
When ENABLE_NFSRDMA is set to `true`, it is not possible to load NVME related storage modules from NVIDIA DOCA-OFED Driver container when they are in use by the system
(e.g the system has NVMe SSD drives in use). User should ensure the modules are not in use and blacklist them prior to the use of NVIDIA DOCA-OFED Driver container.

These variables can be set in the NicClusterPolicy. For example:

Expand All @@ -71,9 +71,9 @@ These variables can be set in the NicClusterPolicy. For example:

.. _advanced-configurations-precompiled:

=========================================================================
Precompiled Container Build Instructions for NVIDIA DOCA Driver Container
=========================================================================
==============================================================================
Precompiled Container Build Instructions for NVIDIA DOCA-OFED Driver Container
==============================================================================

-------------
Prerequisites
Expand All @@ -84,7 +84,7 @@ Before you begin, ensure that you have the following prerequisites:
- Docker (Ubuntu) / Podman (RH) installed on your build system.
- Web access to NVIDIA NIC drivers sources. Latest NIC drivers are published at `NVIDIA DOCA Downloads <https://developer.nvidia.com/doca-downloads>`_, for example: `https://linux.mellanox.com/public/repo/doca/2.10.0/SOURCES/MLNX_OFED/MLNX_OFED_SRC-debian-25.01-0.6.0.0.tgz <https://linux.mellanox.com/public/repo/doca/2.10.0/SOURCES/MLNX_OFED/MLNX_OFED_SRC-debian-25.01-0.6.0.0.tgz>`_

**NOTE:** NVIDIA NIC driver sources are bundled as part of NVIDIA DOCA package. Both the DOCA package version and its corresponding NIC driver (DOCA Driver) version need to be specified to fetch the correct driver sources when building the driver container.
**NOTE:** NVIDIA NIC driver sources are bundled as part of NVIDIA DOCA package. Both the DOCA package version and its corresponding NIC driver (DOCA-OFED Driver) version need to be specified to fetch the correct driver sources when building the driver container.
For example, given a DOCA package version (e.g `2.10.0`) you can find the corresponding MLNX_OFED version at the link: `<https://linux.mellanox.com/public/repo/doca/2.10.0/SOURCES/MLNX_OFED/>`_ which is `25.01-0.6.0.0'`

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Expand Down Expand Up @@ -114,7 +114,7 @@ The Dockerfile consists of the following stages:

1. **Base Image Update**: The base image is updated and common requirements are installed. This stage sets up the basic environment for the subsequent stages.

2. **Download Driver Sources**: This stage downloads the NVIDIA DOCA Driver sources to the specified path. It prepares the necessary files for the driver build process.
2. **Download Driver Sources**: This stage downloads the NVIDIA DOCA-OFED Driver sources to the specified path. It prepares the necessary files for the driver build process.

3. **Build Driver**: The driver is built using the downloaded sources and installed on the container. This stage ensures that the driver is compiled and configured correctly for the target system.

Expand Down
8 changes: 4 additions & 4 deletions docs/advanced/images-sha256.rst
Original file line number Diff line number Diff line change
Expand Up @@ -125,9 +125,9 @@ NVIDIA Network Operator Container Images
- v0.0.3
- sha256:d6a2546a8a65e1034d08ab7d85819f062769842dc96513b4fec44f75d3077316

============================
DOCA Driver Container Images
============================
=================================
DOCA-OFED Driver Container Images
=================================


.. list-table::
Expand All @@ -141,7 +141,7 @@ DOCA Driver Container Images
- 25.04-0.6.1.0-2


The followings tags are available for the above DOCA Driver container version:
The followings tags are available for the above DOCA-OFED Driver container version:

------
Ubuntu
Expand Down
20 changes: 10 additions & 10 deletions docs/advanced/proxy-airgapped.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ This section describes how to successfully deploy the Network Operator in cluste
By default, the Network Operator requires internet access for the following reasons:

- The container images must be pulled during the Network Operator installation.
- The DOCA Driver container must download several OS packages prior to the driver installation.
- The DOCA-OFED Driver container must download several OS packages prior to the driver installation.

To address these requirements, it may be necessary to create a local image registry and/or a local package repository, so that the necessary images and packages will be available for your cluster.
Subsequent sections of this document detail how to configure the Network Operator to use local image registries and local package repositories.
Expand All @@ -91,25 +91,25 @@ Pulling and Pushing Container Images to a Local Registry

To pull the correct images from the NVIDIA registry, you can leverage the fields ``repository``, ``image`` and ``version`` specified in the ``values.yaml`` file or in the :ref:`container_images_digest` section.

NicClusterPolicy supports use of image container digest in the `version` field, except for DOCA driver.
NicClusterPolicy supports use of image container digest in the `version` field, except for DOCA-OFED driver.

There is one caveat with regards to the DOCA driver image. The version field must be appended by the OS name and Architecture running on the worker node.
There is one caveat with regards to the DOCA-OFED driver image. The version field must be appended by the OS name and Architecture running on the worker node.

For example for DOCA driver version |doca-driver-version|, the tag for Ubuntu 24.04 with X86 architecture is "|doca-driver-version|-ubuntu24.04-amd64".
Available DOCA driver image tags can be found at `NGC <https://catalog.ngc.nvidia.com/orgs/nvidia/teams/mellanox/containers/doca-driver/tags>`_.
For example for DOCA-OFED driver version |doca-driver-version|, the tag for Ubuntu 24.04 with X86 architecture is "|doca-driver-version|-ubuntu24.04-amd64".
Available DOCA-OFED driver image tags can be found at `NGC <https://catalog.ngc.nvidia.com/orgs/nvidia/teams/mellanox/containers/doca-driver/tags>`_.

In case of local registry required authentication, make sure to create a pull secret and configure in NicClusterPolicy accordingly.

.. note::
NVIDIA Network Operator communicates with the Image Registry configured for the DOCA Driver in the NICClusterPolicy to list the available tags.
Specifying pull secret is required in the NicClusterPolicy DOCA Driver section, even if global container access credentials are configured on nodes.
NVIDIA Network Operator communicates with the Image Registry configured for the DOCA-OFED Driver in the NICClusterPolicy to list the available tags.
Specifying pull secret is required in the NicClusterPolicy DOCA-OFED Driver section, even if global container access credentials are configured on nodes.

-----------------------------------
Configuring Local Registry TLS Cert
-----------------------------------

NVIDIA Network Operator communicates with the Image Registry configured for the DOCA Driver in the NICClusterPolicy to list the available tags.
This is required to verify the availability of precompiled DOCA Driver container images.
NVIDIA Network Operator communicates with the Image Registry configured for the DOCA-OFED Driver in the NICClusterPolicy to list the available tags.
This is required to verify the availability of precompiled DOCA-OFED Driver container images.

If the Image Registry uses a TLS certificate that is not issued by a well-known Certificate Authority (CA), it is required to configure the NVIDIA Network Operator with the Certificate.

Expand Down Expand Up @@ -184,7 +184,7 @@ Local Package Repository
.. warning::
The instructions below are provided as reference examples to set up a local package repository for NVIDIA Network Operator.

The DOCA Driver container deployed as part of the Network Operator requires certain packages to be available for the driver installation. In restricted internet access or air-gapped installations, users are required to create a local mirror repository for their OS distribution, and make the following packages available:
The DOCA-OFED Driver container deployed as part of the Network Operator requires certain packages to be available for the driver installation. In restricted internet access or air-gapped installations, users are required to create a local mirror repository for their OS distribution, and make the following packages available:

.. code-block::

Expand Down
32 changes: 17 additions & 15 deletions docs/getting-started-kubernetes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,7 @@ First install the Network Operator with NFD enabled:
enabled: true

Once the Network Operator is installed create a NicClusterPolicy with
* DOCA driver
* DOCA-OFED driver
* RDMA Shared device plugin configured to a netdev with name ens1f0.


Expand Down Expand Up @@ -261,7 +261,7 @@ First install the Network Operator with NFD enabled:
enabled: true

Once the Network Operator is installed create a NicClusterPolicy with:
* DOCA driver
* DOCA-OFED driver
* RDMA Shared Device pluging with two RDMA resources - the first mapped to ens1f0 and ens1f1 and the second mapped to ens2f0 and ens2f1.

Note: You may need to change the interface names in the NicClusterPolicy to those used by your target nodes.
Expand Down Expand Up @@ -464,7 +464,7 @@ Network Operator Deployment with a Host Device Network

In this mode, the Network Operator could be deployed on virtualized deployments as well. It supports both Ethernet and InfiniBand modes. From the Network Operator perspective, there is no difference between the deployment procedures. To work on a VM (virtual machine), the PCI passthrough must be configured for SR-IOV devices. The Network Operator works both with VF (Virtual Function) and PF (Physical Function) inside the VMs.

.. warning:: If the Host Device Network is used without the DOCA Driver, the following packages should be installed:
.. warning:: If the Host Device Network is used without the DOCA-OFED Driver, the following packages should be installed:

* the linux-generic package on Ubuntu hosts
* the kernel-modules-extra package on the RedHat-based hosts
Expand Down Expand Up @@ -726,7 +726,7 @@ First install the Network Operator with NFD enabled:
enabled: true

Once the Network Operator is installed create a NicClusterPolicy with:
* DOCA driver
* DOCA-OFED driver
* RDMA shared device plugin
* Secondary network
* Multus CNI
Expand Down Expand Up @@ -897,7 +897,7 @@ Network Operator Deployment for GPUDirect Workloads

GPUDirect requires the following:

* NVIDIA DOCA Driver v5.5-1.0.3.2 or newer
* NVIDIA DOCA-OFED Driver v5.5-1.0.3.2 or newer
* GPU Operator v1.9.0 or newer
* NVIDIA GPU and driver supporting GPUDirect e.g Quadro RTX 6000/8000 or NVIDIA T4/NVIDIA V100/NVIDIA A100

Expand All @@ -910,7 +910,7 @@ First install the Network Operator with NFD enabled:
enabled: true

Once the Network Operator is installed create a NicClusterPolicy with:
* DOCA driver
* DOCA-OFED driver
* SR-IOV Device Plugin
* Secondary network
* Multus CNI
Expand Down Expand Up @@ -1090,7 +1090,7 @@ First install the Network Operator with NFD and SRIOV Network Operator enabled:
enabled: true

Once the Network Operator is installed create a NicClusterPolicy with:
* DOCA driver
* DOCA-OFED driver
* Secondary network
* Multus CNI
* IPoIB CNI
Expand Down Expand Up @@ -1352,7 +1352,7 @@ Network Operator Deployment with an SR-IOV InfiniBand Network

Network Operator deployment with InfiniBand network requires the following:

* NVIDIA DOCA Driver and OpenSM running. OpenSM runs on top of the NVIDIA DOCA Driver stack, so both the driver and the subnet manager should come from the same installation. Note that partitions that are configured by OpenSM should specify defmember=full to enable the SR-IOV functionality over InfiniBand. For more details, please refer to this `article <https://docs.mellanox.com/display/MLNXOFEDv51258060/OpenSM>`_.
* NVIDIA DOCA-OFED Driver and OpenSM running. OpenSM runs on top of the NVIDIA DOCA-OFED Driver stack, so both the driver and the subnet manager should come from the same installation. Note that partitions that are configured by OpenSM should specify defmember=full to enable the SR-IOV functionality over InfiniBand. For more details, please refer to this `article <https://docs.mellanox.com/display/MLNXOFEDv51258060/OpenSM>`_.
* InfiniBand device – Both the host device and switch ports must be enabled in InfiniBand mode.
* rdma-core package should be installed when an inbox driver is used.

Expand All @@ -1367,7 +1367,7 @@ First install the Network Operator with NFD and SR-IOV Network Operator enabled:
enabled: true

Once the Network Operator is installed create a NicClusterPolicy with:
* DOCA driver
* DOCA-OFED driver
* Secondary network
* Multus CNI
* Container Networking Plugins
Expand Down Expand Up @@ -1512,7 +1512,7 @@ Network Operator Deployment with an SR-IOV InfiniBand Network with PKey Manageme

Network Operator deployment with InfiniBand network requires the following:

* NVIDIA DOCA Driver and OpenSM running. OpenSM runs on top of the NVIDIA DOCA Driver stack, so both the driver and the subnet manager should come from the same installation. Note that partitions that are configured by OpenSM should specify defmember=full to enable the SR-IOV functionality over InfiniBand. For more details, please refer to `this article`_.
* NVIDIA DOCA-OFED Driver and OpenSM running. OpenSM runs on top of the NVIDIA DOCA-OFED Driver stack, so both the driver and the subnet manager should come from the same installation. Note that partitions that are configured by OpenSM should specify defmember=full to enable the SR-IOV functionality over InfiniBand. For more details, please refer to `this article`_.
* NVIDIA UFM running on top of OpenSM. For more details, please refer to `the project documentation`_.
* InfiniBand device – Both the host device and the switch ports must be enabled in InfiniBand mode.
* rdma-core package should be installed when an inbox driver is used.
Expand Down Expand Up @@ -1559,7 +1559,7 @@ First install the Network Operator with NFD enabled:
resourcePrefix: "nvidia.com"

Once the Network Operator is installed create a NicClusterPolicy with:
* DOCA driver
* DOCA-OFED driver
* ibKubernetes
* Secondary network
* Multus CNI
Expand Down Expand Up @@ -1645,7 +1645,7 @@ Create IPPool object for nv-ipam
- key: node-role.kubernetes.io/worker
operator: Exists

Wait for NVIDIA DOCA Driver to install and apply the following CRs:
Wait for NVIDIA DOCA-OFED Driver to install and apply the following CRs:

``sriov-ib-network-node-policy.yaml``

Expand Down Expand Up @@ -1759,7 +1759,7 @@ Network Operator Deployment for DPDK Workloads with NicClusterPolicy

.. _HUGEPAGE: http://manpages.ubuntu.com/manpages/focal/man8/hugeadm.8.html

This deployment mode supports DPDK applications. In order to run DPDK applications, HUGEPAGE_ should be configured on the required K8s Worker Nodes. By default, the inbox operating system driver is used. For support of cases with specific requirements, DOCA Driver container should be deployed.
This deployment mode supports DPDK applications. In order to run DPDK applications, HUGEPAGE_ should be configured on the required K8s Worker Nodes. By default, the inbox operating system driver is used. For support of cases with specific requirements, DOCA-OFED Driver container should be deployed.

Network Operator deployment with:

Expand Down Expand Up @@ -1878,6 +1878,8 @@ Network Operator Deployment and OpenvSwitch offload - managed OpenvSwitch

.. warning:: This feature is supported only for Vanilla Kubernetes deployments with SR-IOV Network Operator.

.. warning:: To use DOCA-OFED Driver container with this mode of operation, set the `RESTORE_DRIVER_ON_POD_TERMINATION` environment variable to `false` in the driver configuration section in the NicClusterPolicy. Restoration to the inbox driver is not supported for this feature.

.. warning:: Tech Preview feature.


Expand Down Expand Up @@ -2196,7 +2198,7 @@ Please see the following DOCA documentation for OVS hardware offload verificatio
Network Operator Deployment and OpenvSwitch offload - externally managed OpenvSwitch with VF lag
------------------------------------------------------------------------------------------------

.. warning:: This feature is not compatible with the DOCA Driver container.
.. warning:: This feature is not compatible with the DOCA-OFED Driver container.

.. warning:: This feature is supported only for Vanilla Kubernetes deployments with SR-IOV Network Operator.

Expand Down Expand Up @@ -2938,7 +2940,7 @@ NIC Configuration Operator updates status conditions of the NicDevice CR to set
message: Device firmware '20.42.1000' matches to recommended version '20.42.1000'
lastTransitionTime: "2024-09-21T08:43:10Z"

`FirmwareConfigMatch` condition status is set to `Unknown` if DOCA Driver is not installed otherwise it notifies if current NIC firmware is recommended or not recommended by DOCA Driver. E.g.:
`FirmwareConfigMatch` condition status is set to `Unknown` if DOCA-OFED Driver is not installed otherwise it notifies if current NIC firmware is recommended or not recommended by DOCA-OFED Driver. E.g.:

.. code-block:: bash

Expand Down
Loading