diff --git a/docs/common/vars.rst b/docs/common/vars.rst index e91440ee..c4068490 100644 --- a/docs/common/vars.rst +++ b/docs/common/vars.rst @@ -45,3 +45,7 @@ .. |rdma-cni-repository| replace:: nvcr.io/nvstaging/mellanox .. |spectrumxop-version| replace:: network-operator-v25.10.0-beta.3 .. |spectrumxop-repository| replace:: nvcr.io/nvstaging/mellanox +.. |k8s-launch-kit-version| replace:: v25.10.0 +.. |k8s-launch-kit-repository| replace:: nvcr.io/nvidia/cloud-native +.. |k8s-launch-kit-network-operator-repository| replace:: nvcr.io/nvidia/cloud-native +.. |k8s-launch-kit-component-version| replace:: network-operator-v25.10.0 diff --git a/docs/index.rst b/docs/index.rst index c54e12bf..996688b0 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -26,6 +26,7 @@ Getting Started with Kubernetes Getting Started with Red Hat OpenShift NIC Configuration Operator + [TECH PREVIEW] Configuration Assistance with Kubernetes Launch Kit Customization Options and CRDs Life Cycle Management Advanced Configurations diff --git a/docs/k8s-launch-kit.rst b/docs/k8s-launch-kit.rst new file mode 100644 index 00000000..f998e9c7 --- /dev/null +++ b/docs/k8s-launch-kit.rst @@ -0,0 +1,277 @@ +.. license-header + SPDX-FileCopyrightText: Copyright (c) 2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved. + SPDX-License-Identifier: Apache-2.0 + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. + +.. headings # #, * *, =, -, ^, " +.. include:: ./common/vars.rst + + +****************************************************************** +[TECH PREVIEW] Configuration Assistance with Kubernetes Launch Kit +****************************************************************** + +.. contents:: On this page + :depth: 3 + :local: + :backlinks: none + +Kubernetes Launch Kit (l8k) is a CLI tool for deploying and managing NVIDIA cloud-native solutions on Kubernetes. The tool helps provide flexible deployment workflows for optimal network performance with SR-IOV, RDMA, and other networking technologies. + +------------- +Prerequisites +------------- + +For prerequisites, please refer to the :doc:`NVIDIA Network Operator Deployment Guide with Kubernetes ` page. + +You will need a Kubernetes cluster with NVIDIA Network Operator helm chart installed. + +---------------- +Operation Phases +---------------- + +============================== +Discover Cluster Configuration +============================== + +Deploy a minimal Network Operator profile to automatically discover your cluster's network capabilities and hardware configuration. This phase can be skipped if you provide your own configuration file. + +============================== +Select the Deployment Profile +============================== + +Specify the desired deployment profile via CLI flags or with the natural language prompt for the LLM. + +========================= +Generate Deployment Files +========================= + +Based on the discovered/provided configuration, generate a complete set of YAML deployment files tailored to your selected network profile. + +------------------- +Supported Use Cases +------------------- + +Kubernetes Launch Kit supports the following use cases: + +- SR-IOV Network with RDMA +- Host Device Network with RDMA +- IP over InfiniBand with RDMA Shared Device +- MacVLAN Network with RDMA Shared Device +- SR-IOV InfiniBand Network with RDMA + +Please refer to the :doc:`quick-start/quick-start-k8s` page for more details. + +----- +Usage +----- + +Kubernetes Launch Kit is available as a docker container: + +.. code-block:: bash + :substitutions: + + mkdir ~/cluster-configuration + cp /etc/kubernetes/admin.conf ~/cluster-configuration/kubeconfig + docker run -v ~/cluster-configuration:/cluster-configuration --net=host |k8s-launch-kit-repository|/k8s-launch-kit:|k8s-launch-kit-version| --discover-cluster-config --kubeconfig /cluster-configuration/kubeconfig --save-cluster-config /cluster-configuration/config.yaml --log-level debug --save-deployment-files /cluster-configuration/deployments --fabric infiniband --deployment-type rdma_shared --multirail + +Don't forget to enable --net=host and mount the necessary directories for input and output files with -v. + +.. code-block:: text + + K8s Launch Kit (l8k) is a CLI tool for deploying and managing NVIDIA cloud-native solutions on Kubernetes. The tool helps provide flexible deployment workflows for optimal network performance with SR-IOV, RDMA, and other networking technologies. + + ### Discover Cluster Configuration + Deploy a minimal Network Operator profile to automatically discover your cluster's + network capabilities and hardware configuration by using --discover-cluster-config. + This phase can be skipped if you provide your own configuration file by using --user-config. + This phase requires --kubeconfig to be specified. + + ### Generate Deployment Files + Based on the discovered or provided configuration, + generate a complete set of YAML deployment files for the selected network profile. + Files can be saved to disk using --save-deployment-files. + The profile can be defined manually with --fabric, --deployment-type and --multirail flags, + OR generated by an LLM-assisted profile generator with --prompt (requires --llm-api-key and --llm-vendor). + + ### Deploy to Cluster + Apply the generated deployment files to your Kubernetes cluster by using --deploy. This phase requires --kubeconfig and can be skipped if --deploy is not specified. + + Usage: + l8k [flags] + l8k [command] + + Available Commands: + completion Generate the autocompletion script for the specified shell + help Help about any command + version Print the version number + + Flags: + --ai Enable AI deployment + --deploy Deploy the generated files to the Kubernetes cluster + --deployment-type string Select the deployment type (sriov, rdma_shared, host_device) + --discover-cluster-config Deploy a thin Network Operator profile to discover cluster capabilities + --enabled-plugins string Comma-separated list of plugins to enable (default "network-operator") + --fabric string Select the fabric type to deploy (infiniband, ethernet) + -h, --help help for l8k + --kubeconfig string Path to kubeconfig file for cluster deployment (required when using --deploy) + --llm-api-key string API key for the LLM API (required when using --prompt) + --llm-api-url string API URL for the LLM API (required when using --prompt) + --llm-vendor string Vendor of the LLM API (required when using --prompt) (default "openai-azure") + --log-level string Log level (debug, info, warn, error) (default "info") + --multirail Enable multirail deployment + --prompt string Path to file with a prompt to use for LLM-assisted profile generation + --save-cluster-config string Save discovered cluster configuration to the specified path (default "/opt/nvidia/k8s-launch-kit/cluster-config.yaml") + --save-deployment-files string Save generated deployment files to the specified directory (default "/opt/nvidia/k8s-launch-kit/deployment") + --spectrum-x Enable Spectrum X deployment + --user-config string Use provided cluster configuration file instead of auto-discovery (skips cluster discovery) + + Use "l8k [command] --help" for more information about a command. + +-------------- +Usage Examples +-------------- + +================= +Complete Workflow +================= +Discover cluster config, generate files, and deploy: + +.. code-block:: bash + + l8k --discover-cluster-config --save-cluster-config ./cluster-config.yaml \ + --fabric ethernet --deployment-type sriov --multirail \ + --save-deployment-files ./deployments \ + --deploy --kubeconfig ~/.kube/config + + +================================ +Discover Cluster Configuration +================================ + +.. code-block:: bash + + l8k --discover-cluster-config --save-cluster-config ./my-cluster-config.yaml + + +========================== +Use Existing Configuration +========================== + +Generate and deploy with pre-existing config: + +.. code-block:: bash + + l8k --user-config ./existing-config.yaml \ + --fabric ethernet --deployment-type sriov --multirail \ + --deploy --kubeconfig ~/.kube/config + +========================= +Generate Deployment Files +========================= + +.. code-block:: bash + + l8k --user-config ./config.yaml \ + --fabric ethernet --deployment-type sriov --multirail \ + --save-deployment-files ./deployments + +======================================================= +Generate Deployment Files using Natural Language Prompt +======================================================= + +Kubernetes Launch Kit supports a LLM-assisted profile generation. You can provide a natural language prompt to the tool and it will generate a deployment profile for you. +To configure the LLM, you need to provide the API key to OpenAI Azure backend. + +.. code-block:: bash + + echo "I want to enable multirail networking in my AI cluster" > requirements.txt + l8k --user-config ./config.yaml \ + --prompt requirements.txt --llm-vendor openai-azure --llm-api-key \ + --save-deployment-files ./deployments + +-------------------------- +Configuration File Format +-------------------------- + +After the cluster configuration is discovered, the tool will save the configuration to a file. +You can use this file as a starting point for your own configuration. Own configuration file can be provided to the tool using `--user-config` flag. + +.. code-block:: yaml + :substitutions: + + networkOperator: + version: |k8s-launch-kit-version| + componentVersion: |k8s-launch-kit-component-version| + repository: |k8s-launch-kit-network-operator-repository| + namespace: nvidia-network-operator + nvIpam: + poolName: nv-ipam-pool + subnets: + - subnet: 192.168.2.0/24 + gateway: 192.168.2.1 + - subnet: 192.168.3.0/24 + gateway: 192.168.3.1 + - subnet: 192.168.4.0/24 + gateway: 192.168.4.1 + - subnet: 192.168.5.0/24 + gateway: 192.168.5.1 + - subnet: 192.168.6.0/24 + gateway: 192.168.6.1 + - subnet: 192.168.7.0/24 + gateway: 192.168.7.1 + - subnet: 192.168.8.0/24 + gateway: 192.168.8.1 + - subnet: 192.168.9.0/24 + gateway: 192.168.9.1 + - subnet: 192.168.10.0/24 + gateway: 192.168.10.1 + sriov: + mtu: 9000 + numVfs: 8 + priority: 90 + resourceName: sriov_resource + networkName: sriov_network + hostdev: + resourceName: hostdev-resource + networkName: hostdev-network + rdmaShared: + resourceName: rdma_shared_resource + hcaMax: 63 + ipoib: + networkName: ipoib-network + macvlan: + networkName: macvlan-network + clusterConfig: + capabilities: + nodes: + sriov: true + rdma: true + ib: true + pfs: + - rdmaDevice: mlx5_0 + pciAddress: "0000:03:00.0" + networkInterface: enp3s0f0np0 + traffic: east-west + - rdmaDevice: mlx5_1 + pciAddress: "0000:03:00.1" + networkInterface: enp3s0f1np1 + traffic: east-west + - rdmaDevice: mlx5_2 + pciAddress: 0000:81:00.0 + networkInterface: enp129s0np0 + traffic: east-west + workerNodes: + - cloud-dev-41 + - cloud-dev-40 \ No newline at end of file