diff --git a/assets/images/oneks/dark/delete_k8s_cluster.png b/assets/images/oneks/dark/delete_k8s_cluster.png new file mode 100644 index 00000000..a3561fb8 Binary files /dev/null and b/assets/images/oneks/dark/delete_k8s_cluster.png differ diff --git a/assets/images/oneks/light/delete_k8s_cluster.png b/assets/images/oneks/light/delete_k8s_cluster.png new file mode 100644 index 00000000..d3812421 Binary files /dev/null and b/assets/images/oneks/light/delete_k8s_cluster.png differ diff --git a/content/platform_services/application_appliances/overview.md b/content/platform_services/application_appliances/overview.md index ef87b8f0..9f6ee64b 100644 --- a/content/platform_services/application_appliances/overview.md +++ b/content/platform_services/application_appliances/overview.md @@ -5,11 +5,11 @@ weight: 1 type: docs --- -OpenNebula Marketplace Appliances offer a streamlined way to launch specialized, pre-configured software solutions on your OpenNebula cloud deployment for specific use cases. This includes everything from basic Virtual Machines with specific operating systems through to advance cloud storage solutions and methods for deploying HPC or AI workloads. +OpenNebula Marketplace Appliances offer a streamlined way to launch specialized, pre-configured software solutions on your OpenNebula cloud deployment for specific use cases. This includes everything from basic Virtual Machines pre-installed with specific operating systems through to advanced cloud storage solutions and methods for deploying HPC or AI workloads. You can browse available Marketplace Appliances in the dedicated [OpenNebula Marketplace](https://marketplace.opennebula.io/) site. Below, you will find links to the documentation for several Marketplace Appliances that may be of interest for common use cases: -* [**Harbor Container Registry**](https://github.com/OpenNebula/one-apps/wiki/harbor_intro): Deploy a secure, self managed image repository. +* [**Harbor Container Registry**](https://github.com/OpenNebula/one-apps/wiki/harbor_intro): Deploy a secure, self-managed image repository. * [**MinIO**](https://github.com/OpenNebula/one-apps/wiki/minio_intro): Object storage solution with an AWS S3-compatible API. * [**vLLM AI**](https://github.com/OpenNebula/one-apps/wiki/vllm_intro): Deploy high-performance LLM inferencing. * [**Slurm**](https://github.com/OpenNebula/one-apps/wiki/slurm_intro): Fault-tolerant, highly scalable Cluster management and job scheduling system for HPC. diff --git a/content/platform_services/oneks/_index.md b/content/platform_services/oneks/_index.md index 3a173b33..df935336 100644 --- a/content/platform_services/oneks/_index.md +++ b/content/platform_services/oneks/_index.md @@ -1,5 +1,5 @@ --- -title: "OneKS: Elastic Kubernetes as a Service on OpenNebula (EE)" +title: "Elastic Kubernetes as a Service (EE)" linkTitle: "Elastic Kubernetes (EE)" date: "2026-05-12" description: "OneKS provides Elastic Kubernetes as a Service on OpenNebula. It offers a structured way to create, access, operate, upgrade, recover, and deprovision Kubernetes Clusters." diff --git a/content/platform_services/oneks/getting_started/overview.md b/content/platform_services/oneks/getting_started/overview.md index 04d0f334..0f9f5635 100644 --- a/content/platform_services/oneks/getting_started/overview.md +++ b/content/platform_services/oneks/getting_started/overview.md @@ -1,5 +1,5 @@ --- -title: "OneKS Overview" +title: "Overview" linkTitle: "Overview" date: "2026-05-12" categories: @@ -13,7 +13,7 @@ OneKS provides Elastic Kubernetes as a Service on OpenNebula. It offers a struct OneKS is designed for teams that need a simple and repeatable way to consume Kubernetes inside OpenNebula. Typical use cases include development and test environments, self-service Kubernetes delivery in private cloud environments, and standardized Cluster offerings for different sizes and topologies. -OneKS builds on CAPONE to expose a K8s Cluster-centric lifecycle model for users and users. Users interact mainly with OneKS K8s Clusters and node groups, while lower-level OpenNebula, Cluster API, and dependency-management details are handled underneath. +OneKS builds on CAPONE to expose a K8s Cluster-centric lifecycle model for users. Users interact mainly with OneKS K8s Clusters and node groups, while lower-level OpenNebula, Cluster API, and dependency-management details are handled underneath. ## How Should I Read this Chapter diff --git a/content/platform_services/oneks/getting_started/quick_start.md b/content/platform_services/oneks/getting_started/quick_start.md index ebc1a21c..2ed80b4c 100644 --- a/content/platform_services/oneks/getting_started/quick_start.md +++ b/content/platform_services/oneks/getting_started/quick_start.md @@ -1,5 +1,5 @@ --- -title: "OneKS Quick Start" +title: "Quick Start" linkTitle: "Quick Start" date: "2026-05-12" description: diff --git a/content/platform_services/oneks/management/configuration.md b/content/platform_services/oneks/management/configuration.md index 4ff92bfd..f14c85ea 100644 --- a/content/platform_services/oneks/management/configuration.md +++ b/content/platform_services/oneks/management/configuration.md @@ -1,5 +1,5 @@ --- -title: "OneKS K8s Cluster Configuration" +title: "Kubernetes Cluster Configuration" linkTitle: "Configuration" date: "2026-05-12" description: diff --git a/content/platform_services/oneks/management/k8s_cluster_lifecycle_management.md b/content/platform_services/oneks/management/k8s_cluster_lifecycle_management.md index bf30123a..de834ea3 100644 --- a/content/platform_services/oneks/management/k8s_cluster_lifecycle_management.md +++ b/content/platform_services/oneks/management/k8s_cluster_lifecycle_management.md @@ -1,6 +1,6 @@ --- -title: "OneKS K8s Cluster Lifecycle Management" -linkTitle: "K8s Cluster Lifecycle Management" +title: "Kubernetes Cluster Lifecycle Management" +linkTitle: "Kubernetes Cluster Lifecycle Management" date: "2026-05-12" description: categories: @@ -488,6 +488,10 @@ curl -u "$(cat /var/lib/one/.one/one_auth)" \ In the **K8S Clusters** view, select the K8s Cluster you want to delete. Click the red **Delete** button next to the **Create** button. +{{< image path="/images/oneks/light/delete_k8s_cluster.png" + pathDark="/images/oneks/dark/delete_k8s_cluster.png" +alt="OneKS create Cluster step 1" align="center" width="90%" mb="20px" >}} + The deletion operation deprovisions the OneKS K8s Cluster and its managed resources, including the control plane and managed node groups. Referenced infrastructure, such as the public and private Virtual Networks selected during K8s Cluster creation, is not normally deleted by OneKS. After deletion, verify that the K8s Cluster no longer appears in OneKS: diff --git a/content/platform_services/oneks/management/monitoring_and_troubleshooting.md b/content/platform_services/oneks/management/monitoring_and_troubleshooting.md index b0229048..e82c1fe4 100644 --- a/content/platform_services/oneks/management/monitoring_and_troubleshooting.md +++ b/content/platform_services/oneks/management/monitoring_and_troubleshooting.md @@ -1,5 +1,5 @@ --- -title: "OneKS K8s Cluster Monitoring and Troubleshooting" +title: "Kubernetes Cluster Monitoring and Troubleshooting" linkTitle: "Monitoring and Troubleshooting" date: "2026-05-12" description: diff --git a/content/platform_services/oneks/references/oneks_api.md b/content/platform_services/oneks/references/oneks_api.md index 164e2fdb..7c570077 100644 --- a/content/platform_services/oneks/references/oneks_api.md +++ b/content/platform_services/oneks/references/oneks_api.md @@ -1,5 +1,5 @@ --- -title: "OneKS REST API Reference" +title: "REST API Reference" linkTitle: "API" date: "2026-05-12" description: diff --git a/content/platform_services/oneks/references/oneks_cli.md b/content/platform_services/oneks/references/oneks_cli.md index 0fcbfb2a..16131e8e 100644 --- a/content/platform_services/oneks/references/oneks_cli.md +++ b/content/platform_services/oneks/references/oneks_cli.md @@ -1,5 +1,5 @@ --- -title: "OneKS CLI Reference" +title: "CLI Reference" linkTitle: "CLI" date: "2026-05-12" description: diff --git a/content/product/cluster_configuration/networking_system/spectrumx.md b/content/product/cluster_configuration/networking_system/spectrumx.md new file mode 100644 index 00000000..7c8aea2b --- /dev/null +++ b/content/product/cluster_configuration/networking_system/spectrumx.md @@ -0,0 +1,65 @@ +--- +title: "NVIDIA Spectrum-X Integration" +linkTitle: "NVIDIA Spectrum-X" +date: "2025-12-17" +categories: ["networking"] +pageintoc: "64" +tags: ["networking", "hpc", "ai", "nvidia", "spectrum-x", "evpn"] +weight: "8" +--- + + + +This guide provides a high-level overview of the OpenNebula integration with the NVIDIA Spectrum-X™ Ethernet networking platform. This integration allows OpenNebula to act as a single pane of glass for managing an entire AI factory, from compute and storage to the high-performance network fabric. + +The integration works by mapping OpenNebula's logical resource constructs (like Users and Virtual Networks) directly to the Spectrum-X fabric's tenant segments, which are based on a routed L3EVPN architecture to deliver isolated, high-bandwidth East-West (E/W) traffic for demanding AI and HPC workloads. + +## The Spectrum-X Platform + +NVIDIA Spectrum-X is the first Ethernet fabric built from the ground up to accelerate AI workloads. It delivers advanced performance, scalability, and network intelligence, ensuring consistent, predictable results in a multi-tenant AI cloud. + +The platform is built on two key components: +* **NVIDIA Spectrum-4 Switches**: High-bandwidth, low-latency switches that provide RoCE-optimized routing and advanced congestion control. +* **NVIDIA BlueField-3 SuperNICs**: A new class of network adapter that accelerates and secures the network, moving networking and security tasks from the CPU to the DPU. + +The fabric uses a routed L3EVPN architecture to create isolated tenant environments. Each tenant is assigned a separate Virtual Routing and Forwarding (VRF) instance on the leaf switches, ensuring traffic from one tenant is logically separated from another. + +## OpenNebula Integration Concepts + +The integration between OpenNebula and Spectrum-X is achieved by creating a clear mapping between OpenNebula's resource management constructs and the physical network's tenant architecture. + +### Resource Mapping + +* **Tenant Mapping**: An AI Factory tenant is directly mapped to a **User** in OpenNebula. This user is then granted access to a specific set of isolated resources (N/S vNet, BlueField-3 PCI Device and GPU PCI Device). + +* **Network Mapping**: The integration distinguishes between two traffic patterns: + * **North-South (N/S) Network**: This is the standard management and external access network for a VM. It is implemented in OpenNebula as a regular **Virtual Network (vNet)**. + * **East-West (E/W) Network**: This is the high-performance Spectrum-X fabric used for GPU-to-GPU communication. + The link between these two networks is established by storing the tenant E/W **VXLAN Network Identifier (VNI)** as a custom attribute, `SPX_VNI`, within the N/S Virtual Network template in OpenNebula. A tenant can attach a VM to its own E/W segment by attaching a specific BlueField-3 PCI device to the VM. + +* **Hardware Access**: + * NVIDIA GPUs and BlueField-3 SuperNICs are represented in OpenNebula as **PCI Devices**. + * Access is granted to tenants by assigning ownership or group access to these PCI devices. + * To enable dynamic E/W fabric configuration, the PCI device template for each SuperNIC must store critical networking information as custom attributes: + * `SPX_NIC_IP`: The static IP address of the SuperNIC's interface. This IP address must remain static due to the routed L3EVPN nature of the E/W fabric. + * `SPX_LEAF_IP`: The IP address of the leaf switch the SuperNIC is connected to. + * `SPX_LEAF_PORT`: The physical port name on the leaf switch where the SuperNIC is connected. + +### Dynamic Fabric Configuration + +OpenNebula orchestrates the Spectrum-X fabric dynamically using network hooks. When a user deploys a VM, these hooks execute scripts on the hypervisor that configure the Spectrum-X leaf switches. + +The high-level workflow is as follows: +1. A tenant instantiates a VM Template containing both a standard N/S network interface and one or more E/W PCI passthrough devices (the BlueField-3 SuperNICs). +2. The VM's context contains all the necessary attributes: `SPX_VNI` (from the N/S vNet) and the `SPX_*` attributes (from the PCI devices). +3. Upon deployment, an OpenNebula network hook runs on the target hypervisor. This hook establishes an SSH connection to the corresponding leaf switches. +4. The hook uses `NVUE` commands on the switch to build the tenant E/W datapath, allowing fully tenant-isolated GPU-to-GPU connectivity. + +## Current Status and Considerations + +{{< alert title="Important" color="info" >}} +This is a high-level overview of the integration. Customers interested in a detailed technical discussion and production deployment should contact OpenNebula Systems. +{{< /alert >}} + +* **Availability**: This integration is part of the OpenNebula Enterprise Edition and is available as a reference implementation. +* **Validation Environment**: The integration has been fully developed and validated in the **NVIDIA Air** cloud simulation platform, which provides a faithful, large-scale simulation of a Spectrum-X hardware environment. \ No newline at end of file