Kubernetes on NVIDIA GPUs
Kubernetes is an open source system for managing containerized applications across multiple hosts, providing basic mechanisms for deployment, maintenance, and scaling of applications.
Kubernetes builds upon a decade and a half of experience at Google running production workloads at scale using a system called Borg, combined with best-of-breed ideas and practices from the community.
Kubernetes on NVIDIA GPUs includes support for GPUs and enhancements to Kubernetes, so users can easily configure and use GPU resources for accelerating deep learning workloads.
To start using Kubernetes
Get started with Kubernetes on NVIDIA GPUs by reviewing the installation guide.
The general Kubernetes documentation is available at kubernetes.io.
For general Kubernetes issues, start with the troubleshooting guide.
This release of Kubernetes is supported on the following platforms.
NVIDIA GPU Cloud virtual machine images available on Amazon EC2 and Google Cloud Platform.
- Support for NVIDIA GPUs in Kubernetes using the NVIDIA device plugin
- Support for GPU attributes such as GPU type and memory requirements via the Kubernetes PodSpec
- Visualize and monitor GPU metrics and health with an integrated GPU monitoring stack of NVIDIA DCGM, Prometheus and Grafana
- Support for Docker and CRI-O using the NVIDIA Container Runtime