Skip to content
GPU Sharing Scheduler for Kubernetes Cluster
Go Shell Dockerfile Smarty
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.circleci add circleci (#23) Apr 10, 2019
cmd first commit for opensource Feb 18, 2019
config
deployer
docs
pkg Remove redundant node get (#17) Mar 18, 2019
samples Use GiB as default metric instead of MiB (#1) Mar 1, 2019
vendor first commit for opensource Feb 18, 2019
.travis.yml add travis support Feb 18, 2019
Dockerfile first commit for opensource Feb 18, 2019
Gopkg.lock first commit for opensource Feb 18, 2019
Gopkg.toml first commit for opensource Feb 18, 2019
LICENSE Initial commit Feb 18, 2019
README.md
demo1.jpg first commit for opensource Feb 18, 2019
demo2.jpg

README.md

GPU Sharing Scheduler Extender in Kubernetes

CircleCI Build Status Go Report Card

Overview

More and more data scientists run their Nvidia GPU based inference tasks on Kubernetes. Some of these tasks can be run on the same Nvidia GPU device to increase GPU utilization. So one important challenge is how to share GPUs between the pods. The community is also very interested in this topic.

Now there is a GPU sharing solution on native Kubernetes: it is based on scheduler extenders and device plugin mechanism, so you can reuse this solution easily in your own Kubernetes.

Prerequisites

  • Kubernetes 1.11+
  • golang 1.10+
  • NVIDIA drivers ~= 361.93
  • Nvidia-docker version > 2.0 (see how to install and it's prerequisites)
  • Docker configured with Nvidia as the default runtime.

Design

For more details about the design of this project, please read this Design document.

Setup

You can follow this Installation Guide. If you are using Alibaba Cloud Kubernetes, please follow this doc to install with Helm Charts.

User Guide

You can check this User Guide.

Developing

Scheduler Extender

git clone https://github.com/AliyunContainerService/gpushare-scheduler-extender.git && cd gpushare-scheduler-extender
docker build -t cheyang/gpushare-scheduler-extender .

Device Plugin

git clone https://github.com/AliyunContainerService/gpushare-device-plugin.git && cd gpushare-device-plugin
docker build -t cheyang/gpushare-device-plugin .

Kubectl Extension

  • golang > 1.10
mkdir -p $GOPATH/src/github.com/AliyunContainerService
cd $GOPATH/src/github.com/AliyunContainerService
git clone https://github.com/AliyunContainerService/gpushare-device-plugin.git
cd gpushare-device-plugin
go build -o $GOPATH/bin/kubectl-inspect-gpushare-v2 cmd/inspect/*.go

Demo

- Demo 1: Deploy multiple GPU Shared Pods and schedule them on the same GPU device in binpack way

- Demo 2: Avoid GPU memory requests that fit at the node level, but not at the GPU device level

Related Project

Roadmap

  • Integrate Nvidia MPS as the option for isolation
  • Automated Deployment for the Kubernetes cluster which is deployed by kubeadm
  • Scheduler Extener High Availablity
  • Generic Solution for GPU, RDMA and other devices

Acknowledgments

  • GPU sharing solution is based on Nvidia Docker2, and their gpu sharing design is our reference. The Nvidia Community is very supportive and We are very grateful.
You can’t perform that action at this time.