Skip to content

Commit

Permalink
Add the vGPUScheduler to support Alnair Virtual GPUs (#104)
Browse files Browse the repository at this point in the history
* add our vGPU scheduler to the main

* clean against code
  • Loading branch information
YHDING23 committed Feb 21, 2022
1 parent d99989a commit d66cd91
Show file tree
Hide file tree
Showing 15 changed files with 2,023 additions and 0 deletions.
7 changes: 7 additions & 0 deletions autonomous-scheduler/vGPUScheduler/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
FROM debian:stretch-slim

WORKDIR /

COPY kube-scheduler /usr/local/bin

CMD ["kube-scheduler"]
18 changes: 18 additions & 0 deletions autonomous-scheduler/vGPUScheduler/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
all: local

local:
GOOS=linux GOARCH=amd64 go build -o=kube-scheduler ./cmd/scheduler

build:

sudo docker build --no-cache . -t centaurusinfra/vgpu-scheduler:0.3.0

push:
sudo docker push centaurusinfra/vgpu-scheduler:0.3.0

# Run go fmt against code
fmt:
sudo gofmt -l -w .

clean: fmt vet
sudo rm -f kube-scheduler
42 changes: 42 additions & 0 deletions autonomous-scheduler/vGPUScheduler/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# vGPUScheduler: A customized scheduler for virtual GPUs

vGPUScheduler is a customized kubernetes scheduler based on [scheduling-framework](https://github.com/kubernetes/enhancements/blob/master/keps/sig-scheduling/20180409-scheduling-framework.md). The APIs of scheduling framework allow most scheduling features to be implemented as plugins, while keeping the scheduling core more maintainable. As shown in the diagram, the framework defines a few extension points in both the scheduling cycle and the binding cycle. Our design of plugins are registered and invoked at the Filter and Score extension points to change the scheduling decisions, respectively.

#### Diagram of K8S scheduling-framework and our design
![framework](./img/framework.png)

#### Get Started
- Make sure kubernetes cluster version is 1.17+, otherwise it may not fully support the K8S scheduling-framework
- Git clone the alnair Repo:
```shell
git clone git@github.com:CentaurusInfra/alnair.git
```
- In the vGPUScheduler folder, compile:
```shell
make local
```
- Build the docker image, and push to your docker hub:
```shell
make build
make push
make clean
```
- Backup your `kube-scheduler.yaml` usually located in `/etc/kubernetes/manifests/`. Then copy our `manifests/kube-scheduler.yaml` and `menifests/vGPUScheduler-config.yaml` to `/etc/kubernetes/manifests/`. Change the image link in `kube-scheduler.yaml` accordingly.
- Check if the new scheduler is running in your cluster:
```shell
kubectl get pod -n kube-system | grep "scheduler"
```
#### Deploy a Pod using vGPUScheduler
- Create a pod which needs 4GB GPU memory
```shell
kubectl create -f pod.yaml
```
- Then create a pot which needs 10G GPU memory
```shell
kubectl create -f pod2.yaml
```
- Check the pods status and see which node and GPU they were bound to via the pod annotations.
```shell
kubectl get pod gpu-pod -o yaml | grep -A 8 annotations
```

29 changes: 29 additions & 0 deletions autonomous-scheduler/vGPUScheduler/cmd/scheduler/main.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
package main

import (
"fmt"
"math/rand"
"os"
"time"

"k8s.io/component-base/logs"
"k8s.io/kubernetes/cmd/kube-scheduler/app"
"vGPUScheduler/pkg/vGPUScheduler"

_ "sigs.k8s.io/scheduler-plugins/pkg/apis/config/scheme"
)

func main() {
rand.Seed(time.Now().UTC().UnixNano())
logs.InitLogs()
defer logs.FlushLogs()

cmd := app.NewSchedulerCommand(
app.WithPlugin(vGPUScheduler.Name, vGPUScheduler.New),
)

if err := cmd.Execute(); err != nil {
_, _ = fmt.Fprintf(os.Stderr, "%v\n", err)
os.Exit(1)
}
}
139 changes: 139 additions & 0 deletions autonomous-scheduler/vGPUScheduler/go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
module vGPUScheduler

go 1.17

require (
k8s.io/api v0.22.6
k8s.io/apimachinery v0.22.6
k8s.io/client-go v0.22.6
k8s.io/component-base v0.22.6
k8s.io/klog/v2 v2.9.0
k8s.io/kubernetes v1.22.6
sigs.k8s.io/scheduler-plugins v0.22.6
)

require (
github.com/Azure/go-ansiterm v0.0.0-20210617225240-d185dfc1b5a1 // indirect
github.com/NYTimes/gziphandler v1.1.1 // indirect
github.com/PuerkitoBio/purell v1.1.1 // indirect
github.com/PuerkitoBio/urlesc v0.0.0-20170810143723-de5bf2ad4578 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/bits-and-blooms/bitset v1.2.0 // indirect
github.com/blang/semver v3.5.1+incompatible // indirect
github.com/cespare/xxhash/v2 v2.1.1 // indirect
github.com/coreos/go-semver v0.3.0 // indirect
github.com/coreos/go-systemd/v22 v22.3.2 // indirect
github.com/cyphar/filepath-securejoin v0.2.2 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/docker/distribution v2.7.1+incompatible // indirect
github.com/emicklei/go-restful v2.9.5+incompatible // indirect
github.com/evanphx/json-patch v4.11.0+incompatible // indirect
github.com/felixge/httpsnoop v1.0.1 // indirect
github.com/go-logr/logr v0.4.0 // indirect
github.com/go-openapi/jsonpointer v0.19.5 // indirect
github.com/go-openapi/jsonreference v0.19.5 // indirect
github.com/go-openapi/swag v0.19.14 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect
github.com/golang/protobuf v1.5.2 // indirect
github.com/google/go-cmp v0.5.5 // indirect
github.com/google/gofuzz v1.1.0 // indirect
github.com/google/uuid v1.1.2 // indirect
github.com/googleapis/gnostic v0.5.5 // indirect
github.com/grpc-ecosystem/go-grpc-prometheus v1.2.0 // indirect
github.com/grpc-ecosystem/grpc-gateway v1.16.0 // indirect
github.com/imdario/mergo v0.3.5 // indirect
github.com/inconshreveable/mousetrap v1.0.0 // indirect
github.com/josharian/intern v1.0.0 // indirect
github.com/json-iterator/go v1.1.11 // indirect
github.com/mailru/easyjson v0.7.6 // indirect
github.com/matttproud/golang_protobuf_extensions v1.0.2-0.20181231171920-c182affec369 // indirect
github.com/moby/term v0.0.0-20210610120745-9d4ed1856297 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.1 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/opencontainers/go-digest v1.0.0 // indirect
github.com/opencontainers/runc v1.0.2 // indirect
github.com/opencontainers/selinux v1.8.2 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/prometheus/client_golang v1.11.0 // indirect
github.com/prometheus/client_model v0.2.0 // indirect
github.com/prometheus/common v0.26.0 // indirect
github.com/prometheus/procfs v0.6.0 // indirect
github.com/spf13/cobra v1.1.3 // indirect
github.com/spf13/pflag v1.0.5 // indirect
go.etcd.io/etcd/api/v3 v3.5.0 // indirect
go.etcd.io/etcd/client/pkg/v3 v3.5.0 // indirect
go.etcd.io/etcd/client/v3 v3.5.0 // indirect
go.opentelemetry.io/contrib v0.20.0 // indirect
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.20.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.20.0 // indirect
go.opentelemetry.io/otel v0.20.0 // indirect
go.opentelemetry.io/otel/exporters/otlp v0.20.0 // indirect
go.opentelemetry.io/otel/metric v0.20.0 // indirect
go.opentelemetry.io/otel/sdk v0.20.0 // indirect
go.opentelemetry.io/otel/sdk/export/metric v0.20.0 // indirect
go.opentelemetry.io/otel/sdk/metric v0.20.0 // indirect
go.opentelemetry.io/otel/trace v0.20.0 // indirect
go.opentelemetry.io/proto/otlp v0.7.0 // indirect
go.uber.org/atomic v1.7.0 // indirect
go.uber.org/multierr v1.6.0 // indirect
go.uber.org/zap v1.17.0 // indirect
golang.org/x/crypto v0.0.0-20210220033148-5ea612d1eb83 // indirect
golang.org/x/net v0.0.0-20211209124913-491a49abca63 // indirect
golang.org/x/oauth2 v0.0.0-20200107190931-bf48bf16ab8d // indirect
golang.org/x/sync v0.0.0-20210220032951-036812b2e83c // indirect
golang.org/x/sys v0.0.0-20210616094352-59db8d763f22 // indirect
golang.org/x/term v0.0.0-20210220032956-6a3ed077a48d // indirect
golang.org/x/text v0.3.6 // indirect
golang.org/x/time v0.0.0-20210723032227-1f47c861a9ac // indirect
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 // indirect
google.golang.org/appengine v1.6.5 // indirect
google.golang.org/genproto v0.0.0-20210602131652-f16073e35f0c // indirect
google.golang.org/grpc v1.38.0 // indirect
google.golang.org/protobuf v1.26.0 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
gopkg.in/natefinch/lumberjack.v2 v2.0.0 // indirect
gopkg.in/yaml.v2 v2.4.0 // indirect
gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b // indirect
k8s.io/apiserver v0.22.6 // indirect
k8s.io/cloud-provider v0.22.6 // indirect
k8s.io/component-helpers v0.22.6 // indirect
k8s.io/csi-translation-lib v0.22.6 // indirect
k8s.io/kube-openapi v0.0.0-20211109043538-20434351676c // indirect
k8s.io/kube-scheduler v0.22.6 // indirect
k8s.io/mount-utils v0.22.6 // indirect
k8s.io/utils v0.0.0-20210819203725-bdf08cb9a70a // indirect
sigs.k8s.io/apiserver-network-proxy/konnectivity-client v0.0.27 // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.2.1 // indirect
sigs.k8s.io/yaml v1.2.0 // indirect
)

replace (
k8s.io/api => k8s.io/api v0.22.6
k8s.io/apiextensions-apiserver => k8s.io/apiextensions-apiserver v0.22.6
k8s.io/apimachinery => k8s.io/apimachinery v0.22.6
k8s.io/apiserver => k8s.io/apiserver v0.22.6
k8s.io/cli-runtime => k8s.io/cli-runtime v0.22.6
k8s.io/client-go => k8s.io/client-go v0.22.6
k8s.io/cloud-provider => k8s.io/cloud-provider v0.22.6
k8s.io/cluster-bootstrap => k8s.io/cluster-bootstrap v0.22.6
k8s.io/code-generator => k8s.io/code-generator v0.22.6
k8s.io/component-base => k8s.io/component-base v0.22.6
k8s.io/component-helpers => k8s.io/component-helpers v0.22.6
k8s.io/controller-manager => k8s.io/controller-manager v0.22.6
k8s.io/cri-api => k8s.io/cri-api v0.22.6
k8s.io/csi-translation-lib => k8s.io/csi-translation-lib v0.22.6
k8s.io/kube-aggregator => k8s.io/kube-aggregator v0.22.6
k8s.io/kube-controller-manager => k8s.io/kube-controller-manager v0.22.6
k8s.io/kube-proxy => k8s.io/kube-proxy v0.22.6
k8s.io/kube-scheduler => k8s.io/kube-scheduler v0.22.6
k8s.io/kubectl => k8s.io/kubectl v0.22.6
k8s.io/kubelet => k8s.io/kubelet v0.22.6
k8s.io/kubernetes => k8s.io/kubernetes v1.22.6
k8s.io/legacy-cloud-providers => k8s.io/legacy-cloud-providers v0.22.6
k8s.io/metrics => k8s.io/metrics v0.22.6
k8s.io/mount-utils => k8s.io/mount-utils v0.22.6
k8s.io/pod-security-admission => k8s.io/pod-security-admission v0.22.6
k8s.io/sample-apiserver => k8s.io/sample-apiserver v0.22.6
)
Loading

0 comments on commit d66cd91

Please sign in to comment.