Skip to content

Commit

Permalink
Add v1beta1 API for Karpenter
Browse files Browse the repository at this point in the history
Signed-off-by: Heba Elayoty <hebaelayoty@gmail.com>
  • Loading branch information
helayoty committed Apr 26, 2024
1 parent 8b7941a commit 2f6ef1d
Show file tree
Hide file tree
Showing 10 changed files with 687 additions and 219 deletions.
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ BIN_DIR := $(abspath $(ROOT_DIR)/bin)
TOOLS_DIR := hack/tools
TOOLS_BIN_DIR := $(abspath $(TOOLS_DIR)/bin)

GOLANGCI_LINT_VER := v1.54.1
GOLANGCI_LINT_VER := v1.57.2
GOLANGCI_LINT_BIN := golangci-lint
GOLANGCI_LINT := $(abspath $(TOOLS_BIN_DIR)/$(GOLANGCI_LINT_BIN)-$(GOLANGCI_LINT_VER))

Expand Down
35 changes: 23 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,13 @@

| ![notification](docs/img/bell.svg) What is NEW! |
|-------------------------------------------------|
| Latest Release: March 28th, 2024. Kaito v0.2.2. |
| Latest Release: March 28th, 2024. Kaito v0.2.2. |
| First Release: Nov 15th, 2023. Kaito v0.1.0. |

Kaito is an operator that automates the AI/ML inference model deployment in a Kubernetes cluster.
The target models are popular large open-sourced inference models such as [falcon](https://huggingface.co/tiiuae) and [llama2](https://github.com/facebookresearch/llama).
Kaito has the following key differentiations compared to most of the mainstream model deployment methodologies built on top of virtual machine infrastructures:

- Manage large model files using container images. A http server is provided to perform inference calls using the model library.
- Avoid tuning deployment parameters to fit GPU hardware by providing preset configurations.
- Auto-provision GPU nodes based on model requirements.
Expand All @@ -29,18 +30,20 @@ Kaito follows the classic Kubernetes Custom Resource Definition(CRD)/controller
</div>

The above figure presents the Kaito architecture overview. Its major components consist of:

- **Workspace controller**: It reconciles the `workspace` custom resource, creates `machine` (explained below) custom resources to trigger node auto provisioning, and creates the inference workload (`deployment` or `statefulset`) based on the model preset configurations.
- **Node provisioner controller**: The controller's name is *gpu-provisioner* in [Kaito helm chart](charts/kaito/gpu-provisioner). It uses the `machine` CRD originated from [Karpenter](https://github.com/aws/karpenter-core) to interact with the workspace controller. It integrates with Azure Kubernetes Service(AKS) APIs to add new GPU nodes to the AKS cluster.
- **Node provisioner controller**: The controller's name is *gpu-provisioner* in [Kaito helm chart](charts/kaito/gpu-provisioner). It uses the `machine` CRD originated from [Karpenter](https://sigs.k8s.io/karpenter) to interact with the workspace controller. It integrates with Azure Kubernetes Service(AKS) APIs to add new GPU nodes to the AKS cluster.
Note that the *gpu-provisioner* is an open sourced component maintained in [this](https://github.com/Azure/gpu-provisioner) repository. It can be replaced by other controllers if they support Karpenter-core APIs.


## Installation
## Installation

Please check the installation guidance [here](./docs/installation.md).

## Quick start

After installing Kaito, one can try following commands to start a falcon-7b inference service.
```

```sh
$ cat examples/kaito_workspace_falcon_7b.yaml
apiVersion: kaito.sh/v1alpha1
kind: Workspace
Expand All @@ -58,15 +61,17 @@ inference:
$ kubectl apply -f examples/kaito_workspace_falcon_7b.yaml
```

The workspace status can be tracked by running the following command. When the WORKSPACEREADY column becomes `True`, the model has been deployed successfully.
```
The workspace status can be tracked by running the following command. When the WORKSPACEREADY column becomes `True`, the model has been deployed successfully.

```sh
$ kubectl get workspace workspace-falcon-7b
NAME INSTANCE RESOURCEREADY INFERENCEREADY WORKSPACEREADY AGE
workspace-falcon-7b Standard_NC12s_v3 True True True 10m
```

Next, one can find the inference service's cluster ip and use a temporal `curl` pod to test the service endpoint in the cluster.
```

```sh
$ kubectl get svc workspace-falcon-7b
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
workspace-falcon-7b ClusterIP <CLUSTERIP> <none> 80/TCP,29500/TCP 10m
Expand All @@ -93,28 +98,33 @@ Kaito provides a limited capability to override preset configurations for models
To update parameters for a deployed model, perform `kubectl edit` against the workload, which could be either a `StatefulSet` or `Deployment`.
For example, to enable 4-bit quantization on a `falcon-7b-instruct` deployment, you would execute:

```
```sh
kubectl edit deployment workspace-falcon-7b-instruct
```

Within the deployment specification, locate and modify the command field.

#### Original
```

```sh
accelerate launch --num_processes 1 --num_machines 1 --machine_rank 0 --gpu_ids all inference_api.py --pipeline text-generation --torch_dtype bfloat16
```

#### Modify to enable 4-bit Quantization
```

```sh
accelerate launch --num_processes 1 --num_machines 1 --machine_rank 0 --gpu_ids all inference_api.py --pipeline text-generation --torch_dtype bfloat16 --load_in_4bit
```

Currently, we allow users to change the following paramenters manually:
Currently, we allow users to change the following paramenters manually:

- `pipeline`: For text-generation models this can be either `text-generation` or `conversational`.
- `load_in_4bit` or `load_in_8bit`: Model quantization resolution.

Should you need to customize other parameters, kindly file an issue for potential future inclusion.

### What is the difference between instruct and non-instruct models?

The main distinction lies in their intended use cases. Instruct models are fine-tuned versions optimized
for interactive chat applications. They are typically the preferred choice for most implementations due to their enhanced performance in
conversational contexts.
Expand All @@ -137,6 +147,7 @@ For more information see the [Code of Conduct FAQ](https://opensource.microsoft.
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.

## Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
trademarks or logos is subject to and must follow [Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/legal/intellectualproperty/trademarks/usage/general).
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Expand Down
12 changes: 9 additions & 3 deletions cmd/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,21 @@ package main

import (
"flag"
"github.com/azure/kaito/pkg/k8sclient"
"os"
"strconv"
"time"

"github.com/azure/kaito/pkg/k8sclient"
"sigs.k8s.io/karpenter/pkg/apis/v1beta1"

"github.com/aws/karpenter-core/pkg/apis/v1alpha5"
"github.com/azure/kaito/pkg/controllers"
"github.com/azure/kaito/pkg/webhooks"
"k8s.io/klog/v2"
"knative.dev/pkg/injection/sharedmain"
"knative.dev/pkg/signals"
"knative.dev/pkg/webhook"
metricsserver "sigs.k8s.io/controller-runtime/pkg/metrics/server"

// Import all Kubernetes client auth plugins (e.g. Azure, GCP, OIDC, etc.)
// to ensure that exec-entrypoint and run can make use of them.
Expand Down Expand Up @@ -52,6 +55,7 @@ func init() {

utilruntime.Must(kaitov1alpha1.AddToScheme(scheme))
utilruntime.Must(v1alpha5.SchemeBuilder.AddToScheme(scheme))
utilruntime.Must(v1beta1.SchemeBuilder.AddToScheme(scheme))
//+kubebuilder:scaffold:scheme
klog.InitFlags(nil)
}
Expand All @@ -77,8 +81,10 @@ func main() {
ctrl.SetLogger(zap.New(zap.UseFlagOptions(&opts)))

mgr, err := ctrl.NewManager(ctrl.GetConfigOrDie(), ctrl.Options{
Scheme: scheme,
MetricsBindAddress: metricsAddr,
Scheme: scheme,
Metrics: metricsserver.Options{
BindAddress: ":" + metricsAddr,
},
HealthProbeBindAddress: probeAddr,
LeaderElection: enableLeaderElection,
LeaderElectionID: "ef60f9b0.io",
Expand Down
155 changes: 80 additions & 75 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -4,21 +4,22 @@ go 1.22

require (
github.com/aws/karpenter-core v0.29.2
github.com/go-logr/logr v1.2.4
github.com/onsi/ginkgo/v2 v2.9.7
github.com/onsi/gomega v1.27.8
github.com/samber/lo v1.38.1
github.com/go-logr/logr v1.4.1
github.com/onsi/ginkgo/v2 v2.17.1
github.com/onsi/gomega v1.32.0
github.com/samber/lo v1.39.0
github.com/stretchr/testify v1.8.4
gopkg.in/yaml.v2 v2.4.0
gotest.tools v2.2.0+incompatible
k8s.io/api v0.27.7
k8s.io/apimachinery v0.27.7
k8s.io/client-go v0.27.7
k8s.io/klog/v2 v2.100.1
k8s.io/kubernetes v1.27.7
k8s.io/utils v0.0.0-20230406110748-d93618cff8a2
k8s.io/api v0.29.3
k8s.io/apimachinery v0.29.3
k8s.io/client-go v0.29.3
k8s.io/klog/v2 v2.120.1
k8s.io/kubernetes v1.29.3
k8s.io/utils v0.0.0-20240102154912-e7106e64919e
knative.dev/pkg v0.0.0-20230712131115-7051d301e7f4
sigs.k8s.io/controller-runtime v0.15.2
sigs.k8s.io/controller-runtime v0.17.2
sigs.k8s.io/karpenter v0.36.1
)

require (
Expand All @@ -30,106 +31,110 @@ require (
github.com/census-instrumentation/opencensus-proto v0.4.1 // indirect
github.com/cespare/xxhash/v2 v2.2.0 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/emicklei/go-restful/v3 v3.9.0 // indirect
github.com/emicklei/go-restful/v3 v3.11.0 // indirect
github.com/evanphx/json-patch v4.12.0+incompatible // indirect
github.com/evanphx/json-patch/v5 v5.6.0 // indirect
github.com/fsnotify/fsnotify v1.6.0 // indirect
github.com/evanphx/json-patch/v5 v5.8.0 // indirect
github.com/fsnotify/fsnotify v1.7.0 // indirect
github.com/go-kit/log v0.2.1 // indirect
github.com/go-logfmt/logfmt v0.5.1 // indirect
github.com/go-logr/zapr v1.2.4 // indirect
github.com/go-logr/zapr v1.3.0 // indirect
github.com/go-openapi/jsonpointer v0.19.6 // indirect
github.com/go-openapi/jsonreference v0.20.1 // indirect
github.com/go-openapi/jsonreference v0.20.2 // indirect
github.com/go-openapi/swag v0.22.3 // indirect
github.com/go-task/slim-sprig v0.0.0-20230315185526-52ccab3ef572 // indirect
github.com/gobuffalo/flect v0.2.4 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect
github.com/golang/protobuf v1.5.3 // indirect
github.com/google/gnostic v0.5.7-v3refs // indirect
github.com/google/go-cmp v0.5.9 // indirect
github.com/golang/protobuf v1.5.4 // indirect
github.com/google/gnostic-models v0.6.8 // indirect
github.com/google/go-cmp v0.6.0 // indirect
github.com/google/gofuzz v1.2.0 // indirect
github.com/google/pprof v0.0.0-20210720184732-4bb14d4b1be1 // indirect
github.com/google/uuid v1.3.0 // indirect
github.com/grpc-ecosystem/grpc-gateway/v2 v2.11.3 // indirect
github.com/grpc-ecosystem/grpc-gateway/v2 v2.16.0 // indirect
github.com/hashicorp/golang-lru v0.5.4 // indirect
github.com/imdario/mergo v0.3.16 // indirect
github.com/inconshreveable/mousetrap v1.0.1 // indirect
github.com/inconshreveable/mousetrap v1.1.0 // indirect
github.com/josharian/intern v1.0.0 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/kelseyhightower/envconfig v1.4.0 // indirect
github.com/mailru/easyjson v0.7.7 // indirect
github.com/matttproud/golang_protobuf_extensions v1.0.4 // indirect
github.com/mitchellh/hashstructure/v2 v2.0.2 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.2 // indirect
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/prometheus/client_golang v1.15.1 // indirect
github.com/prometheus/client_model v0.4.0 // indirect
github.com/prometheus/common v0.42.0 // indirect
github.com/prometheus/procfs v0.9.0 // indirect
github.com/prometheus/client_golang v1.19.0 // indirect
github.com/prometheus/client_model v0.6.0 // indirect
github.com/prometheus/common v0.48.0 // indirect
github.com/prometheus/procfs v0.12.0 // indirect
github.com/prometheus/statsd_exporter v0.21.0 // indirect
github.com/spf13/cobra v1.6.0 // indirect
github.com/robfig/cron/v3 v3.0.1 // indirect
github.com/spf13/cobra v1.7.0 // indirect
github.com/spf13/pflag v1.0.5 // indirect
github.com/stretchr/objx v0.5.0 // indirect
go.opencensus.io v0.24.0 // indirect
go.uber.org/atomic v1.9.0 // indirect
go.uber.org/atomic v1.10.0 // indirect
go.uber.org/automaxprocs v1.4.0 // indirect
go.uber.org/multierr v1.11.0 // indirect
go.uber.org/zap v1.24.0 // indirect
golang.org/x/exp v0.0.0-20220303212507-bbda1eaf7a17 // indirect
golang.org/x/net v0.17.0 // indirect
golang.org/x/oauth2 v0.8.0 // indirect
golang.org/x/sync v0.2.0 // indirect
golang.org/x/sys v0.13.0 // indirect
golang.org/x/term v0.13.0 // indirect
golang.org/x/text v0.13.0 // indirect
golang.org/x/time v0.3.0 // indirect
golang.org/x/tools v0.9.1 // indirect
gomodules.xyz/jsonpatch/v2 v2.3.0 // indirect
google.golang.org/api v0.124.0 // indirect
go.uber.org/zap v1.27.0 // indirect
golang.org/x/exp v0.0.0-20220722155223-a9213eeb770e // indirect
golang.org/x/net v0.23.0 // indirect
golang.org/x/oauth2 v0.16.0 // indirect
golang.org/x/sync v0.6.0 // indirect
golang.org/x/sys v0.18.0 // indirect
golang.org/x/term v0.18.0 // indirect
golang.org/x/text v0.14.0 // indirect
golang.org/x/time v0.5.0 // indirect
golang.org/x/tools v0.17.0 // indirect
gomodules.xyz/jsonpatch/v2 v2.4.0 // indirect
google.golang.org/api v0.126.0 // indirect
google.golang.org/appengine v1.6.7 // indirect
google.golang.org/genproto v0.0.0-20230410155749-daa745c078e1 // indirect
google.golang.org/grpc v1.56.3 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20230726155614-23370e0ffb3e // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20230822172742-b8732ec3820d // indirect
google.golang.org/grpc v1.58.3 // indirect
google.golang.org/protobuf v1.33.0 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
k8s.io/apiextensions-apiserver v0.27.2 // indirect
k8s.io/component-base v0.27.7 // indirect
k8s.io/kube-openapi v0.0.0-20230501164219-8b0f38b5fd1f // indirect
k8s.io/apiextensions-apiserver v0.29.3 // indirect
k8s.io/apiserver v0.29.3 // indirect
k8s.io/component-base v0.29.3 // indirect
k8s.io/kube-openapi v0.0.0-20231010175941-2dd684a91f00 // indirect
k8s.io/pod-security-admission v0.0.0 // indirect
sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.2.3 // indirect
sigs.k8s.io/yaml v1.3.0 // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.4.1 // indirect
sigs.k8s.io/yaml v1.4.0 // indirect
)

replace (
k8s.io/api => k8s.io/api v0.27.7
k8s.io/apiextensions-apiserver => k8s.io/apiextensions-apiserver v0.27.7
k8s.io/apimachinery => k8s.io/apimachinery v0.27.7
k8s.io/cli-runtime => k8s.io/cli-runtime v0.27.7
k8s.io/client-go => k8s.io/client-go v0.27.7
k8s.io/cloud-provider => k8s.io/cloud-provider v0.27.7
k8s.io/cluster-bootstrap => k8s.io/cluster-bootstrap v0.27.7
k8s.io/code-generator => k8s.io/code-generator v0.27.7
k8s.io/component-base => k8s.io/component-base v0.27.7
k8s.io/component-helpers => k8s.io/component-helpers v0.27.7
k8s.io/controller-manager => k8s.io/controller-manager v0.27.7
k8s.io/cri-api => k8s.io/cri-api v0.27.7
k8s.io/csi-translation-lib => k8s.io/csi-translation-lib v0.27.7
k8s.io/dynamic-resource-allocation => k8s.io/dynamic-resource-allocation v0.27.7
k8s.io/kms => k8s.io/kms v0.27.7
k8s.io/kube-aggregator => k8s.io/kube-aggregator v0.27.7
k8s.io/kube-controller-manager => k8s.io/kube-controller-manager v0.27.7
k8s.io/kube-proxy => k8s.io/kube-proxy v0.27.7
k8s.io/kube-scheduler => k8s.io/kube-scheduler v0.27.7
k8s.io/kubectl => k8s.io/kubectl v0.27.7
k8s.io/kubelet => k8s.io/kubelet v0.27.7
k8s.io/legacy-cloud-providers => k8s.io/legacy-cloud-providers v0.27.7
k8s.io/metrics => k8s.io/metrics v0.27.7
k8s.io/mount-utils => k8s.io/mount-utils v0.27.7
k8s.io/pod-security-admission => k8s.io/pod-security-admission v0.27.7
k8s.io/sample-apiserver => k8s.io/sample-apiserver v0.27.7
k8s.io/sample-cli-plugin => k8s.io/sample-cli-plugin v0.27.7
k8s.io/sample-controller => k8s.io/sample-controller v0.27.7
k8s.io/api => k8s.io/api v0.29.3
k8s.io/apiextensions-apiserver => k8s.io/apiextensions-apiserver v0.29.3
k8s.io/apimachinery => k8s.io/apimachinery v0.29.3
k8s.io/cli-runtime => k8s.io/cli-runtime v0.29.3
k8s.io/client-go => k8s.io/client-go v0.29.3
k8s.io/cloud-provider => k8s.io/cloud-provider v0.29.3
k8s.io/cluster-bootstrap => k8s.io/cluster-bootstrap v0.29.3
k8s.io/code-generator => k8s.io/code-generator v0.29.3
k8s.io/component-base => k8s.io/component-base v0.29.3
k8s.io/component-helpers => k8s.io/component-helpers v0.29.3
k8s.io/controller-manager => k8s.io/controller-manager v0.29.3
k8s.io/cri-api => k8s.io/cri-api v0.29.3
k8s.io/csi-translation-lib => k8s.io/csi-translation-lib v0.29.3
k8s.io/dynamic-resource-allocation => k8s.io/dynamic-resource-allocation v0.29.3
k8s.io/endpointslice => k8s.io/endpointslice v0.29.3
k8s.io/kms => k8s.io/kms v0.29.3
k8s.io/kube-aggregator => k8s.io/kube-aggregator v0.29.3
k8s.io/kube-controller-manager => k8s.io/kube-controller-manager v0.29.3
k8s.io/kube-proxy => k8s.io/kube-proxy v0.29.3
k8s.io/kube-scheduler => k8s.io/kube-scheduler v0.29.3
k8s.io/kubectl => k8s.io/kubectl v0.29.3
k8s.io/kubelet => k8s.io/kubelet v0.29.3
k8s.io/legacy-cloud-providers => k8s.io/legacy-cloud-providers v0.29.3
k8s.io/metrics => k8s.io/metrics v0.29.3
k8s.io/mount-utils => k8s.io/mount-utils v0.29.3
k8s.io/pod-security-admission => k8s.io/pod-security-admission v0.29.3
k8s.io/sample-apiserver => k8s.io/sample-apiserver v0.29.3
k8s.io/sample-cli-plugin => k8s.io/sample-cli-plugin v0.29.3
k8s.io/sample-controller => k8s.io/sample-controller v0.29.3
)
Loading

0 comments on commit 2f6ef1d

Please sign in to comment.