Skip to content

Commit

Permalink
Add Katib Bundle for Juju (kubeflow#1403)
Browse files Browse the repository at this point in the history
* Add Katib Bundle for Juju

Adds Python operators for Katib, corresponding to the latest Katib manifests.

Adds an `operators` folder with an OWNERS file to hold the operators.

* Fixing code review items

* Update README.md

Update README.md

* Dedent OWNERS file

* Update README.md

Co-authored-by: Rui Vasconcelos <rui.vasconcelos.mail@gmail.com>
  • Loading branch information
knkski and rui-vas committed Dec 3, 2020
1 parent b2e01e5 commit 96b81e8
Show file tree
Hide file tree
Showing 22 changed files with 941 additions and 0 deletions.
3 changes: 3 additions & 0 deletions operators/OWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
approvers:
- knkski
- rfmvasconcelos
33 changes: 33 additions & 0 deletions operators/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
## Katib Operators

### Overview
This bundle encompasses the Kubernetes python operators (a.k.a. charms) for Katib
(see [CharmHub](https://charmhub.io/?q=katib)).

The Katib operators are python scripts that wrap the latest released [Katib manifests][manifests],
providing lifecycle management for each application, handling events (install, upgrade,
integrate, remove).

[manifests]: https://github.com/kubeflow/katib/tree/master/manifests

## Install

### Install applications

To install Katib, run:

juju deploy katib

You can also install each application individually, like this:

juju deploy <application>

where `<application>` is one of `katib-controller`, `katib-ui`, or `katib-db-manager`.

** Note **: As a default, when you `juju deploy` an application or the full Katib
bundle, you will deploy the latest pushed commit of Katib, even if unreleased updates are
already available in the Kubeflow manifests. If you would like to try the latest
available charm run:


juju deploy foo --channel=edge
8 changes: 8 additions & 0 deletions operators/bundle.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
bundle: kubernetes
applications:
katib-controller: { charm: katib-controller, scale: 1, annotations: { gui-x: '0', gui-y: '0' } }
katib-db: { charm: cs:~charmed-osm/mariadb-k8s, scale: 1, annotations: { gui-x: '0', gui-y: '300' }, options: { database: katib } }
katib-db-manager: { charm: katib-db-manager, scale: 1, annotations: { gui-x: '300', gui-y: '0' } }
katib-ui: { charm: katib-ui, scale: 1, annotations: { gui-x: '300', gui-y: '300' } }
relations:
- [katib-db-manager, katib-db]
9 changes: 9 additions & 0 deletions operators/katib-controller/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
options:
webhook-port:
type: int
default: 443
description: Webhook port
metrics-port:
type: int
default: 8080
description: Metrics port
95 changes: 95 additions & 0 deletions operators/katib-controller/files/crds.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: experiments.kubeflow.org
spec:
additionalPrinterColumns:
- JSONPath: .status.conditions[-1:].type
name: Type
type: string
- JSONPath: .status.conditions[-1:].status
name: Status
type: string
- JSONPath: .metadata.creationTimestamp
name: Age
type: date
group: kubeflow.org
version: v1beta1
scope: Namespaced
subresources:
status: {}
names:
kind: Experiment
singular: experiment
plural: experiments
categories:
- all
- kubeflow
- katib

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: suggestions.kubeflow.org
spec:
additionalPrinterColumns:
- JSONPath: .status.conditions[-1:].type
name: Type
type: string
- JSONPath: .status.conditions[-1:].status
name: Status
type: string
- JSONPath: .spec.requests
name: Requested
type: string
- JSONPath: .status.suggestionCount
name: Assigned
type: string
- JSONPath: .metadata.creationTimestamp
name: Age
type: date
group: kubeflow.org
version: v1beta1
scope: Namespaced
subresources:
status: {}
names:
kind: Suggestion
singular: suggestion
plural: suggestions
categories:
- all
- kubeflow
- katib

---
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
name: trials.kubeflow.org
spec:
additionalPrinterColumns:
- JSONPath: .status.conditions[-1:].type
name: Type
type: string
- JSONPath: .status.conditions[-1:].status
name: Status
type: string
- JSONPath: .metadata.creationTimestamp
name: Age
type: date
group: kubeflow.org
version: v1beta1
scope: Namespaced
subresources:
status: {}
names:
kind: Trial
singular: trial
plural: trials
categories:
- all
- kubeflow
- katib
16 changes: 16 additions & 0 deletions operators/katib-controller/files/defaultTrialTemplate.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
apiVersion: batch/v1
kind: Job
spec:
template:
spec:
containers:
- name: training-container
image: docker.io/kubeflowkatib/mxnet-mnist:v1beta1-e294a90
command:
- "python3"
- "/opt/mxnet-mnist/mnist.py"
- "--batch-size=64"
- "--lr=${trialParameters.learningRate}"
- "--num-layers=${trialParameters.numberLayers}"
- "--optimizer=${trialParameters.optimizer}"
restartPolicy: Never
6 changes: 6 additions & 0 deletions operators/katib-controller/files/early-stopping.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"medianstop": {
"image": "docker.io/kubeflowkatib/earlystopping-medianstop",
"imagePullPolicy": "Always"
}
}
16 changes: 16 additions & 0 deletions operators/katib-controller/files/enasCPUTemplate.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
apiVersion: batch/v1
kind: Job
spec:
template:
spec:
containers:
- name: training-container
image: docker.io/kubeflowkatib/enas-cnn-cifar10-cpu:v1beta1-e294a90
command:
- python3
- -u
- RunTrial.py
- --num_epochs=1
- "--architecture=\"${trialParameters.neuralNetworkArchitecture}\""
- "--nn_config=\"${trialParameters.neuralNetworkConfig}\""
restartPolicy: Never
16 changes: 16 additions & 0 deletions operators/katib-controller/files/metrics-collector-sidecar.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"StdOut": {
"image": "docker.io/kubeflowkatib/file-metrics-collector"
},
"File": {
"image": "docker.io/kubeflowkatib/file-metrics-collector"
},
"TensorFlowEvent": {
"image": "docker.io/kubeflowkatib/tfevent-metrics-collector",
"resources": {
"limits": {
"memory": "1Gi"
}
}
}
}
32 changes: 32 additions & 0 deletions operators/katib-controller/files/pytorchJobTemplate.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
apiVersion: "kubeflow.org/v1"
kind: PyTorchJob
spec:
pytorchReplicaSpecs:
Master:
replicas: 1
restartPolicy: OnFailure
template:
spec:
containers:
- name: pytorch
image: gcr.io/kubeflow-ci/pytorch-dist-mnist-test:v1.0
imagePullPolicy: Always
command:
- "python"
- "/var/mnist.py"
- "--lr=${trialParameters.learningRate}"
- "--momentum=${trialParameters.momentum}"
Worker:
replicas: 2
restartPolicy: OnFailure
template:
spec:
containers:
- name: pytorch
image: gcr.io/kubeflow-ci/pytorch-dist-mnist-test:v1.0
imagePullPolicy: Always
command:
- "python"
- "/var/mnist.py"
- "--lr=${trialParameters.learningRate}"
- "--momentum=${trialParameters.momentum}"
32 changes: 32 additions & 0 deletions operators/katib-controller/files/suggestion.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{
"random": {
"image": "docker.io/kubeflowkatib/suggestion-hyperopt"
},
"grid": {
"image": "docker.io/kubeflowkatib/suggestion-chocolate"
},
"hyperband": {
"image": "docker.io/kubeflowkatib/suggestion-hyperband"
},
"bayesianoptimization": {
"image": "docker.io/kubeflowkatib/suggestion-skopt"
},
"tpe": {
"image": "docker.io/kubeflowkatib/suggestion-hyperopt"
},
"enas": {
"image": "docker.io/kubeflowkatib/suggestion-enas",
"imagePullPolicy": "Always",
"resources": {
"limits": {
"memory": "200Mi"
}
}
},
"cmaes": {
"image": "docker.io/kubeflowkatib/suggestion-goptuna"
},
"darts": {
"image": "docker.io/kubeflowkatib/suggestion-darts"
}
}
6 changes: 6 additions & 0 deletions operators/katib-controller/layer.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
repo: https://github.com/juju-solutions/bundle-kubeflow.git
includes:
- "layer:caas-base"
- "layer:status"
- "layer:docker-resource"
- "interface:http"
22 changes: 22 additions & 0 deletions operators/katib-controller/metadata.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: katib-controller
display-name: Katib Controller
summary: A Kubernetes-native project for automated machine learning (AutoML)
description: |
Katib supports Hyperparameter Tuning, Early Stopping and Neural Architecture Search
Katib is the project which is agnostic to machine learning (ML) frameworks. It can tune
hyperparameters of applications written in any language of the users’ choice and natively
supports many ML frameworks, such as TensorFlow, MXNet, PyTorch, XGBoost, and others.
tags: [ai, bigdata, katib, kubeflow, machine-learning, hyperparameter]
maintainers: [Kenneth Koski <kenneth.koski@canonical.com>]
series: [kubernetes]
resources:
oci-image:
type: oci-image
description: Backing OCI image
auto-fetch: true
upstream-source: docker.io/kubeflowkatib/katib-controller:v1beta1-a96ff59
provides:
katib-controller:
interface: http
min-juju-version: 2.8.6

0 comments on commit 96b81e8

Please sign in to comment.