# Hyper-Parameter Tuning with Kubeflow

Hyperparameter optimization or tuning chooses a set of optimal hyperparameters, parameters that control the learning process, for a learning algorithm. The set of hyperparameters yield an optimal model that minimizes a pre-defined loss function on given test data. 

There are many approaches for HPO: 
- grid search
- random search
- bayesian optimization
- gradient-based optimization
- evolutionary optimization
- population based training


# Katib

The [Katib](https://github.com/kubeflow/katib) project is inspired by the [Google Vizier Paper](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/46180.pdf). 

Katib is a scalable and flexible hyperparameter tuning framework and is tightly integrated with Kubernetes. It does not depend on any specific deep learning framework (such as TensorFlow, MXNet, or PyTorch).

Here are some notes on Katib:
* Optimizes a given objective metric such as validation accuracy 
* Supports Int, Double, Discrete, and Categorical parameter ranges
* Option for early stopping

# Hyper-Parameter Tuning Examples 

With Kubeflow Katib, we will run popular hyper-parameter tuning algorithms including `random search`, `grid search`, and `bayesian optimization`.

# Random Search

In [1]:
!pygmentize ./hyper-parameter-tuning/random-search-example.yaml

[94mapiVersion[39;49;00m: [33m"[39;49;00m[33mkubeflow.org/v1alpha3[39;49;00m[33m"[39;49;00m
[94mkind[39;49;00m: Experiment
[94mmetadata[39;49;00m:
  [94mlabels[39;49;00m:
    [94mcontroller-tools.k8s.io[39;49;00m: [33m"[39;49;00m[33m1.0[39;49;00m[33m"[39;49;00m
  [94mname[39;49;00m: random-example
[94mspec[39;49;00m:
  [94mobjective[39;49;00m:
    [94mtype[39;49;00m: maximize
    [94mgoal[39;49;00m: 0.99
    [94mobjectiveMetricName[39;49;00m: Validation-accuracy
    [94madditionalMetricNames[39;49;00m:
      - accuracy
  [94malgorithm[39;49;00m:
    [94malgorithmName[39;49;00m: random
  [94mparallelTrialCount[39;49;00m: 3
  [94mmaxTrialCount[39;49;00m: 3
  [94mmaxFailedTrialCount[39;49;00m: 3
  [94mparameters[39;49;00m:
    - [94mname[39;49;00m: --lr
      [94mparameterType[39;49;00m: double
      [94mfeasibleSpace[39;49;00m:
        [94mmin[39;49;00m: [33m"[39;49;00m[33m0.01[39;49;00m[33m"[39;49;00m
 

In [2]:
!kubectl create -f ./hyper-parameter-tuning/random-search-example.yaml

experiment.kubeflow.org/random-example created


If you check manifest, you will see


```yaml
parameters:
- name: --lr
  parameterType: double
  feasibleSpace:
    min: "0.01"
    max: "0.03"
- name: --num-layers
  parameterType: int
  feasibleSpace:
    min: "2"
    max: "5"
- name: --optimizer
  parameterType: categorical
  feasibleSpace:
    list:
    - sgd
    - adam
    - ftrl
```


This job generates 3 hyperparameters, parameter type and range are also listed.

* --lr (Learning Rate) - type: double
* --num-layers (Number of NN Layer) - type: int
* --optimizer (optimizer) - type: categorical

The demo should start an experiment and run three jobs with different parameters. You can run following command to check job status.

When the `spec.Status.Condition` changes to Completed, the experiment is finished.


In [3]:
!kubectl describe experiment random-example

Name:         random-example
Namespace:    anonymous
Labels:       controller-tools.k8s.io=1.0
Annotations:  <none>
API Version:  kubeflow.org/v1alpha3
Kind:         Experiment
Metadata:
  Creation Timestamp:  2020-09-26T22:57:39Z
  Finalizers:
    update-prometheus-metrics
  Generation:        2
  Resource Version:  37836
  Self Link:         /apis/kubeflow.org/v1alpha3/namespaces/anonymous/experiments/random-example
  UID:               7b73470e-abf9-41a6-9042-d87f4aaa3c0f
Spec:
  Algorithm:
    Algorithm Name:        random
    Algorithm Settings:    <nil>
  Max Failed Trial Count:  3
  Max Trial Count:         3
  Metrics Collector Spec:
    Collector:
      Kind:  StdOut
  Objective:
    Additional Metric Names:
      accuracy
    Goal:                   0.99
    Objective Metric Name:  Validation-accuracy
    Type:                   maximize
  Parallel Trial Count:     3
  Parameters:
    Feasible Space:
      Max:           0.03
      Min:       

# Navigate to Katib to Monitor the Hyper-Parameter Tuning Jobs

You can monitor your results in the Katib UI. If you installed Kubeflow using the deployment guide, you can access the Katib UI at `https://<your kubeflow endpoint>/katib/`

For `random-experiment`, please go to `HP (HypterParameter)` -> `Monitor` -> `random-experiment`.

![katib-experiment-selection.png](./img/katib-experiment-selection.png)

### Pick up best parameters in from results

Once you click job and go the detail page, you will see different combination of parameters and accuracy.


| trialName  | Validation-accuracy 	| accuracy 	| --lr 	| --num-layers 	| --optimizer|
|----------------------------|----------|----------|----------------------|---|------|
| random-experiment-rfwwbnsd | 0.974920 | 0.984844 | 0.013831565266960293 | 4 | sgd  |
| random-experiment-vxgwlgqq | 0.113854 | 0.116646 | 0.024225789898529138 | 4 | ftrl |
| random-experiment-wclrwlcq | 0.979697 | 0.998437 | 0.021916171239020756 | 4 | sgd  |
| random-experiment-7lsc4pwb | 0.113854 | 0.115312 | 0.024163810384272653 | 5 | ftrl |
| random-experiment-86vv9vgv | 0.963475 | 0.971562 | 0.02943228249244735  | 3 | adam |
| random-experiment-jh884cxz | 0.981091 | 0.999219 | 0.022372025623908262 | 2 | sgd  |
| random-experiment-sgtwhrgz | 0.980693 | 0.997969 | 0.016641686851083654 | 4 | sgd  |
| random-experiment-c6vvz6dv | 0.980792 | 0.998906 | 0.0264125850165842   | 3 | sgd  |
| random-experiment-vqs2xmfj | 0.113854 | 0.105313 | 0.026629394628228185 | 4 | ftrl |
| random-experiment-bv8lsh2m | 0.980195 | 0.999375 | 0.021769570793012488 | 2 | sgd  |
| random-experiment-7vbnqc7z | 0.113854 | 0.102188 | 0.025079750575740783 | 4 | ftrl |
| random-experiment-kwj9drmg | 0.979498 | 0.995469 | 0.014985919312945063 | 4 | sgd  |


![katib-experiment-result.png](./img/katib-experiment-result.png)

You can also click trail name to check Trial data.

> Note: All rest examples are different optimization algorithms.  
> The way to submit the job and check job lifecycle is same as random-search-example we did.

# Grid Search

In [4]:
!pygmentize ./hyper-parameter-tuning/grid-example.yaml

[94mapiVersion[39;49;00m: [33m"[39;49;00m[33mkubeflow.org/v1alpha3[39;49;00m[33m"[39;49;00m
[94mkind[39;49;00m: Experiment
[94mmetadata[39;49;00m:
  [94mlabels[39;49;00m:
    [94mcontroller-tools.k8s.io[39;49;00m: [33m"[39;49;00m[33m1.0[39;49;00m[33m"[39;49;00m
  [94mname[39;49;00m: grid-example
[94mspec[39;49;00m:
  [94mobjective[39;49;00m:
    [94mtype[39;49;00m: maximize
    [94mgoal[39;49;00m: 0.99
    [94mobjectiveMetricName[39;49;00m: Validation-accuracy
    [94madditionalMetricNames[39;49;00m:
      - accuracy
  [94malgorithm[39;49;00m:
    [94malgorithmName[39;49;00m: grid
  [94mparallelTrialCount[39;49;00m: 3
  [94mmaxTrialCount[39;49;00m: 3
  [94mmaxFailedTrialCount[39;49;00m: 3
  [94mparameters[39;49;00m:
    - [94mname[39;49;00m: --lr
      [94mparameterType[39;49;00m: double
      [94mfeasibleSpace[39;49;00m:
        [94mmin[39;49;00m: [33m"[39;49;00m[33m0.001[39;49;00m[33m"[39;49;00m
    

In [5]:
!kubectl create -f ./hyper-parameter-tuning/grid-example.yaml

experiment.kubeflow.org/grid-example created


In [6]:
!kubectl describe experiment grid-example

Name:         grid-example
Namespace:    anonymous
Labels:       controller-tools.k8s.io=1.0
Annotations:  <none>
API Version:  kubeflow.org/v1alpha3
Kind:         Experiment
Metadata:
  Creation Timestamp:  2020-09-26T22:57:42Z
  Finalizers:
    update-prometheus-metrics
  Generation:        2
  Resource Version:  37877
  Self Link:         /apis/kubeflow.org/v1alpha3/namespaces/anonymous/experiments/grid-example
  UID:               72b65926-e1a7-4491-bc69-2933eb94cb58
Spec:
  Algorithm:
    Algorithm Name:        grid
    Algorithm Settings:    <nil>
  Max Failed Trial Count:  3
  Max Trial Count:         3
  Metrics Collector Spec:
    Collector:
      Kind:  StdOut
  Objective:
    Additional Metric Names:
      accuracy
    Goal:                   0.99
    Objective Metric Name:  Validation-accuracy
    Type:                   maximize
  Parallel Trial Count:     3
  Parameters:
    Feasible Space:
      Max:           0.01
      Min:           0.

# Bayesian

BayesOpt: A toolbox for bayesian optimization, experimental design and stochastic bandits.

In [7]:
!pygmentize ./hyper-parameter-tuning/bayesopt-example.yaml

[94mapiVersion[39;49;00m: [33m"[39;49;00m[33mkubeflow.org/v1alpha3[39;49;00m[33m"[39;49;00m
[94mkind[39;49;00m: Experiment
[94mmetadata[39;49;00m:
  [94mlabels[39;49;00m:
    [94mcontroller-tools.k8s.io[39;49;00m: [33m"[39;49;00m[33m1.0[39;49;00m[33m"[39;49;00m
  [94mname[39;49;00m: bayesopt-example
[94mspec[39;49;00m:
  [94mobjective[39;49;00m:
    [94mtype[39;49;00m: maximize
    [94mgoal[39;49;00m: 0.99
    [94mobjectiveMetricName[39;49;00m: Validation-accuracy
    [94madditionalMetricNames[39;49;00m:
      - accuracy
  [94malgorithm[39;49;00m:
    [94malgorithmName[39;49;00m: bayesianoptimization
    [94malgorithmSettings[39;49;00m:
      - [94mname[39;49;00m: [33m"[39;49;00m[33mrandom_state[39;49;00m[33m"[39;49;00m
        [94mvalue[39;49;00m: [33m"[39;49;00m[33m10[39;49;00m[33m"[39;49;00m
  [94mparallelTrialCount[39;49;00m: 3
  [94mmaxTrialCount[39;49;00m: 3
  [94mmaxFailedTrialCount[39;49;00m: 3

In [8]:
!kubectl create -f ./hyper-parameter-tuning/bayesopt-example.yaml

experiment.kubeflow.org/bayesopt-example created


In [9]:
!kubectl describe experiment bayesopt-example

Name:         bayesopt-example
Namespace:    anonymous
Labels:       controller-tools.k8s.io=1.0
Annotations:  <none>
API Version:  kubeflow.org/v1alpha3
Kind:         Experiment
Metadata:
  Creation Timestamp:  2020-09-26T22:57:46Z
  Finalizers:
    update-prometheus-metrics
  Generation:        1
  Resource Version:  37925
  Self Link:         /apis/kubeflow.org/v1alpha3/namespaces/anonymous/experiments/bayesopt-example
  UID:               98d99397-b10d-4b49-8bb9-703f13d8d284
Spec:
  Algorithm:
    Algorithm Name:  bayesianoptimization
    Algorithm Settings:
      Name:                random_state
      Value:               10
  Max Failed Trial Count:  3
  Max Trial Count:         3
  Metrics Collector Spec:
    Collector:
      Kind:  StdOut
  Objective:
    Additional Metric Names:
      accuracy
    Goal:                   0.99
    Objective Metric Name:  Validation-accuracy
    Type:                   maximize
  Parallel Trial Count:     3
  Par

# Navigate to Katib to Monitor All Hyper-Parameter Tuning Jobs
![katib-experiment-selection.png](./img/katib-experiment-selection.png)