# HyperParameter Tuning

Hyperparameter optimization or tuning chooses a set of optimal hyperparameters, parameters that control the learning process, for a learning algorithm. The set of hyperparameters yield an optimal model that minimizes a pre-defined loss function on given test data. 

There are many approaches for HPO: 
- grid search
- random search
- bayesian optimization
- gradient-based optimization
- evolutionary optimization
- population based training


## Katib

The [Katib](https://github.com/kubeflow/katib) project is inspired by Google vizier. Katib is a scalable and flexible hyperparameter tuning framework and is tightly integrated with Kubernetes. It does not depend on any specific deep learning framework (such as TensorFlow, MXNet, or PyTorch).


* Two versions: v1alpha1 and v1alpha2
* Across both versions, there is support for random search, grid search, hyperband, bayesian optimization and NAS
* One metric is the objective metric, can have a goal for optimization value (no multimetric optimization)
* Int, double, discrete, and categorical parameter ranges, no options for scaling type (unless that is what “step” means)
* Option for early stopping (not part of the StudyJob (HPO) job request parameters)


In [None]:
In current Kubeflow version, katib is installed by default. 

### Examples 

We will run three Katib Experiments using `normal job`, `TFjob` and `PyTorchJob` 

### Random algorithm

In [None]:
!kubectl create -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha2/random-example.yaml

If you download the file and check manifest, you will see


```yaml
parameters:
- name: --lr
  parameterType: double
  feasibleSpace:
    min: "0.01"
    max: "0.03"
- name: --num-layers
  parameterType: int
  feasibleSpace:
    min: "2"
    max: "5"
- name: --optimizer
  parameterType: categorical
  feasibleSpace:
    list:
    - sgd
    - adam
    - ftrl
```


This job generates 3 hyperparameters, parameter type and range are also listed.

* --lr (Learning Rate) - type: double
* --num-layers (Number of NN Layer) - type: int
* --optimizer (optimizer) - type: categorical

The demo should start an experiment and run three jobs with different parameters. You can run following command to check job status.

When the `spec.Status.Condition` changes to Completed, the experiment is finished.


In [None]:
!kubectl -n kubeflow describe experiment random-example

### Tensorflow Operator

You can monitor your results in the Katib UI. If you installed Kubeflow using the deployment guide, you can access the Katib UI at `https://<your kubeflow endpoint>/katib/`

### PyTorch Operator

In [7]:
!kubectl create -f https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha2/pytorchjob-example.yaml

Error from server (Forbidden): error when creating "https://raw.githubusercontent.com/kubeflow/katib/master/examples/v1alpha2/pytorchjob-example.yaml": experiments.kubeflow.org is forbidden: User "system:serviceaccount:jiaxin:default-editor" cannot create resource "experiments" in API group "kubeflow.org" in the namespace "kubeflow"


In [None]:
!kubectl -n kubeflow describe experiment pytorchjob-example
