# Hyperparameter Tuning with Katib (Catboost)

## Introduction

Hyperparameter tuning is the process of optimizing a model's hyperparameter values in order to maximize the predictive quality of the model.
Examples of such hyperparameters are the learning rate, neural architecture depth (layers) and width (nodes), epochs, batch size, dropout rate, and activation functions.
These are the parameters that are set prior to training; unlike the model parameters (weights and biases), these do not change during the process of training the model.


This notebook shows how you can create and configure an `Experiment` for a catboost model.
In terms of Kubernetes, such an experiment is a custom resource handled by the Katib operator.


## How to Specify Hyperparameters in Your Models
In order for Katib to be able to tweak hyperparameters it needs to know what these are called in the model.
Beyond that, the model must specify these hyperparameters either as regular (command line) parameters or as environment variables.
Since the model needs to be containerized, any command line parameters or environment variables must to be passed to the container that holds your model.
By far the most common and also the recommended way is to use command line parameters that are captured with [`argparse`](https://docs.python.org/3/library/argparse.html) or similar; the trainer (function) then uses their values internally.

## How to Expose Model Metrics as Objective Functions
By default, Katib collects metrics from the standard output of a job container by using a sidecar container.
In order to make the metrics available to Katib, they must be logged to [stdout](https://www.kubeflow.org/docs/components/hyperparameter-tuning/experiment/#metrics-collector) in the `key=value` format.
The job output will be redirected to `/var/log/katib/metrics.log` file.
This means that the objective function (for Katib) must match the metric's `key` in the models output.
It's therefore possible to define custom model metrics for your use case.

## How to Create Experiments
Before we proceed, let's set up a few basic definitions that we can re-use.
Note that you typically use (YAML) resource definitions for Kubernetes from the command line, but we shall show you how to do everything from a notebook, so that you do not have to exit your favourite environment at all!
Of course, if you are more familiar or comfortable with `kubectl` and the command line, feel free to use a local CLI or the embedded terminals from the Jupyter Lab launch screen.

In [23]:
CB_EXPERIMENT_FILE = "katib-catboost-experiment.yaml"

In [24]:
import re

from IPython.utils.capture import CapturedIO


def get_resource(captured_io: CapturedIO) -> str:
    """
    Gets a resource name from `kubectl apply -f <configuration.yaml>`.

    :param str captured_io: Output captured by using `%%capture` cell magic
    :return: Name of the Kubernetes resource
    :rtype: str
    :raises Exception: if the resource could not be created
    """
    out = captured_io.stdout
    matches = re.search(r"^(.+)\s+created", out)
    if matches is not None:
        return matches.group(1)
    else:
        raise Exception(f"Cannot get resource as its creation failed: {out}. It may already exist.")

### Catboost: Katib Catboost Experiment

The following YAML file describes an `Experiment` object:

In [28]:
%%writefile $CB_EXPERIMENT_FILE
apiVersion: kubeflow.org/v1alpha3
kind: Experiment
metadata:
  name: bank-churner3
  namespace: demo01
spec:
  parallelTrialCount: 3
  maxTrialCount: 12
  maxFailedTrialCount: 3
  objective:
    type: maximize
    goal: 0.99
    objectiveMetricName: accuracy
  algorithm:
    algorithmName: bayesianoptimization
  metricsCollectorSpec:
    kind: StdOut
  parameters:
    - name: n_estimators
      parameterType: categorical
      feasibleSpace:
        list:
          - 50
          - 100
    - name: learning_rate
      parameterType: categorical
      feasibleSpace:
        list:
          - 0.01
          - 0.1
    - name: max_depth
      parameterType: int
      feasibleSpace:
        min: "1"
        max: "2"
    - name: reg_lambda
      parameterType: int
      feasibleSpace:
        min: "1"
        max: "2"
    - name: od_wait
      parameterType: categorical
      feasibleSpace:
        list:
          - 50
          - 100
  trialTemplate:
    primaryContainerName: training-container
    trialParameters:
      - name: nEstimators
        description: N estimators for the model
        reference: n_estimators
      - name: learningRate
        description: Learning rate for the model
        reference: learning_rate
      - name: maxDepth
        description: Max depth for the model
        reference: max_depth
      - name: regLambda
        description: Reg lambda for the model
        reference: reg_lambda
      - name: odWait
        description: OD wait for the model
        reference: od_wait
    trialSpec:
      apiVersion: batch/v1
      kind: Job
      spec:
        template:
          metadata:
            annotations:
              sidecar.istio.io/inject: "false"
          spec:
            containers:
              - name: training-container
                image: "docker.io/dampolo/bank-churner:newest3"
                command:
                  - python3
                  - -u
                  - /bank-churners.py
                args:
                  - "--n_estimators=${trialParameters.nEstimators}"
                  - "--learning_rate=${trialParameters.learningRate}"
                  - "--max_depth=${trialParameters.maxDepth}"
                  - "--reg_lambda=${trialParameters.regLambda}"
                  - "--od_wait=${trialParameters.odWait}"
            restartPolicy: Never

Overwriting katib-catboost-experiment.yaml
