# HyperParameter tunning using  CMA-ES

In this example you will deploy 3 Katib Experiments with Covariance Matrix Adaptation Evolution Strategy (CMA-ES) using Jupyter Notebook and Katib SDK. These Experiments have various resume policies.

The notebook shows how to create, get, check status and delete an Experiment.

# Install required package

In [1]:
pip install kubeflow-katib==0.10.1

Defaulting to user installation because normal site-packages is not writeable
Collecting kubeflow-katib==0.10.1
  Downloading kubeflow_katib-0.10.1-py3-none-any.whl (113 kB)
[K     |████████████████████████████████| 113 kB 28.0 MB/s eta 0:00:01
[?25hCollecting table-logger>=0.3.5
  Downloading table_logger-0.3.6-py3-none-any.whl (14 kB)
Installing collected packages: table-logger, kubeflow-katib
Successfully installed kubeflow-katib-0.10.1 table-logger-0.3.6
You should consider upgrading via the '/usr/bin/python3 -m pip install --upgrade pip' command.[0m
Note: you may need to restart the kernel to use updated packages.


## Restart the Notebook kernel to use SDK package

In [None]:
from IPython.display import display_html
display_html("<script>Jupyter.notebook.kernel.restart()</script>",raw=True)

## Import required packages

In [1]:
import copy

from kubeflow.katib import KatibClient
from kubernetes.client import V1ObjectMeta
from kubeflow.katib import V1beta1Experiment
from kubeflow.katib import V1beta1AlgorithmSpec
from kubeflow.katib import V1beta1ObjectiveSpec
from kubeflow.katib import V1beta1FeasibleSpace
from kubeflow.katib import V1beta1ExperimentSpec
from kubeflow.katib import V1beta1ObjectiveSpec
from kubeflow.katib import V1beta1ParameterSpec
from kubeflow.katib import V1beta1TrialTemplate
from kubeflow.katib import V1beta1TrialParameterSpec

## Define your Experiment

You have to create your Experiment object before deploying it. This Experiment is similar to [this](https://github.com/kubeflow/katib/blob/master/examples/v1beta1/cmaes-example.yaml) example.

In [2]:
# Experiment name and namespace.
namespace = "anonymous"
experiment_name = "cmaes-example"

metadata = V1ObjectMeta(
    name=experiment_name,
    namespace=namespace
)

# Algorithm specification.
algorithm_spec=V1beta1AlgorithmSpec(
    algorithm_name="cmaes"
)

# Objective specification.
objective_spec=V1beta1ObjectiveSpec(
    type="maximize",
    goal= 0.99,
    objective_metric_name="Validation-accuracy",
    additional_metric_names=["Train-accuracy"]
)

# Experiment search space. In this example we tune learning rate, number of layer and optimizer.
parameters=[
    V1beta1ParameterSpec(
        name="lr",
        parameter_type="double",
        feasible_space=V1beta1FeasibleSpace(
            min="0.01",
            max="0.06"
        ),
    ),
    V1beta1ParameterSpec(
        name="num-layers",
        parameter_type="int",
        feasible_space=V1beta1FeasibleSpace(
            min="2",
            max="5"
        ),
    ),
    V1beta1ParameterSpec(
        name="optimizer",
        parameter_type="categorical",
        feasible_space=V1beta1FeasibleSpace(
            list=["sgd", "adam", "ftrl"]
        ),
    ),
]



# JSON template specification for the Trial's Worker Kubernetes Job.
trial_spec={
    "apiVersion": "batch/v1",
    "kind": "Job",
    "spec": {
        "template": {
            "metadata": {
                "annotations": {
                    "sidecar.istio.io/inject": "false"
                }
            },
            "spec": {
                "containers": [
                    {
                        "name": "training-container",
                        "image": "docker.io/kubeflowkatib/mxnet-mnist:v1beta1-91e4996",
                        "command": [
                            "python3",
                            "/opt/mxnet-mnist/mnist.py",
                            "--batch-size=64",
                            "--lr=${trialParameters.learningRate}",
                            "--num-layers=${trialParameters.numberLayers}",
                            "--optimizer=${trialParameters.optimizer}"
                        ]
                    }
                ],
                "restartPolicy": "Never"
            }
        }
    }
}

# Configure parameters for the Trial template.
trial_template=V1beta1TrialTemplate(
    primary_container_name="training-container",
    trial_parameters=[
        V1beta1TrialParameterSpec(
            name="learningRate",
            description="Learning rate for the training model",
            reference="lr"
        ),
        V1beta1TrialParameterSpec(
            name="numberLayers",
            description="Number of training model layers",
            reference="num-layers"
        ),
        V1beta1TrialParameterSpec(
            name="optimizer",
            description="Training model optimizer (sdg, adam or ftrl)",
            reference="optimizer"
        ),
    ],
    trial_spec=trial_spec
)


# Experiment object.
experiment = V1beta1Experiment(
    api_version="kubeflow.org/v1beta1",
    kind="Experiment",
    metadata=metadata,
    spec=V1beta1ExperimentSpec(
        max_trial_count=7,
        parallel_trial_count=3,
        max_failed_trial_count=3,
        algorithm=algorithm_spec,
        objective=objective_spec,
        parameters=parameters,
        trial_template=trial_template,
    )
)

# Define Experiments with resume policy

We will define another 2 Experiments with ResumePolicy = Never and ResumePolicy = FromVolume.

Experiment with _Never_ resume policy can't be resumed, the Suggestion resources will be deleted.

Experiment with _FromVolume_ resume policy can be resumed, volume is attached to the Suggestion. Suggestion's PVC and PV should be created for the Suggestion.

In [3]:
experiment_never_resume_name = "never-resume-cmaes"
experiment_from_volume_resume_name = "from-volume-resume-cmaes"

# Create new Experiments from the previous Experiment info.
# Define Experiment with never resume.
experiment_never_resume = copy.deepcopy(experiment)
experiment_never_resume.metadata.name = experiment_never_resume_name
experiment_never_resume.spec.resume_policy = "Never"
experiment_never_resume.spec.max_trial_count = 4

# Define Experiment with from volume resume.
experiment_from_volume_resume = copy.deepcopy(experiment)
experiment_from_volume_resume.metadata.name = experiment_from_volume_resume_name
experiment_from_volume_resume.spec.resume_policy = "FromVolume"
experiment_from_volume_resume.spec.max_trial_count = 4

You can print the Experiment's info to verify it before submission.

In [4]:
print(experiment.metadata.name)
print(experiment.spec.algorithm.algorithm_name)
print("-----------------")
print(experiment_never_resume.metadata.name)
print(experiment_never_resume.spec.resume_policy)
print("-----------------")
print(experiment_from_volume_resume.metadata.name)
print(experiment_from_volume_resume.spec.resume_policy)


cmaes-example
cmaes
-----------------
never-resume-cmaes
Never
-----------------
from-volume-resume-cmaes
FromVolume


# Create your Experiment

You have to create Katib client to use the SDK.

In [5]:
# Create client.
kclient = KatibClient()

# Create your Experiment.
kclient.create_experiment(experiment,namespace=namespace)

{'apiVersion': 'kubeflow.org/v1beta1',
 'kind': 'Experiment',
 'metadata': {'creationTimestamp': '2020-11-30T19:02:20Z',
  'generation': 1,
  'name': 'cmaes-example',
  'namespace': 'anonymous',
  'resourceVersion': '170779217',
  'selfLink': '/apis/kubeflow.org/v1beta1/namespaces/anonymous/experiments/cmaes-example',
  'uid': '6d8b16d3-3778-4fc1-ba3d-d524ec487450'},
 'spec': {'algorithm': {'algorithmName': 'cmaes'},
  'maxFailedTrialCount': 3,
  'maxTrialCount': 7,
  'metricsCollectorSpec': {'collector': {'kind': 'StdOut'}},
  'objective': {'additionalMetricNames': ['Train-accuracy'],
   'goal': 0.99,
   'metricStrategies': [{'name': 'Validation-accuracy', 'value': 'max'},
    {'name': 'Train-accuracy', 'value': 'max'}],
   'objectiveMetricName': 'Validation-accuracy',
   'type': 'maximize'},
  'parallelTrialCount': 3,
  'parameters': [{'feasibleSpace': {'max': '0.06', 'min': '0.01'},
    'name': 'lr',
    'parameterType': 'double'},
   {'feasibleSpace': {'max': '5', 'min': '2'},
    

Create other Experiments.

In [6]:
# Create Experiment with never resume.
kclient.create_experiment(experiment_never_resume,namespace=namespace)
# Create Experiment with from volume resume.
kclient.create_experiment(experiment_from_volume_resume,namespace=namespace)

{'apiVersion': 'kubeflow.org/v1beta1',
 'kind': 'Experiment',
 'metadata': {'creationTimestamp': '2020-11-30T19:02:29Z',
  'generation': 1,
  'name': 'from-volume-resume-cmaes',
  'namespace': 'anonymous',
  'resourceVersion': '170779317',
  'selfLink': '/apis/kubeflow.org/v1beta1/namespaces/anonymous/experiments/from-volume-resume-cmaes',
  'uid': '20f3cee6-818d-48ff-ad1e-d2c85cee51f1'},
 'spec': {'algorithm': {'algorithmName': 'cmaes'},
  'maxFailedTrialCount': 3,
  'maxTrialCount': 4,
  'metricsCollectorSpec': {'collector': {'kind': 'StdOut'}},
  'objective': {'additionalMetricNames': ['Train-accuracy'],
   'goal': 0.99,
   'metricStrategies': [{'name': 'Validation-accuracy', 'value': 'max'},
    {'name': 'Train-accuracy', 'value': 'max'}],
   'objectiveMetricName': 'Validation-accuracy',
   'type': 'maximize'},
  'parallelTrialCount': 3,
  'parameters': [{'feasibleSpace': {'max': '0.06', 'min': '0.01'},
    'name': 'lr',
    'parameterType': 'double'},
   {'feasibleSpace': {'max': 

# Get your Experiment

You can get your Experiment by name and receive required data.

In [7]:
exp = kclient.get_experiment(name=experiment_name, namespace=namespace)
print(exp)
print("-----------------\n")

# Get the max trial count and latest status.
print(exp["spec"]["maxTrialCount"])
print(exp["status"]["conditions"][-1])

{'apiVersion': 'kubeflow.org/v1beta1', 'kind': 'Experiment', 'metadata': {'creationTimestamp': '2020-11-30T19:02:20Z', 'finalizers': ['update-prometheus-metrics'], 'generation': 1, 'name': 'cmaes-example', 'namespace': 'anonymous', 'resourceVersion': '170779219', 'selfLink': '/apis/kubeflow.org/v1beta1/namespaces/anonymous/experiments/cmaes-example', 'uid': '6d8b16d3-3778-4fc1-ba3d-d524ec487450'}, 'spec': {'algorithm': {'algorithmName': 'cmaes'}, 'maxFailedTrialCount': 3, 'maxTrialCount': 7, 'metricsCollectorSpec': {'collector': {'kind': 'StdOut'}}, 'objective': {'additionalMetricNames': ['Train-accuracy'], 'goal': 0.99, 'metricStrategies': [{'name': 'Validation-accuracy', 'value': 'max'}, {'name': 'Train-accuracy', 'value': 'max'}], 'objectiveMetricName': 'Validation-accuracy', 'type': 'maximize'}, 'parallelTrialCount': 3, 'parameters': [{'feasibleSpace': {'max': '0.06', 'min': '0.01'}, 'name': 'lr', 'parameterType': 'double'}, {'feasibleSpace': {'max': '5', 'min': '2'}, 'name': 'num-

# Get all Experiments

You can get list of the current Experiments.

In [8]:
# Get names from the running Experiments.
exp_list = kclient.get_experiment(namespace=namespace)

for exp in exp_list["items"]:
    print(exp["metadata"]["name"])

cmaes-example
from-volume-resume-cmaes
never-resume-cmaes


# Get the current Experiment status

You can check the current Experiment status.

In [9]:
kclient.get_experiment_status(name=experiment_name, namespace=namespace)

'Running'

You can check if your Experiment is succeeded.

In [10]:
kclient.is_experiment_succeeded(name=experiment_name, namespace=namespace)

False

# List of the current Trials

You can get list of the current trials with the latest status.

In [11]:
# Trial list.
kclient.list_trials(name=experiment_name, namespace=namespace)

[{'name': 'cmaes-example-488pljjb', 'status': 'Running'},
 {'name': 'cmaes-example-bfsszl9p', 'status': 'Succeeded'},
 {'name': 'cmaes-example-cnr8grsw', 'status': 'Succeeded'},
 {'name': 'cmaes-example-tpvpv8wp', 'status': 'Running'},
 {'name': 'cmaes-example-xzvbcn4l', 'status': 'Running'}]

# Get the optimal HyperParameters

You can get the current optimal Trial from your Experiment. For the each metric you can see the max, min and latest value.

In [12]:
# Optimal HPs.
kclient.get_optimal_hyperparameters(name=experiment_name, namespace=namespace)

{'currentOptimalTrial': {'bestTrialName': 'cmaes-example-cnr8grsw',
  'observation': {'metrics': [{'latest': '0.976015',
     'max': '0.978802',
     'min': '0.958798',
     'name': 'Validation-accuracy'},
    {'latest': '0.992820',
     'max': '0.992820',
     'min': '0.920359',
     'name': 'Train-accuracy'}]},
  'parameterAssignments': [{'name': 'lr', 'value': '0.04511033252270099'},
   {'name': 'num-layers', 'value': '3'},
   {'name': 'optimizer', 'value': 'sgd'}]}}

# Status for the Suggestion objects

You can check the Suggestion object status for more information about resume status.

For Experiment with FromVolume you should be able to check created PVC and PV.

In [13]:
# Get the current Suggestion status for the never resume Experiment.
suggestion = kclient.get_suggestion(name=experiment_never_resume_name, namespace=namespace)

print(suggestion["status"]["conditions"][-1]["message"])
print("-----------------")

# Get the current Suggestion status for the from volume Experiment.
suggestion = kclient.get_suggestion(name=experiment_from_volume_resume_name, namespace=namespace)

print(suggestion["status"]["conditions"][-1]["message"])

Suggestion is succeeded, can't be restarted
-----------------
Suggestion is succeeded, suggestion volume is not deleted, can be restarted


# Delete your Experiments

You can delete your Experiments.

In [14]:
kclient.delete_experiment(name=experiment_name, namespace=namespace)
kclient.delete_experiment(name=experiment_never_resume_name, namespace=namespace)
kclient.delete_experiment(name=experiment_from_volume_resume_name, namespace=namespace)

{'apiVersion': 'kubeflow.org/v1beta1',
 'kind': 'Experiment',
 'metadata': {'creationTimestamp': '2020-11-30T19:02:29Z',
  'deletionGracePeriodSeconds': 0,
  'deletionTimestamp': '2020-11-30T19:20:53Z',
  'finalizers': ['update-prometheus-metrics'],
  'generation': 2,
  'name': 'from-volume-resume-cmaes',
  'namespace': 'anonymous',
  'resourceVersion': '170787823',
  'selfLink': '/apis/kubeflow.org/v1beta1/namespaces/anonymous/experiments/from-volume-resume-cmaes',
  'uid': '20f3cee6-818d-48ff-ad1e-d2c85cee51f1'},
 'spec': {'algorithm': {'algorithmName': 'cmaes'},
  'maxFailedTrialCount': 3,
  'maxTrialCount': 4,
  'metricsCollectorSpec': {'collector': {'kind': 'StdOut'}},
  'objective': {'additionalMetricNames': ['Train-accuracy'],
   'goal': 0.99,
   'metricStrategies': [{'name': 'Validation-accuracy', 'value': 'max'},
    {'name': 'Train-accuracy', 'value': 'max'}],
   'objectiveMetricName': 'Validation-accuracy',
   'type': 'maximize'},
  'parallelTrialCount': 3,
  'parameters': [