In [1]:
TF_EXPERIMENT_FILE = "katibhotel.yaml"

**Create Experiments**

In [2]:
import re

from IPython.utils.capture import CapturedIO


def get_resource(captured_io: CapturedIO) -> str:
    """
    Gets a resource name from `kubectl apply -f <configuration.yaml>`.

    :param str captured_io: Output captured by using `%%capture` cell magic
    :return: Name of the Kubernetes resource
    :rtype: str
    :raises Exception: if the resource could not be created
    """
    out = captured_io.stdout
    matches = re.search(r"^(.+)\s+created", out)
    if matches is not None:
        return matches.group(1)
    else:
        raise Exception(f"Cannot get resource as its creation failed: {out}. It may already exist.")

**For the experiment, we want to focus on the learning rate, batch-size and optimizer. The following YAML file describes an Experiment object:**

In [3]:
%%writefile $TF_EXPERIMENT_FILE
apiVersion: "kubeflow.org/v1beta1"
kind: Experiment
metadata:
  namespace: sooter
  name: hotelbook
spec:
  parallelTrialCount: 3
  maxTrialCount: 30
  maxFailedTrialCount: 3
  objective:
    type: maximize
    goal: 0.99
    objectiveMetricName: accuracy
  algorithm:
    algorithmName: bayesianoptimization
  metricsCollectorSpec:
    kind: StdOut
  parameters:
    - name: learning_rate
      parameterType: double
      feasibleSpace:
        min: "0.001"
        max: "0.01"
    - name: batch_size
      parameterType: int
      feasibleSpace:
        min: "80"
        max: "200"
    - name: optimizer
      parameterType: categorical
      feasibleSpace:
        list:
          - adam
          - sgd
  trialTemplate:
    primaryContainerName: tensorflow
    trialParameters:
      - name: learningRate
        description: Learning rate for the training model
        reference: learning_rate
      - name: batchSize
        description: Batch Size
        reference: batch_size
      - name: optimizer
        description: Training model optimizer (sdg, adam)
        reference: optimizer
    trialSpec:
      apiVersion: "kubeflow.org/v1"
      kind: TFJob
      spec:
        tfReplicaSpecs:
          Worker:
            replicas: 1
            restartPolicy: OnFailure
            template:
              metadata:
                annotations:
                  sidecar.istio.io/inject: "false"
              spec:
                containers:
                  - name: tensorflow
                    image: mavencodev/tf_hotel:v.0.2
                    command:
                      - "python"
                      - "/tfjob-hotel-demand.py"
                      - "--batch_size=${trialParameters.batchSize}"
                      - "--learning_rate=${trialParameters.learningRate}"
                      - "--optimizer=${trialParameters.optimizer}"

Writing katibhotel.yaml


**Run and Monitor Experiments**

To submit our experiment, we execute:

In [4]:
%%capture kubectl_output --no-stderr
! kubectl apply -f $TF_EXPERIMENT_FILE

**The cell magic grabs the output of the kubectl command and stores it in an object named kubectl_output. From there we can use the utility function we defined earlier:**

In [5]:
EXPERIMENT = get_resource(kubectl_output)

**To see the status, we can then run:**

In [6]:
! kubectl describe $EXPERIMENT

Name:         hotelbook
Namespace:    sooter
Labels:       <none>
Annotations:  <none>
API Version:  kubeflow.org/v1beta1
Kind:         Experiment
Metadata:
  Creation Timestamp:  2021-07-19T20:20:44Z
  Finalizers:
    update-prometheus-metrics
  Generation:  1
  Managed Fields:
    API Version:  kubeflow.org/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:algorithm:
          .:
          f:algorithmName:
        f:maxFailedTrialCount:
        f:maxTrialCount:
        f:metricsCollectorSpec:
          .:
          f:kind:
        f:objective:
          .:
          f:goal:
          f:objectiveMetricName:
          f:type:
        f:parallelTrialCount:
        f:parameters:
        f:trialTemplate:
          .:
          f:primaryContainerName:
          f:trialParameters:
          f:trialSpec:
            .:
            f:apiVersion:

**To get the list of created experiments, use the following command:**

In [7]:
! kubectl get experiments

NAME                  TYPE        STATUS   AGE
airline1              Succeeded   True     8h
airline2-end-to-end   Running     True     3h32m
airline3-end-to-end   Running     True     144m
hotelbook             Running     True     72s


**To get the list of created trials, use the following command:**

In [13]:
! kubectl get trials

NAME                           TYPE        STATUS   AGE
airline1-44c7fzhg              Succeeded   True     8h
airline1-5frqpdkw              Succeeded   True     7h46m
airline1-5xb6npd4              Succeeded   True     7h54m
airline1-6m5qnqrm              Succeeded   True     7h38m
airline1-7hm5hbn7              Succeeded   True     7h30m
airline1-7s9zjzz6              Succeeded   True     8h
airline1-7z2ncblx              Succeeded   True     7h54m
airline1-8d85jh8m              Succeeded   True     7h30m
airline1-8p8dlw5j              Succeeded   True     7h55m
airline1-8t4pzj4m              Succeeded   True     8h
airline1-9tbrwskt              Succeeded   True     7h38m
airline1-9v9prk2v              Succeeded   True     8h
airline1-b2xk89bs              Succeeded   True     7h46m
airline1-d6z989nh              Succeeded   True     8h
airline1-dhx22qg9              Succeeded   True     8h
airline1-h2sfsb7g              Succeeded   True     8h
airline1-hwm2p99f              Succee

**After the experiment is completed, use describe to get the best trial results:**

In [14]:
! kubectl describe $EXPERIMENT

Name:         hotelbook
Namespace:    sooter
Labels:       <none>
Annotations:  <none>
API Version:  kubeflow.org/v1beta1
Kind:         Experiment
Metadata:
  Creation Timestamp:  2021-07-19T20:20:44Z
  Finalizers:
    update-prometheus-metrics
  Generation:  1
  Managed Fields:
    API Version:  kubeflow.org/v1beta1
    Fields Type:  FieldsV1
    fieldsV1:
      f:metadata:
        f:annotations:
          .:
          f:kubectl.kubernetes.io/last-applied-configuration:
      f:spec:
        .:
        f:algorithm:
          .:
          f:algorithmName:
        f:maxFailedTrialCount:
        f:maxTrialCount:
        f:metricsCollectorSpec:
          .:
          f:kind:
        f:objective:
          .:
          f:goal:
          f:objectiveMetricName:
          f:type:
        f:parallelTrialCount:
        f:parameters:
        f:trialTemplate:
          .:
          f:primaryContainerName:
          f:trialParameters:
          f:trialSpec:
            .:
            f:apiVersion:

**Delete Katib Job Runs to Free up resources**

In [15]:
! kubectl delete -f $TF_EXPERIMENT_FILE

experiment.kubeflow.org "hotelbook" deleted



Check to see if the check to see if the pod is still up and running