## Agents on Kubeflow 🤓

In this tutorial we will be training a reinforcement learning agent from the [tensorflow/agents](https://github.com/tensorflow/agents) project on Kubernetes using [Kubeflow](https://github.com/google/kubeflow).

The task the agent will be learning to perform is to operate a Kuka Robotics arm simulated in the OpenAI Gym Bullet Physics 'KukaBulletEnv-v0' environment. When rendered that will look like this:

In [6]:
import io
import base64
from IPython.display import HTML

mp4_path = 'render.mp4'

video = io.open(mp4_path, 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii')))

This narrative consists of two phases:
    1. A phase of learning the model parameters necessary to perform the grasping task
    2. Rendering a video of the parameterized model performing the grasping task.

### Setup

We need to create a Google Cloud Storage bucket to store job logs as well as a unique subdirectory of that bucket to store logs for this particular run. With the following we first create the GCS bucket then generate the path of a log dir to use in a later step.

In [143]:
%%bash
get_project_id() {
  # From
  # Find the project ID first by DEVSHELL_PROJECT_ID (in Cloud Shell)
  # and then by querying the gcloud default project.
  local project="${DEVSHELL_PROJECT_ID:-}"
  if [[ -z "$project" ]]; then
    project=$(gcloud config get-value project 2> /dev/null)
  fi
  if [[ -z "$project" ]]; then
    >&2 echo "No default project was found, and DEVSHELL_PROJECT_ID is not set."
    >&2 echo "Please use the Cloud Shell or set your default project by typing:"
    >&2 echo "gcloud config set project YOUR-PROJECT-NAME"
  fi
  echo "$project"
}

GCLOUD_PROJECT_ID=$(get_project_id)
gsutil mb gs://${GCLOUD_PROJECT_ID}-k8s


LOG_DIR=gs://${GCLOUD_PROJECT_ID}-k8s/jobs/`date | shasum -a 256 | cut -c1-8`

echo Use this log dir when parameterizing the TfJob: $LOG_DIR

Use this log dir when parameterizing the TfJob: gs://kubeflow-rl-k8s/jobs/7bafff5b


Creating gs://kubeflow-rl-k8s/...
ServiceException: 409 Bucket kubeflow-rl-k8s already exists.


This demo will assume your cluster already has the namespace "rl" but if it does not you can easily create it using `kubectl create namespace rl`.

### Using a custom job image

To run a job customized beyond the available parameters you will need to build your trainer code into a docker container which you can do using the build script in the notebook directory as follows:

In [260]:
%%bash
sh ./build.sh

Sending build context to Docker daemon  1.178MB
Step 1/4 : FROM gcr.io/kubeflow-rl/kubeflow-rl-agents:cpu-149e9f4f
 ---> 84a211dfa9c0
Step 2/4 : ADD trainer /app/trainer/
 ---> Using cache
 ---> 76f26832973f
Step 3/4 : WORKDIR /app/
 ---> Using cache
 ---> 09c927adffc0
Step 4/4 : ENTRYPOINT python -m trainer.task
 ---> Using cache
 ---> 024ed132cbdb
Successfully built 024ed132cbdb
Successfully tagged gcr.io/kubeflow-rl/agents-ppo:cpu-b781e9b0
The push refers to a repository [gcr.io/kubeflow-rl/agents-ppo]
5d4bdfa0af38: Preparing
3263c8e6ae8c: Preparing
935451fb4387: Preparing
9c57d9d10093: Preparing
69a8d1bac507: Preparing
cca7884663e6: Preparing
c9c04a5fd1a3: Preparing
5d4dbb0c7791: Preparing
6a19be88e574: Preparing
adcfc17fe4eb: Preparing
8f196722f8c6: Preparing
eac59d81aaf0: Preparing
a09947e71dc0: Preparing
9c42c2077cde: Preparing
625c7a2a783b: Preparing
25e0901a71b8: Preparing
8aa4fcad5eeb: Preparing
cca7884663e6: Waiting
c9c04a5fd1a3: Waiting
5d4dbb0c7791: Waiting
6a19be88e574:

### Training

The objective of the training phase is to learn the parameterization of our model that confers a high level of performance on the provided task. Here we'll launch and monitor a job.

#### Launching the TFJob

We'll use [ksonnet](https://ksonnet.io/) to parameterize and apply (or submit) a TFJob configuration. Here you can change the image to be your custom job image or use the one provided here if you only want to change parameters. That can be done as follows:

In [261]:
%%bash
cd ../../rl-app

LOG_DIR=gs://kubeflow-rl-k8s/jobs/7bafff5b # Replace with your log dir from the setup step
IMAGE=gcr.io/kubeflow-rl/agents-ppo:cpu-b781e9b0

ks generate tf-job agents-ppo --namespace=rl \
    --num_gpus=0 --num_workers=1 --num_ps=0 \
    --image=$IMAGE --args=--logdir=${LOG_DIR}

# If you're interested the YAML-format TfJob config can be displayed using
# ks show gke -c agents-ppo

Writing component at 'components/agents-ppo'


The job can be created using the following:

In [264]:
%%bash

cd ../../rl-app

# Apply the job configuration (i.e. submit the job to run on the cluster)
# ks apply gke -c agents-ppo
# TODO: The above command doesn't always successfully create a tfjob. But in instances where it does not, using
# ks show gke -c agents-ppo |  kubectl create -f - does work. Also the kubeflow tf-job does not appear to yet
# have in its template a field to launch tensorboard so adding that in the following hacky way:
# JOB_YAML=/tmp/`date | shasum -a 256 | cut -c1-8`-job.yaml
# ks show gke -c agents-ppo > ${JOB_YAML}
# echo "  tensorBoard:" >> ${JOB_YAML}
# echo "    logDir: ${LOG_DIR}" >> ${JOB_YAML}
# cat ${JOB_YAML}
# Doesn't work so removing
ks show gke -c agents-ppo |  kubectl create -f -

tfjob "agents-ppo" created


Now we can list tfjobs and see that a job has been created.

In [279]:
%%bash
kubectl get tfjobs -n rl

NAME         AGE
agents-ppo   11m


#### Monitoring training

The IDs, status, and other metadata of pods involved in the training job can be displayed using the following:

In [266]:
%%bash
kubectl get pods -n rl --show-all

NAME                             READY     STATUS    RESTARTS   AGE
agents-ppo-master-pt46-0-c4r5w   1/1       Running   0          4s
agents-ppo-worker-pt46-0-rhbk8   1/1       Running   0          4s


Logs from a specific pod can be displayed with the following (or streamed by adding the --follow flag):

In [280]:
%%bash
kubectl logs agents-ppo-worker-pt46-0-rhbk8 -n rl

INFO:tensorflow:Tensorflow version: 1.3.0
INFO:tensorflow:Tensorflow git version: v1.3.0-rc2-20-g0787eee
INFO:tensorflow:=== using log dir: gs://kubeflow-rl-k8s/jobs/7bafff5b
INFO:tensorflow:Graph contains 867 trainable variables.
2018-01-09 23:52:58.446653: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-09 23:52:58.446722: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2018-01-09 23:52:58.446733: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2018-01-09 23:52:58.446743: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow libra

As part of the TFJob we started, because we included the `tensorBoard` field, a tensorboard instance will have been deployed. Deployments on kubernetes can be listed with the following:

In [270]:
%%bash
kubectl get deployments -n rl
# TODO: Despite adding the tensorBoard field to the job YAML above the tboard deployment does not launch.

No resources found.


Once we have the ID of our tensorboard deployment we can open tensorboard in our browser, after starting the kubernetes proxy with `kubectl proxy`, with the following command (templating in your deployment ID):

In [12]:
%%bash
TENSORBOARD_DEPLOYMENT_ID=[your tensorboard deployment ID]
open http://127.0.0.1:8001/api/v1/proxy/namespaces/default/services/${TENSORBOARD_DEPLOYMENT_ID}:80/

This will open tensorboard in a new browser tab.

#### Deleting jobs

All tfjobs can be cleared from your cluster with the following:

In [263]:
%%bash
kubectl delete tfjobs --all -n rl

tfjob "agents-ppo" deleted


### Rendering the model

In this section we will render an mp4 video of our parameterized model performing the robotic manipulation task. First we'll need to obtain a checkpoint of the model parameters from Google Cloud Storage.

In [275]:
%%bash
GCS_LOGS_PATH=gs://kubeflow-rl-k8s/jobs/7bafff5b

mkdir -p /tmp/kubeflow-agents-render
gsutil -m cp -r ${GCS_LOGS_PATH}/ /tmp/kubeflow-agents-render

#### Simulating the model

Using the local copy of the model checkpoint we can simulate the model performing the task with the following:

In [278]:
import agents
import pybullet_envs

log_dir = "/tmp/kubeflow-agents-render/7bafff5b"

agents.scripts.visualize.visualize(
    logdir=log_dir, outdir=log_dir, num_agents=1, num_episodes=5,
    checkpoint=None, env_processes=True)

ImportError: No module named pybullet_env

#### The result

The above will generate a number of mp4 videos of episodes of the agent performing the grasping task. These will be located in the out_dir specified in the render step above. Here's an example of what that will look like (TODO: Generate an example from a well trained model. This one doesn't pick stuff up.). You can display your own render here by changing the 'mp4_path' to reference the path of one of your renders.

In [2]:
import io
import base64
from IPython.display import HTML

# Replace with the 
mp4_path = 'render.mp4'

video = io.open(mp4_path, 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii')))

### Next actions

If this is your first time working with these technologies you might be interested in some suggestions of good next steps. Here are some ideas:
- Try training with some other learning environments and tweet your results!
- Take a shot at implementing your own gym learning environment and repeat the above.