## Agents on Kubeflow 🤓

In this tutorial we will be training a reinforcement learning agent from the [tensorflow/agents](https://github.com/tensorflow/agents) project on Kubernetes using [Kubeflow](https://github.com/google/kubeflow).

The task the agent will be learning to perform is to operate a Kuka Robotics arm simulated in the OpenAI Gym Bullet Physics 'KukaBulletEnv-v0' environment. Feel free to [skip to the end](http://localhost:8888/notebooks/kubeflow-rl/apps/agents_ppo/demo.ipynb#Rendering-the-model) to see what this will look like!

### Setup

We need to create a Google Cloud Storage bucket to store job logs as well as a unique subdirectory of that bucket to store logs for this particular run. With the following we first create the GCS bucket then generate the path of a log dir to use in a later step.

In [21]:
%%bash
get_project_id() {
  project=$(gcloud config get-value project 2> /dev/null)
  if [[ -z "$project" ]]; then
      >&2 echo "Couldn't load a gcloud project ID!"
  fi
  echo "$project"
}

export GCLOUD_PROJECT_ID=$(get_project_id)
gsutil mb gs://${GCLOUD_PROJECT_ID}-kf

Creating gs://kubeflow-rl-kf/...


This demo will assume your cluster already has the namespace "rl" but if it does not you can easily create it using `kubectl create namespace rl`.

### Training

The objective of the training phase is to learn the parameterization of our model that confers a high level of performance on the provided task. Here we'll launch and monitor a job.

#### Launching the TFJob

We'll use [ksonnet](https://ksonnet.io/) to parameterize and apply a TFJob configuration (i.e. run a job). Here you can change the image to be a custom job image, such as one built and deployed with build.sh, or use the one provided here if you only want to change parameters. Below we'll display the templated job YAML for reference.

In [6]:
%%bash

get_project_id() {
  project=$(gcloud config get-value project 2> /dev/null)
  if [[ -z "$project" ]]; then
      >&2 echo "Couldn't load a gcloud project ID!"
  fi
  echo "$project"
}

cd ../../rl-app

# HPARAM_ID=ml2k-ue30-kuka
# HPARAM_ID=loglocal-kuka
HPARAM_ID=kuka

JOB_SALT=`date | shasum -a 256 | cut -c1-8`
JOB_NAME=`echo ${HPARAM_ID}-${JOB_SALT} | tr '_' '-'`
LOG_DIR=gs://$(get_project_id)-kf/jobs/${JOB_NAME}

# 1.4.1, agents-distributed
#IMAGE=gcr.io/kubeflow-rl/agents-ppo:cpu-7bd4bf03

# 1.4.1, agents propper, MTS
IMAGE=gcr.io/kubeflow-rl/agents-ppo:cpu-cdf59ece

ks param set agents-ppo env "KukaBulletEnv-v0"

ks param set agents-ppo run_mode train
ks param set agents-ppo gcp_project kubeflow-rl
ks param set agents-ppo num_cpu 31
ks param set agents-ppo num_agents 30
ks param set agents-ppo sync_replicas False
ks param set agents-ppo steps 4e7
ks param set agents-ppo update_every 30
ks param set agents-ppo max_length 1000
ks param set agents-ppo eval_episodes 25

# Trigger an async render job every 10 minutes
# ks param set agents-ppo render_secs 600
ks param set agents-ppo render_secs 10

ks param set agents-ppo algorithm "agents.ppo.PPOAlgorithm"
ks param set agents-ppo network "agents.scripts.networks.feed_forward_gaussian"

ks param set agents-ppo job_tag ${JOB_SALT}
ks param set agents-ppo logdir ${LOG_DIR}
ks param set agents-ppo name ${JOB_NAME}
ks param set agents-ppo image ${IMAGE}

ks apply gke -c agents-ppo

Parameter 'env' successfully set to '"KukaBulletEnv-v0"' for component 'agents-ppo'
Parameter 'run_mode' successfully set to '"train"' for component 'agents-ppo'
Parameter 'gcp_project' successfully set to '"kubeflow-rl"' for component 'agents-ppo'
Parameter 'num_cpu' successfully set to '31' for component 'agents-ppo'
Parameter 'num_agents' successfully set to '30' for component 'agents-ppo'
Parameter 'sync_replicas' successfully set to '"False"' for component 'agents-ppo'
Parameter 'steps' successfully set to '4e7' for component 'agents-ppo'
Parameter 'update_every' successfully set to '30' for component 'agents-ppo'
Parameter 'max_length' successfully set to '1000' for component 'agents-ppo'
Parameter 'eval_episodes' successfully set to '25' for component 'agents-ppo'
Parameter 'algorithm' successfully set to '"agents.ppo.PPOAlgorithm"' for component 'agents-ppo'
Parameter 'network' successfully set to '"agents.scripts.networks.feed_forward_gaussian"' for component 'agents-ppo'
Para

Now we can list tfjobs and see that a job has been created.

In [7]:
%%bash
kubectl get tfjobs -n rl

NAME            AGE
kuka-5eaa9f69   3s


#### Monitoring training

The IDs, status, and other metadata of pods involved in the training job can be displayed using the following:

In [10]:
%%bash
kubectl get pods -n rl

NAME                                READY     STATUS    RESTARTS   AGE
kuka-5eaa9f69-master-mzfd-0-xrq5m   1/1       Running   0          15s


Logs from a specific pod can be displayed with the following (or streamed by adding the --follow flag):

In [18]:
%%bash
kubectl logs kuka-5eaa9f69-master-mzfd-0-xrq5m -n rl

INFO:tensorflow:Start a new run and write summaries and checkpoints to gs://kubeflow-rl-kf/jobs/kuka-5eaa9f69.
2018-01-22 20:03:45.566145: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
E0122 20:03:45.571601192       1 ev_epoll1_linux.c:1051]     grpc epoll fd: 3
2018-01-22 20:03:45.582069: I tensorflow/core/distributed_runtime/rpc/grpc_channel.cc:215] Initialize GrpcChannelCache for job master -> {0 -> localhost:2222}
2018-01-22 20:03:45.587284: I tensorflow/core/distributed_runtime/rpc/grpc_server_lib.cc:324] Started server with target: grpc://localhost:2222
{'algorithm': <class 'agents.ppo.algorithm.PPOAlgorithm'>,
 'debug': True,
 'discount': 0.995,
 'dump_dependency_versions': False,
 'env': 'KukaBulletEnv-v0',
 'env_processes': True,
 'eval_episodes': 25,
 'hparam_set_id': 'pybullet_kuka_ff',
 'init_logstd': -1,
 'init_mean_factor': 0.1,
 'kl_cutoff_coef': 1000,
 '

#### Deleting jobs

All tfjobs can be cleared from your cluster with the following:

In [5]:
%%bash
kubectl delete tfjobs --all -n rl

tfjob "kuka-b80905bc" deleted


### Rendering the model

#### Initiating a rendering job directly

Launching a rendering job is as simple as the following:

In [51]:
%%bash

cd ../../rl-app

JOB_SALT=`date | shasum -a 256 | cut -c1-8`
JOB_NAME=`echo render-${JOB_SALT} | tr '_' '-'`
ks param set agents-ppo name ${JOB_NAME}

# To render for a specific log dir that may not have been the last job run you can set the log dir like so
ks param set agents-ppo logdir gs://kubeflow-rl-kf/jobs/kuka-8113eb39
ks param set agents-ppo num_cpu 1

ks param set agents-ppo run_mode render
ks apply gke -c agents-ppo

Parameter 'name' successfully set to '"render-c1c2fff3"' for component 'agents-ppo'
Parameter 'logdir' successfully set to '"gs://kubeflow-rl-kf/jobs/kuka-b903c640"' for component 'agents-ppo'
Parameter 'num_cpu' successfully set to '1' for component 'agents-ppo'
Parameter 'run_mode' successfully set to '"render"' for component 'agents-ppo'
Updating tfjobs rl.render-c1c2fff3
Creating non-existent tfjobs rl.render-c1c2fff3


In [72]:
%%bash
kubectl get pods -n rl --show-all

NAME                                READY     STATUS    RESTARTS   AGE
kuka-8113eb39-master-08aq-0-lngml   1/1       Running   0          3m


In [105]:
%%bash
kubectl logs render-c1c2fff3-master-ndo9-0-lrgkr -n rl

Error from server (NotFound): pods "render-c1c2fff3-master-ndo9-0-lrgkr" not found


#### Inspecting the result

When the job is complete there will be a subdirectory of the log dir named "render" with a number of short videos of episodes of the agent performing the grasping task. Here's an example of what one of those looks like in a well-trained model.

In [9]:
import io
import base64
from IPython.display import HTML

# Replace with the 
mp4_path = 'render.mp4'

video = io.open(mp4_path, 'r+b').read()
encoded = base64.b64encode(video)
HTML(data='''<video alt="test" controls>
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii')))

### Great job! 🎉🎉🎉

If this is your first time working with these technologies you might be interested in some suggestions of good next steps. Here are some ideas:
- Try training with some other learning environments (from the ID fields [here](https://github.com/bulletphysics/bullet3/blob/master/examples/pybullet/gym/pybullet_envs/__init__.py)) and tweet your results! E.g.
    - RacecarBulletEnv-v0
    - MinitaurBulletDuckEnv-v0
    - HalfCheetahBulletEnv-v0
- Take a shot at implementing your own gym learning environment and repeat the above.