# Chapter 8: Play

This notebook contains an implementation of training an agent for the [OpenAI Gym `CarRacing-v0` environment](https://gym.openai.com/envs/CarRacing-v0/).

## World Model

This notebook will be using the [World Model architecture](https://arxiv.org/abs/1803.10122) to train a model for the `CarRacing-v0` environment using the model's own generated "dream" of the environment. The code is based on the implementation in [this repository](https://github.com/AppliedDataSciencePartners/WorldModels).

The model is broken up into 3 main components: a variational autoencoder (VAE), a recurrent neural network with a mixture density network (MDN-RNN), and finally a controller.

### The Variational Autoencoder

The VAE will be trained first to encode the observations of different game states into a into a normally distributed, lower-dimensional latent space.

### The MDN-RNN

The MDN-RNN is trained after the VAE. Its goal is to predict the distribution of the next possible state in the latent space and the future reward at that state using the VAE's encoding, the most recent action, and the current reward as input. It consists of an LSTM network and a mixture-density network (MDN) output layer allows the next state could be sampled from numerous different normal distributions.

### The Controller

The controller is a densely connected neural network whose input is the concatenation of the output of the VAE and the hidden state of the LSTM network. The network's 3 output neurons represent the 3 possible actions the agent can take (steer, accelerate, brake).

## Setup

In [0]:
!pip3 install Box2D gym

In [0]:
!apt-get install -y xvfb python-opengl > /dev/null 2>&1

In [0]:
!pip3 install pyvirtualdisplay

In [0]:
import os
from google.colab import drive

drive.mount('/content/gdrive/')
base_dir = '/content/gdrive/My Drive/gdl_models/world/'
rollout_dir = os.path.join(base_dir, 'rollout/')
vae_weights_dir = os.path.join(base_dir, 'vae/')

In [0]:
from pyvirtualdisplay import Display

display = Display(visible=0, size=(300, 300))
display.start()

In [0]:
Z_DIM = 32

## Generating the Rollout Data

Below is code that will generate the _rollout data_, data made up of observations of an agent acting randomly in the environment.

In [0]:
import gym
import numpy as np
import time

def scale_observation(obs):
  """Scale observation pixel values to [0, 1]."""
  return obs.astype('float32') / 255.0


def collect_rollout_data(total_episodes=1000, timesteps=300,
                         action_refresh_rate=20):
  """Collect the rollout data for training the VAE."""
  env = gym.make('CarRacing-v0')

  for s in range(total_episodes):
    print('Running episode:\t', s)
    episode_id = str(int(time.time()))
    filename = os.path.join(rollout_dir, episode_id + '.npz')
    obs = env.reset()
    env.render()

    observations = []
    actions = []
    rewards = []
    done_sequence = []

    reward = -0.1
    done = False

    for t in range(timesteps):
      if t % action_refresh_rate == 0:
        action = env.action_space.sample()
      observations.append(scale_observation(obs))
      actions.append(action)
      rewards.append(reward)
      done_sequence.append(done)

      obs, reward, done, info = env.step(action)
      env.render()

    np.savez_compressed(filename, obs=observations, action=actions,
                        reward=rewards, done=done_sequence)
  env.close()

In [0]:
collect_rollout_data()

## Implementing and Training the VAE

Below we will implement the variational autoencoder (VAE) this model will use to encode the game state into a normal distribution in a lower-dimensional latent space.

In [5]:
%tensorflow_version 1.x
from tensorflow.keras.layers import (Input, Conv2D, BatchNormalization,
                                     LeakyReLU, Dropout, Flatten, Dense,
                                     Lambda, Reshape, Conv2DTranspose,
                                     Activation)
import tensorflow.keras.backend as K
from tensorflow.keras.models import Model
import numpy as np
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import ModelCheckpoint, LearningRateScheduler
import matplotlib.pyplot as plt


def sampling(args):
  """Sample an encoding from the learned distribution."""
  mu, log_var = args
  return mu + K.exp(log_var / 2) * K.random_normal(shape=K.shape(mu))


def step_decay_schedule(initial_lr, decay_factor=0.5, step_size=1):
  """Create a LearningRateScheduler callback to decay the learning rate during training."""
  def schedule(epoch):
    return initial_lr * (decay_factor ** np.floor(epoch/step_size))
  return LearningRateScheduler(schedule)


class VAE():
  """Implements a varational autoencoder (VAE) using Keras."""

  def __init__(self, input_shape, encoder_conv_filters,
               encoder_conv_kernel_size, encoder_conv_strides,
               encoder_activations, decoder_conv_filters,
               decoder_conv_kernel_size, decoder_conv_strides,
               decoder_activations, z_dim, use_batch_normalization=False,
               use_dropout=False, dropout_rate=0.25):
    encoder_input = Input(shape=input_shape, name='encoder_input')
    x = encoder_input
    for i in range(len(encoder_conv_kernel_size)):
      x = Conv2D(filters=encoder_conv_filters[i],
                 kernel_size=encoder_conv_kernel_size[i],
                 strides=encoder_conv_strides[i], padding='same',
                 name='encoder_conv_{}'.format(i + 1))(x)
      if use_batch_normalization:
        x = BatchNormalization()(x)
      if encoder_activations[i] == 'lrelu':
        x = LeakyReLU()(x)
      else:
        x = Activation(encoder_activations[i])(x)
      if use_dropout:
        x = Dropout(rate=dropout_rate)(x)
    shape_before_flattening = K.int_shape(x)[1:]
    x = Flatten()(x)
    self.z_dim = z_dim
    self.mu = Dense(z_dim, name='mu')(x)
    self.log_var = Dense(z_dim, name='log_var')(x)
    self.encoder_mu_log_var = Model(encoder_input, (self.mu, self.log_var))
    encoder_output = Lambda(sampling,
                            name='encoder_output')([self.mu, self.log_var])
    self.encoder = Model(encoder_input, encoder_output)

    decoder_input = Input(shape=(z_dim,), name='decoder_input')
    x = Dense(np.prod(shape_before_flattening))(decoder_input)
    x = Reshape(shape_before_flattening)(x)
    for i in range(len(decoder_conv_kernel_size)):
      x = Conv2DTranspose(filters=decoder_conv_filters[i],
                          kernel_size=decoder_conv_kernel_size[i],
                          strides=decoder_conv_strides[i], padding='same',
                          name='decoder_conv_t_{}'.format(i + 1))(x)
      if i < len(decoder_conv_kernel_size) - 1:
        if use_batch_normalization:
          x = BatchNormalization()(x)
      if decoder_activations[i] == 'lrelu':
        x = LeakyReLU()(x)
      else:
        x = Activation(decoder_activations[i])(x)
      if use_dropout and i < len(decoder_conv_kernel_size) - 1:
          x = Dropout(rate=dropout_rate)(x)
    decoder_output = x
    self.decoder = Model(decoder_input, decoder_output)
    self.model = Model(encoder_input, self.decoder(encoder_output))
    self.compiled = False
    self.learning_rate = None

  def compile(self, learning_rate, r_loss_factor):
    """Compile the model."""
    self.learning_rate = learning_rate
    if self.compiled:
      return
    opt = Adam(lr=learning_rate)

    def mse(y_act, y_pred):
      return r_loss_factor * K.mean(K.square(y_act - y_pred), axis=(1, 2, 3))

    def kl_divergence(y_act, y_pred):
      return -0.5 * K.sum(
        1 + self.log_var - K.square(self.mu) - K.exp(self.log_var), axis=1)

    def loss(y_act, y_pred):
      return mse(y_act, y_pred) + kl_divergence(y_act, y_pred)
    
    self.model.compile(opt, loss=loss, metrics=[mse, kl_divergence],
                       experimental_run_tf_function=False)
    self.compiled = True

  def fit_with_generator(self, data_flow, epochs, steps_per_epoch,
                         checkpoint_path=None, lr_decay=1, initial_epoch=0,):
    if not self.compiled:
      raise Exception('Model not compiled')
    if initial_epoch > 0:
      self.load(checkpoint_path + 'weights_{:03d}.hdf5'.format(initial_epoch))
    lr_sched = step_decay_schedule(initial_lr=self.learning_rate,
                                   decay_factor=lr_decay, step_size=1)
    callbacks = [lr_sched]
    if checkpoint_path:
      callbacks.append(ModelCheckpoint(
          filepath=checkpoint_path + 'weights.hdf5', verbose=1,
          save_weights_only=True))
      callbacks.append(ModelCheckpoint(
          filepath=checkpoint_path + 'weights_{epoch:03d}.hdf5', verbose=1,
          save_weights_only=True))
    self.model.fit_generator(data_flow, epochs=epochs, shuffle=True,
                             callbacks=callbacks, initial_epoch=initial_epoch,
                             steps_per_epoch=steps_per_epoch)

TensorFlow 1.x selected.


I will initialize the model with mostly the same hyperparameters are the [original code](https://github.com/AppliedDataSciencePartners/WorldModels/blob/master/vae/arch.py), but with some modifications to see how they impact the performance of the overall model.

In [0]:
LEARNING_RATE = 0.0001

vae = VAE(input_shape=(96, 96, 3),
          encoder_conv_filters=(32, 64, 64, 128),
          encoder_conv_kernel_size=(4, 4, 4, 4),
          encoder_conv_strides=(2, 2, 2, 2),
          encoder_activations=('relu', 'relu', 'relu', 'relu'),
          decoder_conv_filters=(64, 64, 32, 3),
          decoder_conv_kernel_size=(5, 5, 6, 6),
          decoder_conv_strides=(2, 2, 2, 2),
          decoder_activations=('relu', 'relu', 'relu', 'sigmoid'),
          z_dim=Z_DIM)
vae.compile(LEARNING_RATE, r_loss_factor=1000)

In [7]:
vae.model.summary()

Model: "model_3"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
encoder_input (InputLayer)      [(None, 96, 96, 3)]  0                                            
__________________________________________________________________________________________________
encoder_conv_1 (Conv2D)         (None, 48, 48, 32)   1568        encoder_input[0][0]              
__________________________________________________________________________________________________
activation (Activation)         (None, 48, 48, 32)   0           encoder_conv_1[0][0]             
__________________________________________________________________________________________________
encoder_conv_2 (Conv2D)         (None, 24, 24, 64)   32832       activation[0][0]                 
____________________________________________________________________________________________

In [0]:
BATCH_SIZE = 100
EPOCHS = 10
N_IMGS = 300 * len(os.listdir(rollout_dir))
STEPS_PER_EPOCH = N_IMGS // BATCH_SIZE
N_LOADS_PER_BATCH = 300 // BATCH_SIZE
IMAGE_SIZE = (96, 96)

def vae_training_data():
  """Load the VAE training data."""
  fnames = os.listdir(rollout_dir)
  fnames.sort()
  while True:
    for fname in fnames:
      new_data = np.load(rollout_dir + fname)['obs']
      data = np.zeros((BATCH_SIZE, *IMAGE_SIZE, 3))
      for i in range(N_LOADS_PER_BATCH):
        data[:,:,:,:] = new_data[i * BATCH_SIZE:(i + 1) * BATCH_SIZE, :, :, :]
        yield data, data

In [0]:
X_train = vae_training_data()

In [0]:
vae.fit_with_generator(X_train, epochs=EPOCHS,
                       steps_per_epoch=STEPS_PER_EPOCH,
                       checkpoint_path=vae_weights_dir)

### Analyzing the VAE

First we will analyze how the VAE reconstructs images from the training set.

In [0]:
vae.model.load_weights(vae_weights_dir + 'weights.hdf5')

In [0]:
X_train = vae_training_data()
next(X_train)
next(X_train)
batch, _ = next(X_train)

In [0]:
x = batch[90]
plt.imshow(x)

In [0]:
y = vae.model.predict([[x]])[0]
plt.imshow(y)

Another way to test the performance of an autonecoder is to decode randomly sampled noise from the latent space.

In [0]:
y = plt.imshow(
    vae.decoder.predict(np.random.normal(0.0, 1.0, size=(1, Z_DIM)))[0])