# Chapter 8: Play

This notebook contains an implementation of training an agent for the [OpenAI Gym `CarRacing-v0` environment](https://gym.openai.com/envs/CarRacing-v0/).

## World Model

This notebook will be using the [World Model architecture](https://arxiv.org/abs/1803.10122) to train a model for the `CarRacing-v0` environment using the model's own generated "dream" of the environment. The code is based on the implementation in [this repository](https://github.com/AppliedDataSciencePartners/WorldModels).

The model is broken up into 3 main components: a variational autoencoder (VAE), a recurrent neural network with a mixture density network (MDN-RNN), and finally a controller.

### The Variational Autoencoder

The VAE will be trained first to encode the observations of different game states into a into a normally distributed, lower-dimensional latent space.

### The MDN-RNN

The MDN-RNN is trained after the VAE. Its goal is to predict the distribution of the next possible state in the latent space and the future reward at that state using the VAE's encoding, the most recent action, and the current reward as input. It consists of an LSTM network and a mixture-density network (MDN) output layer allows the next state could be sampled from numerous different normal distributions.

### The Controller

The controller is a densely connected neural network whose input is the concatenation of the output of the VAE and the hidden state of the LSTM network. The network's 3 output neurons represent the 3 possible actions the agent can take (steer, accelerate, brake).

## Setup

In [0]:
!pip3 install Box2D gym

In [0]:
!apt-get install -y xvfb python-opengl > /dev/null 2>&1

In [0]:
!pip3 install pyvirtualdisplay

In [0]:
import os
from google.colab import drive

drive.mount('/content/gdrive/')
base_dir = '/content/gdrive/My Drive/gdl_models/world/'
rollout_dir = os.path.join(base_dir, 'rollout/')

In [0]:
from pyvirtualdisplay import Display

display = Display(visible=0, size=(300, 300))
display.start()

## Generating the Rollout Data

Below is code that will generate the _rollout data_, data made up of observations of an agent acting randomly in the environment.

In [0]:
import gym
import numpy as np
import time

def scale_observation(obs):
  """Scale observation pixel values to [0, 1]."""
  return obs.astype('float32') / 255.0


def collect_rollout_data(total_episodes=200, timesteps=300,
                         action_refresh_rate=20):
  """Collect the rollout data for training the VAE."""
  env = gym.make('CarRacing-v0')

  for s in range(total_episodes):
    print('Running episode:\t', s)
    episode_id = str(int(time.time()))
    filename = os.path.join(rollout_dir, episode_id + '.npz')
    obs = env.reset()
    env.render()

    observations = []
    actions = []
    rewards = []
    done_sequence = []

    reward = -0.1
    done = False

    for t in range(timesteps):
      if t % action_refresh_rate == 0:
        action = env.action_space.sample()
      observations.append(scale_observation(obs))
      actions.append(action)
      rewards.append(reward)
      done_sequence.append(done)

      obs, reward, done, info = env.step(action)
      env.render()

    np.savez_compressed(filename, obs=observations, action=actions,
                        reward=rewards, done=done_sequence)
  env.close()

In [0]:
collect_rollout_data()

## Implementing and Training the VAE

Below we will implement the variational autoencoder (VAE) this model will use to encode the game state into a normal distribution in a lower-dimensional latent space.

In [0]:
# TODO