<a href="https://colab.research.google.com/github/DeepLearningVision-2019/a6_dl_pong/blob/master/Pong_Deep_Q_Learning.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Playing Pong with Deep Reinforcement Learning

---

Read the paper [Playing Atari with Deep Reinforcement Learning](https://arxiv.org/pdf/1312.5602.pdf) (the paper is also inside the 'Papers' folder in the course materials), and implement a model that can play atari games.

The goals of this project are the following:

- Read and understand the paper.
- Add a brief summary of the paper at the start of the notebook.
- Mention and implement the preprocessing needed; you can add your own steps if needed.
- Load an Atari environment from OpenAI Gym; start with Pong, and try with at least one more.
- Define the convolutional model needed for training.
- Apply deep q learning with your model.
- Use the model to play a game and show the result.

**Rubric:**

1. A summary of the paper was included. The summary covered what the paper does, and why, as well as the preprocessing steps and the model they introduced.
2. Read images from the environment, and performed the correct preprocessing steps.
3. Defined an agent class with the needed functions.
4. Defined the model within the agent class.
5. Trained the model with the Pong environment. Save the weights after each episode.
6. Test the model by making it play Pong.
7. Train and test the agent with another Atari environment of your choosing.


## Add a summary of the paper in this cell

### Basic installs and imports for Colab

In [1]:
#remove " > /dev/null 2>&1" to see what is going on under the hood
!pip install gym pyvirtualdisplay > /dev/null 2>&1
!apt-get install -y xvfb python-opengl ffmpeg > /dev/null 2>&1
!apt-get update > /dev/null 2>&1
!apt-get install cmake > /dev/null 2>&1
!pip install --upgrade setuptools 2>&1
!pip install ez_setup > /dev/null 2>&1
!pip install gym[atari] > /dev/null 2>&1
!pip install gym[box2d] > /dev/null 2>&1

Requirement already up-to-date: setuptools in /usr/local/lib/python3.6/dist-packages (40.8.0)


In [2]:
import gym
from gym import logger as gymlogger
from gym.wrappers import Monitor

import matplotlib
import matplotlib.pyplot as plt

import cv2
import numpy as np
import random, math

from keras import models, layers, optimizers

from collections import deque

import glob, io, base64

from IPython.display import HTML
from IPython import display as ipythondisplay
from pyvirtualdisplay import Display

gymlogger.set_level(40) #error only
%matplotlib inline

Using TensorFlow backend.


### Functions that wraps a video in colab

In [0]:
"""
Utility functions to enable video recording of gym environment and displaying it
To enable video, just do "env = wrap_env(env)""
"""

def show_video():
  mp4list = glob.glob('video/*.mp4')
  if len(mp4list) > 0:
    mp4 = mp4list[0]
    video = io.open(mp4, 'r+b').read()
    encoded = base64.b64encode(video)
    ipythondisplay.display(HTML(data='''<video alt="test" autoplay 
                loop controls style="height: 400px;">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii'))))
  else: 
    print("Could not find video")
    

def wrap_env(env):
  env = Monitor(env, './video', force=True)
  return env

In [4]:
display = Display(visible=0, size=(1400, 900))
display.start()

<Display cmd_param=['Xvfb', '-br', '-nolisten', 'tcp', '-screen', '0', '1400x900x24', ':1009'] cmd=['Xvfb', '-br', '-nolisten', 'tcp', '-screen', '0', '1400x900x24', ':1009'] oserror=None return_code=None stdout="None" stderr="None" timeout_happened=False>

In [6]:
# Loads the cartpole environment
env = wrap_env(gym.make('Pong-v0'))

state_size = env.observation_space.shape[0]
action_size = env.action_space.n

print(state_size, action_size)

batch_size = 32

n_episodes = 1001

210 6


In [7]:
observation = env.reset()

while True:
  
    env.render()
    
    #your agent goes here
    action = env.action_space.sample() 
         
    observation, reward, done, info = env.step(action) 

    if done: 
      break;
            
env.close()
show_video()

## Define the Deep Q learning Agent

In [0]:
class DQNAgent:
    
    def __init__(self, state_size, action_size):
      
        self.state_size = state_size
        self.action_size = action_size
        self.model = self._build_model()
        

    def _build_model(self):
        
        model = models.Sequential()
        model.add(layers.Dense(24, input_dim = self.state_size, activation='relu'))
        model.add(layers.Dense(self.action_size))
        
        model.compile(loss='mse', optimizer='adam')
        
        return model
    
    def remember(self, state, action, reward, next_state, done):
        '''
            state, action, reward at current time
            next_state is the state that occurs after the state-action
            done is if the episode ended
        '''
        pass
        
    def action(self, state):
        
        pass
        
    def train(self, batch_size):
        
        pass
    
           
    def load(self, name):
        self.model.load_weights(name)
        
    def save(self, name):
        self.model.save_weights(name)

In [23]:
agent = DQNAgent(state_size, action_size)
agent.model.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_3 (Dense)              (None, 24)                5064      
_________________________________________________________________
dense_4 (Dense)              (None, 6)                 150       
Total params: 5,214
Trainable params: 5,214
Non-trainable params: 0
_________________________________________________________________


### Needed preprocessing steps

In [0]:
def preprocessFrame(image):
  
  return image

## Training with the environment images

In [0]:
try:
    for e in range(n_episodes):
        
        state = env.reset()
        
        total_reward = 0
        done = False
        
        while not done:
            
            #env.render()
            
            # Takes a random action from the action space of the environment
            #action = agent.action(state)
            action = env.action_space.sample() 
            
            next_state, reward, done, info = env.step(action)
            
            # Define the reward for this problem
            reward = reward if not done else -10
            total_reward += reward
            
            state = next_state
        
        if e % 50 == 0:
            agent.save('{:04d}'.format(e) + 'hdf5')
                
        
finally:
    env.close()

In [0]:
!ls

0000hdf5  0200hdf5  0400hdf5  0600hdf5	0800hdf5  1000hdf5
0050hdf5  0250hdf5  0450hdf5  0650hdf5	0850hdf5  sample_data
0100hdf5  0300hdf5  0500hdf5  0700hdf5	0900hdf5  video
0150hdf5  0350hdf5  0550hdf5  0750hdf5	0950hdf5


### Test your model

In [0]:

env = wrap_env(gym.make('Pong-v0'))
agent.load('0700hdf5')

try:
      state = env.reset()
      state = np.reshape(state, [1, state_size])

      total_reward = 0
      done = False

      while not done:
      #for time in range(200):

          env.render()

          # Takes a random action from the action space of the environment
          action = agent.action(state)

          next_state, reward, done, info = env.step(action)

          reward = reward if not done else -10
          total_reward += reward

          next_state = np.reshape(next_state, [1, state_size])
          state = next_state
        
finally:
    env.close()       
    show_video()

## Train and test your agent with another atari environment