<a href="https://colab.research.google.com/github/Karthik982018/Karthik982018/blob/main/Deep_Q_Learning_SpaceInvaders.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Deep Reinforecement Learning For Space Invaders
**Please execute  the steps sequentially.**

1. For training the model all steps **except** **STEP 11** need to be executed sequentially.
2. For testing the model please upload the necessary weights files and run all the steps **except** **STEP 9** and **STEP 10**  

**The ROM file should be placed in the files directory in COLAB along with the model files.**

The link to download the model file can be found in the steps to run code section in the report.



**STEP 1) All the necessary installation needed to run and play the video game.**

In [None]:
# necessary libraries to realize this project, please install all
!pip install pyvirtualdisplay 
!apt-get install -y xvfb python-opengl ffmpeg 
!apt-get update 
!apt-get install cmake
!pip install --upgrade setuptools 
!pip install ez_setup 
!pip install tensorflow gym keras-rl2 gym[atari]


Reading package lists... Done
Building dependency tree       
Reading state information... Done
python-opengl is already the newest version (3.1.0+dfsg-1).
ffmpeg is already the newest version (7:3.4.8-0ubuntu0.2).
xvfb is already the newest version (2:1.19.6-1ubuntu4.10).
0 upgraded, 0 newly installed, 0 to remove and 48 not upgraded.
Hit:1 http://security.ubuntu.com/ubuntu bionic-security InRelease
Hit:2 https://cloud.r-project.org/bin/linux/ubuntu bionic-cran40/ InRelease
Ign:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  InRelease
Ign:4 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  InRelease
Hit:5 http://archive.ubuntu.com/ubuntu bionic InRelease
Hit:6 http://ppa.launchpad.net/c2d4u.team/c2d4u4.0+/ubuntu bionic InRelease
Hit:7 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64  Release
Hit:8 https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64  Release


**STEP 2) Importing necessary modules for creating an environmet ,creating a  model, creating the deep q learning agent and for visualising the agent  playing the game.**

In [None]:
# libraries used
# provides the environment for training games
import gym
# to generate random numbers
import random
# functionalities of multi dimensional arrays
import numpy as np 
# import sequential keras model forr building deep learning network
from tensorflow.keras.models import Sequential 
# importing different layer networks
from tensorflow.keras.layers import Dense, Flatten, Convolution2D 
# importing the adam optimizer
from tensorflow.keras.optimizers import Adam
# importing the DQNAgent
from rl.agents import DQNAgent 
# memory used by the DQNagent
from rl.memory import SequentialMemory 
# policies used for training the agent
from rl.policy import LinearAnnealedPolicy, EpsGreedyQPolicy 

from gym.wrappers import Monitor # to keep track of the training process
import glob # to deal with file paths
import io # to deal with various types of input/output 
import base64 # to encode the data
#to view the recorded game we install below libraries
from IPython.display import HTML 
from pyvirtualdisplay import Display
from IPython import display as ipythondisplay

**STEP 3) Utility function used to record the video of agent playing the  video game and to render the video.**

In [None]:
# we used below way of visualising the game, because game could not be viewed in google collab.

# creating an display object with the resolution 1400,900
display = Display(visible=0, size=(1400, 900))
# start the display object to be read
display.start()


# function to deal with the video which tracks the training process
def show_video():
  # get the files as .mp4 from video folder
  mp4list = glob.glob('video/*.mp4')
  # checking if the video exists
  if len(mp4list) > 0:
    mp4 = mp4list[0]
    video = io.open(mp4, 'r+b').read()
    encoded = base64.b64encode(video) # encoding the video
    # Will display the video inline in the python notebook output cell
    ipythondisplay.display(HTML(data='''<video alt="test" autoplay 
                loop controls style="height: 400px;">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii'))))
  else: 
    print("Could not find video")
    
# function to return the environment with agents performance during training
def wrap_env(env):
  # will wrap the environment with the monitor to check whats happening
  env = Monitor(env, './video', force=True)
  return env

**STEP 4) Importing the rom files needed to run the video game.**

In [None]:
# importing the rom files, these files need to be kept or uploaded in collab with this notebook
!python -m atari_py.import_roms .\

copying space_invaders.bin from ./Space Invaders (1980) (Atari, Richard Maurer - Sears) (CX2632 - 49-75153) _.bin to /usr/local/lib/python3.7/dist-packages/atari_py/atari_roms/space_invaders.bin


**STEP 5) Creating the environment and exploring its action space.**

In [None]:
# defining the environmnet
env = gym.make('SpaceInvaders-v0')
#finding the shape of the observation space
height, width, depth = env.observation_space.shape 

# getting the number of actions available
actions = env.action_space.n 
#getting the meaning of actions
env.unwrapped.get_action_meanings() 

['NOOP', 'FIRE', 'RIGHT', 'LEFT', 'RIGHTFIRE', 'LEFTFIRE']

**STEP 6) Model function which creates and returns the CNN used to train the agent.**

In [None]:
# function defining the CNN & Deep Learning model and returns the model
def model(h, w, d, actions):
  """
  function used to create and returns the CNN & Deep Learning Model. We used Relu as activation function for our model, as it known for performing better
  parameters:
  h: height
  w: width
  d: depth
  actions: number actions obtained from environment (used for output layer size)
  Returns:
  model: Keras CNN & Deep Learning model
  """

  # create sequential model
  model = Sequential()

  #add model layers, we added four convlutional layer to to extract the features.
  model.add(Convolution2D(32, (8,8), strides=(4,4), activation='relu', input_shape=(3,h, w, d)))
  model.add(Convolution2D(64, (4,4), strides=(2,2), activation='relu'))
  model.add(Convolution2D(64, (3,3), activation='relu'))
  model.add(Convolution2D(64, (2,2), activation = 'relu'))
  # we flatten the nD array to 1D array of elements to pass as input for dense layers
  model.add(Flatten())
  # we started with 512 neurons and reduced till 128 in hidden layers, then output layer will have size of actions needed for the game
  model.add(Dense(512, activation='relu'))
  model.add(Dense(256, activation='relu'))
  model.add(Dense(128, activation ='relu'))
  model.add(Dense(actions, activation='linear'))
  return model 

**STEP 7) Agent function creates and returns a deep q learning agent.**

In [None]:
# defining the agent with our DRL model and action size
def agent(model, actions):
    """
    function used to create and return the DQN agent
    parameters:
    model: the CNN model
    actions: number actions obtained from environment.
    Returns:
    dqn : deep Q learning agent

    """
    # first we define the policy to be followed for the game and with nunber of steps and the value of epsilon decays from 1 to .1
    policy = LinearAnnealedPolicy(EpsGreedyQPolicy(), attr='eps', value_max=1., value_min=.1, value_test=.2, nb_steps=10000) # defining the policy
    memory = SequentialMemory(limit=1000, window_length=3) # defining the memory
    # defining the agent with policy, model and memory settings.
    dqn = DQNAgent(model=model, memory=memory, policy=policy,enable_dueling_network=True, dueling_type='avg', nb_actions=actions, nb_steps_warmup=1000) 
    return dqn


**STEP 8) Building the model and using the model and action space to build the agent.**

In [None]:
# create the model
model = model(height, width, depth, actions)
# create DQN agent using model 
dqn = agent(model, actions)
# compile the agent using adam optimizer
dqn.compile(Adam(learning_rate=1e-4))


In [None]:
# view the summary of model we realized
model.summary()

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 conv2d (Conv2D)             (None, 3, 51, 39, 32)     6176      
                                                                 
 conv2d_1 (Conv2D)           (None, 3, 24, 18, 64)     32832     
                                                                 
 conv2d_2 (Conv2D)           (None, 3, 22, 16, 64)     36928     
                                                                 
 conv2d_3 (Conv2D)           (None, 3, 21, 15, 64)     16448     
                                                                 
 flatten (Flatten)           (None, 60480)             0         
                                                                 
 dense (Dense)               (None, 512)               30966272  
                                                                 
 dense_1 (Dense)             (None, 256)               1

**STEP 9) Fit method is used to train the deep Q learning agent in the Environment.**

In [None]:
# Training of our agent. This step takes more than 2 hours to complete. We made visualize = False, because its training step
history=dqn.fit(env, nb_steps=10000, visualize=False, verbose=2)

Training for 10000 steps ...


  updates=self.state_updates,


  593/10000: episode: 1, duration: 21.073s, episode steps: 593, steps per second:  28, episode reward: 60.000, mean reward:  0.101 [ 0.000, 25.000], mean action: 2.354 [0.000, 5.000],  loss: --, mean_q: --, mean_eps: --


  updates=self.state_updates,


 1462/10000: episode: 2, duration: 488.762s, episode steps: 869, steps per second:   2, episode reward: 410.000, mean reward:  0.472 [ 0.000, 200.000], mean action: 2.435 [0.000, 5.000],  loss: 16.510357, mean_q: 2.924405, mean_eps: 0.889210
 2294/10000: episode: 3, duration: 836.976s, episode steps: 832, steps per second:   1, episode reward: 225.000, mean reward:  0.270 [ 0.000, 30.000], mean action: 2.524 [0.000, 5.000],  loss: 3.193858, mean_q: 2.838968, mean_eps: 0.831025
 3467/10000: episode: 4, duration: 1179.515s, episode steps: 1173, steps per second:   1, episode reward: 455.000, mean reward:  0.388 [ 0.000, 200.000], mean action: 2.624 [0.000, 5.000],  loss: 2.567246, mean_q: 2.607145, mean_eps: 0.740800
 3833/10000: episode: 5, duration: 366.567s, episode steps: 366, steps per second:   1, episode reward: 65.000, mean reward:  0.178 [ 0.000, 20.000], mean action: 2.549 [0.000, 5.000],  loss: 5.328097, mean_q: 2.236587, mean_eps: 0.671545
 4597/10000: episode: 6, duration: 7

**STEP 10)Once the model is trained the trained weights are then saved.**

In [None]:
# We save the model, as its expensive to run again and again. 
dqn.save_weights('weights.h5f')

**STEP 11) The saved  trained weights is used to run the agent.**

In [None]:
# step to reload already stored weights
dqn.load_weights('weights.h5f')

**STEP 12) Runs the agent in the Environment by information gained from training**

In [None]:
# this step will let the agent run an episode in the space invader environment, with the information it gained from training
scores = dqn.test(wrap_env(env), nb_episodes=1, visualize=True)

Testing for 1 episodes ...


  updates=self.state_updates,


Episode 1: reward: 260.000, steps: 960


**STEP 13) Playing the video of agent playing in the environment**

In [None]:
# we visualize the episode played by the agent as recorded video
show_video()

**STEP 14) Plotting the average reward earned per episode in the training phase**

running this step when testing model will throw error as during testing phase the model is not trained again therfore therewill not be history object.

In [None]:
# import matplotlib for plotting graph
import matplotlib.pyplot as plt
# x axis -  number of episodes
plt.xlabel("episodes")
# y axis - cumulative sum of average rewards
plt.ylabel("average reward")
average_reward = np.cumsum(history.history['episode_reward']) / (np.arange(len(history.history['episode_reward'])) + 1)
plt.plot(average_reward)