# Training from Images with Dueling Double Deep Q Network (DDDQN)

By the end of this notebook you will know how to use images as states, preprocess the neural network input data, clip and normalize the reward function and how to define the Experience's Memory type and capacity. 

The environment selected for this tutorial classic Atari game: space invaders, provided by OpenAi Gym.

In [None]:
from RL_Problem import rl_problem
from RL_Agent import dddqn_agent
from RL_Agent.base.utils.Memory.deque_memory import Memory as deq_m
import numpy as np
import matplotlib.pylab as plt
import gym
from RL_Agent.base.utils import agent_saver, history_utils
from RL_Agent.base.utils.networks import networks

## Preprocessing and Normalization
We want to preprocess the input images in order to reduce the dimensionality, crop the edges, convert to grayscale and normalize the pixel values. Here, we define the function to do all this stuff.

In [None]:
def atari_preprocess(obs):
    # Crop and resize the image
    obs = obs[20:200:2, ::2]

    # Convert the image to grayscale
    obs = obs.mean(axis=2)

    # normalize between [0, 1]
    obs = obs / 255.
    
    # Pass from 2D of shape (90, 80) to 3D array of shape (90, 80, 1)
    obs = obs[:, :, np.newaxis]

    return obs


We also want to clip and normalize the reward function. The next funtion normalize the reward as: reward' = log(1+reward), and clip this value between [-1, 1].

In [None]:
def clip_norm_atari_reward(rew):
    return np.clip(np.log(1+rew), -1, 1)

## Defining the Environment

We define the Gym environment.

In [None]:
environment = "SpaceInvaders-v0"
env = gym.make(environment)

Visualization of the original input and the preprocessed input.

In [None]:
aux_obs = env.reset()
aux_prep_obs = atari_preprocess(aux_obs)
env.reset()

plt.figure()
plt.subplot(121)
plt.imshow(aux_obs)
plt.subplot(122)
plt.imshow(aux_prep_obs, cmap='gray')
plt.show()

## Defining the Neural Network Architecture

We define the network architecture using the function "dueling_dqn_net" from "RL_Agent.base.utils.networks.networks.py" which return a dictionary. DDDQN has a particular network architecture that we have splited in three subnetworks. The first network is the common network, which recieves the input data. As you can see in the cell below, we use convolutional layers to process the image input followed by one dense layer for the common network. Here, the network is splited in two:  the advantage network and the value network. Both subnetworks recieves the output of common subnetwork and as their name said, they computes the "advantage" A(a,s) of take an action given an state and the "value" V(s) of being in a state.

In [None]:
net_architecture = networks.dueling_dqn_net(common_conv_layers=2,
                                            common_kernel_num=[32, 32],
                                            common_kernel_size=[3, 3],
                                            common_kernel_strides=[2, 2],
                                            common_conv_activation=['relu', 'relu'],
                                            common_dense_layers=1,
                                            common_n_neurons=[512],
                                            common_dense_activation=['relu'],
                                            
                                            advantage_dense_layers=2,
                                            advantage_n_neurons=[256, 128],
                                            advantage_dense_activation=['relu', 'relu'],

                                            value_dense_layers=2,
                                            value_n_neurons=[256, 128],
                                            value_dense_activation=['relu', 'relu'])

## Defining the RL Agent

Here, we define the RL agent. In this case, we selected a DDDQN agent which is a variation over DQN.

The agent is defined configuring a few parameters:

* learning_rate: learning rate for training the neural network.
* batch_size: Size of the batches used for training the neural network. 
* epsilon: Determines the amount of exploration (float between [0, 1]). 0 -> Full Exploitation; 1 -> Full exploration.
* epsilon_decay: Decay factor of the epsilon. In each iteration we calculate the new epslon value as: epsilon' = epsilon * epsilon_decay.
* esilon_min: minimun value epsilon can reach during the training procedure.
* net_architecture: net architecture defined before.
* n_stack: number of stacked timesteps to form the state.
* img_input: boolean. Set to True where the states are images in form of 3D numpy arrays.
* state_size: tuple, size of the state.

Here, we have two new parameters: 

1) img_input is just a boolean value that need to be setted as True where the input data are images.

2) state_size is the size of the input states. When is not defines the library use automaticaly the state size defined in the environmen but, as we changed it in the preprocessing, we need to set this value in an explicit way.

In [None]:
agent = dddqn_agent.Agent(learning_rate=1e-3,
                          batch_size=64,
                          epsilon=0.9,
                          epsilon_decay=0.999999,
                          epsilon_min=0.15,
                          net_architecture=net_architecture,
                          n_stack=5,
                          img_input=True,
                          state_size=(90, 80, 1)
                          )

## Build a RL Problem

Create a RL problem were the comunication between agent and environment are managed. In this case, we use the funcionality from "RL_Problem.rl_problem.py" which makes transparent to the user the selection of the matching problem. The function "Problem" automatically selects the problem based on the used agent.

In [None]:
problem = rl_problem.Problem(env, agent)

After defining the problem we are going to set the state preprocessing and reward normalization and clipping functions.

In [None]:
problem.preprocess = atari_preprocess
problem.clip_norm_reward = clip_norm_atari_reward

This environment consumes too memory when storing the states (images) for training the neural network. This is a good momment to introduce how to select the memory to use and its size.

DQN based methods are compatibles with all memories defined in RL_Agent.base.utils.Memory.py. Actually, you can find a deque memory, which is the standar memory for DQN methods and a Prioritized Experience Replay (PER) memory.

By default DDDQN uses deque memory. In this specific case, we want to change the capacity of the experiences memory to not overflow the physic memory of the computer.

All DQN based algorithms allows using both types of memory. A2C with experience memory algorithms (from RL_Agent.a2c_agent_discrete_queue.py and RL_Agent.a2c_agent_continuous_queue.py) allows using deque memory. All other algprithms use a buffer instead of a experience memory which length is set through the "batch_size" property of the agent class with the exception of PPO algortithms which include a "memory_size" property.

In [None]:
memory_max_len = 1000 
problem.agent.set_memory(deq_m, memory_max_len)

## Solving the RL Problem

Next step is solving the RL problem that we have define. Here, we specify the number of episodes, the render boolean, the verbosity of the function, the skip_states parameter and additionaly if we want to render the environment after n iterations. 

When render is set to False, we can specify the "render_after" parameter. The environement will be rendering once the specified number of iterations was reached.


In [None]:
problem.solve(episodes=5, skip_states=3, render=False, render_after=3)

The next cell run n iterations in fully exploitative mode to check the performance obtained by the agent. It will be rendered by default. The performance of the agent will be very bad, to reach aceptables performance we will need to run thousands of iterations for this environment.

In [None]:
problem.test(n_iter=2, render=False)

In [None]:
hist = problem.get_histogram_metrics()
history_utils.plot_reward_hist(hist, 10)

Run this last cell if you want to save the agent to a file.

In [None]:
agent_saver.save(agent, 'agent_dddqn.json')

# Takeaways
- We learned how to use image data withing the RL agents over Atari games.
- We learned how to preproces the states data and how to clip and normalize the reward function.
- We learned how to change the leght and type of the exoeriences memory.
- We learned how to select the experiences memory used by Deep Q Network based methods. 