# Define a simple Neural Network with Double Deep Q Network (DDQN)

By the end of this notebook you will know how to use our interface for defining the agent's neural network architecture without knowing Tensor FLow or Keras.

For this purpose, we use the Double Deep Q Network (DDQN) agent to address again the Cart Pole problem.

In [None]:
from RL_Problem import rl_problem
from RL_Agent import ddqn_agent
from RL_Problem.base.ValueBased import dqn_problem
from RL_Agent.base.utils import agent_saver, history_utils
from RL_Agent.base.utils.networks import networks

import gym

## Defining the Environment

The next cell, show how to define the CartPole environment as were saw in tutoria 01.

In [None]:
environment = "CartPole-v1"
environment = gym.make(environment)

## Defining the Neural Network Architecture

CAPOIRL have a very simple interfaz to define neural networks based in dictionaries. This is oriented to people with low experience with neural network, those which have never used the deep learning libraries or modules compatibles with CAPOIRL (Tensor Flow and Keras) and for fast prototyping.

In [None]:
net_architecture =  {"dense_lay": 2,
                    "n_neurons": [128, 128],
                    "dense_activation": ['relu', 'tanh']
                    }

We provide some functions to define the dictionaries without the necessity or remember all the keys. This funcionality can be imported from RL_Agent.base.utils.networks.networks.py" and it is a compilation of functions to create dictionies compatibles with each kind of RL agent.

The next cell redefines "net_architecture" using the specific function for DDQN that returns a dictionary equivalent to the one defined above.

In [None]:
net_architecture = networks.double_dqn_net(dense_layers=2,
                                           n_neurons=[128, 128],
                                           dense_activation=['relu', 'tanh'])

## Defining the RL Agent

Here we define the RL agent that we are going to use. In this case, we selected a DDQN agent which is an variation of DQN.

The agent is defined configuring a few parameters:

* learning_rate: learning rate for training the neural network.
* batch_size: Size of the batches used for training the neural network. 
* epsilon: Determines the amount of exploration (float between [0, 1]). 0 -> Full Exploitation; 1 -> Full exploration.
* epsilon_decay: Decay factor of the epsilon. In each iteration we calculate the new epslon value as: epsilon' = epsilon * epsilon_decay.
* esilon_min: minimun value epsilon can reach during the training procedure.
* n_stack: number of stacked timesteps to form the state
* net_architecture: net architecture defined before.

Here, we notice two new parameters:

"net_architecture" is used to set the network architecture that we have defined before. In this example is a dictionarie but in latter tutorials we will see how we can define more complex networks using keras or tensorflow.

"n_ stack" is a parameter used to incorporate temporal information into the states. By default n_stack = 1, this means the state will be formed only by the current state out of the environment. Where n_stack = n, being n > 1, the state will be formed by the n last states stacked. This means, the current state out of the environment, the state in timestep -1, in timestep -2 , ..., to timestep - (n-1). If n=5 the state will be formed by the 5 last states and will have saphe (5, state_size).

In [None]:
agent = ddqn_agent.Agent(learning_rate=1e-3,
                         batch_size=128,
                         epsilon=0.4,
                         epsilon_decay=0.999,
                         epsilon_min=0.15,
                         n_stack=5,
                         net_architecture=net_architecture)

## Build a RL Problem

Build a RL problem were the comunication between agent and environment are managed.

In [None]:
problem = dqn_problem.DQNProblem(environment, agent)

## Solving the RL Problem

Next step is solving the RL problem that we defined. Here, we specify the number of episodes, the render boolean, the verbosity of the function and finally the "skip_states" parameter. 

The "skip_states" parameter have value 1 by default, this means that the agent will select an ation every timestep to be executed in the environment. When skip_states = n being n > 1, an action selected by the agent will be executed n timesteps and then the actor will select another action. This allows a faster collection of experiences during training procedure avoiding execute the neural network each timestep. 

This state skipping technique is introduced by Mnih, V., Kavukcuoglu, K., Silver, D. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).
            https://doi.org/10.1038/nature14236If.

In [None]:
problem.solve(episodes=100, render=True, skip_states=3)

The next cell run n iterations in fully explorative mode to check the performance obtained by the agent. It will be rendered by default.

In [None]:
problem.test(n_iter=10)

Using "get_histogram_metrics" and "plot_reward_hist" functions the history of rewards obtained during the epochs of the training process can be visualized. Param n_moving_average select how much time steps will be used to calculate a smothed versión of the data (blue line).

In [None]:
hist = problem.get_histogram_metrics()
history_utils.plot_reward_hist(hist, 10)

Run this last cell if you want to save the agent to a file.

In [None]:
agent_saver.save(agent, 'agent_ddqn_pole.json')

# Takeaways

- We learned how to use a DDQN agent.
- We learned how to define the neural network architecture using the interface for fast prototyping and low TensorFlow level people.
- We learned how to stack temporal information within the states.
- We learned how to use the state skipping technique by Mnih et al.