In [None]:
import tensorflow as tf
physical_devices = tf.config.experimental.list_physical_devices('GPU')
assert len(physical_devices) > 0, "Not enough GPU hardware devices available"
config = tf.config.experimental.set_memory_growth(physical_devices[0], True)

# Custom networks with keras and Deterministic Policy Gradient (DPG)

By the end of this notebook you will know how to use agents with advanced neural network using keras, this will allow you to create more complex neural architectures and use all kind of layers from keras module.

We choose Deterministic Policy Gradient (DPG) for this tutorial. This is a Policy-Based agent, which means that it will learn the policy itself. Instead of proposing values of states V(s) as DQN based agent do, DPG propose directly the actions.

In [None]:
from RL_Problem import rl_problem
from RL_Agent import dpg_agent
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM
from RL_Agent.base.utils.networks import networks
from RL_Agent.base.utils import agent_saver, history_utils
import gym

## Defining the Neural Network Architecture

In file "RL_Agent.base.utils.networks.networks" we provide some functions to define the dictionaries with the neural network architectures. These dictionaries have a key called: "use_custom_network". When this key is set to True, the agent will recieves a funtion that builds a keras model. This function receives the input shape of the network which should be the state size. Inside this function you can create you keras network and return it as a tensorflow.keras.models.Sequential or as from tensorflow.keras.models.Model. 

The next cell shows an example of creating a Sequential keras model.

In [None]:
def lstm_custom_model(input_shape):
    actor_model = Sequential()
    actor_model.add(LSTM(64, input_shape=input_shape, activation='tanh'))
    actor_model.add(Dense(256, input_shape=input_shape, activation='relu'))
    actor_model.add(Dense(256, activation='relu'))
    return actor_model

Then, we need to define the dictionary to especify the network architecture. As explained before, "use_custom_network" parameter has to be set to True. The other parameter, "custom_network", recieves the function to build the model. 

For this particular case, we only have the "custom_network" param, but in other cases we may have the network divided in subnetworks. For example, Dueling DDQN or Deep Deterministic Policy Gradient have especific network architectures that are divided in subnerworks where we have an especific key in the dictionarie for each subnetwork.

In [None]:
net_architecture = networks.dpg_net(use_custom_network=True,
                                    custom_network=lstm_custom_model)

## Define the RL Agent

Here, we define the RL agent using the next parameters:

* learning_rate: learning rate for training the neural network.
* batch_size: Size of the batches used for training the neural network. 
* net_architecture: net architecture defined before.
* n_stack: number of stacked timesteps to form the state.

You may notice that we do not include parameters related to exploration process like "epsilon", this is because this algorithm use by default a random choice of action based on the probabilities calculated by the neural network (np.random.choice(n_actions, p=probability_predicitons). This dotes DPG with an inherent explorative behavior and makes epsilon (exploration rate) not needed.

In [None]:
agent = dpg_agent.Agent(learning_rate=1e-3,
                        batch_size=64,
                        net_architecture=net_architecture,
                        n_stack=5)

## Define the Environment

We choose the LunarLander environment from OpenAI Gym.

In [None]:
environment = "LunarLander-v2"
environment = gym.make(environment)

## Build a RL Problem

Create a RL problem were the comunications between agent and environment are managed. In this case, we use the funcionality from "RL_Problem.rl_problem.py" which makes transparent to the user the selection of the matching problem. The function "Problem" automaticaly selects the problem based on the agent used.

In [None]:
problem = rl_problem.Problem(environment, agent)

## Solving the RL Problem

Next step is solving the RL problem that we have define. Here, we specify the number of episodes, the skip_states parameter, the render boolean and additionaly, after how many iterations we want to render the environment. 

When render is set to False we can specify the "render_after" parameter. The environement will be rendered after reach the specified number of iterations.

In [None]:
problem.solve(episodes=250, skip_states=3, render=False, render_after=200)

Runing the agent in exploitation mode over the environment to see the final performance.

In [None]:
problem.test(n_iter=4, render=True)

In [None]:
hist = problem.get_histogram_metrics()
history_utils.plot_reward_hist(hist, 10)

Run this last cell if you want to save the agent to a file.

In [None]:
agent_saver.save(agent, 'agent_dpg_lunar.json')

# Takeaways
- We trained our first Policy-Based agent.
- We learned how to use use keras for creating complex and flexibles neural network architectures within the library.
- We learned to use a new parameter for rendering the training process after n iterations.

