In [None]:
import tensorflow as tf
physical_devices = tf.config.experimental.list_physical_devices('GPU')
assert len(physical_devices) > 0, "Not enough GPU hardware devices available"
config = tf.config.experimental.set_memory_growth(physical_devices[0], True)

# Continuous actions and Tensorboar with Deterministic Policy Gradient (DPG)

In this tutorial we address for first time a problem with continuous actions. We chose the continuous version of Lunar Lander environment for this tutorial. This environment have two actions, both floats in range [-1, 1]. The first action (a_1) controls the main engine, when a_1 < 0 the engine is off and when a_1 > 1 engine is on. The second action (a_2) controls the left and rigth engines. If a_2 in [-1, -0.5] fire left engine, if a_2 in [0.5, 1] fire right engine and if a_2 in [-0.5, 0.5] engines are off.

We also show how to save tensorboard summaries of the training process. We use the tensorboard funcionality defined by defaul, we will introduce how to use customized tensorboard summaries in further tutorials.

By the end of this tutorial you will know how to use agents in environments with continuous action spaces and how to record Tensorboard summaries to supervise the training process.

In [None]:
from RL_Problem import rl_problem
from RL_Agent import dpg_agent_continuous
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM
from RL_Agent.base.utils.networks import networks
from RL_Agent.base.utils import agent_saver, history_utils
import gym

## Defining the  Neural Network Architecture
We define the network architecture using the function "dpg_net" from "RL_Agent.base.utils.networks.networks.py" which return a dictionary.

In [None]:
net_architecture = networks.dpg_net(dense_layers=2,
                                    n_neurons=[128, 128],
                                    dense_activation=['relu', 'relu'])

## Define the RL Agent

We define the agent setting the next parameters:

* learning_rate: learning rate for training the neural network.
* batch_size: Size of the batches used for training the neural network.
* net_architecture: net architecture defined before.
* n_stack: number of stacked timesteps to form the state.
* tensorboard_dir: path to folder for store tensorboard summaries.

If we especify the "tensorboard_dir" param, the agent will record the default tensorboard summaries. "tensorboard_dir" expect a directory path in string format.

In [None]:
agent = dpg_agent_continuous.Agent(learning_rate=1e-3,
                            batch_size=64,
                            n_stack=1,
                            net_architecture=net_architecture,
                            tensorboard_dir='tensorboard_logs')


## Define the Environment

We chose the LunarLanderContinuous environment from OpenAI Gym.

In [None]:
environment = "LunarLanderContinuous-v2"
environment = gym.make(environment)

## Build a RL Problem

The RL problem is were the comunications between agent and environment are managed. In this case, we use the funcionalities from "RL_Problem.rl_problem.py" which makes transparent to the user the selection of the matching problem. The function "Problem" automaticaly selects the problem based on the used agent.

In [None]:
problem = rl_problem.Problem(environment, agent)

## Solving the RL Problem

Next step is solving the RL problem that we have define. Here, we specify the number of episodes, the skip_states parameter and additionaly after how many iterations we want to render the environment. 

We do not specify the value of render because it is set to False by default when trainig.

In [None]:
problem.solve(episodes=400, skip_states=3, render_after=190)

Runing the agent in exploitation mode over the environment to see the final performance.

In [None]:
problem.test(n_iter=4, render=False)

Lets see the reward history as usual. In order to execute the next cell you will need to stop the execution of the cell avobe.

In [None]:
hist = problem.get_histogram_metrics()
history_utils.plot_reward_hist(hist, 10)

## Run Tensorboard to See the Recorded Summaries

Lets see the tensorboard logs. Next cell executes the command that runs the tensorboard service. To see the result, you have to open a tab in your browser on the url that the command shows, usually http://localhost:6006/

In [None]:
!tensorboard --logdir=tensorboard_logs

Run this last cell if you want to save the agent to a file.

In [None]:
agent_saver.save(agent, 'agent_dpg_lunar.json')

# Takeaways
- We trained our agent in a environment with continuous action space.
- We learned how to use record the default Tensorboard summaries during the training process.
- We learned how to see the recorded summaries using the Tensorboard service.

