# Notebook Instructions

1. If you are new to Jupyter notebooks, please go through this introductory manual <a href='https://quantra.quantinsti.com/quantra-notebook' target="_blank">here</a>.
1. Any changes made in this notebook would be lost after you close the browser window. **You can download the notebook to save your work on your PC.**
1. Before running this notebook on your local PC:<br>
i.  You need to set up a Python environment and the relevant packages on your local PC. To do so, go through the section on "**Run Codes Locally on Your Machine**" in the course.<br>
ii. You need to **download the zip file available in the last unit** of this course. The zip file contains the data files and/or python modules that might be required to run this notebook.

# Artificial Neural Networks Based Double Deep Q Learning Agents


In a reinforcement learning problem, except for the environment, the agent is the other cardinal part. In deep reinforcement learning, the agent is modelled using Artificial Neural Networks (ANNs). In this notebook, you will look at the Keras definitions of the two identical ANN-based agents which are used for creating the two Q-tables for Double Deep Q Learning architecture. Both of them are trained using different samples. This is done so that they can later be compared for value overestimation. This helps stabilise the learning process and helps make it fast.

In this notebook, you will perform the following steps:

1. [Import Modules](#modules)
1. [Read OHLCV data](#read)
2. [ANN hyperparameters](#hyper)
3. [Define ANNs for DDQN](#net)

<a id='modules'></a> 
## Import modules

First, we import the modules. We import the sequential model, the dense layer and stochastic gradient descent (sgd) optimizer from the Keras package. We also import the Game class from the quantra_reinforcement_learning module. 

You can find the `quantra_reinforcement_learning` module from the last section of this course '**Python Codes and Data**' unit.

In [1]:
# Import pandas
import pandas as pd

# To suppress GPU related warnings
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'

# Import Sequential model
from tensorflow.keras import Sequential

# Import dense layers
from tensorflow.keras.layers import Dense

# Import stochastic gradient descent optimizer
from tensorflow.keras.optimizers import SGD

# Appends new file paths to import modules
import sys
sys.path.append("..")

from data_modules.quantra_reinforcement_learning import Game
from data_modules.quantra_reinforcement_learning import reward_exponential_pnl

<a id='read'></a> 
## Read price data

In [2]:
# The data is stored in the directory 'data'
path = '../data_modules/'

# Read 5 mins price data
bars5m = pd.read_pickle(path + 'PriceData5m.bz2')

bars5m.head()

Unnamed: 0_level_0,open,high,low,close,volume
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2010-01-04 09:35:00-05:00,91.711,91.809,91.703,91.76,4448908.0
2010-01-04 09:40:00-05:00,91.752,91.973,91.752,91.932,4380988.0
2010-01-04 09:45:00-05:00,91.94,92.022,91.928,92.005,2876633.0
2010-01-04 09:50:00-05:00,92.005,92.177,91.973,92.177,4357079.0
2010-01-04 09:55:00-05:00,92.168,92.177,92.038,92.079,2955068.0


<a id='hyper'></a> 
## ANN hyperparameters 

In the initialisations below, we define the various hyperparameters used by the ANNs.

In [3]:
# Create a dictionary to store the configuration
rl_config = {}

# LEARNING_RATE: This is the multiplier for the steps of the gradient
# This tells us how fast the optimizer will reach an optima
rl_config['LEARNING_RATE'] = 0.05

# LOSS_FUNCTIO: The function to quantify how much predictions are away from ground truth
rl_config['LOSS_FUNCTION'] = 'mse'

# ACTIVATION_FUN: The function that adds non-linearity to predictions for fitting complex curves
rl_config['ACTIVATION_FUN'] = 'relu'

# NUM_ACTIONS: Number of actions the agent can take; Buy, sell and hold
rl_config['NUM_ACTIONS'] = 3

# BATCH_SIZE: The number of samples being trained on at a given time
rl_config['BATCH_SIZE'] = 1

# HIDDEN_MULT: Relative size of the input and the hidden layer
rl_config['HIDDEN_MULT'] = 2

<a id='net'></a> 
## Define ANNs for DDQN

In the code below we define two multi-layer perceptrons. In a multi-layer perceptron, the data is passed into the neural network only once. The error is calculated and the feedback is given back from the last to the first layers. There is only one propagation of error or feedback backwards. This is called backpropagation.

The dimensions of the input are equal to the dimensions of the state of the environment. The dimensions of the outputs of the agent are equal to the number of actions the agent can take. In this case, it is: buy, sell and hold. You create the models by embedding three ```dense()``` layers in the ```sequential()``` model as shown below.

Two models are defined because Double Deep Q Learning requires two Q-tables learning simultaneously. This helps in avoiding value overestimation as stated in the previous video unit which in turn leads to faster learning.

In [4]:
def init_net(env, rl_config):
    """.
    Args:
        env: an instance of the Game class which is used to create the environment the agent explores
    Returns:
        modelR: the neural network for R-value table
        modelQ : the neural network for Q-value table
    """

    hidden_size = len(env.state)*rl_config['HIDDEN_MULT']

# ----------------------------------------------------------------------------------

    # Define the sequential function which encapsulates the layers of the model
    modelR = Sequential()

    # Define a dense layer with input shape equal to the size of the state vector
    modelR.add(Dense(len(env.state), input_shape=(
        len(env.state),), activation=rl_config['ACTIVATION_FUN']))

    # Define a dense hidden layer of input size hidden_size. The activation function used is relu
    modelR.add(Dense(hidden_size, activation=rl_config['ACTIVATION_FUN']))

    # Define a dense layer with output of the size of the num_actions
    # which is total number of possible actions. The activation used is softmax
    modelR.add(Dense(rl_config['NUM_ACTIONS'], activation='softmax'))

    # Compile the model
    # Use the stochaistic gradient descent optimizer
    modelR.compile(SGD(lr=rl_config['LEARNING_RATE']),
                   loss=rl_config['LOSS_FUNCTION'])

# ---------------------------------------------------------------------------------------

    modelQ = Sequential()
    modelQ.add(Dense(len(env.state), input_shape=(
        len(env.state),), activation=rl_config['ACTIVATION_FUN']))
    modelQ.add(Dense(hidden_size, activation=rl_config['ACTIVATION_FUN']))
    modelQ.add(Dense(rl_config['NUM_ACTIONS'], activation='softmax'))
    modelQ.compile(SGD(lr=rl_config['LEARNING_RATE']),
                   loss=rl_config['LOSS_FUNCTION'])

    return modelR, modelQ

In [5]:
# START_IDX: This is the starting index for the main loop, allow enough for lkbk
START_IDX = 3000

# LKBK: This is the lookback period, e.g. a value of 10 means 10 mins, 10 hours and 10 days!
LKBK = 10

ohlcv_dict = {
    'open': 'first',
    'high': 'max',
    'low': 'min',
    'close': 'last',
    'volume': 'sum'
}

# Resample data to 1 hour data
bars1h = bars5m.resample(
    '1H', label='left', closed='right').agg(ohlcv_dict).dropna()

# Reample data to daily data
bars1d = bars5m.resample(
    '1D', label='left', closed='right').agg(ohlcv_dict).dropna()

# Create the Game environment
env = Game(bars5m, bars1d, bars1h, reward_exponential_pnl,
           lkbk=LKBK, init_idx=START_IDX)

# Create the model
modelR, modelQ = init_net(env, rl_config)

In [6]:
modelR.summary()

Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense (Dense)                (None, 138)               19182     
_________________________________________________________________
dense_1 (Dense)              (None, 276)               38364     
_________________________________________________________________
dense_2 (Dense)              (None, 3)                 831       
Total params: 58,377
Trainable params: 58,377
Non-trainable params: 0
_________________________________________________________________


In [7]:
modelQ.summary()

Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_3 (Dense)              (None, 138)               19182     
_________________________________________________________________
dense_4 (Dense)              (None, 276)               38364     
_________________________________________________________________
dense_5 (Dense)              (None, 3)                 831       
Total params: 58,377
Trainable params: 58,377
Non-trainable params: 0
_________________________________________________________________


You can choose to modify the agent definition. As stated before, you can try out with much more complex layers like LSTMs and 1D CNNs. In the coming units, you will learn how these ANNs are trained using a method known as experience replay.  <br><br>