# Introduction

This notebook presents the main part of the project. It is decomposed in the following parts:
- Parameters setting 
- Creation of the trading environment 
- Set-up of the trading agent (simple_agent)
- Set-up of the portfolio vector memory (PVM)
- Agent training 
- Agent Evaluation
- Analysis 

<u>Note:</u> This notebook has been cleaned up and run on a local machine. The appearing results are only for illustration and not representative of the project results in the presentation. 

# 1. Imports

In [1]:
import io, os, sys, types
from IPython import get_ipython
from nbformat import read
from IPython.core.interactiveshell import InteractiveShell
import ipynb.fs.full
import import_ipynb
from PVM import PVM
from Agents.DPG_NoCash import DPG
from MarketEnvironment import MarketEnvironment

importing Jupyter notebook from PVM.ipynb


In [2]:
import ffn
import random
import numpy as np
import pandas as pd
from tqdm import tqdm
from collections import deque

# Import backend to build neural networs and tensors
import tensorflow as tf

# Imports for plotting
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt

# 2. Load Data and Parameters

In [4]:
# Load market data tensor 
periods = '2012-01-01_2020-01-01'
path_data = './MarketData/np_data/input_' + periods + '_closeNorm.npy'
# path_data = './MarketData/np_data/input_2014-01-01_2020-06-23.npy'
# path_data = './MarketData/np_data/input_2008-01-01_2010-01-01.npy'
data = np.load(path_data)       # Load data
trading_period = data.shape[2]  # All the periods for which feature prices are recorded
num_features = data.shape[0]-1   # Number of feature prices
num_assets = data.shape[1]      # Number of market assets

m = num_assets
asset_list = ['SAN.MC', 'TEF.MC', 'GRF.MC', 'BIO.MC', 'FAE.MC', 'AIR.MC', 'IDR.MC', 'ITX.MC', 'SLR.MC']
# asset_list = [i for i in range(m)]

training_validating_testing = {'train_ratio': 0.6, 'validation_ratio': 0.2}
training_parameters = {'num_epochs': 10, 'num_batches': 30, 'batch_size': 20}
network_parameters = {'regularization': 1e-8, 'learning': 9e-4, 'window_size': 50}
financial_information = {'trading_cost': 0.25/100, 'interest_rate': 0.02/250}
investing_information = {'initial_portfolio_value': 10000, 'initial_weight_vector': np.array(np.array([1]+[0]*m))}

# Trading steps/periods (days) for training, validations and testing sets:
training_set_periods = int(training_validating_testing['train_ratio']*trading_period)
validating_set_periods = int(training_validating_testing['validation_ratio']*trading_period)
testing_set_periods = trading_period-training_set_periods - validating_set_periods

# Training parameters
num_epochs = training_parameters['num_epochs']
num_batches = training_parameters['num_batches']
batch_size = training_parameters['batch_size']

# Network optimization parameters:
regularization = network_parameters['regularization']  # The L2 regularization coefficient applied to network training
learning = network_parameters['learning']              # Parameter alpha (i.e. the step size) of the Adam optimization
optimizer = tf.train.AdamOptimizer(learning)
n = network_parameters['window_size']

# Financial parameters
trading_cost = financial_information['trading_cost']
interest_rate = financial_information['interest_rate']

# Investing_information
w_init = investing_information['initial_weight_vector']           # Before starting to trade, all the portfolio is composed by cash 
pf_value_init = investing_information['initial_portfolio_value']  # Amount of cash the agent is going to invest

# PVM Parameters
sample_bias = 5e-5  # Beta in the geometric distribution for mini-batch training sample batches

import datetime
Dates = pd.read_csv('MarketData/Madrid_SE_' + periods + '/' + 'AIR.MC.csv',).Date
Dates = Dates[:-1]
train_dates = list(Dates[n : training_set_periods])
val_dates = list(Dates[training_set_periods + n : training_set_periods + validating_set_periods])
test_dates = list(Dates[training_set_periods + validating_set_periods + n : ])

# 3. Train Agent

First the action is going to be computed without training the net, and its performance is going to be evaluated. Then, after each batch the NN is going to update its parameters and evaluate its performance again.

## 3.1 Evaluate performance of the agent

For each period of the validation set, the agent is going to read $X_t = [:, :, t-n:t]$ which is the tensor containing the n previous relative prices for all the assets and the features, and the previous action taken by the agent in the last period, and compute the new action using the NN trained parameters (they already have been trained when this function is triggered). Since the NN needs the input tensors to be rank 4 tensors because training is doing using the mini batch technique (explained on the training function), the rank 3 $X_t$ tensor is rechaped into a rank 4 X_t tensor such that $X_t = [:, :, t-n:t] = [1, :, :, t-n:t]$ and the $W_{previous}$ vector (rank 1 tensor) is going to be reshaped into a rank 2 one $W_{previous} = (asset_0, asset_1, \dots , asset_m) = [Assets] = [1, Assets] $. The reshape operation does not suppose any changes in the values because only one dimension to the batch dimension is added, so here they are exactly the same tensors. 


In [7]:
import matplotlib.dates as mdates
def get_max_draw_down(xs):
    xs = np.array(xs)
    i = np.argmax(np.maximum.accumulate(xs) - xs) # end of the period
    j = np.argmax(xs[:i])                         # start of period
    
    return xs[j] - xs[i]
    
    
def eval_performance(epoch, agent, LogReturn = True):
    
    # Create empty lists so as to evaluate the performance of the actor 
    list_weight_end_val = list()
    list_pf_end_training = list()
    list_pf_min_training = list()
    list_pf_max_training = list()
    list_pf_mean_training = list()
    list_pf_dd_training = list()
    list_sharpe_ratios = list()
    
    # Create the trading environment and initializate it with initial weight and portfolio value
    # n + t is where it starts reading the data
    env = MarketEnvironment(path = path_data, window_length = n,
                   initial_portfolio_value = pf_value_init, trading_cost = trading_cost,
                   interest_rate = interest_rate, train_size = training_validating_testing['train_ratio'], LogReturn = LogReturn)
    # The first period that the net is going to compute is training_set_periods + n 
    # cause it needs the n previous periods to compute tha action
    state, done = env.reset(w_init[1:], pf_value_init, t = training_set_periods)

    # First element of the portfolio value and the weight vector
    p_list = [pf_value_init]
    w_list = [w_init[1:]]

    # Reads from training_set_periods (first validating set index) + n 
    # until the last validating set index minus n because the last index of the market environment is
    # MarketEnvironment.index = training_set_periods + validating_set_periods + n - n
    # And X_last = data[:,:,index-n:index]
    for k in range(training_set_periods + 1, training_set_periods + validating_set_periods - n):
        # Reshape the tensors adding one dimension for the batches to feed the into the NN
        X_t = state[0].reshape([-1] + list(state[0].shape))
        X_t = X_t[:, :-1, :, :]
        W_previous = state[1].reshape([-1]+ list(state[1].shape))
        
        # Compute the action the agent takes (once the parameters of the net have been trained) 
        action = agent.compute_W(X_t, W_previous)
       
        # Forward step: compute new environment state
        state, reward, done = env.step(action)

        X_next = state[0]                  # X[:, :, t+1-n:t+1]
        W_t = state[1]                     # Action at the end of the trading period (contains price fluctuations)
        pf_value_t  = state[2]             # Portfolio value at the end of the trading period
        dailyReturn_t = X_next[-1, :, -1]  # X[opening/opening, :, t+1] = opening(t+1)/opening(t)
        
        #print('current portfolio value', round(pf_value_previous,0))
        #print('weights', W_previous)
        p_list.append(pf_value_t)  # List of the portfolio values after each step
        w_list.append(W_t)         # List of the actions taken by the agent after each step
        list_sharpe_ratios.append(agent.get_sharpe_ratio)
        
    # Record just the last element. Lists are created to find out the max, min and mean values of the portfolio
    list_weight_end_val.append(w_list[-1])
    list_pf_end_training.append(p_list[-1])
    list_pf_min_training.append(np.min(p_list))
    list_pf_max_training.append(np.max(p_list))
    list_pf_mean_training.append(np.mean(p_list))
    
    list_pf_dd_training.append(get_max_draw_down(p_list))
    dates = [datetime.datetime.strptime(d, "%Y-%m-%d").date() for d in val_dates]
    print('End of test PF value:',round(p_list[-1]))
    print('Min of test PF value:',round(np.min(p_list)))
    print('Max of test PF value:',round(np.max(p_list)))
    print('Mean of test PF value:',round(np.mean(p_list)))
    print('Max Draw Down of test PF value:',round(get_max_draw_down(p_list)))
    print('End of test weights:',w_list[-1])
    
    fig, ax = plt.subplots()
    plt.title('Portfolio evolution (validation set) episode {}'.format(epoch))
    plt.xlabel('Periods')
    plt.ylabel('Portfolio Value')
    ax.plot(dates, p_list, label = 'Agent Portfolio Value')
    fig.autofmt_xdate()                              # Rotate and align the tick labels so they look better
    ax.fmt_xdata = mdates.DateFormatter('%Y-%m-%d')  # Format dates to string for the x axis locations 
    plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.)
    plt.show()
    
    plt.title('Portfolio weights (end of validation set) episode {}'.format(epoch))
    plt.ylabel('Weights')
    plt.bar(np.arange(m), list_weight_end_val[-1])
    plt.xticks(np.arange(m), asset_list, rotation=45)
    plt.show()
    
    names = asset_list
    w_list = np.array(w_list)
    for j in range(m):
        plt.plot(w_list[:,j], label = 'Weight Stock {}'.format(names[j]))
        plt.legend(bbox_to_anchor=(1.05, 1), loc=2, borderaxespad=0.5)
    plt.show()

## 3.2 Train the NN using batch training

1. A batch starting with period $tb$ $t_0 − n_b$ is picked with a geometrically distributed probability (PVM class).
2.  It is important that prices inside a batch are in time-order: the slices of the X_t tensor are selected such that X_t[:,:,index:index+n] where the third dimension is the time dimension. However, the index for training is choosen by a geometrical random function.
3. The for loop which runs through bs index appends into lists the results from each sample in the batch. The lists are converted into arrays that are fed into the NN (and treated as rank 4 tensors):
- list_X_t = $(X[:,:,index:index+n], X[:,:,index+1: index+1 + n],\dots , X[:,:,tb+batch_size: tb+batch\_size + n])$ where $index = n + tb$
- list_W_t = $(W\_index, W\_{index+1}, \dots , W\_{index+batch\_size})$  $\Rightarrow$ Each of this W are calculated at the end of each period (considerinf the evolution of the price during the session)
4. train: calls train function defined in the agents class.


In [8]:
#random action function
def get_random_action(m):
    random_vec = np.random.rand(m)
    return random_vec/np.sum(random_vec)


def train(agent, env, num_epochs, eval_performance):

    # Keep record of the values of the portfolio at the end of each batch
    list_final_pf_batch_values = list()
    
    # Number of epochs (times the whole dataset is computed)
    for e in range(num_epochs):
        # Evaluate the performance of the agent before it has been trained 
        # The action is computed just feeding the X_t and W_previous tensors of the validation set
        print('Start Episode', e)
        if e==0:
            eval_performance('Before Training', agent)
        print('Episode:', e)

        # Init the PVM with the training parameters (initialize it after each epoch)
        memory = PVM(m, sample_bias, total_steps = training_set_periods, batch_size = batch_size, w_init = w_init[1:])

        # Number of batches: For each batch (nb), the net sees a determined number of samples from training set
        for nb in range(num_batches):
            # Starting point of the batch: get a ramdom index to initialize the batch
            t_start = memory.get_random_index()

            # Initialize the environment with the weight from PVM at the starting point and the initial portfolio value
            # Starts reading data from t_start + n (n is defined in the env which is passed to the function)
            state, done = env.reset(memory.get_W(t_start), pf_value_init, t=t_start)

            # Initialize lists to store the values at each step/sample of the batch
            list_X_t, list_W_previous, list_pf_value_previous, list_dailyReturn_t = [], [], [], []

            # Read the samples in the batch to create the data structures needed to train the NN
            for bs in range(batch_size):
                
                # Keep in mind that what it is called here t, in the MarketEnvironment is t_start+n
                # X_t [:,:,t-n:t] = [:,:,t_start:t_start+n]
                # Reshapes X_t [features, assets, periods] into [1, features, assets, periods] 
                # because the nn needs to receive a rank 4 tensor to compute the actionfor the next period
                X_t = state[0].reshape([-1] + list(state[0].shape))
                X_t = X_t[:, :-1,:, :]
                W_previous = state[1].reshape([-1] + list(state[1].shape))
                pf_value_previous = state[2]

                # Compute actions 
                # Take random action sometimes to improve exploration
                if np.random.rand() < 0.6:
                    action = agent.compute_W(X_t, W_previous)
                else:
                    action = get_random_action(m)
                
                # Given the state and the action, call the environment to go one time step later (compute next state and reward)
                # The results from action w_t (X_next, W_t at the end of the period and pf_value_t at the end of the period) are
                # stored in the following variables so, for the next sample of the for loop, action w_{t+1} is computed with data
                # X_next = X[1,:,:, t + 1: t + 1 + window_length] and w_t (at the end of the period)
                state, reward, done = env.step(action)
                
                # Get the action once it has evolved, and the portfolio value
                # The action is needed to pass it into the NN so as it can learn
                # The portfolio value is just for drawing the evolution
                W_t = state[1]         # Portfolio weight vector at the end of period t (considering evolution of prices)
                pf_value_t = state[2]  # Portfolio value at the end of period t(considering evolution of prices)

                # Get the daily return: X_next = X[:,:,t+1-n:t+1] where t = t_start + n
                # DailyReturn_t = X_next[-1,:,-1] = [open(t+1)/open(t)] price fluctuation during session t (~ close(t)/open(t))
                # DailyReturn_t-1 = X_t[-1,:,-1] = [open(t)/open(t-1)] 
                X_next = state[0]
                dailyReturn_t = X_next[-1, :, -1]
                
                # Each X_t tensor is a rank 3 tensor of [features, assets, window length]
                # Store each sample elements so as to use them to train the net as a batch
                list_X_t.append(X_t.reshape(state[0].shape[0]-1, state[0].shape[1], state[0].shape[2]))
                list_W_previous.append(W_previous.reshape(state[1].shape))
                list_pf_value_previous.append([pf_value_previous])
                list_dailyReturn_t.append(dailyReturn_t)

                # End of the batch: keep record of the portfolio values at the end of each batch
                if bs==batch_size-1:
                    list_final_pf_batch_values.append(pf_value_t)

                # Update this action (once it has evolved) into the PVM so as to use it to evaluate next action
                memory.update(t_start + bs, W_t)
            
            # Build the rank 4 tensors: batches composed by bs samples 
            list_X_t = np.array(list_X_t)
            list_W_previous = np.array(list_W_previous)
            list_pf_value_previous = np.array(list_pf_value_previous)
            list_dailyReturn_t = np.array(list_dailyReturn_t)
           

            #for each batch, train the network to maximize the reward
            agent.train(list_X_t, list_W_previous,
                        list_pf_value_previous, list_dailyReturn_t)
            agent.save_model()
            print(nb)
            first_layer = agent.get_first_layer(X_t, W_previous); print('1', first_layer.shape)
#             if np.any(first_layer)==0:
#                 print('WARNING, first layer')
#             second_layer = agent.get_second_layer(X_t, W_previous);# print(second_layer)
#             if np.any(second_layer)==0:
#                 print('WARNING, second_layer')
            growth_potential = agent.get_growth_potential(X_t, W_previous); print(growth_potential)
            if np.any(growth_potential)==0:
                print('WARNING, Network is not computing the potential of growth of the assest!')
            print('Sharpe Ratio over batch: ', agent.get_sharpe_ratio(list_X_t, list_W_previous,
                        list_pf_value_previous, list_dailyReturn_t))
            print('Loss function over batch: ', agent.get_loss_function(list_X_t, list_W_previous,
                        list_pf_value_previous, list_dailyReturn_t))
            
        eval_performance(e, agent)

## 3.3 Create environment and instanciate the agent

The environment is in charge of computing the tensors $X_t$, $W_{previous}$ and $pf_{previous}$ that are going to be fed into the NN. Once the batch tensors are created they are fed into the NN for training. Therefore, the net is not going to update the parameters of the layers until a number equal of batch_size of tensors have passed through its layers. Once the net has been trained over a determined number of epochs (number of times the net seed the whole dataset), the training is complete.


In [9]:
# Initialize session and run it so as to train the agent
tf.reset_default_graph()
device = 'cpu'
path_to_save = 'ModelParams_LR_closeNorm_/'
model_name = 'Model2_n50_s9_9e-4_bs20_reg2_retrained_.ckpt'; 
env_log = MarketEnvironment(path = path_data, window_length = n,
               initial_portfolio_value = pf_value_init, trading_cost = trading_cost,
               interest_rate = interest_rate, train_size = training_validating_testing['train_ratio'], LogReturn = True)

load_weights = True
if device == "cpu":
    os.environ["CUDA_VISIBLE_DEVICES"] = ""
    with tf.device("/cpu:0"):
        log_agent = DPG(num_features, m, n, device, optimizer, trading_cost, interest_rate, 
                       path_to_save, model_name, LogReturn = True, load_weights = load_weights)
else:
    log_agent = DPG(num_features, m, n, device, optimizer, trading_cost, interest_rate, 
                   path_to_save, model_name, LogReturn = True, load_weights = load_weights)


Instructions for updating:
Call initializer instance with the dtype argument instead of passing it to the constructor
Instructions for updating:
Use tf.initializers.variance_scaling instead with distribution=uniform to get equivalent behavior.
Instructions for updating:
Use keras.layers.flatten instead.
Instructions for updating:
Please use `layer.__call__` method instead.
Instructions for updating:
Use keras.layers.Dense instead.
Loading Model
Saved to:ModelParams_LR_closeNorm_/
model_checkpoint_path: "ModelParams_LR_closeNorm_/Model2_n50_s9_9e-4_bs20_reg2_retrained_.ckpt"
all_model_checkpoint_paths: "ModelParams_LR_closeNorm_/Model2_n50_s9_9e-4_bs20_reg2_retrained_.ckpt"
 ModelParams_LR_closeNorm_/Model2_n50_s9_9e-4_bs20_reg2_retrained_.ckpt
INFO:tensorflow:Restoring parameters from ModelParams_LR_closeNorm_/Model2_n50_s9_9e-4_bs20_reg2_retrained_.ckpt
Successfully loaded: ModelParams_LR_closeNorm_/Model2_n50_s9_9e-4_bs20_reg2_retrained_.ckpt


In [10]:
train(log_agent, env_log, 20, eval_performance)

Start Episode 0


KeyboardInterrupt: 