# 1. Trading Agent
Class to implement a policy gradient trading agent, who is in charge of finding a policy which maximizes the reward of the portfolio. The agent is a NN whose output is the weight vector $\vec{w}$ or action.


## 1.1 Portolio features:
The portfolio features are the tensors that characterize the portfolio:

* Relative price tensor $X_t$: Composed by the relative price vectors of the 3 features: 

$\text{[closing(0:t-1)/opening price(0:t-1), high(0:t-1))/opening price(0:t-1), low(0:t-1))/opening price(0:t-1)]}$

where the shape is $[Bathces, f, m]$, where f are the features, and m the non cash assets.

* Relative price vector ($y_t$): Fluctuation of the prices of the assets during the t session ($open(t+1)/open(t)\approx cose(t)/open(t)$). Shape $[Batches, 1+m]$. It is a rank 2 tensor, and it can be seen as a vector (rank 1 tensor) for each sample/period t in the batch.

* W_previous ($w'_{t-1}$): is the portfolio weight vector at the end of the previous trading period. It is given by:
$$w'_{t-1} = \frac{\vec{y}_{t-1}\vec{w}_{t-1}}{\sum_{i=1}^m y_{t-1,i}\cdot w_{t-1,i}};\; \mathrm{Shape}\; [Batches, 1+m]$$

The transaction costs are given by rearranging the assets in the portfolio (buying or selling them) to go from $w'_{t-1}$ to $w_t$ which is the action the agent took for the actual period.

<div style="text-align:center"><img src="images/Trading_Scheme_.PNG" /></div>

The computation of $w'_t$ is done by the environment class, so here is not implemented.

* Change on the portfolio value during the session: $P'_t = P_{t}\cdot \vec{w}_{t}\cdot \vec{y}_t$
    - $P_t$ is the value of the portfolio at the beginigin of period t.
    - $w_t$ is the portfolio weigth vector at the beginign of period t.
    - $P'_t$ is the valie of the portfolio at the end of period t.

* Rate of return ($r_t$) or immediate reward: 
    - Simple return: $\frac{P'_t-P_{t-1}}{P_{t-1}} = \mu_t y_t \cdot w_{t} - 1$
    - Continuous compouneded return: $\log{\frac{P'_t}{P_{t-1}}}=\log{\mu_t y_t \cdot w_{t}}$. Shape $[Batches, 1]$


* Portfolio value vector (pv_vector): Portfolio value for each sample in the batch considering the price evolution
    - $[Batch]$ rank 1 tensor (vector):  There is a value per sample/period.
    - Portfolio value ($P_f$): is the value of the portfolio anfter $\Delta t = t_f-bs$ periods (bs = batch size):
$$P_{t_f} = P_{t_f-bs} \exp \left( \sum _{t=t_f-bs} ^{t_f + 1} r_t \right) \Rightarrow \frac{P_{t_f}}{P_{t_f-bs}} = \prod _{t=t_f-bs} ^{t_f+1} \mu_t \vec{y}_t \cdot \vec{w}_{t}; \; \mathrm{Shape}\; \mathrm{It\; is\; a\; scalar. Shape\; []}$$ 
    - It can also be computed by calculating the transaction costs of changing the portfolio from w_previous = w'_t to action = w_t, obtaining the vector of the asset values in the portfolio V_trans = action*P_prev-(costs,0,...,0), multiply it to consider the fluctuation of prices during the session, suming up V_trans elements for each sample (portfolio value after each period), and then summing up the portfolio value after each period. 

* Objective function ($R$): is what is going to be maximize. 
    - Simple return:  
    - Log return: It is given by the average of logarithmic cumulated return
$$R(s_1, a_1, \dots, s_{t_f}, a_{t_f}, s_{t_f+1}) = \frac{1}{t_f}\log \left(\frac{P_f}{P_0}\right) = \sum _{t=1}^{t_f+1}\log (\mu_t\vec{y}_t\cdot \vec{w}_{t-1}) = \frac{1}{t_f}\sum_{t=1}^{t_f+1}r_t; \; \mathrm{Shape}\; [Batches,1]$$
    - Sharpe ratio: is the average return earned in excess of the risk-free rate per unit of volatility or total risk.


## 1.2 Introducing the cash as an asset

The agent introduces the cash as an asset so as to give it more importance when the rest of the assets are devaluing. Since the cash which is not invested has a very little return (the return of the deposit on which it is stored) and it is constant, the profit of not investng cash will be 1+(percentage of interest rate of the deposit/100). Therefore, since the NN receives as an input the closing prices normalized over the highest closing price of the timeseries for each asset, the agent is able to analyze if the asset for the trading period $t$ day is loosing money which respect other days, and, if this loss is worse than the constant return given if the cash was stored in the deposit.

Here the cash was not introduced as an asset.

## 1.3 Train the NN using batch training: 

1. A batch starting with period $tb$ $t_0 − n_b$ is picked with a geometrically distributed probability (PVM class).
2.  It is important that prices inside a batch are in time-order: the slices of the X_t tensor are selected such that X_t[:,:,index:index+n] where the third dimension is the time dimension.
3. The for loop which runs through bs index appends into lists the results from each sample in the batch. The lists are converted into arrays that are fed into the NN (and treated as rank 4 tensors):
- list_X_t = $(X[:,:,index:index+n], X[:,:,index+1: index+1 + n],\dots , X[:,:,tb+batch_size: tb+batch\_size + n])$ where $index = n + tb$
- list_W_t = $(W\_index, W\_{index+1}, \dots , W\_{index+batch\_size})$  $\Rightarrow$ Each of this W are calculated at the end of each period (considerinf the evolution of the price during the session)
4. train: calls train function defined in the agents class.


In [1]:
import tensorflow as tf
import tflearn
import numpy as np
import os


class DPG:
    
    def __init__(self, num_features, num_assets, window_size, device, optimizer, trading_cost, interest_rate,
                path_to_save, model_name, LogReturn, load_weights = False, layer_type = 'Conv'):

        # parameters
        self.trading_cost = trading_cost
        self.interest_rate = interest_rate
        self.m = num_assets
        self.n = window_size
        self.num_features = num_features
        self.LogReturn = LogReturn
        self.layer_type = layer_type
        
        # Network parameters
        self.X_t = tf.placeholder(tf.float32, [None, self.num_features, self.m, self.n]) 
        self.batch_size = tf.shape(self.X_t)[0]                         # Batch size
        self.W_previous = tf.placeholder(tf.float32, [None, self.m])  # w'_{t-1}
        self.action = self.build_net()  # Returns the output of imputing X_t and w_previous to the NN
        
        # Portfolio parameters
        self.pf_value_previous = tf.placeholder(tf.float32, [None, 1])               # p'_{t-1} 
        self.portfolioValuePrevious = tf.squeeze(self.pf_value_previous)             # From [values, 1] to [values]
        self.dailyReturn_t = tf.placeholder(tf.float32, [None, self.m])              # y_t = Open(t+1)/Open(t)
        constant_return = tf.constant(1+self.interest_rate, shape=[1, 1])            # Interest rate given by cash
        self.cash_return = tf.tile(constant_return, tf.stack([self.batch_size, 1]))  # Interest rate is the sae for all samples
        self.y_t = tf.concat([self.cash_return, self.dailyReturn_t], axis=1)         # Daily returns considering the cash return
                
    
        # Function to minimize: maximize reward over the batch (min(-r) = max(r))
        self.loss_function = self.loss_function()
        self.optimizer = optimizer
        self.train_op = optimizer.minimize(-self.loss_function)
#         self.global_step = tf.Variable(0, trainable=False)
#         self.optimize=tf.train.AdamOptimizer(self.learning_rate).minimize(self.loss,global_step=self.global_step)
#         self.sess = sess    
        tf_config = tf.ConfigProto()
        self.sess = tf.Session(config=tf_config)
        if device == "cpu":
            tf_config.gpu_options.per_process_gpu_memory_fraction = 0
        else:
            tf_config.gpu_options.per_process_gpu_memory_fraction = 0.2
        
        self.model_name = model_name
        self.path_to_save = path_to_save
        self.saver = tf.train.Saver(max_to_keep=10)
        
        if load_weights:
            print("Loading Model")
            try:
                checkpoint = tf.train.get_checkpoint_state(path_to_save)
                print('Saved to:' + path_to_save)
                print(checkpoint, checkpoint.model_checkpoint_path)
                if checkpoint and checkpoint.model_checkpoint_path:
                    tf.reset_default_graph()
                    self.saver.restore(self.sess, checkpoint.model_checkpoint_path)
                    print("Successfully loaded:", checkpoint.model_checkpoint_path)
                else:
                    print("Could not find old network weights")
                    self.sess.run(tf.global_variables_initializer())
            except:
                print("Could not find old network weights")
                self.sess.run(tf.global_variables_initializer())
        else:
            self.sess.run(tf.global_variables_initializer())
        
        
    def build_net(self):
        network = tf.transpose(self.X_t, [0, 2, 3, 1])  # Reshape [Batches, Assets, Periods, Features]
        network = tflearn.layers.conv_2d(network, 3,
                                         [1, 3],
                                         [1, 1, 1, 1],
                                         'valid',
                                         'relu')
        self.first_layer = network
        # Second layer 
        width = network.get_shape()[2]
        network = tflearn.layers.conv_2d(network, 48,
                                         [1, width],
                                         [1, 1],
                                         "valid",
                                         'relu'#)
                                         ,regularizer="L2",
                                         weight_decay=5e-9)
        self.second_layer = network
        # Third layer
        w_previous = self.W_previous[:,:]
        network=tf.concat([network,tf.reshape(w_previous, [-1, self.m, 1, 1])],axis=3)
        network = tflearn.layers.conv_2d(network, 1,
                                         [1, network.get_shape()[2]],
                                         [1, 1],
                                         "valid",
                                         'relu'#)
                                         ,regularizer="L2",
                                         weight_decay=5e-9)
        growth_potential = network[:, :, 0, 0]  # Squeeze diensions [Batchs, assets, 1, 1] = [Batches, Assets]
        self.growth_potential = growth_potential
        network=tf.layers.flatten(network)
        w_init = tf.random_uniform_initializer(-0.005, 0.005)
        action = tf.layers.dense(network, self.m, activation=tf.nn.softmax, kernel_initializer=w_init)

        return action

    # Compute loss funtion
    def loss_function(self):
        if self.LogReturn:
            
            # PROFIT VECTOR: P_t/P_{t-1} = exp(r_t) = sum over the assets (action*y_t) 
            # profit_vector = (y_t1 * w_t1,..., y_tn * w_tn) tn = t1 + batch_size = last period (sample in the batch)
            # return_vector = (t_t1, ..., t_tn)
            self.profit_vector = tf.reduce_sum(self.action * self.dailyReturn_t, reduction_indices=[1]) * self.compute_mu() 
            self.return_vector = tf.log(self.profit_vector) 

            # PROFIT: P(t)/P(t-bs)=exp(sum(_(t-bs)^t) r_t) = (prod(_(t-bs)^t)w_t*y_t) profit obtained after each batch
            self.profit = tf.reduce_prod(self.profit_vector)          # Multiplies all the elements of profit_vector
            self.mean = tf.reduce_mean(self.return_vector)            # Average daily return (through all the batches)
            self.reward = self.mean                                   # Cumulated return (eq 22)
            # Risk measure
            self.standard_deviation = tf.sqrt(tf.reduce_mean((self.return_vector - self.mean) ** 2))
            self.sharpe_ratio = (self.mean) / self.standard_deviation
    
            loss_function = self.set_loss_function()                  # Loss function to train the NN
            

        # Simple reward: r_t = (p_t-p_t-1)/p_t-1 = mu_t*y_t*w_t - 1 (w_t = action)
        else:   
            # Vector of the returns obtained for each period (r_t1, ..., r_tn) such that tn = t1+batch_size
            # Return: r_t = (p_t-p_t-1)/p_t-1 => Profit: p_t/p_t-1 = r_t + 1 => p_t = p_t-1(r_1 + 1)
            self.return_vector = tf.reduce_sum(self.action * self.y_t, reduction_indices=[1]) * self.compute_mu() - 1
            self.profit_vector = 1 + self.return_vector
            
            self.profit = tf.reduce_prod((1 + self.return_vector))  # P_t/p_t-bs
            self.mean = tf.reduce_mean(self.return_vector)          # Average daily return (through all the batches)
            self.reward = self.mean                                 # Cumulated return (eq 22)
            # self.portfolioValues = self.profit_vector * self.portfolioValuePrevious

            # Risk measure
            self.standard_deviation = tf.sqrt(tf.reduce_mean((self.return_vector - self.mean) ** 2))
            self.sharpe_ratio = (self.mean) / self.standard_deviation
            
            loss_function =  self.set_loss_function()               # Loss function to train the NN
            

        return loss_function
                
    
    # Transaction remainder factor 
    def compute_mu(self):
        return 1 - tf.reduce_sum(tf.abs(self.action[:,:]-self.W_previous[:,:]), axis=1) * self.trading_cost # [Batches]
    
   
    # Define the loss function which is going to minimize the agent (so as to maximize the reward)
    # Keep in mind that what is going to be minimize is the -loss function (see self.train_op)
    def set_loss_function(self):
        LAMBDA = 1e-4 
        
        # Minimizesthe minus cumulated returns (maximizes the cumulated returns)
        def loss_function1():
            if self.LogReturn:
                return self.reward #* 1000000
            else: 
                return self.reward

        # Minimizes the minus sharpe ratio   
        def loss_function3():
            if self.LogReturn:
#                 print('Add regularization to avoid actions too big')
#                 return self.sharpe_ratio - 0.1*tf.reduce_max(self.action)
                return self.sharpe_ratio
#                 return self.sharpe_ratio - tf.reduce_mean(tf.reduce_sum(tf.abs(self.action[:, :] - self.W_previous[:,:])\
#                                             *self.trading_cost, reduction_indices=[1])) #- 0.1*tf.reduce_max(self.action)
            else: 
                return self.sharpe_ratio

        
        loss_function = loss_function1
        loss_tensor = loss_function()
        regularization_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
        if regularization_losses:
            for regularization_loss in regularization_losses:
                loss_tensor += regularization_loss
        return loss_tensor

    # Compute the agent's action   
    def compute_W(self, X_t_, W_previous_):
        return self.sess.run(tf.squeeze(self.action), feed_dict={self.X_t: X_t_, self.W_previous: W_previous_})
    
    # Train the NN maximizing the reward: the input is a batch of the differents values
    def train(self, X_t_, W_previous_, pf_value_previous_, dailyReturn_t_):
        self.sess.run(self.train_op, feed_dict={self.X_t: X_t_,                             
                                                self.W_previous: W_previous_,
                                                self.pf_value_previous: pf_value_previous_,
                                                self.dailyReturn_t: dailyReturn_t_})
        
    # Save model parameters
    def save_model(self):
        if not os.path.exists(self.path_to_save):
            os.makedirs(self.path_to_save)
        self.saver.save(self.sess, self.path_to_save + self.model_name)
        
    # Getters of interesting variables: 
    def get_sharpe_ratio(self,  X_t_, W_previous_, pf_value_previous_, dailyReturn_t_):
        return self.sess.run(self.sharpe_ratio, feed_dict={self.X_t: X_t_,                             
                                                self.W_previous: W_previous_,
                                                self.pf_value_previous: pf_value_previous_,
                                                self.dailyReturn_t: dailyReturn_t_})
    def get_average_daily_return(self,  X_t_, W_previous_, pf_value_previous_, dailyReturn_t_):
        return self.sess.run(self.mean, feed_dict={self.X_t: X_t_,                             
                                                self.W_previous: W_previous_,
                                                self.pf_value_previous: pf_value_previous_,
                                                self.dailyReturn_t: dailyReturn_t_})
    def get_profit(self,  X_t_, W_previous_, pf_value_previous_, dailyReturn_t_):
        return self.sess.run(self.profit, feed_dict={self.X_t: X_t_,                             
                                                self.W_previous: W_previous_,
                                                self.pf_value_previous: pf_value_previous_,
                                                self.dailyReturn_t: dailyReturn_t_})
    def get_loss_function(self,  X_t_, W_previous_, pf_value_previous_, dailyReturn_t_):
        return self.sess.run(self.loss_function, feed_dict={self.X_t: X_t_,                             
                                                self.W_previous: W_previous_,
                                                self.pf_value_previous: pf_value_previous_,
                                                self.dailyReturn_t: dailyReturn_t_})
    
    def get_w_evol(self,  X_t_, W_previous_, dailyReturn_t_):
        return self.sess.run(self.w_evol, feed_dict={self.X_t: X_t_,                             
                                                self.W_previous: W_previous_,
                                                self.dailyReturn_t: dailyReturn_t_})
    def get_first_layer(self, X_t_, W_previous_):
        return self.sess.run(self.first_layer, feed_dict={self.X_t: X_t_, self.W_previous: W_previous_})
    
    def get_second_layer(self, X_t_, W_previous_):
        return self.sess.run(self.second_layer, feed_dict={self.X_t: X_t_, self.W_previous: W_previous_})
    
    def get_growth_potential(self, X_t_, W_previous_):
        return self.sess.run(self.growth_potential, feed_dict={self.X_t: X_t_, self.W_previous: W_previous_})



curses is not supported on this machine (please install/reinstall curses for an optimal experience)








In [40]:
# import tensorflow as tf
# import numpy as np
# tf.reset_default_graph()
# sess = tf.Session()
# optimizer = tf.train.AdamOptimizer(9e-2)
# # Initialize network agents
# simple_agent = DPG(4, 5, 10, sess, optimizer, 0.0025, 0.005, LogReturn = False)

()
