# Market environment 

Trading environment (render) of the project:
* The trading agent calls the class by giving an action at the time t. 
* Then the class (render) gives back the new portfolio at the next step (time t+1). 

Parameters:
* windonw_length: Number of time slots looked in the past to build the input tensor.
* portfolio_value: Initial value of the portfolio.
* trading_cost: Cost (in % of the traded stocks) the agent will pay to execute the action.
* interest_rate: Rate of interest (in % of the money the agent has) the agent will:
                  -get at each step if he has a positive amount of money 
                  -pay if he has a negative amount of money
* train_size: % of data taken for the training of the agent - please note the training data are taken with respect 
    of the time span (train -> | time T | -> test)
    
Calculations for each of the samples (not batches): 
The portfolio features are the tensors that characterize the portfolio:

* Relative price vector ($y_t$): Changes of the prices during the session or relative prices. There are 2 approaches:
    - $y_t = \frac{\text{closing}_t}{\text{opening}_{t}}$. Shape $[1, 1+m]$ vector of relative prices for each asset and 1 sample.
    - $y_t = \frac{\text{opening}_t}{\text{opening}_{t-1}}$

* Future_weight_vec ($w'_t$ or $w_{evol}$}): is the portfolio weight vector at the end of the trading period. It is given by:
$$w'_t = \frac{\vec{y}_t\vec{w}_{t-1}}{\sum_{i=1}^m y_{t,i}\cdot w_{t-1,i}};\; \mathrm{Shape}\; [Batches, 1+m]$$


## Simple return:
Notation:
* $Pf = Pf_{previous}\cdot r_t = \sum_{i=1}^{m}\vec{v}_{previous}$ : portfolio value
* $\vec{v}_{previous} = \vec{w}\cdot Pf_{previous}$ : vector of the values of the assets in the portfolio where $\vec{w}$ is the action just taken.
    
Given an action, $\vec{a} = \vec{w}$, the portfolio value is given by:
$$P_t = P_{t-1}\vec{a} = P_{t-1}\cdot y_t\cdot \vec{a}_{t-1}$$
where $y_t =[(1+\text{int rate}), \frac{\text{opening}_t}{\text{opening}_{t-1}}]$ represents the relative price change. 

Considering transaction costs (each transaction costs a "transaction cost" so the whole operation in the portfolio costs c such that $pf_{evol} = \mu_t\cdot pf_{previous}$), the reward of the portfolio is then given by:
$$\text{reward} = \frac{pf_{evol}-pf_{previous}}{pf_{previous}} = \frac{\mu_t P_t}{P_{t-1}}-1 = \mu_t \cdot y_t\cdot w_{t-1}-1$$ 
where $pf_{evol}$ is the change in total portfolio value (the change is measured by $\vec{y}_t$) once transaction costs have been considered:

$pf_{evol} = \sum_{i=0}^{m} \vec{v}_{evol} =  \sum_{i=0}^{m} \vec{v}_{trans}\cdot \vec{y}_t$ and $\vec{v}_{trans} = \vec{v}_{previous} - (costs, 0,0,\dots , 0)$


<!--- $pf_{evol} = (pf_{previous} - \text{costs})\cdot y_t = \mu_t \cdot w_{alloc} \cdot y_t = \mu_t \cdot w_{t-1} \cdot y_t $-->

Therefore, knowing $\vec{v}_{evol}$ and $pf_{evol}$, the new action considering transaction costs should be:

$$w_{evol} = \frac{v_{evol}}{\sum _{i = 1}^{m}v_{evol,i}}$$

## Log return:

* Logarithmic rate of return ($r_t$) or immediate reward: $\log{\mu_t y_t \cdot w_{t-1}}$. Shape $[1, 1]$

* Portfolio value vector (__pv_vector): Portfolio value for each batch (the value of the portfolio after computing the action with n_b samples)
    - $[Batch]$ rank 1 tensor (vector):  There is a value per batch.
    - Portfolio value ($P_f$): is the value of the portfolio anfter $\Delta t = t_f-t_0$ periods:
$$P_{t_f} = P_0 \exp \left( \sum _{t=1} ^{t_f + 1} r_t \right) = P_0 \prod _{t=1} ^{t_f+1} \mu_t \vec{y}_t \cdot \vec{w}_{t-1}; \; \mathrm{Shape}\; \mathrm{It\; is\; a\; scalar. Shape\; []}$$ 

* Cumulative reward function ($R$): is what is going to be maximize. It is given by the average of logarithmic cumulated return
$$R(s_1, a_1, \dots, s_{t_f}, a_{t_f}, s_{t_f+1}) = \frac{1}{t_f}\log \left(\frac{P_f}{P_0}\right) = \sum _{t=1}^{t_f+1}\log (\mu_t\vec{y}_t\cdot \vec{w}_{t-1}) = \frac{1}{t_f}\sum_{t=1}^{t_f+1}r_t; \; \mathrm{Shape}\; [Batches,1]$$


In [1]:
import math
import gym
from gym import spaces, logger
# from gym.utils import seeding
import numpy as np
from gym.envs.registration import register
import tensorflow as tf


class MarketEnvironment():

    def __init__(self, path, window_length, initial_portfolio_value, 
                 trading_cost, interest_rate, train_size, LogReturn):
        
        # Load data [features, assets, previous periods]
        self.path = path
        self.data = np.load(self.path)

        # Parameters of the trading agent
        self.portfolio_value = initial_portfolio_value  # How much cash used to create the portfolio
        self.window_length = window_length
        self.trading_cost = trading_cost
        self.interest_rate = interest_rate
        self.LogReturn = LogReturn

        # Number of stocks and features
        self.num_stocks = self.data.shape[1]
        self.num_features = self.data.shape[0]
        self.end_train = int((self.data.shape[2]-self.window_length)*train_size)
        
        # Init state and index
        self.index = None  # Represents the period t for which the computations are made
        self.state = None
        self.done = False

        #init seed
#         self.seed()
        
    # Return the value of the portfolio
    def return_pf(self):
        return self.portfolio_value
        
    # Tensor which will be the input for the NN [Batches, features, assets, previous periods]   
    def readTensor(self, X, index):
        # Reads index-n:index -> t + n - n: t + n 
        # index = t + n where t is the period given
        return X[ : , :, index-self.window_length:index]
    
    # Calculate the return of each stock for the day t 
    def getFluctuationVector(self, index):
        #print(index)
        # Adds the element [1+self.interest_rate] to the first possiton of the data[-1,:,index] (last feature, all assets, period t + n = index)
        return np.array([1 + self.interest_rate] + self.data[-1,:,index].tolist())

    # Get random seed
    def seed(self, seed=None):
        self.np_random, seed = seeding.np_random(seed)
        return [seed]
    
    # Restarts the environment with given initial weights and given value of portfolio
    def reset(self, w_init, p_init, t = 0):
        
        self.index = self.window_length + t  # Period for which the computations are made
        self.state= (self.readTensor(self.data, self.index), w_init, p_init)
        self.done = False
        
        return self.state, self.done
    

    # Compute the new value of the portfolio
    def step(self, action):
        """
        At each step t, the trading agent gives as input the action he wants to do: w_t
        The function computes the new value of the portfolio at the step (t+1): P_{t+1}
        It also returns the reward associated with the action the agent took: r_t 
        The reward is defined as the evolution of the the value of the portfolio in %. 
        """
        index = self.index
        # Get tensor X (self.data), from index-window_size:index previous periods
        data = self.readTensor(self.data, index)
        done = self.done

        # Beginning of the day (t)
        state = self.state      # State of the market {X_t, w_{t-1}, pf_{t-1}}
        w_previous = state[1]   # Action taken in period t-1 evolved due to fluctuations of price during (t-1, t) session
        pf_previous = state[2]  # Value of the portfolio at the end of the previous period (t-1)
        
        # Fluctuations in price plus the interest rate for money (to evaluate cash bias): [1+int_rate, open(index+1)/open(index)]
        update_vector = self.getFluctuationVector(index)
            
        # Compute transaction cost
        cost = pf_previous * np.linalg.norm((action[1:] - w_previous[1:]), ord = 1) * self.trading_cost
        
        # Transaction remainder factor: Pevol = mu*Pprev = Pprev - costs; mu = 1-costs
        # So as to calculate the reward without the need of the portfolio previous value
#         mu = 1 - np.linalg.norm((action-w_previous), ord = 1)* self.trading_cost

        # Value vector of the assets: value of each of the assets in the portfolio for the new action
        v_alloc = pf_previous * action
        
        # Pay transaction costs (pay the from cash amount)
        v_trans = v_alloc - np.array([cost] + [0] * self.num_stocks)
        
        # Compute features of period t considering the price evolution during the session
        v_evol = v_trans * update_vector  # Value of each asset at the end of the t period
        pf_evol = np.sum(v_evol)        # Portfolio value at the end of period t
        w_evol = v_evol / pf_evol         # Weight vector at the end of period t
        
        # Compute reward:
        if not self.LogReturn:
            reward = (pf_evol - pf_previous)/pf_previous
#             reward2 = np.dot(action, update_vector)*mu - 1     # Same result
        else:
            reward = np.log(pf_evol/pf_previous)
#             reward2 = np.log(np.dot(action,update_vector)*mu)  # Same result
           
        # Update index to get the next state (state of the following sample)
        index = index + 1  
        state = (self.readTensor(self.data, index), w_evol, pf_evol)  # State = (X_t+1, w't, p't)
        
        if index >= self.end_train:
            done = True
    
        self.state = state
        self.index = index
        self.done = done

        return state, reward, done