# Notebook Instructions

1. If you are new to Jupyter notebooks, please go through this introductory manual <a href='https://quantra.quantinsti.com/quantra-notebook' target="_blank">here</a>.
1. Any changes made in this notebook would be lost after you close the browser window. **You can download the notebook to save your work on your PC.**
1. Before running this notebook on your local PC:<br>
i.  You need to set up a Python environment and the relevant packages on your local PC. To do so, go through the section on "**Run Codes Locally on Your Machine**" in the course.<br>
ii. You need to **download the zip file available in the last unit** of this course. The zip file contains the data files and/or python modules that might be required to run this notebook.

# Game Class

In this notebook, you will learn to create a full `Game` class. In the previous notebooks, you have learned to update the positions, design reward system and assemble the states. Here, you will put these inside the Game class.

You will perform the following steps:

1. [Import libraries](#libraries)
2. [Read price data](#price)
3. [Design reward](#reward)
4. [Construct Game class](#game)

<a id='libraries'></a> 
## Import libraries

In [1]:
# Import necessary libraries
import pandas as pd
import numpy as np
import talib
from datetime import datetime, timedelta
import datetime

# Ignore warnings
import warnings
warnings.filterwarnings("ignore")

<a id='price'></a> 
## Read price data

We will read the 5 minutes price data from the compressed pickle file. You have already done these steps in the previous notebooks. You can find the data file in the last section of this course **"Python Data and Codes"**

In [10]:
# The data is stored in the directory 'data'
path = '../data_modules/'

bars5m = pd.read_pickle(path+ 'PriceData5m.bz2')

bars5m.head()

Unnamed: 0_level_0,open,high,low,close,volume
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2010-01-04 09:35:00-05:00,91.711,91.809,91.703,91.76,4448908.0
2010-01-04 09:40:00-05:00,91.752,91.973,91.752,91.932,4380988.0
2010-01-04 09:45:00-05:00,91.94,92.022,91.928,92.005,2876633.0
2010-01-04 09:50:00-05:00,92.005,92.177,91.973,92.177,4357079.0
2010-01-04 09:55:00-05:00,92.168,92.177,92.038,92.079,2955068.0


<a id='reward'></a> 
## Design reward 

You have already designed the reward system based on the pnl in the "Reward Design" notebook.

In [3]:
def get_pnl(entry, curr_price, position):
    # Transaction cost and commissions
    tc = 0.001
    return (curr_price*(1-tc) - entry*(1+tc))/entry*(1+tc)*position


def reward_pure_pnl(entry, curr_price, position):
    '''pure pnl'''
    return get_pnl(entry, curr_price, position)


def reward_positive_pnl(entry, curr_price, position):
    '''Positive pnl, zero otherwise'''
    pnl = get_pnl(entry, curr_price, position)

    if pnl >= 0:
        return pnl

    else:
        return 0
    
    
def reward_pos_log_pnl(entry, curr_price, position):
    '''Positive log pnl, zero otherwise'''
    pnl = get_pnl(entry, curr_price, position)

    if pnl >= 0:
        return np.ceil(np.log(pnl*100+1))
    else:
        return 0


def reward_categorical_pnl(entry, curr_price, position):
    '''Sign of pnl'''
    pnl = get_pnl(entry, curr_price, position)
    return np.sign(pnl)


def reward_positive_categorical_pnl(entry, curr_price, position):
    '''1 for win, 0 for loss'''
    pnl = get_pnl(entry, curr_price, position)
    if pnl >= 0:
        return 1
    else:
        return 0


def reward_exponential_pnl(entry, curr_price, position):
    '''Exponentual percentage pnl'''
    pnl = get_pnl(entry, curr_price, position)
    return np.exp(pnl)

<a id='game'></a> 
## Construct Game class

You have done the majority of the steps from this class such as initialising the Game class, updating the position, calculating the reward, creating the input features and assembling the state. 

We are adding two more functions to complete the Game class.

`get_state`: This function returns the state of the system, including candlesticks, indicators, day of the week, time of the day and position.
            
`act`: This function interacts with the trading algorithm. It takes action as a parameter suggested by the neural networks and returns a flag whether game is over or not and a reward when game is over.

In [4]:
class Game(object):

    def __init__(self, bars5m, bars1d, bars1h, reward_function, lkbk=20,  init_idx=None):

        # Initialise 5 mins frequency data
        self.bars5m = bars5m
        # Initilaise lookback period for the calculation of technical indicators
        self.lkbk = lkbk
        # Intialise length of each trade
        self.trade_len = 0
        # Initialise 1 day frequency data
        self.bars1d = bars1d
        # Initialise 1 hour frequency data
        self.bars1h = bars1h
        # Initialise when game is over to update the state, position and calculate reward
        self.is_over = False
        # Intialise reward to store the value of reward
        self.reward = 0
        # Define pnl_sum to calculate the pnl when all episodes are complete.
        self.pnl_sum = 0
        # Supply a starting index which indicates a position in our price dataframe
        # and denotes the point at which the game starts
        self.init_idx = init_idx
        # Instantiate reward function
        self.reward_function = reward_function
        # When game is over, reset all state values
        self.reset()

# ---------------------------------------------------------------------------------------------

    def _update_position(self, action):
        '''This is where we update our position'''

        # If the action is zero or hold, do nothing
        if action == 0:
            pass

        elif action == 2:
            """---Enter a long or exit a short position---"""

            # Current position (long) same as the action (buy), do nothing
            if self.position == 1:
                pass

            # No current position, and action is buy, update the position to indicate buy
            elif self.position == 0:
                self.position = 1
                self.entry = self.curr_price
                self.start_idx = self.curr_idx

            # Current postion (short) is different than the action (buy), end the game
            elif self.position == -1:
                self.is_over = True

        elif action == 1:
            """---Enter a short or exit a long position---"""

            # Current position (short) same as the action (sell), do nothing
            if self.position == -1:
                pass

            # No current position, and action is sell, update the position to indicate sell
            elif self.position == 0:
                self.position = -1
                self.entry = self.curr_price
                self.start_idx = self.curr_idx

            # Current postion (long) is different than the action (sell), end the game
            elif self.position == 1:
                self.is_over = True

# ---------------------------------------------------------------------------------------------

    def _get_reward(self):
        """Here we calculate the reward when the game is finished.
        In this case, we use a exponential pnl reward.
        """
        if self.is_over:
            self.reward = self.reward_function(
                self.entry, self.curr_price, self.position)

# ---------------------------------------------------------------------------------------------

    def _get_last_N_timebars(self):
        '''This function gets the timebars for the 5 mins, 1 hour and 1 day resolution based on the lookback we've specified.'''

        '''Width of the 5m, 1hr, and 1d'''
        self.wdw5m = 9
        self.wdw1h = np.ceil(self.lkbk*15/24.)
        self.wdw1d = np.ceil(self.lkbk*15)

        '''Creating the candlesticks based on windows'''
        self.last5m = self.bars5m[self.curr_time -
                                  timedelta(self.wdw5m):self.curr_time].iloc[-self.lkbk:]
        self.last1h = self.bars1h[self.curr_time -
                                  timedelta(self.wdw1h):self.curr_time].iloc[-self.lkbk:]
        self.last1d = self.bars1d[self.curr_time -
                                  timedelta(self.wdw1d):self.curr_time].iloc[-self.lkbk:]

# ---------------------------------------------------------------------------------------------

    def _assemble_state(self):
        self._get_last_N_timebars()

        """Adding State Variables"""
        self.state = np.array([])

        """Adding candlesticks"""
        def get_normalised_bars_array(bars):
            bars = bars.iloc[-10:].values.flatten()
            bars = (bars-np.mean(bars))/np.std(bars)
            return bars

        self.state = np.append(self.state, get_normalised_bars_array(
            self.last5m[['open', 'high', 'low', 'close']]))
        self.state = np.append(
            self.state, get_normalised_bars_array(self.last1h))
        self.state = np.append(
            self.state, get_normalised_bars_array(self.last1d))

        """" Adding technical indicators"""
        def get_technical_indicators(bars):
            # Create an array to store the value of indicators
            tech_ind = np.array([])

            """Relative difference two moving averages"""
            sma1 = talib.SMA(bars['close'], self.lkbk-1)[-1]
            sma2 = talib.SMA(bars['close'], self.lkbk-8)[-1]
            tech_ind = np.append(tech_ind, (sma1-sma2)/sma2)

            """Relative Strength Index"""
            tech_ind = np.append(tech_ind, talib.RSI(
                bars['close'], self.lkbk-1)[-1])

            """Momentum"""
            tech_ind = np.append(tech_ind, talib.MOM(
                bars['close'], self.lkbk-1)[-1])

            """Balance of Power"""
            tech_ind = np.append(tech_ind, talib.BOP(bars['open'],
                                                     bars['high'],
                                                     bars['low'],
                                                     bars['close'])[-1])

            """Aroon Oscillator"""
            tech_ind = np.append(tech_ind, talib.AROONOSC(bars['high'],
                                                          bars['low'],
                                                          self.lkbk-3)[-1])
            return tech_ind

        self.state = np.append(
            self.state, get_technical_indicators(self.last5m))
        self.state = np.append(
            self.state, get_technical_indicators(self.last1h))
        self.state = np.append(
            self.state, get_technical_indicators(self.last1d))

        """Time of the day and day of the week"""
        tm_lst = list(map(float, str(self.curr_time.time()).split(':')[:2]))
        _time_of_day = (tm_lst[0]*60 + tm_lst[1])/(24*60)
        _day_of_week = self.curr_time.weekday()/6

        self.state = np.append(self.state, self._time_of_day)
        self.state = np.append(self.state, self._day_of_week)
        self.state = np.append(self.state, self.position)

# ---------------------------------------------------------------------------------------------

    def get_state(self):
        """This function returns the state of the system.
        Returns:
            self.state: the state including candlestick bars, indicators, time signatures and position.
        """
        # Assemble new state
        self._assemble_state()
        return np.array([self.state])

# ---------------------------------------------------------------------------------------------

    def act(self, action):
        """This is the point where the game interacts with the trading
        algo. It returns value of reward when game is over.
        """

        self.curr_time = self.bars5m.index[self.curr_idx]
        self.curr_price = self.bars5m['close'][self.curr_idx]

        self._update_position(action)

        # Unrealized or realized pnl. This is different from pnl in reward method which is only realized pnl.
        self.pnl = (-self.entry + self.curr_price)*self.position/self.entry

        self._get_reward()
        if self.is_over:
            self.trade_len = self.curr_idx - self.start_idx

        return self.is_over, self.reward

    def reset(self):
        """Resetting the system for each new trading game.
        Here, we also resample the bars for 1h and 1d.
        Ideally, we should do this on every update but this will take very long.
        """
        self.pnl = 0
        self.entry = 0
        self._time_of_day = 0
        self._day_of_week = 0
        self.curr_idx = self.init_idx
        self.t_in_secs = (
            self.bars5m.index[-1]-self.bars5m.index[0]).total_seconds()
        self.start_idx = self.curr_idx
        self.curr_time = self.bars5m.index[self.curr_idx]
        self._get_last_N_timebars()
        self.position = 0
        self.act(0)
        self.state = []
        self._assemble_state()

In [5]:
ohlcv_dict = {
    'open': 'first',
    'high': 'max',
    'low': 'min',
    'close': 'last',
    'volume': 'sum'
}

# Resample 5 mins data to 1 hour data
bars1h = bars5m.resample(
    '1H', label='left', closed='right').agg(ohlcv_dict).dropna()

# Reample 5 mins data to daily data
bars1d = bars5m.resample(
    '1D', label='left', closed='right').agg(ohlcv_dict).dropna()

# Create Game class environment
env = Game(bars5m, bars1d, bars1h, reward_exponential_pnl,
           lkbk=10,  init_idx=3000)

### Analyse output

In [6]:
env.act(1)

(False, 0)

We passed action = 1 or sell as an input to the act() method. It returns False, which is a flag for the game over or not. Since the game is not over, it returns no reward.

In [7]:
env.act(2)

(True, 1.0020040053400068)

Now, we passed action = 2 or buy, which is opposite of the previous action sell. Therefore, the game is over, and you get the reward. 

The neural networks or the agent suggests the actions. Here, for the illustration purpose, we passed the action and observed the output of the Game class. 

You can tweak the code and change the parameters of the Game class such as lookback period or can change the reward function other than exponential pnl.

In the coming section, you will learn to model the agent, which is another essential part of the reinforcement learning 
<br><br>