# Notebook Instructions

1. If you are new to Jupyter notebooks, please go through this introductory manual <a href='https://quantra.quantinsti.com/quantra-notebook' target="_blank">here</a>.
1. Any changes made in this notebook would be lost after you close the browser window. **You can download the notebook to save your work on your PC.**
1. Before running this notebook on your local PC:<br>
i.  You need to set up a Python environment and the relevant packages on your local PC. To do so, go through the section on "**Run Codes Locally on Your Machine**" in the course.<br>
ii. You need to **download the zip file available in the last unit** of this course. The zip file contains the data files and/or python modules that might be required to run this notebook.

# Creating the Target Variable Using Strategy Returns

In the previous unit, we created a list of trading strategies which will be used to create the target variables. In this notebook, we will calculate the 3-day returns of the call, put and underlying asset for each trading day. These values will be used to calculate the 3-day returns of all strategies for each trading day. The target variable is the strategy that generated maximum returns on each trading day.

The notebook is structured as follows:
1. [Import the Data](#read)
2. [3-Day Returns of Contracts](#3returns)
3. [3-Day Returns of Strategies](#returns)
4. [Target Variable](#target)

## Import Libraries

In [1]:
# For Data manipulation
import pandas as pd
import numpy as np
import itertools

# Ignore warnings
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)
pd.options.mode.chained_assignment = None

<a id='read'></a>
## Import the Data

Import the files `strategies_combinations_mlo.csv` as `strategies` and the `underlying_data_strategy_design_mlo.csv` as `underlying_data` using the `read_csv` method of `pandas`.
These CSV files are available in the zip file of the unit 'Python Codes and Data' in the 'Course Summary' section.

In [2]:
# Import strategies created in the previous notebook
strategies = pd.read_csv('../data_modules/strategies_combinations_mlo.csv')

# Import the underlying data i.e. s&p 500 index data
underlying_data = pd.read_csv(
    '../data_modules/underlying_data_strategy_design_mlo.csv')

<a id='3returns'></a>
## 3-Day Returns of Contracts

To calculate the 3-day returns of the call, put and the underlying asset, as a first step, we need to find the capital required to open buy/sell position in the call, put and the underlying asset. 


* For buying an option, the capital required is a multiple of the premium and the lot size.

 <img src="https://d2a032ejo53cab.cloudfront.net/Glossary/1qWgNIVS/margin-buy.png" width="400">
 
 
 * For selling an option, the capital required is the margin decided by the broker and the exchange. The total margin required to sell an option is the sum of SPAN and exposure margins. The option sellers have to maintain the total margin in their trading account to cover potential losses.

 <img src="https://d2a032ejo53cab.cloudfront.net/Glossary/aBZVHHAY/margin.png" width="400">
 
 
* We assume the span margin to be 10% of the notional value (notional value = strike price * lot size)
* The exposure margin is 6% of the notional value.

* The span margin depends on the volatility of underlying and varies over the day as per volatility. The margin will increase substantially during highly volatile phases of the market. So, to account for this, we will add 3% of the notional value to the total margin. 

* The SPAN® (Standard Portfolio Analysis of Risk) margin is calculated by the exchanges using various parameters such as interest rates, strike price, change in price and volatility of the underlying etc. The SPAN margin calculates the potential risk of holding the position for one day.
* The exposure margin is charged by the broker to cover the risk in case of unusual movements in the market

In [3]:
# lot size of 100
lot_size = 100

# notional value calculation: notional value = strike price * lot size
notional_value = underlying_data['atm_strike_price']*lot_size

# Margin for options selling is a sum of span margin, exposure margin
# and an additional 3% of notional value to account for highly volatile phases of the market
# Store the margin required for options selling in the column 'margin_option_selling'
underlying_data.loc[:, 'margin_option_selling'] = \
    (notional_value*0.1) + (notional_value*0.06) + (notional_value*0.03)

# Premium received while shorting call
underlying_data.loc[:,
                    'short_call_prem_received'] = underlying_data['call_last']*lot_size

# Premium received while shorting put
underlying_data['short_put_prem_received'] = underlying_data['put_last']*lot_size

# Calculating returns of long position in the atm call held for 3 days
underlying_data['long_call_returns'] = \
    lot_size * 100 * (underlying_data['call_last'].shift(-3)-underlying_data['call_last'])\
    / (underlying_data['call_last']*lot_size)

# Calculating returns of short position in the atm call held for 3 days
underlying_data['short_call_returns'] = \
    lot_size * 100*(underlying_data['call_last']-underlying_data['call_last'].shift(-3))\
    / (underlying_data['margin_option_selling']-underlying_data['short_call_prem_received'])

# Calculating returns of long position in the atm put held for 3 days
underlying_data['long_put_returns'] = \
    lot_size * 100 * (underlying_data['put_last'].shift(-3)-underlying_data['put_last'])\
    / (underlying_data['put_last']*lot_size)

# Calculating returns of short position in the atm put held for 3 days
underlying_data['short_put_returns'] = \
    lot_size * 100*(underlying_data['put_last']-underlying_data['put_last'].shift(-3))\
    / (underlying_data['margin_option_selling']-underlying_data['short_put_prem_received'])

# Calculating returns of long position in the underlying asset held for 3 days
underlying_data['long_underlying_returns'] = \
    lot_size*100 * (underlying_data['close'].shift(-3)-underlying_data['close'])\
    / (underlying_data['close']*lot_size)

# Calculating returns of short position in the underlying asset held for 3 days
underlying_data['short_underlying_returns'] = - \
    1*(underlying_data['long_underlying_returns'])

# Returns will be 0 if positions were not taken on call or put or underlying
underlying_data['returns_no_position'] = 0

# Drop the nan values
underlying_data = underlying_data.dropna(
).reset_index().drop(['index'], axis=1)

<a id='returns'></a>
## 3-Day Returns of Strategies
The 3-day returns of buy and sell positions of the contracts call, put and underlying are stored in the dataframe `underlying_data`. For example, the 3-day returns of long call are stored in the column `long_call_returns`.

Using these values, let's calculate the 3-day returns of the strategies. Let's see how the strategy returns are calculated for the 'strategy_1'.


In [4]:
strategies.iloc[0]

Unnamed: 0             0
call                  -1
put                   -1
underlying             0
strategy      strategy_1
Name: 0, dtype: object

The strategy_1 involves selling call, put and taking no position in the underlying. So, to find the returns of strategy_1, take the sum of the columns `short_call_returns`,  `short_put_returns` and `returns_no_position`. 

In [5]:
underlying_data['short_call_returns'] + \
    underlying_data['short_put_returns']+underlying_data['returns_no_position']

0       3.222872
1       6.235584
2       5.218943
3       0.533198
4       2.435605
          ...   
2540    3.014764
2541    8.305032
2542    4.365169
2543    7.283039
2544    8.644605
Length: 2545, dtype: float64

A similar approach can be taken to calculate the 3-day returns of all strategies for each trading day. In the above example, we have selected the columns that represent the 3-day returns of the contracts as per the positions. The following functions help you to select the appropriate columns from the dataframe `underlying_data` to calculate the strategy returns

In [6]:
# Functions to select the columns as per positions in the contracts
def call_returns(call):
    if call == -1:
        return 'short_call_returns'
    elif call == 1:
        return 'long_call_returns'
    else:
        return 'returns_no_position'


def put_returns(put):
    if put == -1:
        return 'short_put_returns'
    elif put == 1:
        return 'long_put_returns'
    else:
        return 'returns_no_position'


def underlying_returns(underlying):
    if underlying == -1:
        return 'short_underlying_returns'
    elif underlying == 1:
        return 'long_underlying_returns'
    else:
        return 'returns_no_position'

For each trading day, calculate the returns of all strategies available in the strategies list and store the returns in the dataframe `underlying_data` in the column with the name of the strategy.

In [7]:
# Calculating the returns of every strategy and creating columns to save them
for str_ in strategies.strategy.values:
    print(str_)
    underlying_data[str_] = np.nan
    for i in range(0, len(underlying_data)):
        underlying_data[str_][i] = underlying_data[call_returns(strategies[strategies.strategy == str_].call.values)][i]+underlying_data[put_returns(
            strategies[strategies.strategy == str_].put.values)][i]+underlying_data[underlying_returns(strategies[strategies.strategy == str_].underlying.values)][i]

strategy_1
strategy_3
strategy_4
strategy_5
strategy_6
strategy_7
strategy_8
strategy_9
strategy_10
strategy_11
strategy_15
strategy_16
strategy_17
strategy_18
strategy_19
strategy_20
strategy_21
strategy_22
strategy_23
strategy_25


In [8]:
underlying_data.index = underlying_data.quote_date
underlying_data.filter(like='stra').T

quote_date,2010-03-02,2010-03-03,2010-03-04,2010-03-10,2010-03-11,2010-03-12,2010-03-15,2010-03-17,2010-03-18,2010-03-19,...,2022-09-13,2022-09-14,2022-09-15,2022-09-16,2022-09-19,2022-09-20,2022-09-21,2022-09-22,2022-09-23,2022-09-26
strategy_1,3.222872,6.235584,5.218943,0.533198,2.435605,3.915064,3.968763,2.758565,2.76718,1.586833,...,3.619966,3.525634,2.27446,-2.467098,3.065556,3.014764,8.305032,4.365169,7.283039,8.644605
strategy_3,-0.897655,0.380695,0.100957,-0.549436,-1.33967,0.157892,0.499949,1.451538,0.666562,-1.9376,...,3.398776,1.368453,2.120211,-1.553609,3.95251,6.945016,9.630074,6.364791,1.05513,5.338666
strategy_4,1.543522,3.191763,2.507079,-0.121714,0.048733,1.535292,1.31611,1.417247,1.381939,-1.263409,...,1.88938,0.199675,0.955994,-3.706798,0.313948,2.725542,6.070908,3.419069,1.753977,4.940037
strategy_5,3.984699,6.002831,4.9132,0.306008,1.437136,2.912692,2.132271,1.382956,2.097317,-0.589217,...,0.379984,-0.969104,-0.208223,-5.859988,-3.324613,-1.493931,2.511742,0.473347,2.452825,4.541407
strategy_6,-14.690758,-24.619305,-23.535406,-7.049436,-28.925877,-28.413537,-27.842297,-20.770684,-23.500105,-45.967451,...,-9.856472,-21.798391,-9.192007,-12.516175,-21.497006,4.146182,-12.291848,-5.212203,-51.629333,-40.430565
strategy_7,-12.249581,-21.808237,-21.129285,-6.621714,-27.537474,-27.036136,-27.026136,-20.804975,-22.784727,-45.29326,...,-11.365868,-22.967169,-10.356223,-14.669365,-25.135568,-0.073291,-15.851014,-8.157925,-50.930486,-40.829194
strategy_8,-9.808404,-18.997169,-18.723164,-6.193992,-26.149071,-25.658736,-26.209975,-20.839266,-22.069349,-44.619068,...,-12.875264,-24.135948,-11.52044,-16.822555,-28.774129,-4.292765,-19.41018,-11.103648,-50.231638,-41.227824
strategy_9,-0.761828,0.232752,0.305743,0.22719,0.998469,1.002372,1.836492,1.375608,0.669862,2.17605,...,3.239982,4.494738,2.482683,3.39289,6.390169,4.508696,5.79329,3.891822,4.830214,4.103198
strategy_10,1.679349,3.043821,2.711864,0.654912,2.386872,2.379772,2.652653,1.341317,1.38524,2.850242,...,1.730586,3.325959,1.318466,1.239701,2.751608,0.289222,2.234124,0.9461,5.529062,3.704568
strategy_11,4.120526,5.854889,5.117986,1.082634,3.775276,3.757172,3.468814,1.307027,2.100618,3.524433,...,0.22119,2.157181,0.154249,-0.913489,-0.886954,-3.930252,-1.325042,-1.999623,6.22791,3.305939


<a id='target'></a>

## Target Variable
As you can see, the dataframe `underlying_data` has columns with strategy names that store the 3-day returns of the strategy for each trading day. The target variable is the name of the strategy that generated maximum strategy returns. 

Create the columns `max_returns_strategy`  to store the name of the strategy with maximum returns out of all strategies on each trading day.

In [9]:
# Calculating the strategy with maximum returns
underlying_data['max_returns_strategy'] = underlying_data.filter(
    like='strategy_').idxmax(axis=1)
underlying_data['max_returns_strategy']

quote_date
2010-03-02    strategy_11
2010-03-03     strategy_1
2010-03-04     strategy_1
2010-03-10    strategy_20
2010-03-11    strategy_11
                 ...     
2022-09-20     strategy_3
2022-09-21     strategy_3
2022-09-22     strategy_3
2022-09-23     strategy_1
2022-09-26     strategy_1
Name: max_returns_strategy, Length: 2545, dtype: object

The column `max_returns_strategy` is the target variable of the machine learning model that predicts the options strategies to deploy.

## Conclusion

In this notebook, we have created a list of trading strategies and calculated their 3-day returns to find the target variable of the machine learning model that predicts the options strategies to deploy. The 3-day holding period is taken as an example and can be changed as per your requirements. 

In the next unit, you will learn to create the input features for the ML model.
<br><br>