## Notebook Instructions

1. If you are new to Jupyter notebooks, please go through this introductory manual <a href='https://quantra.quantinsti.com/quantra-notebook' target="_blank">here</a>.
1. Any changes made in this notebook would be lost after you close the browser window. **You can download the notebook to save your work on your PC.**
1. Before running this notebook on your local PC:<br>
i.  You need to set up a Python environment and the relevant packages on your local PC. To do so, go through the section on "**Run Codes Locally on Your Machine**" in the course.<br>
ii. You need to **download the zip file available in the last unit** of this course. The zip file contains the data files and/or python modules that might be required to run this notebook.

# Creating the Target Variable - Strategy Design

In the previous units, we discussed how the target variable is created for the machine learning model to predict the options strategy to deploy. In this and the next notebook, we will design a list of options trading strategies, create the options and underlying datasets to calculate the returns of the strategies,  and finally create the target variable.

The notebook is structured as follows:
1. [Import the Data](#read)
2. [Strategy Design](#design)
3. [Conclusion](#conclusion)


## Import Libraries

In [1]:
# For Data manipulation
import pandas as pd
import numpy as np
import itertools

# Ignore warnings
import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

<a id='read'></a>
## Import the Data

Import the files `spx_eom_options_2010_2022.bz2` as `options_data` and `sp500_index_2010_2022.csv` as `underlying_data` using the `read_pickle` and `read_csv` method of `pandas`.
These CSV files are available in the zip file of the unit 'Python Codes and Data' in the 'Course Summary' section.

In [2]:
# Import EOM SPX options data from 2010-2022
options_data = pd.read_pickle(
    "../data_modules/spx_eom_expiry_options_2010_2022.bz2")

# # Set index for the data
options_data.index.name = 'index'

# Import the underlying data i.e. S&P 500 index data
underlying_data = pd.read_csv(
    '../data_modules/sp500_index_2010_2022.csv', index_col='Date')[['Open', 'High', 'Low', 'Close']]

# Convert index dtype to datetime
underlying_data.index = pd.to_datetime(underlying_data.index)

In the `underlying_data`, filter the days for which the options data is available in the dataframe `options_data` and create the `atm_strike_price` column to store the strike price. 

The atm strike price is the strike price of the contract that is closest to the close price of the underlying. So, to find the `atm_strike_price` values, find the strike price of the contract which has the minimum `' [STRIKE_DISTANCE_PCT]'` value.

In [3]:
# Selecting underlying data for index range of options data
underlying_data = underlying_data[underlying_data.index.isin(
    options_data.index)].dropna()

# Create the 'atm_strike_price' column with NaN values
underlying_data['atm_strike_price'] = np.nan

# Calculate atm strike price
for i in range(0, len(underlying_data)):
    trading_day_data = options_data.loc[underlying_data.index[i]]
    underlying_data['atm_strike_price'][i] = trading_day_data[trading_day_data[' [STRIKE_DISTANCE_PCT]']
                                                              == trading_day_data[' [STRIKE_DISTANCE_PCT]'].min()][' [STRIKE]'][0]

The `underlying_data` dataframe has OHLC values of the S&P 500 index along with the atm strike price. The `options_data` dataframe has option chain data of call and put options of the S&P 500 index. Merge these two dataframes and create a master data frame that has the trading day as an index and OHLC values of the underlying, atm strike price and option chain data of call and put options of the S&P 500 index. This dataframe is named `underlying_data`.

In [4]:
# Change the datatype of the column ' [QUOTE_DATE]' from 'object' to 'datetime'
options_data[' [QUOTE_DATE]'] = pd.to_datetime(options_data.index)

# Merge the dataframes 'underlying_data' and 'options_data'
underlying_data = pd.merge(underlying_data, options_data, left_on=[
                           'Date', 'atm_strike_price'], right_on=[' [QUOTE_DATE]', ' [STRIKE]'])

# Improving the column names by changing them to lower case and adding 'call' and 'put' to the names of the columns
underlying_data.columns = underlying_data.columns.str.replace('[', '').str.replace(
    ']', '').str.strip().str.lower().str.replace('c_', 'call_').str.replace('p_', 'put_')

# Removing rows where either the call and put strike prices are 0
underlying_data = underlying_data[(
    underlying_data.call_last != 0) & (underlying_data.put_last != 0)]
underlying_data.head()

Unnamed: 0,open,high,low,close,atm_strike_price,strike,strike_distance_pct,call_last,underlying_last,put_last,...,call_theta,call_rho,call_iv,put_delta,put_gamma,put_vega,put_theta,put_rho,put_iv,quote_date
1,1117.01001,1123.459961,1116.51001,1118.310059,1125.0,1125.0,0.006,16.15,1117.98,23.2,...,-0.30847,0.37731,0.1561,-0.5579,0.0078,1.26269,-0.36309,-0.53265,0.15685,2010-03-02
2,1119.359985,1125.640015,1116.579956,1118.790039,1125.0,1125.0,0.006,19.5,1118.49,23.2,...,-0.32176,0.37714,0.15232,-0.54783,0.00797,1.24798,-0.35476,-0.50447,0.15785,2010-03-03
3,1119.119995,1123.72998,1116.660034,1122.969971,1125.0,1125.0,0.002,19.5,1122.68,22.0,...,-0.3308,0.38313,0.15806,-0.52029,0.0081,1.23698,-0.3769,-0.46392,0.1585,2010-03-04
7,1140.219971,1148.26001,1140.089966,1145.609985,1150.0,1150.0,0.004,13.1,1145.36,20.0,...,-0.37266,0.29926,0.14951,-0.54093,0.00929,1.11512,-0.40092,-0.38429,0.15113,2010-03-10
8,1143.959961,1150.23999,1138.98999,1150.23999,1150.0,1150.0,0.0,13.3,1149.96,17.4,...,-0.38001,0.32599,0.14328,-0.4928,0.00942,1.09985,-0.41159,-0.32547,0.15508,2010-03-11


<a id='design'></a> 
## Strategy Design
The strategy list is the combinations of all possible positions in atm call, atm put and underlying asset. Position `1` indicates 'buying', position `-1` indicates 'selling' and `0` indicates 'no position' in the respective contract. 

Using the `itertools`  module of python, create all possible combinations of the three positions `1`, `-1` and `0` that can be taken in three contracts call, put and the underlying asset. Store the combinations of the positions in the `strategies` dataframe in the columns `call`, `put` and `underlying`. Create the `strategy` column that stores the name of the strategy. 

The name of the strategy is in the format 'strategy_number' where the number indicates the index of the combination.

In [5]:
# Creating combinations of positions 1, -1 and 0
positions = [-1, 0, 1]
comb = list(itertools.product(positions, repeat=3))

# Create the 'strategies' dataframe
strategies = pd.DataFrame(comb, columns=['call', 'put', 'underlying'])

# Create the 'strategy' column
strategies['strategy'] = 'strategy_' + strategies.index.astype(str)

strategies

Unnamed: 0,call,put,underlying,strategy
0,-1,-1,-1,strategy_0
1,-1,-1,0,strategy_1
2,-1,-1,1,strategy_2
3,-1,0,-1,strategy_3
4,-1,0,0,strategy_4
5,-1,0,1,strategy_5
6,-1,1,-1,strategy_6
7,-1,1,0,strategy_7
8,-1,1,1,strategy_8
9,0,-1,-1,strategy_9


There are 27 strategies created as per the combinations of three positions that can be taken in the three contracts call, put and the underlying asset. However, before we proceed, we must filter the strategies.

Since these are options strategies, there should at least be one position taken in the atm call or atm put contract. So, we can remove the strategy combinations where there are no positions in the atm call and atm put.

In [6]:
# Since we definitely want to take at least one position in call or put, remove rows when call and put = 0
strategies = strategies[(strategies.call != 0) | (strategies.put != 0)]

# Reset the index of the 'strategies' dataframe
strategies.index = range(0, len(strategies))

len(strategies)

24

The number of strategies decreased from 27 to 24.

Since the same position in the call and put will be taken to trade volatility whereas the position in the underlying asset is taken to trade directionality. We can’t have the same position in the call and put option when we have an open position in the underlying.

In [7]:
# let's make sure the positions in the call, put are not the same when there is an open position in the underlying
strategies = strategies[~(
    (strategies.call == strategies.put) & (strategies.underlying != 0))]
strategies.index = range(0, len(strategies))
len(strategies)

20

<a id='conclusion'></a> 
## Conclusion
Now we are left with 20 strategies which can be used to create the target variable. The target variable is the name of the strategy that has the highest returns in the holding period. In this exercise, we are considering a 3-day holding period. 

In the next notebook, before we proceed to calculate the strategy returns, we will calculate the 3-day returns of the call, put and underlying asset for each trading day. These values will be used to calculate the 3-day returns of all strategies for each trading day.  <br><br>