# Notebook Instructions

1. If you are new to Jupyter notebooks, please go through this introductory manual <a href='https://quantra.quantinsti.com/quantra-notebook' target="_blank">here</a>.
1. Any changes made in this notebook would be lost after you close the browser window. **You can download the notebook to save your work on your PC.**
1. Before running this notebook on your local PC:<br>
i.  You need to set up a Python environment and the relevant packages on your local PC. To do so, go through the section on "**Run Codes Locally on Your Machine**" in the course.<br>
ii. You need to **download the zip file available in the last unit** of this course. The zip file contains the data files and/or python modules that might be required to run this notebook.

## Creating the Input Features

In the previous unit, you have learnt to select the input features for the machine learning model to predict the options strategy to deploy. In this notebook, we will create the input features of the underlying asset and options market.

The notebook is structured as follows:
1. [Import the Data](#read)
2. [Features Related to the Underlying Asset](#features_underlying)
3. [Features Related to the Options Greeks](#greeks)
4. [Features Related to the Options Contract](#contract)
5. [Target Variable](#target)

## Import Libraries


In [1]:
# For data manipulation
import pandas as pd
import numpy as np

# For technical indicators
import talib

<a id='read'></a>
## Import the Data

Import the file `underlying_data_options_target_variable_2010_2022.csv` as `underlying_data` using the `read_csv` method of `pandas`. This file contains the underlying asset data, option chains data along with the target variable.

In [2]:
# Import the underlying data with option chain data and the target variable
underlying_data = pd.read_csv(
    '../data_modules/underlying_data_options_target_variable_2010_2022.csv', index_col='quote_date')
underlying_data.head()

Unnamed: 0_level_0,Unnamed: 0,open,high,low,close,atm_strike_price,strike,strike_distance_pct,call_last,underlying_last,...,strategy_16,strategy_17,strategy_18,strategy_19,strategy_20,strategy_21,strategy_22,strategy_23,strategy_25,max_returns_strategy
quote_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2010-03-02,1,1117.01001,1123.459961,1116.51001,1118.310059,1125.0,1125.0,0.006,16.15,1117.98,...,-13.793103,-11.351927,-19.647277,-17.2061,-14.764923,-21.326626,-18.885449,-16.444272,-32.678552,strategy_11
2010-03-03,2,1119.359985,1125.640015,1116.579956,1118.790039,1125.0,1125.0,0.006,19.5,1118.49,...,-25.0,-22.188932,-31.562119,-28.751051,-25.939983,-34.60594,-31.794872,-28.983804,-56.794872,strategy_1
2010-03-04,3,1119.119995,1123.72998,1116.660034,1122.969971,1125.0,1125.0,0.002,19.5,1122.68,...,-23.636364,-21.230242,-24.668616,-22.262495,-19.856373,-27.38048,-24.974359,-22.568238,-48.610723,strategy_1
2010-03-10,7,1140.219971,1148.26001,1140.089966,1145.609985,1150.0,1150.0,0.004,13.1,1145.36,...,-6.5,-6.072278,2.135587,2.563309,2.991031,1.480675,1.908397,2.336119,-4.591603,strategy_20
2010-03-11,8,1143.959961,1150.23999,1138.98999,1150.23999,1150.0,1150.0,0.0,13.3,1149.96,...,-27.586207,-26.197804,0.246589,1.634993,3.023396,-2.140283,-0.75188,0.636524,-28.338087,strategy_11


The input features for the ML model to predict the options strategy to deploy has two types of features. 
1. Features related to the underlying asset.
2. Features related to the options market.

<a id='features_underlying'></a>
## Features Related to the Underlying Asset

Historical returns of the underlying asset across multiple periods, technical indicators of momentum and volatility are used as input features.

1. **Historical returns:** 1,5,10,22,44 and 88 days returns of the underlying asset.
2. **Momentum:** Relative Strength Index (RSI) of the close price of the underlying asset.
3. **Volatility:** Average True Range(ATR), upper, lower and middle Bollinger Bands of the underlying asset.


In [3]:
# Historical returns
intervals = [1, 5, 10, 22, 44, 88]

# Past returns of multiple time periods
for t in intervals:
    underlying_data[f'f_ret_{t}'] = underlying_data.close.pct_change(t)

# RSI of the underlying
underlying_data['f_rsi'] = talib.RSI(underlying_data.close)

# ATR of the underlying
underlying_data['f_natr'] = talib.NATR(
    underlying_data.high, underlying_data.low, underlying_data.close)

# Bollinger bands of the underlying
upper, middle, lower = talib.BBANDS(underlying_data.close)
underlying_data['f_norm_upper'] = upper/underlying_data.close
underlying_data['f_norm_lower'] = lower/underlying_data.close
underlying_data['f_norm_middle'] = middle/underlying_data.close

# Features related to the underlying asset
underlying_data.filter(like='f_').dropna().head()

Unnamed: 0_level_0,f_ret_1,f_ret_5,f_ret_10,f_ret_22,f_ret_44,f_ret_88,f_rsi,f_natr,f_norm_upper,f_norm_lower,f_norm_middle
quote_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2011-03-14,-0.006049,-0.010488,0.029061,0.045139,0.168393,0.15924,62.728368,1.205084,1.026167,0.99101,1.008588
2011-03-15,-0.0112,-0.030223,0.019072,0.033383,0.142588,0.145765,56.679002,1.328214,1.033313,0.994247,1.01378
2011-03-16,-0.019495,-0.047833,-0.000604,0.012315,0.121113,0.119246,48.084267,1.44438,1.050383,0.997396,1.023889
2011-03-17,0.013398,-0.016516,-0.026535,0.03116,0.132125,0.111827,53.230963,1.446851,1.033278,0.980709,1.006994
2011-03-18,0.00431,-0.019221,-0.038889,0.029239,0.137419,0.112124,54.80402,1.422389,1.018753,0.978751,0.998752


<a id='greeks'></a>
## Features Related to the Options Greeks

Features related to options greeks such as delta, gamma, vega, theta, and rho are used to measure different factors that might affect the price of an options contract. These features are already present in the dataframe `underlying_data`.

In [4]:
# Features related to options greeks
underlying_data[['call_delta', 'call_gamma', 'call_vega', 'call_theta', 'call_rho',
                 'put_delta', 'put_gamma', 'put_vega', 'put_theta', 'put_rho', ]].head()

Unnamed: 0_level_0,call_delta,call_gamma,call_vega,call_theta,call_rho,put_delta,put_gamma,put_vega,put_theta,put_rho
quote_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2010-03-02,0.44153,0.00792,1.26223,-0.30847,0.37731,-0.5579,0.0078,1.26269,-0.36309,-0.53265
2010-03-03,0.44955,0.00824,1.24661,-0.32176,0.37714,-0.54783,0.00797,1.24798,-0.35476,-0.50447
2010-03-04,0.47999,0.00819,1.23668,-0.3308,0.38313,-0.52029,0.0081,1.23698,-0.3769,-0.46392
2010-03-10,0.45857,0.00943,1.1152,-0.37266,0.29926,-0.54093,0.00929,1.11512,-0.40092,-0.38429
2010-03-11,0.5064,0.01009,1.10025,-0.38001,0.32599,-0.4928,0.00942,1.09985,-0.41159,-0.32547


<a id='contract'></a>
## Features Related to the Options Contract
In addition to the options greeks, other metrics of options contracts such as days to expiration, last traded price of the at the money call and put, implied volatility of the at the money call and put, last traded price of the underlying asset and strike price of at the money contract are used as input features.



In [5]:
# Print the features related to the options contract
underlying_data[['dte', 'call_last', 'put_last',
                 'close', 'call_iv', 'put_iv',  'atm_strike_price']].head()

Unnamed: 0_level_0,dte,call_last,put_last,close,call_iv,put_iv,atm_strike_price
quote_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2010-03-02,28.96,16.15,23.2,1118.310059,0.1561,0.15685,1125.0
2010-03-03,27.96,19.5,23.2,1118.790039,0.15232,0.15785,1125.0
2010-03-04,26.96,19.5,22.0,1122.969971,0.15806,0.1585,1125.0
2010-03-10,20.96,13.1,20.0,1145.609985,0.14951,0.15113,1150.0
2010-03-11,19.96,13.3,17.4,1150.23999,0.14328,0.15508,1150.0


Create a final features list combining all three types of features and store it in the list `features`.

In [6]:
# Final list of features
features = ['call_last', 'put_last', 'close', 'atm_strike_price',
            'dte', 'call_delta', 'call_gamma', 'call_vega',
            'call_theta', 'call_rho', 'call_iv', 'put_delta', 'put_gamma',
            'put_vega', 'put_theta', 'put_rho', 'put_iv', 'f_ret_1', 'f_ret_5',
            'f_ret_10', 'f_ret_22', 'f_ret_44', 'f_ret_88', 'f_natr', 'f_rsi',
            'f_norm_upper', 'f_norm_lower', 'f_norm_middle']

# Values of input features
underlying_data[features].dropna().head()

Unnamed: 0_level_0,call_last,put_last,close,atm_strike_price,dte,call_delta,call_gamma,call_vega,call_theta,call_rho,...,f_ret_5,f_ret_10,f_ret_22,f_ret_44,f_ret_88,f_natr,f_rsi,f_norm_upper,f_norm_lower,f_norm_middle
quote_date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2011-03-14,19.1,26.5,1296.390015,1295.0,17.0,0.50888,0.00757,1.14638,-0.56401,0.29972,...,-0.010488,0.029061,0.045139,0.168393,0.15924,1.205084,62.728368,1.026167,0.99101,1.008588
2011-03-15,25.6,22.3,1281.869995,1280.0,16.0,0.50798,0.00678,1.10044,-0.65746,0.27696,...,-0.030223,0.019072,0.033383,0.142588,0.145765,1.328214,56.679002,1.033313,0.994247,1.01378
2011-03-16,32.35,25.5,1256.880005,1255.0,15.0,0.52145,0.00581,1.04815,-0.86532,0.27448,...,-0.047833,-0.000604,0.012315,0.121113,0.119246,1.44438,48.084267,1.050383,0.997396,1.023889
2011-03-17,23.5,24.8,1273.719971,1275.0,14.0,0.49193,0.00664,1.02815,-0.78472,0.23935,...,-0.016516,-0.026535,0.03116,0.132125,0.111827,1.446851,53.230963,1.033278,0.980709,1.006994
2011-03-18,22.9,21.9,1279.209961,1280.0,13.0,0.49904,0.00768,0.99892,-0.74158,0.23038,...,-0.019221,-0.038889,0.029239,0.137419,0.112124,1.422389,54.80402,1.018753,0.978751,0.998752


## Conclusion

In this notebook, you learnt to calculate the input features for the ML model to predict the options strategies to deploy. In the following units, you will learn to select the best-suited ML model to predict the options strategies to deploy.  <br><br>