# Notebook Instructions

1. If you are new to Jupyter notebooks, please go through this introductory manual <a href='https://quantra.quantinsti.com/quantra-notebook' target="_blank">here</a>.
1. Any changes made in this notebook would be lost after you close the browser window. **You can download the notebook to save your work on your PC.**
1. Before running this notebook on your local PC:<br>
i.  You need to set up a Python environment and the relevant packages on your local PC. To do so, go through the section on "**Run Codes Locally on Your Machine**" in the course.<br>
ii. You need to **download the zip file available in the last unit** of this course. The zip file contains the data files and/or python modules that might be required to run this notebook.

# Options Data Storing
In the previous video, you've learned how to source options data. This data is downloaded as a zip file and each zip file contains comma-separated files (with a .txt extension) where the options data is saved. To use this data we need to extract the zip file for all the months and then convert each file from .txt to .csv, therefore, in this notebook, you'll learn how to carry out these activities in just a few steps.

The steps are as follows:
1. [Import the Libraries](#import)
2. [Extract the Data](#ext)
3. [Conclusion](#conclusion)

<a id='import'></a>
## Import the Libraries
The first step is to import the necessary libraries. Here we are importing `py7zr` which will be used for extracting or decompressing the zip files. We will also use the `os` library to interact with your operating system. Additionally, the `pandas` library is used for data manipulation and the warning library is used for ignoring non-critical warnings.

<i> <span style="color:#FFFFFF; background:#00C001"> The following cells will not run in the browser. Download this notebook and convert the cell to "Code" type. You will also have to download the necessary data (as explained in the previous unit) and change the path/folder name.</i>

<a id='ext'></a>
## Extracting the Data
The files are saved in a folder named `spx_options_raw_data`. We will use the `listdir` method of `os` to list all the files present in that folder and then create a master dataframe named `options_data` for storing all the necessary data.

We will also read only those columns that are relevant to the strategy used in this course. This will be used later on at the time of storing all the data in a single dataframe.

Next, we will create a loop for extracting the zip files with the `extractall()` method of `py7zr`. The data that we currently have is too large and may take longer to process, so we will store only the necessary data in a dataframe named `mothly_data`. We are selecting only the end-of-month (EOM) expiry contracts. We will do this by selecting the expiration dates that are the same as the quote date on the last row of the monthly data file. In other words, we will locate the expiry dates with the same dates as the last quote date of that month.

Finally, we will move the selected data to the master dataframe named `opions_data` and delete the extracted files using the `remove` method of `os`.

Let's take a look at what the master dataframe `options_data` looks like.

In [4]:
# Output when you display the dataframe

Unnamed: 0,[QUOTE_DATE],[STRIKE],[STRIKE_DISTANCE_PCT],[C_LAST],[UNDERLYING_LAST],[P_LAST],[EXPIRE_DATE],[DTE],[C_DELTA],[C_GAMMA],[C_VEGA],[C_THETA],[C_RHO],[C_IV],[P_DELTA],[P_GAMMA],[P_VEGA],[P_THETA],[P_RHO],[P_IV]
238,2012-03-01,700.0,0.491,0.0,1374.71,0.05,2012-03-30,28.96,1.0,0.0,0.0,0.0,0.0,,0.0,0.0,0.00518,-0.00513,-0.00059,0.713030
239,2012-03-01,800.0,0.418,0.0,1374.71,0.1,2012-03-30,28.96,1.0,0.0,0.0,0.0,0.0,,-0.00006,0.00006,0.00634,-0.0051,-0.00039,0.577930
240,2012-03-01,825.0,0.400,0.0,1374.71,0.1,2012-03-30,28.96,1.0,0.0,0.0,0.0,0.0,,-0.00009,0.0,0.00597,-0.00565,-0.00092,0.547670
241,2012-03-01,850.0,0.382,0.0,1374.71,0.1,2012-03-30,28.96,1.0,0.0,0.0,0.0,0.0,,-0.00054,-0.00003,0.01088,-0.00926,-0.00071,0.545950
242,2012-03-01,875.0,0.363,0.0,1374.71,0.05,2012-03-30,28.96,1.0,0.0,0.0,0.0,0.0,,-0.00132,-0.00002,0.01562,-0.01293,-0.00137,0.532740
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
37247,2013-12-31,1975.0,0.069,0.0,1847.76,0.0,2013-12-31,0.00,0.00159,0.00011,0.00614,-0.02458,0.00017,0.440070,-1.0,0.0,0.0,0.0,0.0,
37248,2013-12-31,2000.0,0.082,0.0,1847.76,0.0,2013-12-31,0.00,0.00191,0.0001,0.00521,-0.02554,0.00008,0.515870,-0.91448,0.00116,0.1469,-2.95973,-0.0301,1.033880
37249,2013-12-31,2025.0,0.096,0.0,1847.76,0.0,2013-12-31,0.00,0.00203,0.00008,0.00512,-0.02515,-0.0002,0.588870,-1.0,0.0,0.0,0.0,0.0,
37250,2013-12-31,2050.0,0.109,0.0,1847.76,0.0,2013-12-31,0.00,0.00089,0.00005,0.00418,-0.0244,0.0,0.658410,-1.0,0.0,0.0,0.0,0.0,


<a id='conclusion'></a>
## Conclusion
We've cut down the originally downloaded data by selecting just the EOM contracts and we have also brought down the number of columns from 33 to 20 by selecting the necessary columns. This data is now saved in a file named `spx_eom_expiry_options_2010_2022.csv`. However, the size of this CSV file is too large and may increase the processing time. To reduce the size we will convert the CSV file into a pickle file named `spx_eom_expiry_options_2010_2022.bz2`. You will learn how to do this in the upcoming units.
<br><br>