# Options Data Storing
In the previous unit, you've learned how to source options data. This data is downloaded as a zip file and each zip file contains comma-separated files (with a .txt extension) where the options data is saved. To use this data we need to extract the zip file for all the months and then convert each file from .txt to .csv, therefore, in this notebook, you'll learn how to carry out these activities in just a few steps.
The steps are as follows:
1. [Import the Libraries](#import)
2. [Extract the Data](#ext)

<a id='import'></a>
## Import the Libraries
The first step is to import the necessary libraries. Here we are importing `py7zr` which will be used for extracting or decompressing the zip files. We will also use the `os` library to interact with your operating system. Additionally, the `pandas` library is used for data manipulation and the warning library is used for ignoring non-critical warnings.

<i> <span style="color:#FFFFFF; background:#00C001"> The following cells will not run in the browser. Download this notebook and convert the cell to "Code" type. You will also have to download the necessary data (as explained in the previous unit) and change the path/folder name.</i>

In [1]:
# For extracting zip files
import py7zr

# For data manipulation
import pandas as pd

# For ignoring warnings
import warnings 
warnings.filterwarnings('ignore')

# For getting a list of all files
import os

<a id='ext'></a>
## Extracting the Data
The files are saved in a folder named `spx_options_raw_data`. We will use the `listdir` method of `os` to list all the files present in that folder and then create a master dataframe named `options_data` for storing all the necessary data.

In [3]:
# Path to the main folder containing downloaded zip files
path = 'spx_options_raw_data/'

# List of files in the folder
dir_list = os.listdir(path)

# Create a master dataframe
options_data = pd.DataFrame()

Next, we will create a loop for extracting the zip files with the `extractall()` method of `py7zr`. The data that we currently have is too large and may take longer to process, so we will store only the necessary data in a dataframe named `monthly_data`. 

Finally, we will move the selected data to the master dataframe named `options_data` and delete the extracted files using the `remove` method of `os`.

In [5]:
# Loop for extracting the zip files
for zipfile_ in dir_list:
    with py7zr.SevenZipFile(path+zipfile_, 'r') as archive:
        archive.extractall()
        all_files = archive.getnames()

        # Read the files and store the necessary data in a dataframe
        for file_name in all_files:
            monthly_data = pd.read_csv(file_name, sep=',')

            # Move the files to the master dataframe
            options_data = pd.concat([options_data, monthly_data], ignore_index=True)

            # Delete the extracted files
            os.remove(file_name)

Let's take a look at what the master dataframe `options_data` looks like.

In [6]:
# Display the dataframe
options_data

Unnamed: 0,[QUOTE_UNIXTIME],[QUOTE_READTIME],[QUOTE_DATE],[QUOTE_TIME_HOURS],[UNDERLYING_LAST],[EXPIRE_DATE],[EXPIRE_UNIX],[DTE],[C_DELTA],[C_GAMMA],...,[P_LAST],[P_DELTA],[P_GAMMA],[P_VEGA],[P_THETA],[P_RHO],[P_IV],[P_VOLUME],[STRIKE_DISTANCE],[STRIKE_DISTANCE_PCT]
0,1262638800,2010-01-04 16:00,2010-01-04,16.0,1132.99,2010-01-07,1262898000,3.0,1.000000,0.000000,...,0.050000,-0.000770,0.000040,0.004390,-0.021160,-0.000490,0.640130,550.000000,208.0,0.184
1,1262638800,2010-01-04 16:00,2010-01-04,16.0,1132.99,2010-01-07,1262898000,3.0,1.000000,0.000000,...,0.050000,-0.002030,0.000060,0.009110,-0.040560,0.000000,0.598590,,183.0,0.162
2,1262638800,2010-01-04 16:00,2010-01-04,16.0,1132.99,2010-01-07,1262898000,3.0,1.000000,0.000000,...,0.100000,-0.003690,0.000220,0.013020,-0.058980,-0.000710,0.538850,,158.0,0.139
3,1262638800,2010-01-04 16:00,2010-01-04,16.0,1132.99,2010-01-07,1262898000,3.0,1.000000,0.000000,...,0.150000,-0.008940,0.000380,0.027860,-0.127580,-0.000860,0.502730,,133.0,0.117
4,1262638800,2010-01-04 16:00,2010-01-04,16.0,1132.99,2010-01-07,1262898000,3.0,1.000000,0.000000,...,0.200000,-0.012510,0.000660,0.038570,-0.156920,-0.001470,0.427250,720.000000,108.0,0.095
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
481625,1325278800,2011-12-30 16:00,2011-12-30,16.0,1257.91,2014-12-18,1418936400,1084.0,0.034,0.00023,...,0.0,-0.94657,0.0,0.0,-0.06022,-62.36523,-0.000090,,842.1,0.669
481626,1325278800,2011-12-30 16:00,2011-12-30,16.0,1257.91,2014-12-18,1418936400,1084.0,0.02074,0.00013,...,0.0,-0.94622,0.0,0.0,-0.06021,-65.33411,-0.000190,,942.1,0.749
481627,1325278800,2011-12-30 16:00,2011-12-30,16.0,1257.91,2014-12-18,1418936400,1084.0,0.01611,0.00014,...,0.0,-0.94568,0.0,0.0,-0.06011,-66.81953,-0.000400,,992.1,0.789
481628,1325278800,2011-12-30 16:00,2011-12-30,16.0,1257.91,2014-12-18,1418936400,1084.0,0.00616,0.00005,...,1265.35,-0.94652,0.0,0.0,-0.06021,-74.13313,0.000160,6.000000,1242.1,0.987
