
## FINANCIAL DATA
MODULE 4 | LESSON 3


---


# **OPTION DATA AND ATTRIBUTES**

|  |  |
|:---|:---|
|**Reading Time** |  30- 40 minutes |
|**Prior Knowledge** |Calls and Puts, Option Parameters, Option Payoffs, Option Prices, Option Strategies    |
|**Keywords** |Open Interest, Put Call Parity Ratios, Bid-Ask Spread    |


---

*In the previous lesson, we studied how options depend on the underlying parameters. In this lesson, we'll import option data: real-time pricing and attributes. We'll see the types of metadata available and how they relate to each other and option prices.*

In [None]:
import datetime

import matplotlib.patches as mpatches
import matplotlib.pyplot as plt
import pandas as pd
import yfinance as yf

## 1. Handling Option Complexity
As an asset, options are N times more complicated than stocks
(where N > 1 and is left up to you to determine the effects of 
non-linearity, leverage, kurtosis, hedging, model risk, and other issues).

One of the reasons options are more complicated than stocks is the abundant array of options to choose from.

Suppose you are bullish on Netflix (symbol = NFLX).
(You may like the original series, and the wide variety of films, television series, and documentaries.)
You can decide to buy the stock.
But what if you decide to buy the option?
Well, you will soon realize there are a multitude of options.

So, the question is better phrased as, what if you decide to buy an option? Or even a set of options because some strategies involve multiple options?

Remember, unless you are a volatility trader, when you trade options you need to get three things correct:
1. Direction: based on what you choose between calls or puts
2. Size: based on what you choose among strike levels
3. Timing: based on what you choose among different expirations
Let's count how many there are of each!

## 2. Importing Option Data

First, we'll refer to a function that helps not only to import option chain data but also to categorize and store it neatly.

The function uses the Python package `yfinance`.
It works with pandas Data Frames.
It is used for loops, basic subsetting, and even lambda functions.
Please do the required reading to ensure your knowledge of Python is complete and up-to-date.


In [None]:
# https://medium.com/@txlian13/webscrapping-options-data-with-python-and-yfinance-e4deb0124613


def options_chain(symbol):

    tk = yf.Ticker(symbol)
    # Expiration dates
    exps = tk.options

    # Get options for each expiration
    options = pd.DataFrame()
    for e in exps:
        opt = tk.option_chain(e)
        opt = pd.DataFrame().append(opt.calls).append(opt.puts)
        opt["expirationDate"] = e
        options = options.append(opt, ignore_index=True)

    # Bizarre error in `yfinance` that gives the wrong expiration date
    # Add 1 day to get the correct expiration date
    options["expirationDate"] = pd.to_datetime(
        options["expirationDate"]
    ) + datetime.timedelta(days=1)
    options["dte"] = (
        options["expirationDate"] - datetime.datetime.today()
    ).dt.days / 365

    # Boolean column if the option is a CALL
    options["CALL"] = options["contractSymbol"].str[4:].apply(lambda x: "C" in x)

    # options[['bid', 'ask', 'strike']] = options[['bid', 'ask', 'strike']].apply(pd.to_numeric)
    options[["bid", "ask", "strike", "volume", "Implied Volatility"]] = options[
        ["bid", "ask", "strike", "volume", "Implied Volatility"]
    ].apply(pd.to_numeric)
    options["mark"] = (
        options["bid"] + options["ask"]
    ) / 2  # Calculate the midpoint of the bid-ask

    # Drop unnecessary and meaningless columns
    options = options.drop(
        columns=[
            "contractSize",
            "currency",
            "change",
            "percentChange",
            "lastTradeDate",
            "lastPrice",
        ]
    )

    return options

## 3. Options Come in a Variety of Expiration Dates

Now, let's count the number of different expiration dates.

In [None]:
from yahoo_fin import options

nflx_dates = options.get_expiration_dates("nflx")
len(nflx_dates)

As of the time we ran this code, we have 15 different expiration dates.
(You may have a different number.)
Let's see how varied the dates are.

In [None]:
list(nflx_dates)

We notice that there are about six expiration dates that occur in 1-week intervals: 1-week options, 2-week options, 3-week options, 4-week options, 5-week options, and 6-week options. Then, they appear to be monthly, typically expiring the third Friday of the month. Then, there appear to be 1-year and perhaps even 2-year options. When you're bullish on Netflix, you will certainly want to know if you think the stock will increase within a few weeks for short-dated options or over longer periods of time for the longer-dated options.

Next, let's get all the calls.

In [None]:
try:
    callsNflx = options.get_calls("nflx")
except:  # noQA E722
    callsNflx = pd.read_csv("nflx_calls.csv")

list(callsNflx.columns)

In [None]:
callsNflx.head()

In [None]:
callsNflx.dtypes

There's lots of information available on options. Keep in mind the function used to collect them dropped some other columns that are not shown in this list.  The data we have on options includes:
1. The Contract Name. This is similar to a CUSIP, ISIN, SEDOL, ticker, or identifier.
2. The Last Trade Date. This is the date of the most recent activity. If an option is inactive, you may encounter a date from a long time ago.
3. Strike. The strike level of the option.
4. Last Price. Trades in the market occur at a specific price. This is the most recent one.
5. Bid. Market participants willing to buy agree to do so at the bid price. This is a quote rather than a trade.
6. Ask. Market participants willing to sell agree to do so at the ask price. This is also known as the offer price.  Again, this is quote data rather than trade data.
Just to clarify: Last Price refers to data that actually traded. Bid and Ask refer to price levels offered by market makers to buy and sell, respectively.
7. Change. This gives the change in price on the day.  Positive change means prices increased on the day; negative changes mean prices decreased.
8. % Chg. This gives the percent change. This is usually more helpful than the level change because it is scaled relative to the option's price.
9. Volume. This gives the number of contracts that traded today.
10. Open Interest. This gives the number of contracts outstanding for that particular option. This is sometimes confused with volume.  
11. Implied Volatility. Of all the option's inputs, volatility is the most important. The other numeric inputs, stock price, strike level, risk-free rate, and dividend yield, are easily observed in the market. The volatility is the one number that is key to the option's price. If we are given the price of an option, we can imply what volatility was used to achieve that price. This means that we had agreement on all the other parameters. When we imply the volatility from the option's price, we compute the implied probability.

Let's start counting.

## 4. Options Come in a Variety of Strikes

How many different strike levels are there?

In [None]:
numStrikes = callsNflx["Strike"].count()
numStrikes

This great number of strikes is bewildering.  It can make the specific selection of an option's strike a daunting process.  Fortunately, if we  investigate the strikes, we see a lot of them are deep OTM or deep ITM.

In [None]:
callsNflx["Strike"]

At the time of writing, NFLX is 340.  However, you can find strikes ranging from 10 to 1050. The overwhelming majority of these are OTM.

## 5. Options Have Different Amounts of Open Interest

An easy way to filter options is to examine their open interest.
The open interest refers to the number of contracts in existence.  
Unlike stocks, the open interest of options can change moment by moment.
To issue a new option, a buyer and seller simply come to terms.  
Effectively, there is as much supply as the sellers are willing to write.
(Practically, they will want to have access to the underlying so they can properly hedge their exposures).
Many of the strikes we examined have little or no open interest.
That means, market makers are offering these securities, but there have been no contracts written yet.  Perhaps the combination of being far from the strike and the time to expiration being too soon means that there is little interest in hedging or speculating with these options.

In [None]:
callStrikes = list(callsNflx["Strike"])
callOpenInt = list(callsNflx["Open Interest"])

In [None]:
plt.plot(callStrikes, callOpenInt)
plt.xlabel("Call Strikes")
plt.ylabel("Call Open Interest")

In [None]:
plt.xlim([300, 500])
plt.plot(callStrikes, callOpenInt)
plt
plt.xlabel("Strike")
plt.ylabel("Open Interest")

A handful of options have the vast majority of open interest.  This means that when we move far from the strike, we see much less open activity.

Options with low amounts of open interest are not liquid.  Recall from Financial Markets that we studied liquidity.  Think of each option as a vendor at an outdoor market.  Think of open interest as the number of customers who bought fruits and vegetables at this market.  
A vendor with no customers is like an option with insufficient open interest.
There are little to no option contracts written.
Therefore, the markets are illiquid.  

Unlike our outdoor market, the options market allows participants to buy and sell. For illiquid securities, there is likely going to be a large bid-ask spread. Those options will be unfavorable to trade due to the illiquidity.

So we have lots of calls. We could run through the same exercise for puts, but the results should be similar.  There would be a lot of strikes.

In [None]:
try:
    putsNflx = options.get_puts("nflx")
except:  # noqa E722
    putsNflx = pd.read_csv("nflx_puts.csv")

for index in putsNflx.index:
    if "-" == putsNflx["Volume"][index]:
        putsNflx.loc[index, "Volume"] = 0
numPutStrikes = putsNflx["Strike"].count()
numPutStrikes

In [None]:
putStrikes = list(putsNflx["Strike"])
putOpenInt = list(putsNflx["Open Interest"])
plt.plot(putStrikes, putOpenInt)
plt.xlabel("Put Strikes")
plt.ylabel("Put Open Interest")

Rather than plot all of the open interest amounts, let's just focus on the options near the current stock price.

In [None]:
plt.xlim([300, 500])
plt.plot(putStrikes, putOpenInt)
plt.xlabel("Strike")
plt.ylabel("Open Interest");

## 6. Cleaning Volume

There's a problem with volume.  Contracts with no volume used '-' instead of 0.
This data cleaning is an important step.  
The following for loop will replace instances of '-' with a 0.

In [None]:
for index in callsNflx.index:
    if "-" == callsNflx["Volume"][index]:
        callsNflx.loc[index, "Volume"] = 0

Now we convert the column to float.

In [None]:
callsNflx.Volume = callsNflx.Volume.astype(float)

Let's do the same for puts: replace dashes and then convert to float.

In [None]:
for index in putsNflx.index:
    if "-" == putsNflx["Volume"][index]:
        putsNflx.loc[index, "Volume"] = 0

putsNflx.Volume = putsNflx.Volume.astype(float)

Let's create simple lists so we can add them to a data frame later.

In [None]:
callVolume = list(callsNflx["Volume"])
putVolume = list(putsNflx["Volume"])

 ## 7. Diving Deeper into Open Interest

Let's dive deeper into open interest.
We can start by graphing open interest for calls and puts across different strikes.

In [None]:
callDf = pd.DataFrame()
callDf["Strikes"] = callStrikes
callDf["CallOpenInt"] = callOpenInt
callDf["CallVolume"] = callVolume
putDf = pd.DataFrame()
putDf["Strikes"] = putStrikes
putDf["PutOpenInt"] = putOpenInt
putDf["PutVolume"] = putVolume

Now let's merge the two data frames for strikes near the current stock price.

In [None]:
df = callDf.merge(putDf)
df = df[(df["Strikes"] > 300) & (df["Strikes"] < 400)]
list(df.dtypes)

In [None]:
plt.xlim([300, 400])
plt.plot(list(df["Strikes"]), list(df["CallOpenInt"]))
plt.plot(list(df["Strikes"]), list(df["PutOpenInt"]))
plt.title("Open Interest vs. Strike Level")
plt.xlabel("Strike")
plt.ylabel("Open Interest")
patch1 = mpatches.Patch(color="blue", label="Call Open Interest")
patch2 = mpatches.Patch(color="orange", label="Put Open Interest")
plt.legend(handles=[patch1, patch2])

What is so interesting about open interest? 
Meaningful data ratios derive from these open interest numbers.
Let's define one.
Put Call Ratio of Open Interest = Put Open Interest / Call Open Interest

Read the required reading to see how useful Open Interest can be in predicting the direction of a stock.
How can we define a function for put call ratio?

In [None]:
def PutCallRatioOpenInterest(df):
    pcroi = sum(df["PutOpenInt"]) / sum(df["CallOpenInt"])
    return round(pcroi, 4)


PutCallRatioOpenInterest(df)

In [None]:
def PutCallRatioVolume(df):
    pcv = sum(df["PutVolume"]) / sum(df["CallVolume"])
    return round(pcv, 4)


PutCallRatioVolume(df)

## 8. Handling Python Data Structures

In [None]:
info = {}
for date in nflx_dates:
    info[date] = options.get_options_chain("nflx")
type(info)

Notice the familiar data structures.  Info is a dictionary.  It is keyed by the expiration date.

In [None]:
exp_dates = list(info.keys())
exp_dates

Let's extract one key by extracting one date.

In [None]:
z1 = info[exp_dates[0]]
type(z1)

In [None]:
z1.keys()

We still have a dictionary. Let's get the calls.

In [None]:
z2 = z1["calls"]
type(z2)

Now, we have a data frame.

In [None]:
z2["Strike"].count()
z2.columns

In [None]:
z3 = z2[(z2["Strike"] >= 330) & (z2["Strike"] <= 350)]
z3

Let's compute the bid-ask spread.

In [None]:
# Compute the bid-ask spread
z4 = z2[(z2["Strike"] >= 300) & (z2["Strike"] <= 400)]
plt.plot(list(z4["Strike"]), list(z4["Ask"] - z4["Bid"]))
plt.title("Bid-Ask Spread as a Function of Strike")
plt.xlabel("Strike")
plt.ylabel("Bid-Ask Spread")

When we are OTM, the bid-ask spread is considerably higher.
OTM calls tend to have very low liquidity.


## 9. Conclusion
In this lesson, we introduced Python functions to import option data. Unlike stock data, option data is messier because there are many options for a single stock: calls and puts, different strikes, and different expiration times. In the next lesson, we'll look at option strategies.

---
Copyright © 2022 WorldQuant University. This
content is licensed solely for personal use. Redistribution or
publication of this material is strictly prohibited.
