# Yfinance, Option Chain, and Data

Focus: Gain familiarity with Yfinance, the option chain data, and how to manipulate the data for later computations of implied volatilty surfaces and PCA.

Notes:
- The ability to calculate an implied volatity through BS, simple samples, and rootfinding methods has already been implemented at this point 

- The focus of this notebook will be to further these implementations with real market data

- There will be conceptual note for both myself and other to follow along in this notebook

In [11]:
#Allow imports from the src directory
import sys
from pathlib import Path
project_root = Path().resolve().parents[0]
sys.path.append(str(project_root))

# Import self-made libraries to run checks
from src.black_scholes import black_scholes_price
from src.implied_vol import implied_volatility

# Standard imports
import yfinance as yf
import pandas as pd
import numpy as np

# Data Familiarity  
<br>
.option_chain()
- Have to pass in a date of string type labeled as "YYYY-MM-DD"
- Cannot pull multiple dates without a loop and adding to data frame

Option chain data
- Pulls IV but also prints in decimal form not percent
- Ex. Yfinance IV = 2.98 = 298%  
<br>
- Bid is the highest premium someone in the market is willing to pay for the option contract
- Ask is the lowest premium someon will accept to sell the contract
- Mid price = (bid+ask)/2 which is our fair market price estimate  
<br>
- Volume: How many contracts traded today
- Open interest: How many contracts still open and active
- Contract size: Shares per contract (REGULAR =  100 shares)



In [12]:
#Initialization 
ticker = "SPY"
tk = yf.Ticker(ticker)

#Example of not using a specific expiration date
SPYdf = tk.option_chain()
print("SPYdf data type:", type(SPYdf), "<- Notes that it does not return a DataFrame. Instead: <class 'tuple'>")

#Pulling available expiration dates
expirations = tk.options
print("SPY Expirations:", expirations)

#Pull option chain for a specific date
SPYdf = tk.option_chain('2026-02-20')
print("SPYdf data type:", type(SPYdf), "<- still a tuple of two DataFrames")
print("View of data:\n")
print(SPYdf)

SPYdf data type: <class 'yfinance.ticker.Options'> <- Notes that it does not return a DataFrame. Instead: <class 'tuple'>
SPY Expirations: ('2026-02-13', '2026-02-17', '2026-02-18', '2026-02-19', '2026-02-20', '2026-02-23', '2026-02-24', '2026-02-25', '2026-02-27', '2026-03-06', '2026-03-13', '2026-03-20', '2026-03-27', '2026-03-31', '2026-04-17', '2026-04-30', '2026-05-15', '2026-05-29', '2026-06-18', '2026-06-30', '2026-07-31', '2026-08-21', '2026-09-18', '2026-09-30', '2026-12-18', '2026-12-31', '2027-01-15', '2027-03-19', '2027-06-17', '2027-12-17', '2028-01-21', '2028-06-16', '2028-12-15')
SPYdf data type: <class 'yfinance.ticker.Options'> <- still a tuple of two DataFrames
View of data:

Options(calls=         contractSymbol             lastTradeDate  strike  lastPrice     bid  \
0    SPY260220C00335000 2026-02-02 19:44:19+00:00   335.0     362.17  345.41   
1    SPY260220C00345000 2026-02-02 14:42:12+00:00   345.0     348.01  335.42   
2    SPY260220C00350000 2026-02-03 19:25:15

In [13]:
#Separating calls and puts into their own DataFrames & do small view
calls = SPYdf.calls
puts = SPYdf.puts
print("Now that the data is separated into calls and puts, it is pandas DataFrames)")
print("Calls DataFrame data type:", type(calls))
print("Puts DataFrame data type:", type(puts))

# Small view of calls and puts DataFrames
print("\nCalls DataFrame:\n", calls.head(5))
print("\nPuts DataFrame:\n", puts.head(5))

# View of columns in calls DataFrames
print("\nCalls DataFrame columns:\n", calls.columns)
print("There are ", len(calls), " call option for SPY with expiration date 2026-02-20\n")
print("There are ", len(puts), " put option for SPY with expiration date 2026-02-20\n")


Now that the data is separated into calls and puts, it is pandas DataFrames)
Calls DataFrame data type: <class 'pandas.core.frame.DataFrame'>
Puts DataFrame data type: <class 'pandas.core.frame.DataFrame'>

Calls DataFrame:
        contractSymbol             lastTradeDate  strike  lastPrice     bid  \
0  SPY260220C00335000 2026-02-02 19:44:19+00:00   335.0     362.17  345.41   
1  SPY260220C00345000 2026-02-02 14:42:12+00:00   345.0     348.01  335.42   
2  SPY260220C00350000 2026-02-03 19:25:15+00:00   350.0     337.41  330.43   
3  SPY260220C00365000 2025-11-26 15:41:15+00:00   365.0     316.48  325.28   
4  SPY260220C00370000 2025-11-20 17:07:06+00:00   370.0     292.80  320.42   

      ask  change  percentChange  volume  openInterest  impliedVolatility  \
0  348.49     0.0            0.0     1.0             3           2.151860   
1  338.53     0.0            0.0     2.0             2           2.080083   
2  333.53     0.0            0.0     7.0            12           2.041997  

# Implied Volatility Check
Here I am going to check that my implied volatity caluclations matches the values given by the Yfinance option chain  
- First we pull some neccesary information like risk-free rate, stock price (SPY), and current time to expirations
- Next filter the option contracts sensibly 
- Finally calculating IV and comparing to Yfinance IV

In [17]:
#Imports
from src.risk_free_rate import calculate_risk_free_rate
from datetime import datetime, timezone

#Get current stock price
tk = yf.Ticker("SPY")
S = tk.info.get("regularMarketPrice", None)
if S is None:
    S = tk.history(period="1d")["Close"].iloc[-1]

#Calculate risk-free rate
r = calculate_risk_free_rate()

#Get current time in UTC
now = datetime.now(timezone.utc)
print(now)
print("now data type:", type(now))

#Calculate time to expiration
exp = "2026-02-20"
exp_dt = datetime.strptime(exp, "%Y-%m-%d").replace(tzinfo=timezone.utc)
print("exp_dt data type:", type(exp_dt), "\n")

T = ((exp_dt - now).total_seconds()) / ((365.0 * 24 * 60 * 60))  # Time to expiration in years
print("Time to expiration T (in years):", T, "\n")
print("T data type:", type(T), "\n")




2026-02-13 01:02:07.327057+00:00
now data type: <class 'datetime.datetime'>
exp_dt data type: <class 'datetime.datetime'> 

Time to expiration T (in years): 0.019059889426147895 

T data type: <class 'float'> 



Filter options under the conditions:
- moneyness low: float = 0.80
- moneyness high: float = 1.20
- max spread pct: float = 0.35
- min mid: float = 0.05
- requires open interest or volume  
<br>
Different conditions can be passed in but I do not do so because I like them for current purposes

In [18]:
import src.option_filters as of
print("Before filtering, there are ", len(calls), " call options and ", len(puts), " put options.\n")
calls = of.filter_option_chain(calls, S)
puts = of.filter_option_chain(puts, S)
print("After filtering, there are ", len(calls), " call options and ", len(puts), " put options.\n")

Before filtering, there are  98  call options and  155  put options.

After filtering, there are  98  call options and  155  put options.



Calculate IV & Compare for calls and puts
- TBD

In [None]:
first50_calls = calls.head(50)
first50_puts = puts.head(50)

#Compare Yfinance IV with my calculated IV for calls - using mid price as the option price for IV calculation
print('\n\nComparing IV for 50 calls - using mid price \n\n')
for row in first50_calls.itertuples(index=False):
    bid = row.bid
    ask = row.ask
    K = row.strike
    yahoo_iv = row.impliedVolatility  # already decimal 
    mid = 0.5 * (bid + ask)
    calc_iv = implied_volatility(price=mid, S=S, K=K, T=T, r=r, option_type="call")

    print(
        f"K={K:>7.2f} mid={mid:>8.3f} | YahooIV={yahoo_iv:>8.4f}  CalcIV={calc_iv:>8.4f}  Diff={calc_iv - yahoo_iv:+.4f}"
    )


#Now Put options - using mid price as the option price for IV calculation
print('\n\nComparing IV for 50 puts - using mid price \n\n')
for row in first50_puts.itertuples(index=False):
    bid = row.bid
    ask = row.ask
    K = row.strike
    yahoo_iv = row.impliedVolatility  # already decimal 
    mid = 0.5 * (bid + ask)
    calc_iv = implied_volatility(price=mid, S=S, K=K, T=T, r=r, option_type="put")

    print(
        f"K={K:>7.2f} mid={mid:>8.3f} | YahooIV={yahoo_iv:>8.4f}  CalcIV={calc_iv:>8.4f}  Diff={calc_iv - yahoo_iv:+.4f}"
    )


#Compare Yfinance IV with my calculated IV for calls - using last price as the option price for IV calculation
print('\n\nComparing IV for 50 calls - using last price \n\n')
for row in first50_calls.itertuples(index=False):
    bid = row.bid
    ask = row.ask
    K = row.strike
    yahoo_iv = row.impliedVolatility  # already decimal 
    price = row.lastPrice
    calc_iv = implied_volatility(price=price, S=S, K=K, T=T, r=r, option_type="call")

    print(
        f"K={K:>7.2f} price={price:>8.3f} | YahooIV={yahoo_iv:>8.4f}  CalcIV={calc_iv:>8.4f}  Diff={calc_iv - yahoo_iv:+.4f}"
    )

#now with puts - using last price as the option price for IV calculation
print('\n\nComparing IV for 50 puts - using last price \n\n')
for row in first50_puts.itertuples(index=False):
    bid = row.bid
    ask = row.ask
    K = row.strike
    yahoo_iv = row.impliedVolatility  # already decimal 
    price = row.lastPrice
    calc_iv = implied_volatility(price=price, S=S, K=K, T=T, r=r, option_type="put")

    print(
        f"K={K:>7.2f} price={price:>8.3f} | YahooIV={yahoo_iv:>8.4f}  CalcIV={calc_iv:>8.4f}  Diff={calc_iv - yahoo_iv:+.4f}"
    )



Comparing IV for 100 calls - using mid price 


K= 681.00 mid=   8.215 | YahooIV=  0.2012  CalcIV=  0.2091  Diff=+0.0079
K= 682.00 mid=   7.585 | YahooIV=  0.1977  CalcIV=  0.2056  Diff=+0.0079
K= 680.00 mid=   8.880 | YahooIV=  0.2055  CalcIV=  0.2130  Diff=+0.0076
K= 683.00 mid=   6.975 | YahooIV=  0.1942  CalcIV=  0.2021  Diff=+0.0079
K= 679.00 mid=   9.545 | YahooIV=  0.2092  CalcIV=  0.2164  Diff=+0.0072
K= 684.00 mid=   6.385 | YahooIV=  0.1907  CalcIV=  0.1985  Diff=+0.0078
K= 678.00 mid=  10.225 | YahooIV=  0.2124  CalcIV=  0.2198  Diff=+0.0074
K= 685.00 mid=   5.815 | YahooIV=  0.1871  CalcIV=  0.1949  Diff=+0.0078
K= 677.00 mid=  10.925 | YahooIV=  0.2163  CalcIV=  0.2232  Diff=+0.0068
K= 686.00 mid=   5.265 | YahooIV=  0.1834  CalcIV=  0.1912  Diff=+0.0077
K= 676.00 mid=  11.635 | YahooIV=  0.2227  CalcIV=  0.2265  Diff=+0.0037
K= 687.00 mid=   4.740 | YahooIV=  0.1796  CalcIV=  0.1874  Diff=+0.0078
K= 675.00 mid=  12.360 | YahooIV=  0.2260  CalcIV=  0.2297  Diff=+0.0038
K

Differences in these IV outputs can be attributed to a multitude of items:  
- The current implementation of BS assumes a non-dividend paying stock & SPY does pay dividends
- Y finance could be using mid price or last price
- Time to expiration could be calculated different (Trading days only)  
- Model differences in handling deep ITM or OTM options
<br>
<br>
Overall these are close enough for now and verify that the current BS implementation and IV calculations are working

# Building an IV Surface