# Black-Scholes Model Empirical Analysis: Data Download

This notebook downloads 1 month of market data for the Black-Scholes empirical analysis,
following the methodology of Salami (2024).

## Install Required Packages

```bash
pip install yfinance pandas numpy pandas-datareader matplotlib
```

In [6]:
# Uncomment to install packages
# !pip install yfinance pandas numpy pandas-datareader matplotlib

In [7]:
import yfinance as yf
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import os
import warnings
warnings.filterwarnings('ignore')

# Create data directory
DATA_DIR = 'data'
os.makedirs(DATA_DIR, exist_ok=True)

print("=" * 60)
print("BLACK-SCHOLES EMPIRICAL ANALYSIS: DATA DOWNLOAD")
print("=" * 60)
print(f"Download Date: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print(f"Data Directory: {os.path.abspath(DATA_DIR)}")

BLACK-SCHOLES EMPIRICAL ANALYSIS: DATA DOWNLOAD
Download Date: 2026-01-27 16:13:13
Data Directory: /Users/dukpagnarith/Desktop/PDE/data


## 1. Define Study Parameters

Following Salami (2024), we select stocks from three sectors:
- **Financial**: BAC, JPM, WFC
- **Healthcare**: MRK, PFE, UNH
- **Technology**: AAPL, MSFT, NVDA

In [8]:
# Stock selection (same as Salami 2024)
TICKERS = {
    'BAC': {'name': 'Bank of America Corporation', 'sector': 'Financial'},
    'JPM': {'name': 'JPMorgan Chase & Co.', 'sector': 'Financial'},
    'WFC': {'name': 'Wells Fargo & Company', 'sector': 'Financial'},
    'MRK': {'name': 'Merck & Company Inc.', 'sector': 'Healthcare'},
    'PFE': {'name': 'Pfizer Inc.', 'sector': 'Healthcare'},
    'UNH': {'name': 'UnitedHealth Group Inc.', 'sector': 'Healthcare'},
    'AAPL': {'name': 'Apple Inc.', 'sector': 'Technology'},
    'MSFT': {'name': 'Microsoft Corporation', 'sector': 'Technology'},
    'NVDA': {'name': 'NVIDIA Corporation', 'sector': 'Technology'}
}

# Time period: 1 month ending today
END_DATE = datetime.now()
START_DATE = END_DATE - timedelta(days=30)

print(f"\nStudy Period: {START_DATE.strftime('%Y-%m-%d')} to {END_DATE.strftime('%Y-%m-%d')}")
print(f"\nSelected Stocks ({len(TICKERS)}):")
for ticker, info in TICKERS.items():
    print(f"  {ticker}: {info['name']} ({info['sector']})")


Study Period: 2025-12-28 to 2026-01-27

Selected Stocks (9):
  BAC: Bank of America Corporation (Financial)
  JPM: JPMorgan Chase & Co. (Financial)
  WFC: Wells Fargo & Company (Financial)
  MRK: Merck & Company Inc. (Healthcare)
  PFE: Pfizer Inc. (Healthcare)
  UNH: UnitedHealth Group Inc. (Healthcare)
  AAPL: Apple Inc. (Technology)
  MSFT: Microsoft Corporation (Technology)
  NVDA: NVIDIA Corporation (Technology)


## 2. Download Risk-Free Rate

In [9]:
def download_risk_free_rate(start_date, end_date):
    """Download 3-month Treasury rate from FRED (as used in Salami 2024)."""
    try:
        import pandas_datareader as pdr
        
        # Download 3-month Treasury rate (DGS3MO) as in the paper
        treasury = pdr.get_data_fred('DGS3MO', start=start_date, end=end_date)
        treasury = treasury.dropna()
        
        # Convert to decimal
        treasury['rate'] = treasury['DGS3MO'] / 100
        treasury = treasury.reset_index()
        treasury.columns = ['date', 'rate_pct', 'rate']
        
        print(f"✓ Downloaded {len(treasury)} days of risk-free rate data")
        print(f"  Latest rate: {treasury['rate'].iloc[-1]:.4f} ({treasury['rate'].iloc[-1]*100:.2f}%)")
        
        return treasury
        
    except Exception as e:
        print(f"✗ Failed to download from FRED: {e}")
        print("  Creating fallback data with 4.5% rate")
        dates = pd.date_range(start=start_date, end=end_date, freq='B')
        return pd.DataFrame({
            'date': dates,
            'rate_pct': 4.5,
            'rate': 0.045
        })

risk_free_df = download_risk_free_rate(START_DATE, END_DATE)
risk_free_df.to_csv(f'{DATA_DIR}/risk_free_rate.csv', index=False)
print(f"  Saved to {DATA_DIR}/risk_free_rate.csv")

✗ Failed to download from FRED: No module named 'pandas_datareader'
  Creating fallback data with 4.5% rate
  Saved to data/risk_free_rate.csv


## 3. Download Stock Price History (1 Month)

In [10]:
def download_stock_history(tickers, start_date, end_date):
    """Download daily stock prices for all tickers."""
    all_data = []
    
    for ticker in tickers:
        try:
            stock = yf.Ticker(ticker)
            hist = stock.history(start=start_date, end=end_date)
            
            if len(hist) > 0:
                hist = hist.reset_index()
                hist['ticker'] = ticker
                hist = hist[['Date', 'ticker', 'Open', 'High', 'Low', 'Close', 'Volume']]
                hist.columns = ['date', 'ticker', 'open', 'high', 'low', 'close', 'volume']
                all_data.append(hist)
                print(f"  ✓ {ticker}: {len(hist)} trading days")
            else:
                print(f"  ✗ {ticker}: No data")
                
        except Exception as e:
            print(f"  ✗ {ticker}: Error - {e}")
    
    return pd.concat(all_data, ignore_index=True) if all_data else None

print("\nDownloading stock price history...")
stock_history = download_stock_history(TICKERS.keys(), START_DATE, END_DATE)

if stock_history is not None:
    stock_history.to_csv(f'{DATA_DIR}/stock_history.csv', index=False)
    print(f"\n✓ Saved {len(stock_history)} records to {DATA_DIR}/stock_history.csv")


Downloading stock price history...
  ✓ BAC: 19 trading days
  ✓ JPM: 19 trading days
  ✓ WFC: 19 trading days
  ✓ MRK: 19 trading days
  ✓ PFE: 19 trading days
  ✓ UNH: 19 trading days
  ✓ AAPL: 19 trading days
  ✓ MSFT: 19 trading days
  ✓ NVDA: 19 trading days

✓ Saved 171 records to data/stock_history.csv


## 4. Download Current Stock Information

In [11]:
def download_stock_info(tickers_dict):
    """Download current stock information."""
    stock_info = []
    
    for ticker, meta in tickers_dict.items():
        try:
            stock = yf.Ticker(ticker)
            info = stock.info
            
            spot = info.get('currentPrice') or info.get('regularMarketPrice') or info.get('previousClose')
            div_yield = info.get('dividendYield') or 0
            
            stock_info.append({
                'ticker': ticker,
                'name': meta['name'],
                'sector': meta['sector'],
                'spot_price': spot,
                'dividend_yield': div_yield,
                'market_cap': info.get('marketCap', 0),
                'beta': info.get('beta', 1.0)
            })
            print(f"  ✓ {ticker}: ${spot:.2f}, Div: {div_yield*100:.2f}%")
            
        except Exception as e:
            print(f"  ✗ {ticker}: Error - {e}")
    
    return pd.DataFrame(stock_info)

print("\nDownloading current stock information...")
stock_info_df = download_stock_info(TICKERS)
stock_info_df.to_csv(f'{DATA_DIR}/stock_info.csv', index=False)
print(f"\n✓ Saved to {DATA_DIR}/stock_info.csv")
stock_info_df


Downloading current stock information...
  ✓ BAC: $52.02, Div: 215.00%
  ✓ JPM: $301.04, Div: 199.00%
  ✓ WFC: $88.05, Div: 204.00%
  ✓ MRK: $107.40, Div: 317.00%
  ✓ PFE: $25.88, Div: 665.00%
  ✓ UNH: $351.64, Div: 251.00%
  ✓ AAPL: $255.41, Div: 41.00%
  ✓ MSFT: $470.28, Div: 77.00%
  ✓ NVDA: $186.47, Div: 2.00%

✓ Saved to data/stock_info.csv


Unnamed: 0,ticker,name,sector,spot_price,dividend_yield,market_cap,beta
0,BAC,Bank of America Corporation,Financial,52.02,2.15,379875819520,1.295
1,JPM,JPMorgan Chase & Co.,Financial,301.04,1.99,819509854208,1.066
2,WFC,Wells Fargo & Company,Financial,88.05,2.04,276396408832,1.088
3,MRK,Merck & Company Inc.,Healthcare,107.4,3.17,268261933056,0.298
4,PFE,Pfizer Inc.,Healthcare,25.88,6.65,147146113024,0.433
5,UNH,UnitedHealth Group Inc.,Healthcare,351.64,2.51,318529110016,0.425
6,AAPL,Apple Inc.,Technology,255.41,0.41,3774028185600,1.093
7,MSFT,Microsoft Corporation,Technology,470.28,0.77,3495669530624,1.073
8,NVDA,NVIDIA Corporation,Technology,186.47,0.02,4539985428480,2.314


## 5. Download Option Chains

Download option chains for each stock, selecting expiration ~30-45 days out.

In [12]:
def download_all_option_chains(tickers, min_days=25, max_days=60):
    """Download option chains for all tickers."""
    all_calls = []
    all_puts = []
    summary = []
    
    today = datetime.now().date()
    
    for ticker in tickers:
        try:
            stock = yf.Ticker(ticker)
            expirations = stock.options
            
            if not expirations:
                print(f"  ✗ {ticker}: No options available")
                continue
            
            # Find suitable expiration
            selected_exp = None
            for exp in expirations:
                exp_date = datetime.strptime(exp, '%Y-%m-%d').date()
                days = (exp_date - today).days
                if min_days <= days <= max_days:
                    selected_exp = exp
                    break
            
            if not selected_exp:
                selected_exp = expirations[0]
            
            exp_date = datetime.strptime(selected_exp, '%Y-%m-%d').date()
            days_to_exp = (exp_date - today).days
            
            # Get option chain
            chain = stock.option_chain(selected_exp)
            
            # Process calls
            calls = chain.calls.copy()
            calls['ticker'] = ticker
            calls['expiration'] = selected_exp
            calls['days_to_expiry'] = days_to_exp
            calls['T'] = days_to_exp / 365
            all_calls.append(calls)
            
            # Process puts
            puts = chain.puts.copy()
            puts['ticker'] = ticker
            puts['expiration'] = selected_exp
            puts['days_to_expiry'] = days_to_exp
            puts['T'] = days_to_exp / 365
            all_puts.append(puts)
            
            summary.append({
                'ticker': ticker,
                'expiration': selected_exp,
                'days_to_expiry': days_to_exp,
                'T': days_to_exp / 365,
                'num_calls': len(calls),
                'num_puts': len(puts)
            })
            
            print(f"  ✓ {ticker}: Exp {selected_exp} ({days_to_exp}d), {len(calls)} calls, {len(puts)} puts")
            
        except Exception as e:
            print(f"  ✗ {ticker}: Error - {e}")
    
    calls_df = pd.concat(all_calls, ignore_index=True) if all_calls else None
    puts_df = pd.concat(all_puts, ignore_index=True) if all_puts else None
    summary_df = pd.DataFrame(summary)
    
    return calls_df, puts_df, summary_df

print("\nDownloading option chains...")
calls_df, puts_df, options_summary = download_all_option_chains(TICKERS.keys())

if calls_df is not None:
    calls_df.to_csv(f'{DATA_DIR}/options_calls.csv', index=False)
    puts_df.to_csv(f'{DATA_DIR}/options_puts.csv', index=False)
    options_summary.to_csv(f'{DATA_DIR}/options_summary.csv', index=False)
    print(f"\n✓ Saved {len(calls_df)} calls and {len(puts_df)} puts")

options_summary


Downloading option chains...
  ✓ BAC: Exp 2026-02-27 (31d), 25 calls, 18 puts
  ✓ JPM: Exp 2026-02-27 (31d), 24 calls, 24 puts
  ✓ WFC: Exp 2026-02-27 (31d), 27 calls, 21 puts
  ✓ MRK: Exp 2026-02-27 (31d), 24 calls, 18 puts
  ✓ PFE: Exp 2026-02-27 (31d), 20 calls, 14 puts
  ✓ UNH: Exp 2026-02-27 (31d), 36 calls, 33 puts
  ✓ AAPL: Exp 2026-02-27 (31d), 39 calls, 29 puts
  ✓ MSFT: Exp 2026-02-27 (31d), 60 calls, 49 puts
  ✓ NVDA: Exp 2026-02-27 (31d), 53 calls, 39 puts

✓ Saved 308 calls and 245 puts


Unnamed: 0,ticker,expiration,days_to_expiry,T,num_calls,num_puts
0,BAC,2026-02-27,31,0.084932,25,18
1,JPM,2026-02-27,31,0.084932,24,24
2,WFC,2026-02-27,31,0.084932,27,21
3,MRK,2026-02-27,31,0.084932,24,18
4,PFE,2026-02-27,31,0.084932,20,14
5,UNH,2026-02-27,31,0.084932,36,33
6,AAPL,2026-02-27,31,0.084932,39,29
7,MSFT,2026-02-27,31,0.084932,60,49
8,NVDA,2026-02-27,31,0.084932,53,39


## 6. Preview Downloaded Data

In [13]:
print("\n" + "=" * 60)
print("STOCK PRICE HISTORY PREVIEW")
print("=" * 60)

# Show last 5 days for each stock
for ticker in TICKERS.keys():
    subset = stock_history[stock_history['ticker'] == ticker].tail(3)
    if len(subset) > 0:
        print(f"\n{ticker}:")
        print(subset[['date', 'close']].to_string(index=False))


STOCK PRICE HISTORY PREVIEW

BAC:
                     date     close
2026-01-22 00:00:00-05:00 52.450001
2026-01-23 00:00:00-05:00 51.720001
2026-01-26 00:00:00-05:00 52.020000

JPM:
                     date      close
2026-01-22 00:00:00-05:00 303.630005
2026-01-23 00:00:00-05:00 297.720001
2026-01-26 00:00:00-05:00 301.040009

WFC:
                     date     close
2026-01-22 00:00:00-05:00 88.040001
2026-01-23 00:00:00-05:00 86.959999
2026-01-26 00:00:00-05:00 88.050003

MRK:
                     date      close
2026-01-22 00:00:00-05:00 109.180000
2026-01-23 00:00:00-05:00 108.180000
2026-01-26 00:00:00-05:00 107.400002

PFE:
                     date     close
2026-01-22 00:00:00-05:00 25.670000
2026-01-23 00:00:00-05:00 25.650000
2026-01-26 00:00:00-05:00 25.879999

UNH:
                     date      close
2026-01-22 00:00:00-05:00 354.470001
2026-01-23 00:00:00-05:00 356.260010
2026-01-26 00:00:00-05:00 351.640015

AAPL:
                     date      close
2026-01-22 00:0

In [14]:
print("\n" + "=" * 60)
print("OPTION CHAIN PREVIEW (AAPL)")
print("=" * 60)

# Show sample call options
aapl_calls = calls_df[calls_df['ticker'] == 'AAPL'][[
    'strike', 'lastPrice', 'bid', 'ask', 'volume', 'impliedVolatility'
]].head(10)
print("\nCall Options:")
print(aapl_calls.to_string(index=False))


OPTION CHAIN PREVIEW (AAPL)

Call Options:
 strike  lastPrice  bid  ask  volume  impliedVolatility
  170.0      76.30  0.0  0.0     NaN            0.00001
  185.0      63.24  0.0  0.0     1.0            0.00001
  190.0      60.50  0.0  0.0     2.0            0.00001
  195.0      65.87  0.0  0.0     NaN            0.00001
  200.0      51.86  0.0  0.0     2.0            0.00001
  205.0      42.30  0.0  0.0    10.0            0.00001
  210.0      42.92  0.0  0.0     3.0            0.00001
  215.0      35.00  0.0  0.0     3.0            0.00001
  220.0      36.85  0.0  0.0    44.0            0.00001
  225.0      32.10  0.0  0.0     3.0            0.00001


## 7. Data Summary

In [15]:
print("\n" + "=" * 60)
print("DATA DOWNLOAD COMPLETE")
print("=" * 60)

print(f"\nStudy Period: {START_DATE.strftime('%Y-%m-%d')} to {END_DATE.strftime('%Y-%m-%d')}")
print(f"\nFiles Created in '{DATA_DIR}/' directory:")
for f in sorted(os.listdir(DATA_DIR)):
    size = os.path.getsize(os.path.join(DATA_DIR, f))
    print(f"  • {f} ({size:,} bytes)")

print(f"\nData Summary:")
print(f"  • Stocks: {len(TICKERS)}")
print(f"  • Trading Days: {stock_history['date'].nunique()}")
print(f"  • Call Options: {len(calls_df)}")
print(f"  • Put Options: {len(puts_df)}")
print(f"  • Risk-Free Rate (latest): {risk_free_df['rate'].iloc[-1]*100:.2f}%")

print("\n" + "=" * 60)
print("NEXT STEP: Run the QMD analysis file")
print("=" * 60)


DATA DOWNLOAD COMPLETE

Study Period: 2025-12-28 to 2026-01-27

Files Created in 'data/' directory:
  • options_calls.csv (46,766 bytes)
  • options_puts.csv (36,919 bytes)
  • options_summary.csv (453 bytes)
  • risk_free_rate.csv (833 bytes)
  • stock_history.csv (18,844 bytes)
  • stock_info.csv (654 bytes)

Data Summary:
  • Stocks: 9
  • Trading Days: 19
  • Call Options: 308
  • Put Options: 245
  • Risk-Free Rate (latest): 4.50%

NEXT STEP: Run the QMD analysis file
