# Earnings Regression — data preparation

This notebook combines the event-study abnormal returns, analyst EPS surprises, and regime data into a single panel for regression analysis. The output will have columns: `Ticker`, `Earnings Date`, `Surprise`, `CAR`, `Regime`, `VIX`, `Δ10Y` and will be limited to 250 rows (first 250 matched earnings announcements).

# Earnings Regression — data preparation

This notebook combines the event-study abnormal returns, analyst EPS surprises, and regime data into a single panel for regression analysis. The output will have columns: `Ticker`, `Earnings Date`, `Surprise`, `CAR`, `Regime`, `VIX`, `Δ10Y` and will be limited to 250 rows (first 250 matched earnings announcements).

In [1]:
# Imports
import pandas as pd
import numpy as np


In [16]:
# Load datasets (adjust paths if needed)
av = pd.read_csv('Data/av_eps_quarterly.csv', dayfirst=True, parse_dates=['fiscalDateEnding','reportedDate'], infer_datetime_format=True)
event = pd.read_csv('Data/event_study_abnormal_returns_panel123.csv', parse_dates=['date'], infer_datetime_format=True)
regime = pd.read_csv('Data/regime_data.csv', infer_datetime_format=True)
# Quick checks
print('av rows', len(av))
print('event rows', len(event))
print('regime rows', len(regime))

av rows 250
event rows 1714
regime rows 5504


  av = pd.read_csv('Data/av_eps_quarterly.csv', dayfirst=True, parse_dates=['fiscalDateEnding','reportedDate'], infer_datetime_format=True)
  event = pd.read_csv('Data/event_study_abnormal_returns_panel123.csv', parse_dates=['date'], infer_datetime_format=True)
  regime = pd.read_csv('Data/regime_data.csv', infer_datetime_format=True)


In [18]:
# Compute CAR per earnings announcement: cumulative abnormal return over window [-5, +5] around the event
car = (event[event['event_day'].between(-5, 5)]
       .groupby(['ticker','date'], as_index=False)
       ['abnormal_return']
       .sum()
       .rename(columns={'ticker':'Ticker','date':'Earnings Date','abnormal_return':'CAR'})
       )
# Normalize types
car['Earnings Date'] = pd.to_datetime(car['Earnings Date'], errors='coerce')
print(car)

     Ticker Earnings Date       CAR
0      AAPL    2005-07-08 -0.019242
1      AAPL    2005-07-11 -0.025196
2      AAPL    2005-07-12 -0.003679
3      AAPL    2005-07-13  0.001773
4      AAPL    2005-07-14  0.052472
...     ...           ...       ...
1709   NVDA    2025-11-18 -0.011270
1710   NVDA    2025-11-19  0.019285
1711   NVDA    2025-11-20 -0.001195
1712   NVDA    2025-11-21 -0.022863
1713   NVDA    2025-11-24 -0.018861

[1714 rows x 3 columns]


In [19]:
# Prepare analyst/earnings data: pick ticker, reported date and surprise
av_small = av.rename(columns={'symbol':'Ticker','reportedDate':'Earnings Date','surprise':'Surprise'})[[ 'Ticker','Earnings Date','Surprise' ]].copy()
av_small['Earnings Date'] = pd.to_datetime(av_small['Earnings Date'], dayfirst=True, errors='coerce')

In [20]:
# Merge earnings surprises with CAR on Ticker + Earnings Date (inner join to keep matched events)
merged = pd.merge(av_small, car, on=['Ticker','Earnings Date'], how='inner')
print('merged rows', len(merged))

merged rows 248


In [21]:
# Prepare regime data: rename known columns and keep Regime, VIX and Δ10Y
print('regime columns:', regime.columns.tolist())
r = regime.copy()
# Explicit renames based on file columns
r = r.rename(columns={'date':'Earnings Date', 'regime':'Regime', 'VIXCLS':'VIX', 'DGS10_3m_pct_change':'Δ10Y'})
# Keep only the columns we need (if present)
keep = [c for c in ['Earnings Date','Regime','VIX','Δ10Y'] if c in r.columns]
r_small = r[keep].copy()
r_small['Earnings Date'] = pd.to_datetime(r_small['Earnings Date'], errors='coerce')

regime columns: ['date', 'DGS10', 'DGS10_3m_pct_change', 'VIXCLS', 'regime']


In [26]:
# Merge regime data into the main merged table (left join on Earnings Date)
final = pd.merge(merged, r_small, on='Earnings Date', how='left')
cols = ['Ticker','Earnings Date','Surprise','CAR','Regime','VIX','Δ10Y']
available = [c for c in cols if c in final.columns]
panel = final[available].dropna(subset=['Ticker','Earnings Date'])
panel_250 = panel.sort_values(['Earnings Date','Ticker']).head(250).reset_index(drop=True)
print('final panel shape:', panel.shape)
print('panel_250 shape:', panel_250.shape)
panel_250
panel_250.to_csv('Data/earnings_regression_panel.csv', index=False)
print('Saved Data/earnings_regression_panel.csv')
panel_250.sort_values(['Ticker', 'Earnings Date'])

final panel shape: (248, 7)
panel_250 shape: (248, 7)
Saved Data/earnings_regression_panel.csv


Unnamed: 0,Ticker,Earnings Date,Surprise,CAR,Regime,VIX,Δ10Y
2,AAPL,2005-07-13,0.00,0.001773,0,10.84,-0.047945
5,AAPL,2005-10-11,0.00,0.042294,0,15.63,0.057831
8,AAPL,2006-01-18,0.00,-0.015901,0,12.25,-0.029083
11,AAPL,2006-04-19,0.00,-0.017639,0,11.32,0.161290
14,AAPL,2006-07-19,0.00,-0.003753,0,15.55,0.003968
...,...,...,...,...,...,...,...
235,NVDA,2024-11-20,0.06,-0.005471,0,17.16,0.163588
238,NVDA,2025-02-26,0.04,0.032805,0,19.10,0.000000
241,NVDA,2025-05-28,0.06,0.004363,0,19.31,0.051765
244,NVDA,2025-08-27,0.04,-0.005637,0,14.85,-0.051454


Notes:
- CAR is computed as cumulative abnormal return over the event window [-5, +5]. Change the window in the notebook if you prefer a different definition.
- The notebook attempts to find matching columns in `regime_data.csv`. If your `regime_data.csv` uses different column names for `VIX` or the 10-year change, adjust the renaming logic in the cell above.
- The notebook saves the prepared panel to `Data/earnings_regression_panel.csv`.