# Neural-network variant (MLP) â€” Earnings forecast replication

This notebook implements a simple neural-network (MLP) benchmark designed to be a **clean comparison**
to the paper's **random-forest** earnings forecasts.

Key design choice: **match the rolling-window training / testing protocol** used in `02_EarningsForecasts.ipynb`:
for each month `t`, train on the previous 12 months (subject to the same announcement-date filters) and predict the cross-section at `t`.

Outputs:
- `../data/Results/NN_wo_lookahead_raw.parquet` (or `NN_with_lookahead_raw.parquet`)
- optional MSE comparison CSV vs the RF baseline.

Run after:
1. `01_Preprocess.ipynb` (creates `df_train_new.parquet`)
2. `02_EarningsForecasts.ipynb` (creates RF benchmark parquets, if you want comparisons)


In [None]:
import pandas as pd
from pandas.tseries.offsets import MonthEnd

from functions.nn_forecast import MLPSpec, MLPParams, run_mlp_forecasts, summarize_mse_comparison

In [None]:
# Load the preprocessed panel
df = pd.read_parquet('../data/Results/df_train_new.parquet')
df['YearMonth'] = pd.to_datetime(df['YearMonth']) + MonthEnd(0)
df.shape

## Build feature specs (match RF notebook)

Set `mode = "woLAB"` for the look-ahead-bias-free specification (recommended).


In [None]:
mode = "woLAB"  # "wLAB" or "woLAB"

ratio_chars = ['CAPEI', 'bm', 'evm', 'pe_exi', 'pe_inc', 'ps', 'pcf', 'dpr', 'npm', 'opmbd', 'opmad', 'gpm', 'ptpm', 'cfm', 'roa', 'roe', 'roce', 'efftax', 'aftret_eq', 'aftret_invcapx', 'aftret_equity', 'pretret_noa', 'pretret_earnat', 'GProf', 'equity_invcap', 'debt_invcap', 'totdebt_invcap', 'capital_ratio', 'int_debt', 'int_totdebt', 'cash_lt', 'invt_act', 'rect_act', 'debt_at', 'debt_ebitda', 'short_debt', 'curr_debt', 'lt_debt', 'profit_lct', 'ocf_lct', 'cash_debt', 'fcf_ocf', 'lt_ppent', 'dltt_be', 'debt_assets', 'debt_capital', 'de_ratio', 'intcov', 'intcov_ratio', 'cash_ratio', 'quick_ratio', 'curr_ratio', 'cash_conversion', 'inv_turn', 'at_turn', 'rect_turn', 'pay_turn', 'sale_invcap', 'sale_equity', 'sale_nwc', 'rd_sale', 'adv_sale', 'staff_sale', 'accrual', 'ptb', 'PEG_trailing', 'divyield']
macro = ["RGDP","RCON","INDPROD","UNEMP"]

def lag(col_wlab: str, col_wolab: str) -> str:
    return col_wlab if mode == "wLAB" else col_wolab

specs = [
    MLPSpec("q1","EPS_true_q1","EPS_ana_q1","ANNDATS_q1",
            ratio_chars + ["ret","prc","EPS_true_l1_q1","EPS_ana_q1"] + macro),
    MLPSpec("q2","EPS_true_q2","EPS_ana_q2","ANNDATS_q2",
            ratio_chars + ["ret","prc",lag("EPS_true_l1_q2","EPS_true_l1_q1"),"EPS_ana_q2"] + macro),
    MLPSpec("q3","EPS_true_q3","EPS_ana_q3","ANNDATS_q3",
            ratio_chars + ["ret","prc",lag("EPS_true_l1_q3","EPS_true_l1_q1"),"EPS_ana_q3"] + macro),
    MLPSpec("y1","EPS_true_y1","EPS_ana_y1","ANNDATS_y1",
            ratio_chars + ["ret","prc","EPS_true_l1_y1","EPS_ana_y1"] + macro),
    MLPSpec("y2","EPS_true_y2","EPS_ana_y2","ANNDATS_y2",
            ratio_chars + ["ret","prc",lag("EPS_true_l1_y2","EPS_true_l1_y1"),"EPS_ana_y2"] + macro),
]
len(specs), specs[0]

## Run rolling MLP forecasts

Hyperparameters are intentionally simple:
- 3 hidden layers (64, 32, 16)
- weight decay via `alpha`
- early stopping on a held-out validation split inside each training window


In [None]:
params = MLPParams(hidden_layer_sizes=(64, 32, 16), alpha=1e-4, max_iter=200,
                   early_stopping=True, random_state=0)

nn_forecast = run_mlp_forecasts(df=df, specs=specs, params=params,
                                train_window_months=12,
                                start_date="1986-01-31",
                                verbose_every=12)
nn_forecast.head()

In [None]:
# Save forecasts
out_path = '../data/Results/NN_with_lookahead_raw.parquet' if mode=='wLAB' else '../data/Results/NN_wo_lookahead_raw.parquet'
nn_forecast.to_parquet(out_path, index=False)
out_path

## Evaluate vs Random Forest benchmark (optional)

Requires that you have already run `02_EarningsForecasts.ipynb` to create the RF benchmark parquet.


In [None]:
rf_path = '../data/Results/RF_with_lookahead_raw_005.parquet' if mode=='wLAB' else '../data/Results/RF_wo_lookahead_raw_005.parquet'
rf = pd.read_parquet(rf_path)

summary = summarize_mse_comparison(rf_df=rf, nn_df=nn_forecast)
summary

In [None]:
# Save summary
out_csv = f'../data/Results/NN_vs_RF_MSE_{mode}.csv'
summary.to_csv(out_csv, index=False)
out_csv