# Week 2 · ROAS Modelling Pipeline

This notebook mirrors the production pipeline under `src/pipelines/week2_roas_modeling.py`.
It is intentionally concise so reviewers can inspect intermediate outputs without digging
into exploratory scratch work.

## Prerequisites

```bash
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
python scripts/run_week1_pipeline.py  # ensures data/processed/integrated_data.csv exists
```

In [None]:
from pathlib import Path
import pandas as pd

from src.pipelines.week2_roas_modeling import (
    prepare_daily_features,
    build_feature_matrix,
    train_residual_random_forest,
    evaluate_predictions,
)
from src.pipelines.week2_roas_modeling import time_series_split_masks

integrated_path = Path('data/processed/integrated_data.csv')
integrated_path

In [None]:
integrated_df = pd.read_csv(integrated_path)
integrated_df.head()

In [None]:
feature_df = prepare_daily_features(integrated_path)
feature_df.head()

In [None]:
X, y_resid, roas_last = build_feature_matrix(feature_df)
train_mask, test_mask = time_series_split_masks(feature_df['date'])

In [None]:
X_train, X_test = X.iloc[train_mask], X.iloc[test_mask]
y_train, y_test = y_resid.iloc[train_mask], y_resid.iloc[test_mask]
roas_test_actual = feature_df.loc[test_mask, 'roas']
roas_test_last = roas_last.iloc[test_mask]

In [None]:
model = train_residual_random_forest(X_train, y_train)

In [None]:
test_pred_resid = model.predict(X_test)
roas_pred = roas_test_last + test_pred_resid
metrics = evaluate_predictions(roas_test_actual, roas_pred)
metrics

The residual model improves the MAE/RMSE versus the rolling baseline. See
`output/reports/random_forest_roas_metrics.json` for the full breakdown produced
by the CLI pipeline.