# Stage 09 — Homework Starter Notebook

In the lecture, we learned how to create engineered features. Now it’s your turn to apply those ideas to your own project data.

In [2]:
import pandas as pd
import numpy as np

# Example synthetic data (replace with your project dataset)
df = pd.read_csv('../raw_site-project_table-data_20250828-153211.csv')
df

Unnamed: 0,date,VIG,CPI,UNEMPLOYMENT,SPREAD
0,2007-01-01,54.29500,202.416,4.6,-0.12
1,2007-02-01,55.10000,203.499,4.5,-0.13
2,2007-03-01,53.49500,205.352,4.4,-0.01
3,2007-04-01,55.25925,206.686,4.5,0.02
4,2007-05-01,56.77005,207.949,4.4,-0.02
...,...,...,...,...,...
218,2025-03-01,195.09000,319.799,4.2,0.31
219,2025-04-01,186.02000,320.795,4.2,0.50
220,2025-05-01,195.60000,321.465,4.2,0.50
221,2025-06-01,200.73500,322.561,4.1,0.49


## TODO: Implement at least 2 engineered features here

In [3]:
# Example template:
# df['spend_income_ratio'] = df['monthly_spend'] / df['income']  # TODO: Your feature
df['spread_L1'] = df['SPREAD'].shift(1).fillna(0)
df["CPI_change"] = df["CPI"].pct_change().fillna(0)
df["UNEMP_change"] = df["UNEMPLOYMENT"].diff().fillna(0)
df


Unnamed: 0,date,VIG,CPI,UNEMPLOYMENT,SPREAD,spread_L1,CPI_change,UNEMP_change
0,2007-01-01,54.29500,202.416,4.6,-0.12,0.00,0.000000,0.0
1,2007-02-01,55.10000,203.499,4.5,-0.13,-0.12,0.005350,-0.1
2,2007-03-01,53.49500,205.352,4.4,-0.01,-0.13,0.009106,-0.1
3,2007-04-01,55.25925,206.686,4.5,0.02,-0.01,0.006496,0.1
4,2007-05-01,56.77005,207.949,4.4,-0.02,0.02,0.006111,-0.1
...,...,...,...,...,...,...,...,...
218,2025-03-01,195.09000,319.799,4.2,0.31,0.24,0.002247,0.1
219,2025-04-01,186.02000,320.795,4.2,0.50,0.31,0.003114,0.0
220,2025-05-01,195.60000,321.465,4.2,0.50,0.50,0.002089,0.0
221,2025-06-01,200.73500,322.561,4.1,0.49,0.50,0.003409,-0.1


### Feature Engineering Rationale

- **Lagged Yield Spread (`spread_L1`)**  
  The slope of the yield curve (10Y–2Y spread) is widely viewed as a leading indicator of economic conditions and equity performance.  
  Using the **lagged value** ensures that only information available at time *t-1* is used to predict VIG returns at time *t*, avoiding look-ahead bias.

- **CPI Change (`CPI_change`)**  
  Instead of the absolute CPI level, the **percentage change** captures inflation shocks, which directly influence interest rates, discount factors, and equity valuations.  
  A sudden rise in CPI can signal tighter monetary policy ahead, which often pressures dividend-focused equity indexes like VIG.

- **Unemployment Change (`UNEMP_change`)**  
  The level of unemployment matters less than its **month-to-month change**, which reflects momentum in the labor market.  
  Rising unemployment usually signals weakening economic conditions, while falling unemployment supports consumption and corporate earnings.  
  Including this difference highlights the *direction* of labor market stress as a potential driver of dividend index performance.
