# Stage 09 — Homework Starter Notebook

In the lecture, we learned how to create engineered features. Now it’s your turn to apply those ideas to your own project data.

In [1]:
import pandas as pd
import numpy as np

np.random.seed(0)
n = 100

df = pd.DataFrame({
    'customer_id': range(1, n+1),
    'age': np.random.normal(40, 8, size=n).clip(22, 70).round(1),
    'income': np.random.lognormal(mean=10.6, sigma=0.3, size=n).round(2),
    'monthly_spend': np.random.normal(2000, 600, n).astype(int),
    'credit_score': np.random.normal(680, 50, n).astype(int)
})
df.head()

Unnamed: 0,customer_id,age,income,monthly_spend,credit_score
0,1,54.1,70611.28,1778,614
1,2,43.2,26787.01,1856,762
2,3,47.8,27415.24,2659,674
3,4,57.9,53681.25,2393,645
4,5,54.9,28227.81,2384,713


## TODO: Implement at least 2 engineered features here

### Rationale for Feature 1
Explain why this feature may help a model. Reference your EDA.

In [2]:
# Feature 1: Spend-to-Income Ratio
df['spend_income_ratio'] = df['monthly_spend'] / df['income'] 
df[['income','monthly_spend','spend_income_ratio']].head()

Unnamed: 0,income,monthly_spend,spend_income_ratio
0,70611.28,1778,0.02518
1,26787.01,1856,0.069287
2,27415.24,2659,0.09699
3,53681.25,2393,0.044578
4,28227.81,2384,0.084456


This feature captures how much a person spends relative to their income.
- From EDA, we saw that income is heavily right-skewed while spend varies widely.
- Raw values may not fully reflect affordability.
- Normalizing spend by income provides a proportional measure, which is often more predictive of financial stability.

### Rationale for Feature 2
Explain why this feature may help a model. Reference your EDA.

In [3]:
# Feature 2: Rolling Spend Mean 
df['rolling_spend_mean'] = df['monthly_spend'].rolling(window=3, min_periods=1).mean()
df[['monthly_spend','rolling_spend_mean']].head(10)

Unnamed: 0,monthly_spend,rolling_spend_mean
0,1778,1778.0
1,1856,1817.0
2,2659,2097.666667
3,2393,2302.666667
4,2384,2478.666667
5,1029,1935.333333
6,1985,1799.333333
7,1557,1523.666667
8,2167,1903.0
9,1941,1888.333333


This feature captures short-term trends in spending behavior.
- Spending may fluctuate monthly, but a rolling average smooths noise and highlights recent habits.
- This is useful if the target variable is linked to behavior consistency (e.g., stable vs. volatile spenders).
- From EDA, we noticed occasional spikes; smoothing helps reduce the impact of single outliers.