# Stage 09 — Homework Starter Notebook

In the lecture, we learned how to create engineered features. Now it’s your turn to apply those ideas to your own project data.

In [1]:
import pandas as pd
import numpy as np

# Example synthetic data (replace with your project dataset)
np.random.seed(0)
n = 100
df = pd.DataFrame({
    'income': np.random.normal(60000, 15000, n).astype(int),
    'monthly_spend': np.random.normal(2000, 600, n).astype(int),
    'credit_score': np.random.normal(680, 50, n).astype(int)
})
df.head()

Unnamed: 0,income,monthly_spend,credit_score
0,86460,3129,661
1,66002,1191,668
2,74681,1237,734
3,93613,2581,712
4,88013,1296,712


## Done: Implement at least 2 engineered features here

In [2]:
# === Feature Engineering (minimal, student-style) ===

# 1) Spending-to-Income ratio (how large spending is relative to income)
#    - guard against division by zero
den = np.where(df['income'] > 0, df['income'], np.nan)
df['spend_income_ratio'] = (df['monthly_spend'] / den).fillna(0.0)

# 2) Estimated monthly savings (income is yearly here, so divide by 12)
#    - positive => likely saving; negative => overspending
df['monthly_savings'] = (df['income'] / 12.0 - df['monthly_spend']).round(0)

# 3) Standardized credit score (z-score) for comparability across users
mu = df['credit_score'].mean()
sigma = df['credit_score'].std(ddof=0)
df['credit_zscore'] = (df['credit_score'] - mu) / (sigma if sigma != 0 else 1.0)

# quick peek
df.head()

Unnamed: 0,income,monthly_spend,credit_score,spend_income_ratio,monthly_savings,credit_zscore
0,86460,3129,661,0.03619,4076.0,-0.326824
1,66002,1191,668,0.018045,4309.0,-0.179701
2,74681,1237,734,0.016564,4986.0,1.207464
3,93613,2581,712,0.027571,5220.0,0.745076
4,88013,1296,712,0.014725,6038.0,0.745076


### Rationale for Feature 1
Explain why this feature may help a model. Reference your EDA.

### Rationale for Feature 1: Spend-to-Income Ratio  

The *spend_income_ratio* shows how much of a person’s income is used for monthly spending.  
From the EDA, we noticed that some users have relatively high spending compared to their income,  
which could indicate higher financial stress and a bigger chance of default.  

This ratio normalizes spending by income, making users more comparable even if they earn  
different salaries. It may help the model capture risk patterns more effectively than  
using raw `monthly_spend` or `income` alone.


### Rationale for Feature 2
Explain why this feature may help a model. Reference your EDA.

### Rationale for Feature 2: Monthly Savings  

The *monthly_savings* feature is calculated as `income - monthly_spend`.  
It shows how much money a person is able to keep after covering their basic expenses.  

From the EDA, we observed that users with lower or even negative savings  
are more likely to be at financial risk. By including this feature,  
the model can better capture differences in financial health that are  
not obvious when only looking at `income` or `monthly_spend` separately.  

This feature directly reflects financial stability and may improve  
the model’s ability to predict default risk.
