# Stage 09 — Homework Starter Notebook

In the lecture, we learned how to create engineered features. Now it’s your turn to apply those ideas to your own project data.

In [1]:
import pandas as pd
import numpy as np

# Example synthetic data (replace with your project dataset)
np.random.seed(0)
n = 100
df = pd.DataFrame({
    'income': np.random.normal(60000, 15000, n).astype(int),
    'monthly_spend': np.random.normal(2000, 600, n).astype(int),
    'credit_score': np.random.normal(680, 50, n).astype(int)
})
df.head()

Unnamed: 0,income,monthly_spend,credit_score
0,86460,3129,661
1,66002,1191,668
2,74681,1237,734
3,93613,2581,712
4,88013,1296,712


## TODO: Implement at least 2 engineered features here

In [8]:
# Example template:
df['spend_income_ratio'] = df['monthly_spend'] / df['income']  # TODO: Your feature
df.head()

Unnamed: 0,income,monthly_spend,credit_score,spend_income_ratio,credit_score_category,income_bracket
0,86460,3129,661,0.03619,fair,medium
1,66002,1191,668,0.018045,fair,medium
2,74681,1237,734,0.016564,good,medium
3,93613,2581,712,0.027571,good,high
4,88013,1296,712,0.014725,good,medium


### Rationale for Feature 1: Spend to Income Ratio
This feature calculates the proportion of income that is spent monthly. From economic theory, we know that spending patterns relative to income can be a strong predictor of financial behavior. A high ratio might indicate financial stress or aggressive spending habits, while a very low ratio might indicate frugality or high savings rate. This engineered feature captures this relationship in a single metric that would likely be more predictive than either variable alone.

In [7]:
# Feature 2: Credit Score Category
def categorize_credit_score(score):
    if score < 580:
        return 'poor'
    elif score < 670:
        return 'fair'
    elif score < 740:
        return 'good'
    elif score < 800:
        return 'very_good'
    else:
        return 'excellent'

df['credit_score_category'] = df['credit_score'].apply(categorize_credit_score)
df.head()

Unnamed: 0,income,monthly_spend,credit_score,spend_income_ratio,credit_score_category,income_bracket
0,86460,3129,661,0.03619,fair,medium
1,66002,1191,668,0.018045,fair,medium
2,74681,1237,734,0.016564,good,medium
3,93613,2581,712,0.027571,good,high
4,88013,1296,712,0.014725,good,medium


### Rationale for Feature 2: Credit Score Category
While the raw credit score is valuable, categorizing it allows models to capture non-linear relationships that might exist between credit quality and spending behavior. Research shows that the impact of credit score on financial behavior isn't linear - there are often threshold effects (e.g., the difference between 650 and 700 might be more significant than between 700 and 750). By creating categories aligned with industry-standard brackets, we enable the model to recognize these non-linear patterns more easily.


In [4]:
# Feature 3: Income Bracket (additional feature)
df['income_bracket'] = pd.cut(df['income'], 
                              bins=[0, 30000, 60000, 90000, 120000, np.inf], 
                              labels=['very_low', 'low', 'medium', 'high', 'very_high'])
df.head()

Unnamed: 0,income,monthly_spend,credit_score,spend_income_ratio,credit_score_category,income_bracket
0,86460,3129,661,0.03619,fair,medium
1,66002,1191,668,0.018045,fair,medium
2,74681,1237,734,0.016564,good,medium
3,93613,2581,712,0.027571,good,high
4,88013,1296,712,0.014725,good,medium


### Rationale for Feature 3: Income Bracket
Income distribution often follows a Pareto pattern where the relationship between income and spending isn't linear. By creating income brackets, we allow the model to capture potential segment-specific behaviors. For example, the spending patterns of very high-income individuals might follow different rules than those with medium income, even if their income-to-spending ratios are similar. This feature helps the model recognize these segment-specific patterns.

These engineered features transform raw financial data into more meaningful representations that align with economic theory and common financial analysis practices, potentially improving model performance by providing more interpretable and predictive inputs.