#### Age Bands

From the EDA churn increases with age.
Age bands capture non-linear effects better than raw age

In [54]:
import pandas as pd
import numpy as np

In [55]:
df = pd.read_csv('C:/customerchurnprediction/data/Bank Customer Churn Prediction.csv')

#Drop the customer_id column
df.drop(['customer_id'], axis=1, inplace=True)

df['age_group']=pd.cut(df['age'], bins=[0, 30, 40, 50, 60, 120],
    labels=['<30','30-39','40-49','50-59','60+'], include_lowest=True)

Engagement Features with HIGH IMPACT

1. Active Member vs Number of Products

In [56]:
df['inactive_single_product'] = (df['active_member']==0) & (df['products_number']==1).astype(int)

Products per Year of Tenure

In [57]:
df['products_per_tenure'] = (df['products_number'] / df['tenure'] + 1)

#### Financial Behavior Features

* Balance Flags

* Zero-balance customers are often dormant.
* High-balance churn has high financial risk.

In [58]:
df['zero_balance'] = (df['balance'] == 0).astype(int)
df['high_balance'] = (df['balance']> df['balance'].median()).astype(int)

Balance to Products Ratio

This identifies under-utilized high-value customers

In [59]:
df['balance_per_product'] = (df['balance'] / df['products_number'] +1)

### Credit Risk Signals

In [60]:
df['credit_score_band'] = pd.cut(df['credit_score'], bins=[300,580, 670, 740, 800, 900],
       labels=['Poor','Fair','Good','Very Good', 'Excellent'])

### Tenure-Based Risk

* From these early-stage churn is operationally preventable
* Having this information enables targeted onboarding interventions

In [61]:
df['early_customer'] = (df['tenure'] < 3).astype(int)

### Composite Risk Indicator

* The main objective for this is to encode multiple risk behaviors into one interpretable signal

In [62]:
df['churn_risk_score'] = (
    df['inactive_single_product']+
    df['zero_balance']+
    df['early_customer']
)

### Sanity Checks


In [63]:
df.isnull().sum()

credit_score               0
country                    0
gender                     0
age                        0
tenure                     0
balance                    0
products_number            0
credit_card                0
active_member              0
estimated_salary           0
churn                      0
age_group                  0
inactive_single_product    0
products_per_tenure        0
zero_balance               0
high_balance               0
balance_per_product        0
credit_score_band          0
early_customer             0
churn_risk_score           0
dtype: int64

In [64]:
df.head()

Unnamed: 0,credit_score,country,gender,age,tenure,balance,products_number,credit_card,active_member,estimated_salary,churn,age_group,inactive_single_product,products_per_tenure,zero_balance,high_balance,balance_per_product,credit_score_band,early_customer,churn_risk_score
0,619,France,Female,42,2,0.0,1,1,1,101348.88,1,40-49,False,1.5,1,0,1.0,Fair,1,2
1,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0,40-49,False,2.0,0,0,83808.86,Fair,1,1
2,502,France,Female,42,8,159660.8,3,1,0,113931.57,1,40-49,False,1.375,0,1,53221.266667,Poor,0,0
3,699,France,Female,39,1,0.0,2,0,0,93826.63,0,30-39,False,3.0,1,0,1.0,Good,1,2
4,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0,40-49,False,1.5,0,1,125511.82,Excellent,1,1


Replace infinities Safely

In [65]:
df.replace([np.inf, -np.inf], np.nan, inplace=True)

Save the feature engineered dataset

In [66]:
df.to_csv('C:\customerchurnprediction\data\engineered\engineeredbank_churn.csv', index=False)