### 1. Import Dependencies

#### How Binning Helps?

**1. Non-linear relationship with target**
- Binning can help capture non-linear patterns that a linear model might miss

**2. Skewed distribution**
- Binning can smooth out skew and reduce the effect of extreme values

**3. Interpretability is key**
- Easier for business users to understand "age 18-25" than "age = 23"

**4. Model is prone to overfitting**
- Binning reduces granularity → fewer splits → less overfitting

**5. Need to reduce cardinality**
- Helps when a numeric column has too many unique values

**6. Sparse or noisy data**
- Binning can group rare or noisy values to improve signal strength

In [35]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import os

### 2. Basic Processing

In [36]:
df = pd.read_csv('data/processed/ChurnModelling_Outlier_Handled.csv')
df.head()

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,Female,42.0,2,0.0,1,1,1,101348.88,1
1,608,Spain,Female,41.0,1,83807.86,1,0,1,112542.58,0
2,502,France,Female,42.0,8,159660.8,3,1,0,113931.57,1
3,699,France,Female,38.91,1,0.0,2,0,0,93826.63,0
4,850,Spain,Female,43.0,2,125510.82,1,1,1,79084.1,0


In [37]:
df['Age'].min(), df['Age'].max()

(18.0, 92.0)

to apply binning you must know the exact range of the feature

- 350 - 580 => 'Poor'
- 580 - 670 => 'Fair'
- 670 - 740 => 'Good'
- 740 - 800 => 'Very Good'
- 800 - 850 => 'Excelent'

In [38]:
def custom_binning(value):
    if value < 580:
        return 'Poor'
    elif 580 <= value < 670:
        return 'Fair'
    elif 670 <= value < 740:
        return 'Good'
    elif 740 <= value < 800:
        return 'Very Good'
    elif 800 <= value <= 850:
        return 'Excellent'
    else:
        assert True, "Value out of range"

In [39]:
df['CreditScoreBins'] = df['CreditScore'].apply(custom_binning)
del df['CreditScore']
df

Unnamed: 0,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited,CreditScoreBins
0,France,Female,42.00,2,0.00,1,1,1,101348.88,1,Fair
1,Spain,Female,41.00,1,83807.86,1,0,1,112542.58,0,Fair
2,France,Female,42.00,8,159660.80,3,1,0,113931.57,1,Poor
3,France,Female,38.91,1,0.00,2,0,0,93826.63,0,Good
4,Spain,Female,43.00,2,125510.82,1,1,1,79084.10,0,Excellent
...,...,...,...,...,...,...,...,...,...,...,...
9995,France,Male,39.00,5,0.00,2,1,0,96270.64,0,Very Good
9996,France,Male,35.00,10,57369.61,1,1,1,101699.77,0,Poor
9997,France,Female,36.00,7,0.00,1,0,1,42085.58,1,Good
9998,Germany,Male,42.00,3,75075.31,2,1,0,92888.52,1,Very Good


In [40]:
df.to_csv('data/processed/ChurnModelling_Binning_Applied.csv', index=False)