#### 1.Import Dependencies

How Binning Helps?
1. Non-linear relationship with target
- Bining can help cature non-linear patterns that a linear model might miss

2. Skewed distribution
- Binning can smooth out skew and reduce the effect of extreme values

3. Interpretability is the key
- Easier for business users to understand "age 18-25" than "age=23"

4. Model is prone to overfitting
- Binning reduces granularity -> fewer splits -> less overfitting

5. Need to reduce cardinality
- Helps when a numeric column has too many unique values

6. Sparse or noisy data
- Binning can group rare or noisy values to improve signal strength

In [12]:
import os
import pandas as pd 
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

#### 2. Basic Processing

In [13]:
df = pd.read_csv("../Week 01/Data/processed/churnModelling_outliers_handled.csv")
df.head(5)

Unnamed: 0.1,Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,0,619,France,0,42.0,2,0.0,1,1,1,101348.88,1
1,1,608,Spain,0,41.0,1,83807.86,1,0,1,112542.58,0
2,2,502,France,0,42.0,8,159660.8,3,1,0,113931.57,1
3,3,699,France,0,38.91,1,0.0,2,0,0,93826.63,0
4,4,850,Spain,0,43.0,2,125510.82,1,1,1,79084.1,0


In [8]:
df['Age'].min(),df['Age'].max()

(np.float64(18.0), np.float64(92.0))

Before feature binning you have to have a good understanding of lower-bound and upper-bound. If you uncertain about it don't go for it 

350 - 580 => 'Poor'
580 - 670 => 'Fair'
670 - 740 => 'Good'
740 - 800 => 'Very Good' 
800 - 850 => 'Excellent'

In [14]:
def custom_binning_credit_score(score):
    if score < 580:
        return 'Poor'
    elif score < 670:
        return 'Fair'
    elif score < 740:
        return 'Good'
    elif score < 800:
        return 'Very Good' 
    elif score <= 850:
        return 'Excellent'
    else:
        assert True, "Credit Score can't go higher than 850"

df['CreditScoreBins'] = df['CreditScore'].apply(custom_binning_credit_score)
del df['CreditScore']

df.head(10)
        

Unnamed: 0.1,Unnamed: 0,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited,CreditScoreBins
0,0,France,0,42.0,2,0.0,1,1,1,101348.88,1,Fair
1,1,Spain,0,41.0,1,83807.86,1,0,1,112542.58,0,Fair
2,2,France,0,42.0,8,159660.8,3,1,0,113931.57,1,Poor
3,3,France,0,38.91,1,0.0,2,0,0,93826.63,0,Good
4,4,Spain,0,43.0,2,125510.82,1,1,1,79084.1,0,Excellent
5,5,Spain,1,44.0,8,113755.78,2,1,0,149756.71,1,Fair
6,6,France,1,50.0,7,0.0,2,1,1,10062.8,0,Excellent
7,7,Germany,0,29.0,4,115046.74,4,1,0,119346.88,1,Poor
8,8,France,1,44.0,4,142051.07,2,0,1,74940.5,0,Poor
9,9,France,1,27.0,2,134603.88,1,1,1,71725.73,0,Good


In [17]:
df['Gender'] = df['Gender'].apply(lambda x:"Male" if x==1 else "Female")
df

Unnamed: 0.1,Unnamed: 0,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited,CreditScoreBins
0,0,France,Female,42.00,2,0.00,1,1,1,101348.88,1,Fair
1,1,Spain,Female,41.00,1,83807.86,1,0,1,112542.58,0,Fair
2,2,France,Female,42.00,8,159660.80,3,1,0,113931.57,1,Poor
3,3,France,Female,38.91,1,0.00,2,0,0,93826.63,0,Good
4,4,Spain,Female,43.00,2,125510.82,1,1,1,79084.10,0,Excellent
...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9995,France,Female,39.00,5,0.00,2,1,0,96270.64,0,Very Good
9996,9996,France,Female,35.00,10,57369.61,1,1,1,101699.77,0,Poor
9997,9997,France,Female,36.00,7,0.00,1,0,1,42085.58,1,Good
9998,9998,Germany,Female,42.00,3,75075.31,2,1,0,92888.52,1,Very Good


In [18]:
df.to_csv("../Week 01/Data/processed/churnModelling_binning_applied.csv")