# Bank Customer Churn Prediction

# -*- coding: utf-8 -*-
"""
# Customer Churn Prediction in Banking Institutions

## Background
Customer churn, the phenomenon where clients discontinue their use of a bank's services, 
poses a significant challenge within the financial sector. Elevated churn rates can result in:
- Substantial revenue loss
- Increased customer acquisition costs
- Diminished market share

Proactive retention strategies driven by behavioral insights can enhance customer satisfaction 
and foster long-term profitability.

---

## Problem Statement
Banking institutions struggle to predict churn due to:
1. Insufficient actionable insights from transactional/demographic data
2. Limitations of reactive methods (e.g., exit surveys provide only post-churn feedback)

---

## Objective
Develop a predictive model to:
1. **Identify churn drivers**: Key factors influencing attrition
2. **Predict high-risk customers**: Probability scores for near-future churn
3. **Recommend interventions**: Personalized retention strategies

---

## Dataset Description
Contains customer profiles with churn status (Exited: Yes/No).

### Features:
| Feature           | Description                                      | Type          |
|-------------------|--------------------------------------------------|---------------|
| RowNumber         | Sequential row identifier                        | Integer       |
| CustomerId        | Unique customer identifier                       | String        |
| Surname           | Customer surname                                 | String        |
| CreditScore       | Numerical credit assessment (300-850)            | Integer       |
| Geography         | Customer location (e.g., France, Germany)        | Categorical   |
| Gender            | Male/Female                                      | Binary        |
| Age               | Customer age in years                            | Integer       |
| Tenure            | Years as bank customer                           | Integer       |
| Balance           | Account balance in currency units                | Float         |
| NumOfProducts     | Number of bank products used                     | Integer       |
| HasCrCard         | Credit card ownership (1=Yes, 0=No)              | Binary        |
| IsActiveMember    | Active usage status (1=Active, 0=Inactive)       | Binary        |
| EstimatedSalary   | Approximate annual salary                        | Float         |
| Exited            | Churn status (1=Chtions
- Temporal dynamics (Behavior changes over time)
"""
ams).
ams).
e

"<a id="cont"></a>
## 📑 Table of Contents    

<a href=#one>1. Import packages and Loading Data</a>

<a href=#two>2. Data Cleaning</a>

<a href=#three>3. Exploratory Data Analysis (EDA)</a>

<a href=#four>4. Data preprocessing</a>

<a href=#five>5. Modeling and Model Peformance</a>

<a href=#six>6. Insights and Recommendations </a>

<a href=#seven>7. Conclusion</a>
L exports.
"""

<a id="one"></a>
###  1. Importing packages 
<a href=#cont>Back to Table of Contents</a>

In [2]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt 


<a id="two"></a>
###  2. Importing and Data Cleaning
<a href=#cont>Back to Table of Contents</a>

<a id="two"></a>
####  2.1 Importing  Data 
<a href=#cont>Back to Table of Contents</a>

In [4]:
churn_dataset = pd.read_csv("Churn_Modelling.csv")
churn_dataset.head(5)

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


<a id="two"></a>
####  2.2 Data Cleaning
<a href=#cont>Back to Table of Contents</a>



**Simplifying the Dataset for Better Predictions**  
Our first look at the data revealed three columns that don’t add value to predicting customer churn:  

1. **`RowNumber`**: Just a counting system (already tracked automatically by the system).  
2. **`CustomerId`**: Random account numbers (no hidden trends here).  
3. **`Surname`**: Last names (unrelated to banking behavior).  

**Here’s why they’re being excluded:**  
- They don’t help predict outcomes.  
- Names like `Surname` could unintentionally lead to unfair decisions (e.g., discrimination based on cultural backgrounds).  
- They can’t be transformed into useful insights, even with advancerplexity: pplx.ai/sharesis)

In [5]:
churn_dataset_clean = churn_dataset.drop(columns=['RowNumber', 'CustomerId', 'Surname'])

In [11]:
churn_dataset_clean.head(5)

Unnamed: 0,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0


In [12]:
churn_dataset_clean.shape

(10000, 11)

In [21]:
print("\nNumber of Duplicates:", churn_dataset_clean.duplicated().sum())


Number of Duplicates: 0


In [16]:
print("\nMissing Values:")
print(churn_dataset_clean.isnull().sum())


Missing Values:
CreditScore        0
Geography          0
Gender             0
Age                0
Tenure             0
Balance            0
NumOfProducts      0
HasCrCard          0
IsActiveMember     0
EstimatedSalary    0
Exited             0
dtype: int64


In [24]:
print("\n The number 5 summary:")
churn_dataset_clean.describe()


 The number 5 summary:


Unnamed: 0,CreditScore,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,650.5288,38.9218,5.0128,76485.889288,1.5302,0.7055,0.5151,100090.239881,0.2037
std,96.653299,10.487806,2.892174,62397.405202,0.581654,0.45584,0.499797,57510.492818,0.402769
min,350.0,18.0,0.0,0.0,1.0,0.0,0.0,11.58,0.0
25%,584.0,32.0,3.0,0.0,1.0,0.0,0.0,51002.11,0.0
50%,652.0,37.0,5.0,97198.54,1.0,1.0,1.0,100193.915,0.0
75%,718.0,44.0,7.0,127644.24,2.0,1.0,1.0,149388.2475,0.0
max,850.0,92.0,10.0,250898.09,4.0,1.0,1.0,199992.48,1.0
