# **Bank Customer Churn Classification**

**Bank Customer Churn:** Understanding Customer Attrition in the Banking Industry

In today's competitive banking landscape, retaining customers is a top priority for financial institutions. However, customer churn, the phenomenon where customers discontinue their relationship with a bank, poses a significant challenge. Understanding why customers churn and identifying early warning signs can enable banks to implement targeted retention strategies and maintain a loyal customer base.

in this notepade build model to classifier  Bank Customer Churn using this dataset : [Bank Customer Churn](https://www.kaggle.com/datasets/bhuviranga/customer-churn-data)

# **Import the library**

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

**Read the dataset using Pandas lib ans save the dataframe in DF**

In [2]:
df=pd.read_csv('Bank Customer Churn Prediction.csv')

In [3]:
df.head(10)

Unnamed: 0,customer_id,credit_score,country,gender,age,tenure,balance,products_number,credit_card,active_member,estimated_salary,churn
0,15634602,619,France,Female,42,2,0.0,1,1,1,101348.88,1
1,15647311,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,15619304,502,France,Female,42,8,159660.8,3,1,0,113931.57,1
3,15701354,699,France,Female,39,1,0.0,2,0,0,93826.63,0
4,15737888,850,Spain,Female,43,2,125510.82,1,1,1,79084.1,0
5,15574012,645,Spain,Male,44,8,113755.78,2,1,0,149756.71,1
6,15592531,822,France,Male,50,7,0.0,2,1,1,10062.8,0
7,15656148,376,Germany,Female,29,4,115046.74,4,1,0,119346.88,1
8,15792365,501,France,Male,44,4,142051.07,2,0,1,74940.5,0
9,15592389,684,France,Male,27,2,134603.88,1,1,1,71725.73,0


In [4]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10000 entries, 0 to 9999
Data columns (total 12 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   customer_id       10000 non-null  int64  
 1   credit_score      10000 non-null  int64  
 2   country           10000 non-null  object 
 3   gender            10000 non-null  object 
 4   age               10000 non-null  int64  
 5   tenure            10000 non-null  int64  
 6   balance           10000 non-null  float64
 7   products_number   10000 non-null  int64  
 8   credit_card       10000 non-null  int64  
 9   active_member     10000 non-null  int64  
 10  estimated_salary  10000 non-null  float64
 11  churn             10000 non-null  int64  
dtypes: float64(2), int64(8), object(2)
memory usage: 937.6+ KB


In [5]:
df.describe()

Unnamed: 0,customer_id,credit_score,age,tenure,balance,products_number,credit_card,active_member,estimated_salary,churn
count,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0,10000.0
mean,15690940.0,650.5288,38.9218,5.0128,76485.889288,1.5302,0.7055,0.5151,100090.239881,0.2037
std,71936.19,96.653299,10.487806,2.892174,62397.405202,0.581654,0.45584,0.499797,57510.492818,0.402769
min,15565700.0,350.0,18.0,0.0,0.0,1.0,0.0,0.0,11.58,0.0
25%,15628530.0,584.0,32.0,3.0,0.0,1.0,0.0,0.0,51002.11,0.0
50%,15690740.0,652.0,37.0,5.0,97198.54,1.0,1.0,1.0,100193.915,0.0
75%,15753230.0,718.0,44.0,7.0,127644.24,2.0,1.0,1.0,149388.2475,0.0
max,15815690.0,850.0,92.0,10.0,250898.09,4.0,1.0,1.0,199992.48,1.0


# **Pre-processing the Dataset for Bank Customer Churn Prediction**

To ensure accurate and effective training of the classifier for bank customer churn prediction, it is essential to pre-process the dataset. This involves several steps to handle missing values, encode categorical variables, and normalize numerical features. Let's delve into each of these steps:

**Handling Missing Values**
The presence of missing values in the dataset can hinder the performance of the classifier. Missing values can be dealt with in various ways, depending on the extent and nature of missingness. Some common techniques include:

In [6]:
df.isnull().sum()

customer_id         0
credit_score        0
country             0
gender              0
age                 0
tenure              0
balance             0
products_number     0
credit_card         0
active_member       0
estimated_salary    0
churn               0
dtype: int64

**Encoding Categorical Variables**
Categorical variables need to be encoded as numerical values before feeding them into the classifier. This ensures compatibility with machine learning algorithms. Some common encoding techniques include:

**Label encoding:** Assigning a unique numerical label to each category of a categorical variable. However, this method may introduce an unintended ordinal relationship among the categories, which may not be appropriate in all cases.

In [7]:
from sklearn.preprocessing import LabelEncoder

# Initialize the LabelEncoder
label_encoder = LabelEncoder()

for attribute in df.columns:
    df[attribute] = label_encoder.fit_transform(df[attribute])


**Dropping rows or columns:** If the missing values are minimal or occur in variables that have limited impact on the classification task, removing the corresponding rows or columns may be a suitable approach.

In [9]:
df=df.drop('customer_id' , axis=1)

# **Classifcation**
Using Random Forest is a popular machine learning algorithm used for both classification and regression tasks. It is an ensemble learning method that combines multiple decision trees to make predictions.

In [11]:
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

In [12]:
# Split the dataset into features (X) and target variable (y)
X = df.drop('churn', axis=1)
y = df['churn']

In [13]:
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In [14]:
# Initialize and train the Random Forest classifier
rf_classifier = RandomForestClassifier()
rf_classifier.fit(X_train, y_train)


In [15]:
# Make predictions on the test set
y_pred = rf_classifier.predict(X_test)

In [16]:
# Evaluate the classifier's performance
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

# Print the evaluation metrics
print("Accuracy: {:.2f}".format(accuracy))
print("Precision: {:.2f}".format(precision))
print("Recall: {:.2f}".format(recall))
print("F1-Score: {:.2f}".format(f1))

Accuracy: 0.86
Precision: 0.74
Recall: 0.47
F1-Score: 0.57
