<a href="https://colab.research.google.com/github/ShreyaKR26/bank-customer-churn/blob/main/bankcustomerchurn.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

 Objective:

The primary objective of a bank customer churn model is to predict which customers are likely to leave or discontinue their banking relationship within a specified timeframe. By identifying these customers early, banks can implement targeted retention strategies to improve customer satisfaction and loyalty, ultimately reducing churn rates and increasing profitability. Key goals include identifying high-risk customers through predictive analytics, understanding the drivers of churn—such as service quality, fees, and customer service experiences—enhancing customer retention via personalized marketing campaigns, and optimizing resource allocation to focus efforts on those at risk.

Data Sources:

To build an effective churn prediction model, various data sources can be utilized. These include customer demographics, such as age, gender, and income; account information, including account types, tenure, and balance history; and behavioral data that captures patterns of customer interactions, such as website visits and service call frequency. Additionally, transaction data detailing amounts and frequencies, customer feedback from surveys and Net Promoter Scores (NPS), and external market data, including economic indicators and competitive analysis, can be leveraged. By integrating these diverse data sources, banks can create a robust model that offers valuable insights into customer behavior and effective retention strategies.

In [None]:
# Upload CSV file
from google.colab import files
uploaded = files.upload()

# Import the uploaded CSV file using pandas
import pandas as pd
df = pd.read_csv(list(uploaded.keys())[0])

# Display the first few rows
print(df.head())

Saving Churn_Modelling.csv to Churn_Modelling (1).csv
   RowNumber  CustomerId   Surname  CreditScore Geography  Gender  Age  \
0          1    15634602  Hargrave          619    France  Female   42   
1          2    15647311      Hill          608     Spain  Female   41   
2          3    15619304      Onio          502    France  Female   42   
3          4    15701354      Boni          699    France  Female   39   
4          5    15737888  Mitchell          850     Spain    Male   43   

   Tenure    Balance  NumOfProducts  HasCrCard  IsActiveMember  \
0       2       0.00              1          1               1   
1       1   83807.86              1          0               1   
2       8  159660.80              3          1               0   
3       1       0.00              2          0               0   
4       2  125510.82              1          1               1   

   EstimatedSalary  Exited  
0        101348.88       1  
1        112542.58       0  
2        113931.5

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score


In [None]:
# Load dataset
data = pd.read_csv('Churn_Modelling.csv')


In [None]:
# Explore dataset
print(data.head())



   RowNumber  CustomerId   Surname  CreditScore Geography  Gender  Age  \
0          1    15634602  Hargrave          619    France  Female   42   
1          2    15647311      Hill          608     Spain  Female   41   
2          3    15619304      Onio          502    France  Female   42   
3          4    15701354      Boni          699    France  Female   39   
4          5    15737888  Mitchell          850     Spain    Male   43   

   Tenure    Balance  NumOfProducts  HasCrCard  IsActiveMember  \
0       2       0.00              1          1               1   
1       1   83807.86              1          0               1   
2       8  159660.80              3          1               0   
3       1       0.00              2          0               0   
4       2  125510.82              1          1               1   

   EstimatedSalary  Exited  
0        101348.88       1  
1        112542.58       0  
2        113931.57       1  
3         93826.63       0  
4         790

In [None]:
# Select relevant columns (drop unnecessary ones)
data = data.drop(["RowNumber", "CustomerId", "Surname"], axis=1)



In [None]:
# Encode categorical features
label_encoder_geography = LabelEncoder()
data['Geography'] = label_encoder_geography.fit_transform(data['Geography'])

label_encoder_gender = LabelEncoder()
data['Gender'] = label_encoder_gender.fit_transform(data['Gender'])


In [None]:
# Define features (X) and target (y)
X = data.drop("Exited", axis=1)  # Features
y = data["Exited"]             # Target


In [None]:
# Split dataset into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)



In [None]:
# Standardize features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


In [None]:
# Train the model (Random Forest Classifier)
model = RandomForestClassifier(random_state=42, n_estimators=100)
model.fit(X_train, y_train)


In [None]:
# Make predictions
y_pred = model.predict(X_test)



In [None]:
# Evaluate the model
print("Confusion Matrix:\n", confusion_matrix(y_test, y_pred))
print("Classification Report:\n", classification_report(y_test, y_pred))
print("Accuracy:", accuracy_score(y_test, y_pred))


Confusion Matrix:
 [[0 1]
 [0 0]]
Classification Report:
               precision    recall  f1-score   support

           0       0.00      0.00      0.00       1.0
           1       0.00      0.00      0.00       0.0

    accuracy                           0.00       1.0
   macro avg       0.00      0.00      0.00       1.0
weighted avg       0.00      0.00      0.00       1.0

Accuracy: 0.0


  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))
  _warn_prf(average, modifier, f"{metric.capitalize()} is", len(result))


In [None]:
# Save the model
import joblib
joblib.dump(model, "bank_customer_churn_model.pkl")


['bank_customer_churn_model.pkl']

In [None]:
# Load and test the saved model
loaded_model = joblib.load("bank_customer_churn_model.pkl")
print("Loaded model accuracy:", loaded_model.score(X_test, y_test))

Loaded model accuracy: 0.0
