# Bank Customer Churn Model

-------------

## **Objective**

The objective of this project is to build a machine learning model to predict whether a bank customer will churn (leave the bank) based on various features such as customer demographics, account information, and transaction history.

## **Data Source**

The dataset for this project can be sourced from Kaggle: Bank Customer Churn Dataset

## **Import Library**

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, classification_report, accuracy_score

## **Import Data**

In [None]:
data = pd.read_csv('Churn_Modelling.csv')

## **Describe Data**

In [None]:
# Display the first few rows of the dataset
print(data.head())

# Summary statistics of the dataset
print(data.describe())

# Information about the dataset
print(data.info())

## **Data Visualization**

In [None]:
# Visualize the distribution of the target variable
sns.countplot(x='Exited', data=data)
plt.title('Distribution of Churned and Non-Churned Customers')
plt.show()

# Visualize some feature distributions
sns.histplot(data['CreditScore'], kde=True, bins=30)
plt.title('Distribution of Credit Scores')
plt.show()

## **Data Preprocessing**

In [None]:
# Check for missing values
print(data.isnull().sum())

# Drop irrelevant columns
data = data.drop(['RowNumber', 'CustomerId', 'Surname'], axis=1)

# Encode categorical variables
label_encoder = LabelEncoder()
data['Gender'] = label_encoder.fit_transform(data['Gender'])
data['Geography'] = label_encoder.fit_transform(data['Geography'])

# Standardize the feature variables
scaler = StandardScaler()
scaled_features = scaler.fit_transform(data.drop('Exited', axis=1))

# Convert scaled features back to a dataframe
scaled_data = pd.DataFrame(scaled_features, columns=data.columns[:-1])


## **Define Target Variable (y) and Feature Variables (X)**

In [None]:
X = scaled_data
y = data['Exited']

## **Train Test Split**

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


## **Modeling**

In [None]:
model = LogisticRegression()
model.fit(X_train, y_train)


## **Model Evaluation**

In [None]:
# Predictions on the test set
y_pred = model.predict(X_test)

# Confusion matrix
conf_matrix = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:\n", conf_matrix)

# Classification report
class_report = classification_report(y_test, y_pred)
print("Classification Report:\n", class_report)

# Accuracy score
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy Score:", accuracy)


## **Prediction**

In [None]:
# Example prediction
sample_data = X_test.iloc[0].values.reshape(1, -1)
sample_prediction = model.predict(sample_data)
print("Sample Prediction:", sample_prediction)

## **Explaination**

In this project, we developed a logistic regression model to predict whether a bank customer will churn. We began by preprocessing the data, including encoding categorical variables and scaling the features. The model's performance was evaluated using a confusion matrix, classification report, and accuracy score. A sample prediction was made to demonstrate the model's usage. This hands-on project illustrates the practical application of logistic regression in a real-world scenario to address customer churn prediction in the banking industry.