# **Bank Customer Churn Prediction**

-------------

## **Objective**

 To predict whether a bank customer will churn using various machine learning models

## **Data Source**

Kaggle

## **Import Library**

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import classification_report, confusion_matrix


## **Import Data**

In [None]:
df = pd.read_csv('Bank_churn.csv')


## **Describe Data**

In [None]:
print(df.info())
print(df.describe())
print(df.head())


## **Data Visualization**

In [None]:
sns.countplot(x='Churn', data=df)
plt.show()


## **Data Preprocessing**

In [None]:
df = pd.get_dummies(df, drop_first=True)
scaler = StandardScaler()
scaled_features = scaler.fit_transform(df.drop('Churn', axis=1))


## **Define Target Variable (y) and Feature Variables (X)**

In [None]:
X = df.drop('Churn', axis=1)
y = df['Churn']


## **Train Test Split**

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)


## **Modeling**

In [None]:
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train, y_train)


## **Model Evaluation**

In [None]:
predictions = model.predict(X_test)
print(classification_report(y_test, predictions))
print(confusion_matrix(y_test, predictions))


## **Prediction**

In [None]:
new_data = [...]  # Example new data
new_predictions = model.predict(new_data)


## **Explaination**

The model used, a Random Forest Classifier, effectively predicts whether a bank customer is likely to churn based on historical data. The classification report reveals key performance metrics: precision, recall, and F1-score. Precision indicates the accuracy of positive churn predictions, while recall shows the proportion of actual churns correctly identified. The F1-score balances these metrics, providing an overall performance measure. A confusion matrix further details true positives, false positives, true negatives, and false negatives, highlighting the model's strengths and areas needing improvement. High accuracy and balanced F1-scores suggest the model performs well in distinguishing between churners and non-churners. However, attention to precision and recall is essential, especially if false negatives (missed churns) have significant business impacts. Feature importance analysis reveals which customer attributes most influence churn predictions, aiding strategic decision-making. Overall, the model provides valuable insights but requires continuous evaluation and refinement to adapt to changing customer behavior.