# **Customer Churn Prediction**

Develop a model to predict customer churn for a subscription based service or business. Use historical customer data, including features like usage behavior and customer demographics, and try algorithms like Logistic Regression, Random Forests, or Gradient Boosting to predict churn.

**Dataset Link -**
https://www.kaggle.com/datasets/shantanudhakadd/bank-customer-churn-prediction

**Dataset Description-** It is the dataset of a U.S. bank customer for getting the information that , this particular customer will leave bank or not. Various Bank detail is given like CustomerID , surname, Credit score and many more.

**1. Import Libraries and Load data**

In [1]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier

In [3]:
data = pd.read_csv('/content/Churn_Modelling.csv')
data

Unnamed: 0,RowNumber,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
0,1,15634602,Hargrave,619,France,Female,42,2,0.00,1,1,1,101348.88,1
1,2,15647311,Hill,608,Spain,Female,41,1,83807.86,1,0,1,112542.58,0
2,3,15619304,Onio,502,France,Female,42,8,159660.80,3,1,0,113931.57,1
3,4,15701354,Boni,699,France,Female,39,1,0.00,2,0,0,93826.63,0
4,5,15737888,Mitchell,850,Spain,Female,43,2,125510.82,1,1,1,79084.10,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9995,9996,15606229,Obijiaku,771,France,Male,39,5,0.00,2,1,0,96270.64,0
9996,9997,15569892,Johnstone,516,France,Male,35,10,57369.61,1,1,1,101699.77,0
9997,9998,15584532,Liu,709,France,Female,36,7,0.00,1,0,1,42085.58,1
9998,9999,15682355,Sabbatini,772,Germany,Male,42,3,75075.31,2,1,0,92888.52,1


**2. Drop irrelevant columns**

In [5]:
data = data.drop(['RowNumber', 'CustomerId', 'Surname'], axis=1)

**3. Define features and target**

In [6]:
X = data.drop(columns=['Exited'])
y = data['Exited']

**4. Encode categorical variables**

In [7]:
categorical_feature = ['Geography', 'Gender']
numerical_feature = X.drop(columns=categorical_feature).columns.tolist()

**5. Preprocessing pipelines**

In [9]:
numerical_pipeline = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())
])

In [10]:
categorical_pipeline = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))
])

In [11]:
preprocessor = ColumnTransformer(transformers=[
    ('numerical', numerical_pipeline, numerical_feature),
    ('categorical', categorical_pipeline, categorical_feature)
])

**6. Train model**

In [12]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

**7. Define, Train and Evaluate model**

In [13]:
models = {
    'Logistic Regression': LogisticRegression(random_state=42, max_iter=1000),
    'Random Forest': RandomForestClassifier(random_state=42),
    'Gradient Boosting': GradientBoostingClassifier(random_state=42)
}

In [14]:
results = {}
for name, model in models.items():
  pipeline = Pipeline(steps=[
      ('preprocessor', preprocessor),
      ('model', model)
  ])
  pipeline.fit(X_train, y_train)
  y_pred = pipeline.predict(X_test)
  report = classification_report(y_test, y_pred)
  results[name] = report

for model_name, metrics in results.items():
  print(f"Results for {model_name}:")
  print(metrics)
  print("\n")

Results for Logistic Regression:
              precision    recall  f1-score   support

           0       0.83      0.96      0.89      1607
           1       0.55      0.20      0.29       393

    accuracy                           0.81      2000
   macro avg       0.69      0.58      0.59      2000
weighted avg       0.78      0.81      0.77      2000



Results for Random Forest:
              precision    recall  f1-score   support

           0       0.88      0.96      0.92      1607
           1       0.75      0.46      0.57       393

    accuracy                           0.86      2000
   macro avg       0.81      0.71      0.75      2000
weighted avg       0.85      0.86      0.85      2000



Results for Gradient Boosting:
              precision    recall  f1-score   support

           0       0.88      0.96      0.92      1607
           1       0.74      0.47      0.58       393

    accuracy                           0.86      2000
   macro avg       0.81      0.72