#<center>💳 Credit Card Customer Churn prediction 💳


# **Table of Contents**

**1)** [**Introduction**](#Section1)<br1> 

**2)**  [**Importing the Essential Libraries, Metrics**](#Section1)<br2>

**3)** [**Loading the Data**](#Section3)<br3>

**4)** [**Exploratory Data Analysis**](#Section4)<br>

**5)** [**One-Hot Encoding and Train-Test Split**](#Section5)<br>

**6)** [**Standardizing the Data**](#Section6)<br>

**7)** [**Machine Learning Models**](#Section7)<br>

   **7.1)**[**Logistic Regression**](#Section7.1)<br>

  **7.2)**[**KNN**](#Section7.2)<br>

  **7.3)**[**SVM**](#Section7.3)







```
# This is formatted as code
```
<a name = Section1></a>

# **Introduction**

**A manager at the bank is disturbed with more and more customers leaving their credit card services. They would really appreciate if one could predict for them who is gonna get churned so they can proactively go to the customer to provide them better services and turn customers' decisions in the opposite direction.**

**I got this dataset from a website with the URL as https://leaps.analyttica.com/home. I have been using this for a while to get datasets and accordingly work on them to produce fruitful results. The site explains how to solve a particular business problem.**

**Now, this dataset consists of 10,000 customers mentioning their age, salary, marital_status, credit card limit, credit card category, etc. There are nearly 18 features.**

**We have only 16.07% of customers who have churned. Thus, it's a bit difficult to train our model to predict churning customers.**



<a name = Section3></a>

##Importing the Essential Libraries, Metrics


In [None]:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.metrics import mean_squared_error, r2_score
import matplotlib.pyplot as plt
import matplotlib.pyplot as plt
from sklearn.preprocessing import scale, StandardScaler,LabelEncoder
from sklearn.model_selection import train_test_split, GridSearchCV,cross_val_score
from sklearn.metrics import confusion_matrix, accuracy_score, mean_squared_error, r2_score, roc_auc_score, roc_curve,classification_report
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.neural_network import MLPClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import GradientBoostingClassifier
import warnings
warnings.filterwarnings("ignore")


<a name = Section3></a>

##Loading the Data


In [None]:
df=pd.read_csv("/content/BankChurners.csv")

## Feature Selection


In [None]:
cols_to_use = ["Attrition_Flag","Customer_Age","Gender","Dependent_count","Education_Level","Marital_Status","Income_Category","Card_Category","Credit_Limit"]


In [None]:
df = df[cols_to_use]


<a name = Section4></a>

# Exploratory Data Analysis


**Taking a look at the first 5 rows of the dataset**



In [None]:
df.head()

Unnamed: 0,Attrition_Flag,Customer_Age,Gender,Dependent_count,Education_Level,Marital_Status,Income_Category,Card_Category,Credit_Limit
0,Existing Customer,45,M,3,High School,Married,$60K - $80K,Blue,12691.0
1,Existing Customer,49,F,5,Graduate,Single,Less than $40K,Blue,8256.0
2,Existing Customer,51,M,3,Graduate,Married,$80K - $120K,Blue,3418.0
3,Existing Customer,40,F,4,High School,Unknown,Less than $40K,Blue,3313.0
4,Existing Customer,40,M,3,Uneducated,Married,$60K - $80K,Blue,4716.0


**Checking the shape—i.e. size—of the data**



In [None]:
df.shape

(10127, 9)

**Learning the dtypes of columns' and how many non-null values are there in those columns**



In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10127 entries, 0 to 10126
Data columns (total 9 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Attrition_Flag   10127 non-null  object 
 1   Customer_Age     10127 non-null  int64  
 2   Gender           10127 non-null  object 
 3   Dependent_count  10127 non-null  int64  
 4   Education_Level  10127 non-null  object 
 5   Marital_Status   10127 non-null  object 
 6   Income_Category  10127 non-null  object 
 7   Card_Category    10127 non-null  object 
 8   Credit_Limit     10127 non-null  float64
dtypes: float64(1), int64(2), object(6)
memory usage: 712.2+ KB


**Getting the statistical summary of datase**t



In [None]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
Customer_Age,10127.0,46.32596,8.016814,26.0,41.0,46.0,52.0,73.0
Dependent_count,10127.0,2.346203,1.298908,0.0,1.0,2.0,3.0,5.0
Credit_Limit,10127.0,8631.953698,9088.77665,1438.3,2555.0,4549.0,11067.5,34516.0


**Checking for the missing values**



In [None]:
df.isna().sum()


Attrition_Flag     0
Customer_Age       0
Gender             0
Dependent_count    0
Education_Level    0
Marital_Status     0
Income_Category    0
Card_Category      0
Credit_Limit       0
dtype: int64

**Checking for the duplicated values**



In [None]:
df.duplicated().sum()


30

**Deleting dupicate values**

In [None]:
df.drop_duplicates(inplace=True)


In [None]:
df.duplicated().sum()


0

<a name = Section5></a>

# One-Hot Encoding and Train-Test Split





**Encoding the categorical features in X dataset by using One-Hot Encoding method**

**Splitting the data into Train and Test chunks for better evaluation**


In [None]:
y = df["Attrition_Flag"]
X = df.drop("Attrition_Flag", axis =1)
X = pd.get_dummies(X, columns=["Education_Level","Marital_Status","Income_Category","Card_Category","Gender"])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)


In [None]:
X.head()


Unnamed: 0,Customer_Age,Dependent_count,Credit_Limit,Education_Level_College,Education_Level_Doctorate,Education_Level_Graduate,Education_Level_High School,Education_Level_Post-Graduate,Education_Level_Uneducated,Education_Level_Unknown,...,Income_Category_$60K - $80K,Income_Category_$80K - $120K,Income_Category_Less than $40K,Income_Category_Unknown,Card_Category_Blue,Card_Category_Gold,Card_Category_Platinum,Card_Category_Silver,Gender_F,Gender_M
0,45,3,12691.0,0,0,0,1,0,0,0,...,1,0,0,0,1,0,0,0,0,1
1,49,5,8256.0,0,0,1,0,0,0,0,...,0,0,1,0,1,0,0,0,1,0
2,51,3,3418.0,0,0,1,0,0,0,0,...,0,1,0,0,1,0,0,0,0,1
3,40,4,3313.0,0,0,0,1,0,0,0,...,0,0,1,0,1,0,0,0,1,0
4,40,3,4716.0,0,0,0,0,0,1,0,...,1,0,0,0,1,0,0,0,0,1


In [None]:
X_train.shape


(7572, 26)

In [None]:
X_test.shape


(2525, 26)

<a name = Section6></a>

# Standardizing the Data


In [None]:
scaler = StandardScaler()


In [None]:
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


In [None]:
label = LabelEncoder()


In [None]:
y_train =  label.fit_transform(y_train)
y_test = label.transform(y_test)


<a name = Section7></a>

# Machine Learning Models



<a name = Section7.1></a>

## Logistic Regression


In [None]:
loj_model = LogisticRegression(solver="liblinear").fit(X_train,y_train)


In [None]:
y_pred = loj_model.predict(X_test)


In [None]:
accuracy_score(y_test,y_pred)


0.8297029702970297

## Model Tuning


In [None]:
loj_model = LogisticRegression(solver="liblinear").fit(X_train,y_train)


In [None]:
y_pred = loj_model.predict(X_test)


In [None]:
cross_val_score(loj_model,X_test,y_test,cv=10).mean()


0.8297023025283895

<a name = Section7.2></a>

# KNN


In [None]:
knn_model = KNeighborsClassifier().fit(X_train,y_train)


In [None]:
y_pred = knn_model.predict(X_test)


In [None]:
accuracy_score(y_test,y_pred)


0.805940594059406

# Model Tuning


In [None]:
knn = KNeighborsClassifier()


In [None]:
knn_params = {"n_neighbors": np.arange(1,50)}


In [None]:
knn_cv_model = GridSearchCV(knn,knn_params, cv=10).fit(X_train,y_train)


In [None]:
knn_cv_model.best_score_


0.8421820266780061

In [None]:
knn_cv_model.best_params_


{'n_neighbors': 20}

In [None]:
knn_tuned =  KNeighborsClassifier(n_neighbors=20).fit(X_train,y_train)


In [None]:
y_pred = knn_model.predict(X_test)


In [None]:
accuracy_score(y_test,y_pred)


0.805940594059406

<a name = Section7.3></a>

# SVM


In [None]:
svm_model = SVC(kernel="linear").fit(X_train,y_train)


In [None]:
y_pred = svm_model.predict(X_test)


In [None]:
accuracy_score(y_test,y_pred)


0.8297029702970297

# Model Tuning


In [None]:
svm = SVC()


In [None]:
svm_params  = { "C": np.arange(1,10),
               "kernel":["linear","rbf"]}


In [None]:
svm_cv_model = GridSearchCV(svm,svm_params,cv=10, n_jobs=-1, verbose=2).fit(X_train,y_train)


Fitting 10 folds for each of 18 candidates, totalling 180 fits


In [None]:
svm_cv_model.best_params_


{'C': 1, 'kernel': 'linear'}

In [None]:
svm_tuned  = SVC(C=1, kernel='linear').fit(X_train,y_train)


In [None]:
y_pred = svm_tuned.predict(X_test)


In [None]:
accuracy_score(y_test,y_pred)


0.8297029702970297