# CUSTOMER CHURN ANALYSIS AND CLASSIFICATION


## AIM

With the rapid expansion of the telecom business, specialist co-ops are increasingly focusing on growing its endorser base. It has become a challenge to maintain existing clients in the difficult environment. It is stated that the cost of acquiring a new client is undoubtedly more than the cost of keeping the current one. As a result, telecom companies should use advanced analysis to figure out customer behaviour and, as a result, predict client relationships if they quit the service.

Some probable questions are as follows:


a) What factors contribute to customer churn?

b) Which customers are more likely to churn?

c) What can be done to prevent them from leaving?

## Overview
1. Import data and python packages
    * Import packages
    * Import data
    * Data shape and info
2. Data visualization
    * Count Plots
    * Pie Plots
    * Box Plots
    * Heatmap(Correlation)
    * Pairplot
3. Classification

    3.1 Split data as train and test 
    
    3.2 Functions for models
    
    3.3 Models
      * Decision Tree Classifier
      * Gradient Booster Classifier
      * KNN Classifier
      * Random Forest Classifier
      * Artificial Nural Network
4. Result
      * Cross validation scores

<a id="t1."></a>
# 1. Import data and python packages

In [3]:
#installation of packages and libraries required
!pip install mglearn

Collecting mglearn
  Obtaining dependency information for mglearn from https://files.pythonhosted.org/packages/bb/8b/687d30a3df6b870af541dde6327423e35713e38243db135f57b4ebd054f3/mglearn-0.2.0-py2.py3-none-any.whl.metadata
  Downloading mglearn-0.2.0-py2.py3-none-any.whl.metadata (628 bytes)
Downloading mglearn-0.2.0-py2.py3-none-any.whl (581 kB)
   ---------------------------------------- 0.0/581.4 kB ? eta -:--:--
    --------------------------------------- 10.2/581.4 kB ? eta -:--:--
    --------------------------------------- 10.2/581.4 kB ? eta -:--:--
   -- ------------------------------------ 30.7/581.4 kB 220.2 kB/s eta 0:00:03
   -- ------------------------------------ 41.0/581.4 kB 219.4 kB/s eta 0:00:03
   ------- ------------------------------ 112.6/581.4 kB 547.6 kB/s eta 0:00:01
   ------------------- -------------------- 286.7/581.4 kB 1.1 MB/s eta 0:00:01
   --------------------------- ------------ 399.4/581.4 kB 1.4 MB/s eta 0:00:01
   ----------------------------------

In [4]:
#module and libraries

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import mglearn

from sklearn.decomposition import PCA
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

## Machine Learning Models Diffrent Algorithms
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import classification_report
from sklearn.model_selection import cross_val_score
from sklearn.metrics import confusion_matrix,accuracy_score, f1_score, precision_score, recall_score
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense , Activation
from tensorflow.keras.utils import to_categorical

import warnings
warnings.simplefilter(action='ignore', category=FutureWarning)

ModuleNotFoundError: No module named 'tensorflow'

In [None]:
churn_df = pd.read_csv('telecom_churn.csv')

In [None]:
# Show Data
churn_df.head()

In [5]:
# Data shape
churn_df.shape

NameError: name 'churn_df' is not defined

In [None]:
#Statistics based information of data
churn_df.info()

> There are no null or missing data.

> The attributes are of numeric type(int or float).

In [None]:
#Data Description
churn_df_describe = churn_df.describe().T
churn_df_describe

Descriptive statistics for attributes of dataframe.
- The count of each attribute represents no missing data anywhere.
- The min and max values can be observed along with mean to analyze whether the outliers have impacted the mean or not.
- Along with that, the percentiles are also given analyzing the same.


### 2. Data visualization


#### 2.1 Analysis of target label

In [None]:
#churn feature as target
labels = ['Churn', 'Not Churn']
sizes = churn_df['Churn'].value_counts(sort = True)


explode = (0.2, 0)
 
fig1, ax1 = plt.subplots(figsize=(8, 6))
ax1.pie(sizes, explode=explode, colors = ["#FF6A6A","#C2C4E2"], labels=labels, autopct='%1.1f%%', shadow=True)
ax1.axis('equal')

plt.show()
sizes

Target labels interpretation:

- 0 class: The customer will not leave the company service
- 1 class: The customer will leave the company service


More instances i.e almost 85% are of Churn category that means customer will leave the company on the basis of the attributes mentioned.

#### 2.2 Analysis of attributes; Counts

In [None]:
ax = plt.figure(figsize=(12,10))

#Contract Renewal
plt.subplot(2,2,1)
sns.countplot(data = churn_df , x = "ContractRenewal" ,palette=["lightgreen","yellow"], edgecolor='k')
plt.title("Count Plot for attribute Contract Renewal" , size=15, fontweight='bold', fontfamily='monospace')
plt.xlabel("Contract Renewal", size=13, fontweight='light', fontfamily='monospace')
plt.ylabel('')

#DataPlan
plt.subplot(2,2,2)
sns.countplot(data = churn_df , x = "DataPlan" , palette=["lightgreen","yellow"], edgecolor='k')
plt.title("Count Plot for attribute Data Plan" , size=15, fontweight='bold', fontfamily='monospace')
plt.xlabel("Data Plan", size=13, fontweight='light', fontfamily='monospace')
plt.ylabel('')

#Customer Service Calls
plt.subplot(2,2,(3,4))
sns.countplot(data = churn_df , x = "CustServCalls" , palette=["lightgreen","yellow"], edgecolor='k')
plt.title("Count Plot for attribute Customer Service Calls Renewal" , size=15, fontweight='bold', fontfamily='monospace')
plt.xlabel("Customer Service Calls Renewal", size=13, fontweight='light', fontfamily='monospace')
plt.ylabel('')
plt.tight_layout()
plt.show()

Interpretation of Analysis:
- **Contract Renewal**: Maximum cases are of churn where the factor is contract renewal. The services might have not been satisfiable.
- **Data Plan**: On the basis of data plan service provided by the telecom company, the customers are more satisfied resulting in non-churn cases. Less than a half of customers didn't like the service.

#### 2.3 Outliers detection

In [None]:
#AccountWeeks
ax = plt.figure(figsize=(10,8))
plt.subplot(2,2,1)
sns.boxenplot(data = churn_df , y = "AccountWeeks" , x = "Churn" , color="#668B8B", scale="linear")
plt.xlabel("Churn",size=13, fontweight='light', fontfamily='monospace')
plt.ylabel("AccountWeeks",size=13, fontweight='light', fontfamily='monospace')

#DataUsage
plt.subplot(2,2,2)
sns.boxenplot(data = churn_df , y = "DataUsage" ,  x = "Churn" , color="#FF0000", scale="linear")
plt.xlabel("Churn",size=13, fontweight='light', fontfamily='monospace')
plt.ylabel("DataUsage",size=13, fontweight='light', fontfamily='monospace')

#DayMins
plt.subplot(2,2,3)
sns.boxenplot(data = churn_df , y = "DayMins" , x = "Churn" , color="#FF0000", scale="linear")
plt.xlabel("Churn",size=13, fontweight='light', fontfamily='monospace')
plt.ylabel("DayMins",size=13, fontweight='light', fontfamily='monospace')

#DayCalls
plt.subplot(2,2,4)
sns.boxenplot(data = churn_df , y = "DayCalls" , x = "Churn" , color="#668B8B", scale="linear")
plt.xlabel("Churn",size=13, fontweight='light', fontfamily='monospace')
plt.ylabel("DayCalls",size=13, fontweight='light', fontfamily='monospace')
plt.tight_layout()
plt.show()

In [None]:
ax = plt.figure(figsize=(10,8))

#MonthlyCharge
plt.subplot(2,2,1)
sns.boxenplot(data = churn_df , y = "MonthlyCharge" , x = "Churn" , palette=["#DC143C","#458B00"])
plt.xlabel("Churn",size=13, fontweight='light', fontfamily='monospace')
plt.ylabel("MonthlyCharge",size=13, fontweight='light', fontfamily='monospace')

#OverageFee
plt.subplot(2,2,2)
sns.boxenplot(data = churn_df , y = "OverageFee" , x = "Churn" ,palette=["#DC143C","#458B00"])
plt.xlabel("Churn",size=13, fontweight='light', fontfamily='monospace')
plt.ylabel("OverageFee",size=13, fontweight='light', fontfamily='monospace')

#RoamMins
plt.subplot(2,2,3)
sns.boxenplot(data = churn_df , y = "RoamMins" , x = "Churn" , palette=["#DC143C","#458B00"])
plt.xlabel("Churn",size=13, fontweight='light', fontfamily='monospace')
plt.ylabel("RoamMins",size=13, fontweight='light', fontfamily='monospace')

#CustServCalls
plt.subplot(2,2,4)
sns.boxenplot(data = churn_df , y = "CustServCalls" , x = "Churn" , palette=["#DC143C","#458B00"])
plt.xlabel("Churn",size=13, fontweight='light', fontfamily='monospace')
plt.ylabel("CustServCalls",size=13, fontweight='light', fontfamily='monospace')
plt.tight_layout()
plt.show()

Analysis Interpretation:
- The boxen plots of these four attributes show some outliers that are nearer to the min and max quartiles therefore, can be left untreated.

#### 2.4 Paired Analysis

In [None]:
ax = plt.figure(figsize=(10,5))
churn_df.groupby(['ContractRenewal',"DataPlan"])['Churn'].mean().plot(figsize=(10,5),kind="bar",color="#FF6347",
                                                               edgecolor='k')
plt.title("Average Plot for Churn" , size=18, fontweight='bold', fontfamily='monospace')
plt.ylabel("ContractRenewal, DataPlan",size=13, fontweight='light', fontfamily='monospace')
plt.xlabel("Average of Churn",size=13, fontweight='light', fontfamily='monospace')
plt.show()

**Interpretation of Analysis:**

Customer churn analysis requires the attributes ContractRenewal and DataPlan. The probability of customer churn is low if these two attributes are "1". DataPlan has a greater impact than ContractRenewal.

By increasing DataUsage, customers are less likely to churn, and by decreasing other attributes, customers are also less likely to churn. This work has been done by classifying and averaging Churn.

#### 2.5 Basic Statistics about data

In [None]:
#mean corresponding to each attribute with churn(0 and 1)
print("Churn:0 and Churn:1")
mean_df = churn_df.mean().reset_index()
mean_df.columns = ['Feature', 'Mean']
mean_df.set_index('Feature')

In [None]:
#mean corresponding to each attriibute with churn(0)
print("Churn:0")
mean_df = churn_df.loc[churn_df["Churn"]==0].mean().reset_index()
mean_df.columns = ['Feature', 'Mean']
mean_df.set_index('Feature')

In [None]:
#mean corresponding to each attribute with churn(1)
print("Churn:1")
mean_df = churn_df.loc[churn_df["Churn"]==1].mean().reset_index()
mean_df.columns = ['Feature', 'Mean']
mean_df.set_index('Feature')

#### 2.6 Attributes correlation

In [None]:
#correlation matrix
ax = plt.figure(figsize=(12,10))
sns.heatmap(churn_df.corr(),annot=True,cmap="Blues", fmt='.0%')
plt.show()

**Interpretation of Analysis:**

Some features can be seen highly correlated with each other where the correlation present is greater than 50%. The dimensionality reduction can be performed to overcome this.

In [None]:
sns.pairplot(data = churn_df[["DataUsage","RoamMins","DayMins","MonthlyCharge","Churn","DayCalls"]],
            hue="Churn", palette='Reds')
plt.show()

**Interpretation of Analysis:**

The leptokurtic curve can be seen between identical features; observing the peak at curve where the class label is 0 i.e, the churn 0(not churn) category. The dataset/column values are spread evenly without much skewness. A dense cluster can be seen in each attribute.

# 3. Classification

In [None]:
#Classification process overview
mglearn.plots.plot_grid_search_overview();
plt.show()

### Data Scaling: Standardization

In [None]:
#scaling down values; standardizing
scaler = StandardScaler().fit(churn_df.drop("Churn",axis=1))

In [None]:
#separation of dependent and independent variables
X = scaler.transform(churn_df.drop("Churn",axis=1))
y = churn_df["Churn"]

### Dimensionality Reduction: PCA

In [None]:
#Reducing dimensions using principal component analysis
pca = PCA(n_components=7)
principalComponents = pca.fit(X)
cumm_expainedvariance = np.cumsum(principalComponents.explained_variance_ratio_)
principalComponents = pca.transform(X)
principalComponents_df = pd.DataFrame(data = principalComponents
             , columns = ['PC'+str(i+1) for i in range(7)])

In [None]:
#cummulative explained score corresponding to each principal component
plt.bar(range(0,len(cumm_expainedvariance.tolist()[::-1])), cumm_expainedvariance, 
        alpha=0.5, align='center', edgecolor='black',label='Individual explained variance')

plt.ylabel('Cummulative Explained variance ratio')
plt.xlabel('Principal Components')
plt.legend(loc='best')
plt.tight_layout()
plt.show()

**Interpretation of Analysis:**

Plot representing the cummulative explained variance score that is the importance or important information each principal component carries.

In [None]:
#pca reduced dimension data
principalComponents_df.head()

## 3.1. Split data for train and test

In [None]:
#Storing dimensionally reduced data in variable denoted as independent feature variable
X = principalComponents_df

In [None]:
#splitting data into training and validation sets
X_train, X_test, y_train, y_test = train_test_split(principalComponents_df, y, test_size =0.20,random_state=0, stratify = y)

In [None]:
#Shapes
print("----Shapes of splitted training and test sets----")
print("Shape of Train X: ", X_train.shape)
print("Shape of Train y: ", y_train.shape)
print("Shape of Test X: ", X_test.shape)
print("Shape of Test y: ", y_test.shape)


## 3.2 Functions for models and metrics

In [None]:
from sklearn.metrics import plot_roc_curve,plot_confusion_matrix,accuracy_score,confusion_matrix

#function for estimator
def Model(model):
    global X,y,X_train, X_test, y_train, y_test
    print(type(model).__name__)
    pred = model.predict(X_test)
    acs = accuracy_score(y_test,pred)
    print("Accuracy Score             :",acs)
    
    plot_confusion_matrix(model,X,y,cmap="Reds")
    plt.title("Confusion Matrix")
    plt.show()

In [None]:
#function to plot ROC-AUC
def Check(list_of_disp):
    ax = plt.gca()
    for i in list_of_disp: 
        i.plot(ax=ax)
    plt.plot([0,1],[0,1],"--",color="k",alpha=0.7)
    plt.title("ROC Curve of Classifiers")
    plt.show()

In [None]:
from sklearn.model_selection import cross_val_score

#function to provide cross-validation score of estimator
def CrossValidationScore(model_list):
    global X,y
    
    mean_cross_val_score = []
    model_name           = []
    
    for model in model_list:
        model_name.append(type(model).__name__)
        
    for i in model_list:
        scores = cross_val_score(i, X, y, cv=5)
        mean_cross_val_score.append(scores.mean())
        
    cvs = pd.DataFrame({"Model Name":model_name,"CVS":mean_cross_val_score})
    return cvs.style.background_gradient("Greens")

<a id="t3.3"></a>
## 3.3 Models

#### 3.3.1. DECISION TREE CLASSIFIER

In [None]:
#DecisionTreeClassifier
dt = DecisionTreeClassifier(random_state=0, max_depth=4, min_samples_split=10)
dt.fit(X_train, y_train)
pd = dt.predict(X_test)
plot_confusion_matrix(dt, X_test, y_test)

#cm and roc
dt_disp = plot_roc_curve(dt, X_test, y_test)
plt.title("ROC Curve of {}".format(type(dt).__name__))
plt.plot([0,1],[0,1],"--",color="k",alpha=0.7)
plt.show()

#reults of DecisionTreeClassifier
print("Accuracy Score: ", accuracy_score(y_test, dt.predict(X_test)))
print("Precision Score: ", precision_score(y_test, dt.predict(X_test)))
print("Recall Score: ", recall_score(y_test, dt.predict(X_test)))
print("F1 Score: ",f1_score(y_test, dt.predict(X_test)))

#### 3.3.2. GRADIENT BOOSTING CLASSIFIER

In [None]:
#GradientBoostingClassifier
gb = GradientBoostingClassifier(n_estimators=200, max_depth=2, random_state=0)
gb.fit(X_train, y_train)
pg = gb.predict(X_test)
plot_confusion_matrix(gb, X_test, y_test) 

#cm and roc
gb_disp = plot_roc_curve(gb, X_test, y_test)
plt.title("ROC Curve of {}".format(type(gb).__name__))
plt.plot([0,1],[0,1],"--",color="k",alpha=0.7)
plt.show()

#reults of knn
print("Accuracy Score: ", accuracy_score(y_test, gb.predict(X_test)))
print("Precision Score: ", precision_score(y_test, gb.predict(X_test)))
print("Recall Score: ", recall_score(y_test, gb.predict(X_test)))
print("F1 Score: ",f1_score(y_test, gb.predict(X_test)))

#### 3.3.3. K NEIGHBORS CLASSIFIER

In [None]:
#searching for optimal k value by observing results/score at each k; k=1 to 29
k_max = 30
accuracy = [[],[]]
for k in range(1,k_max+1):
    mdl = KNeighborsClassifier(n_neighbors=k).fit(X_train,y_train)
    pred = mdl.predict(X_test)
    accuracy[0].append(k)
    accuracy[1].append(accuracy_score(y_test, pred)) 
accuracy = np.array(accuracy)
max_acc_k = accuracy[1].argmax()
plt.figure(figsize=(10,5))
plt.plot(accuracy[0],accuracy[1], color='k', ls="--")
plt.scatter(x=accuracy[0][max_acc_k], y=accuracy[1][max_acc_k],s=50, label="Max Accuracy: {}\nBest K: {}".format(round(accuracy[1][max_acc_k],2),
                                                                                                 accuracy[0][max_acc_k]), color='#ffb3b3')
plt.legend()
plt.grid(True)
plt.title("Accuracy Score for each K" , size=18, fontweight='bold', fontfamily='monospace')
plt.xlabel("K", size=13, fontweight='light', fontfamily='monospace')
plt.ylabel('Accuracy Score', size=13, fontweight='light', fontfamily='monospace')
plt.show()

**Interpretation of Analysis:**

The best K value has come out to be 5 with maximum accuracy score achieved as 92.0 at this particular K. Increase in value of K, further, leads to decrease in accuracy score reflecting not to choose higher K value to prevent a huge degradation in performance of model.

In [None]:
#KNN
knn = KNeighborsClassifier(n_neighbors=5).fit(X_train,y_train)
print("Model Installed!")
print("Please Wait for Results..")
Model(knn)

#cm and roc
knn_disp = plot_roc_curve(knn, X_test, y_test)
plt.title("ROC Curve of {}".format(type(knn).__name__))
plt.plot([0,1],[0,1],"--",color="k",alpha=0.7)
plt.show()

#reults of knn
print("Precision Score: ", precision_score(y_test, knn.predict(X_test)))
print("Recall Score: ", recall_score(y_test, knn.predict(X_test)))
print("F1 Score: ",f1_score(y_test, knn.predict(X_test)))

#### 3.3.4. RANDOM FOREST CLASSIFIER

In [None]:
#RF
rf = RandomForestClassifier(n_estimators=100).fit(X_train,y_train)
print("Model Installed!")
print("Please Wait for Results..")
Model(rf)

rf_disp = plot_roc_curve(rf, X_test, y_test)
plt.title("ROC Curve of {}".format(type(rf).__name__))
plt.plot([0,1],[0,1],"--",color="k",alpha=0.7)
plt.show()

#results of RF
print("Precision Score: ", precision_score(y_test, rf.predict(X_test)))
print("Recall Score: ", recall_score(y_test, rf.predict(X_test)))
print("F1 Score: ",f1_score(y_test, rf.predict(X_test)))

In [None]:
#ROC-AUC of knn and RF
list_of_disp = [knn_disp,rf_disp]
Check(list_of_disp)

**Interpretation of Analysis:**

The model with the highest Accuracy Score is Random Forest Classifier. At the same time, when looking at the ROC curve of Random Forest Classifier, it is seen that it learns the classes better than other model.

#### 3.3.5. ARTIFICIAL NEURAL NETWORK
##### Multi-layer Perceptron classifier

In [None]:
#y data to categorical form
y_train_cat = to_categorical(y_train)
y_test_cat = to_categorical(y_test)

X_train = np.array(X_train)
X_test = np.array(X_test)

In [None]:
# ANN model
model = Sequential()

model.add(Dense(64,input_shape=X_train[0].shape,activation="sigmoid"))

model.add(Dense(128,activation="relu"))

model.add(Dense(64,activation="relu"))

model.add(Dense(2,activation="softmax"))

model.compile(loss="binary_crossentropy",optimizer="adam",metrics=["acc"])

In [None]:
#fitting the network
history = model.fit(X_train,y_train_cat,batch_size=32,epochs=20,validation_data=(X_test, y_test_cat))

In [None]:
#performance visuals of ANN

#Accuracy
plt.figure(figsize=(14,5))
plt.subplot(1,2,1)
plt.plot(history.history["acc"],color="#C2C4E2")
plt.plot(history.history["val_acc"],color="#ffb3b3")
plt.xlabel("Epochs")
plt.ylabel("Acc")
plt.legend(["Training","Validation"])
plt.grid()

#loss
plt.subplot(1,2,2)
plt.plot(history.history["loss"],color="#C2C4E2")
plt.plot(history.history["val_loss"],color="#ffb3b3")
plt.xlabel("Epochs")
plt.ylabel("Loss")
plt.legend(["Training","Validation"])
plt.grid()
plt.show()

**Interpretation of Analysis:**

The plots plotted above has shown the accuracy and loss at each epoch compiled during the fitting of neural network; both during training and validation.

- The accuracy during training and validation chasing each other with a similar pace, resulting in average accuracy as 91 and 93 on training and validation respectively.
- The loss during training and validation is moving towards the deepest value, resulting in least average loss.

In [None]:
#evaluation score on test set
model.evaluate(X_test, y_test_cat)

In [None]:
#probabilistic output to class output conversion
pred_class = model.predict(X_test)

def toClass(pred):   
    class_ = np.zeros(len(pred))
    for i in range(len(pred)):
        index = pred[i].argmax()
        class_[i] = index
        
    return class_

from sklearn.metrics import classification_report
print(classification_report(toClass(y_test_cat),toClass(pred_class)))

In [None]:
#reults of ANN
print("Accuracy Score: ", accuracy_score(toClass(y_test_cat),toClass(pred_class)))
print("Precision Score: ", precision_score(toClass(y_test_cat),toClass(pred_class)))
print("Recall Score: ", recall_score(toClass(y_test_cat),toClass(pred_class)))
print("F1 Score: ",f1_score(toClass(y_test_cat),toClass(pred_class)))
print()

# 4. Result

#### CROSS VALIDATION SCORE

In [None]:
#Overview of cross validation score structure
mglearn.plots.plot_cross_validation();
plt.show();

In [None]:
#cross validation score achieved by model
from sklearn import svm
model = svm.SVC()
accuracy = cross_val_score(model, X, y, scoring='accuracy', cv = 10)
print(accuracy)
#get the mean of each fold 
print("Accuracy of Model with Cross Validation is:",accuracy.mean() * 100)

In [None]:
#performance plots
ax = plt.figure(figsize=(12,8))

#accuracy
plt.subplot(2,2,1)
sns.set_color_codes('pastel')
sns.barplot(['Decision Tree', 'Gradient Boosting', 'KNN', 'Random Forest', 'ANN'],
[accuracy_score(y_test, dt.predict(X_test)),
accuracy_score(y_test, gb.predict(X_test)),
accuracy_score(y_test, knn.predict(X_test)),
accuracy_score(y_test, rf.predict(X_test)),
accuracy_score(toClass(y_test_cat),toClass(pred_class))], palette=["#20B2AA","#87CEFA","#B0E2FF", "#A4D3EE", "#8DB6CD"])
plt.xlabel("Models")
plt.ylabel("Accuracy Score")

#precision
plt.subplot(2,2,2)
sns.barplot(['Decision Tree', 'Gradient Boosting', 'KNN', 'Random Forest', 'ANN'],
[precision_score(y_test, dt.predict(X_test)),
precision_score(y_test, gb.predict(X_test)),
precision_score(y_test, knn.predict(X_test)),
precision_score(y_test, rf.predict(X_test)),
precision_score(toClass(y_test_cat),toClass(pred_class))], palette=["#20B2AA","#87CEFA","#B0E2FF", "#A4D3EE", "#8DB6CD"])
plt.xlabel("Models")
plt.ylabel("Precision Score")

#recall
plt.subplot(2,2,3)
sns.barplot(['Decision Tree', 'Gradient Booster', 'KNN', 'Random Forest', 'ANN'],
[recall_score(y_test, dt.predict(X_test)),
recall_score(y_test, gb.predict(X_test)),
recall_score(y_test, knn.predict(X_test)),
recall_score(y_test, rf.predict(X_test)),
recall_score(toClass(y_test_cat),toClass(pred_class))], palette=["#20B2AA","#87CEFA","#B0E2FF", "#A4D3EE", "#8DB6CD"])
plt.xlabel("Models")
plt.ylabel("Recall Score")

#f1score
plt.subplot(2,2,4)
sns.barplot(['Decision Tree', 'Gradient Boosting', 'KNN', 'Random Forest', 'ANN'],
[f1_score(y_test, dt.predict(X_test)),
f1_score(y_test, gb.predict(X_test)),
f1_score(y_test, knn.predict(X_test)),
f1_score(y_test, rf.predict(X_test)),
f1_score(toClass(y_test_cat),toClass(pred_class))], palette=["#20B2AA","#87CEFA","#B0E2FF", "#A4D3EE", "#8DB6CD"])
plt.xlabel("Models")
plt.ylabel("F1 Score")

plt.tight_layout()
plt.show()

### **Interpretation of Results:**

After all the experimentation and analysis, the conclusion that comes out is ANN and Random Forest have worked so well from the beginning to the last and have classified the churn category with least error. 

Overall report of all models
--------------------------------------

**Decision Tree:**

- Accuracy Score:  0.9175412293853074
- Precision Score:  0.7916666666666666
- Recall Score:  0.5876288659793815
- F1 Score:  0.6745562130177515


**Gradient Boosting:**

- Accuracy Score:  0.9220389805097451
- Precision Score:  0.8461538461538461
- Recall Score:  0.5670103092783505
- F1 Score:  0.6790123456790124


**KNN:**

- Cross-validation-Score:	0.900690
- Accuracy Score: 0.9160419790104948
- Precision Score:  0.8727272727272727
- Recall Score:  0.4948453608247423
- F1 Score:  0.6315789473684211

**Random Forest:**

- Cross-validation-Score:	0.917195
- Accuracy Score: 0.9310344827586207
- Precision Score:  0.8695652173913043
- Recall Score:  0.6185567010309279
- F1 Score:  0.7228915662650603

**ANN:**

- Accuracy Score: 0.9355322338830585
- Precision Score: 0.9090909090909091
- Recall Score: 0.6185567010309279
- F1 Score: 0.7361963190184049

 
### **CONCLUSION**

The score of **Artificial Neural Network and Random Forest Classifier** in classifying the customer churn has been observed more active and accurate resulting in best estimators for such cases. Also, the KNN has been seen chasing both other classifiers with competitive scores but lagged behind with few percent declining the accuracy, but cannot be ignored for future improvements. Comparison between both shows, the **Artificial Neural Network(ANN)** is having greater precision score as well as F1-score which reflects its fine behavior in identifying the classes and predicting them positively with any fail.