## Project: Analysis and Classification of customer churn in telecommunication companies

### Author: Munezero Mihigo

### Date: 13 November 2021

## Project Objective

### The objective of the task is to predict whether the customer will churn or not.

## General Knowledge

### What is a customer churn?
 customer churn occurs when an existing customer, user, player, subscriber or any kind of return client stops doing business or ends the relationship with a company.

### Types of churn

#### Contractual churn:
 Contractual churn occurs when a customer is under contract for a service and decides to cancel their service. <br>Example: Cable TV, SaaS products (Software as a Service e.g. Dropbox).

#### Voluntary churn:
 Voluntary churn occurs when a user voluntarily cancels a service and includes prepaid cell phones, streaming subscriptions.

#### Non-contractual churn:
 Non-contractual churn ocurs when a customer is not under contract for a service and includes customer loyality at a retail location or online browsing.

#### Involuntary churn:
 Involuntary churn occurs when a churn occurs not at the request of the customer. For example: credit card expiration, utilities being shut off by the provider. <br>Most likely, you as a customer have cancelled a service for a variety of reasons including lack of usage, poor service or better price.

## Step-by-Step Process

1. data preprocessing <br>
2. One-hot Encoding <br>
3. Data Transformation (StandardScaler or MinMaxScaler) <br>
4. Grid Search and Cross-Validation with Decision Tree Classifier <br>
5. Tree diagram of the Decision Tree <br>
6. Confusion Matrix, Classification report, and ROC-AUCdata preprocessing <br>

#### Import libraries

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.tree import DecisionTreeClassifier, plot_tree
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
from sklearn.metrics import roc_curve, roc_auc_score, precision_score, recall_score, f1_score
from sklearn.metrics import plot_confusion_matrix

import warnings
warnings.filterwarnings('ignore')

In [2]:
df = pd.read_csv("../Data/Churn_Data_set.csv")
display(df.head(5))

Unnamed: 0,Account_Length,Vmail_Message,Day_Mins,Eve_Mins,Night_Mins,Intl_Mins,CustServ_Calls,Churn,Intl_Plan,Vmail_Plan,...,Day_Charge,Eve_Calls,Eve_Charge,Night_Calls,Night_Charge,Intl_Calls,Intl_Charge,State,Area_Code,Phone
0,128,25,265.1,197.4,244.7,10.0,1,no,no,yes,...,45.07,99,16.78,91,11.01,3,2.7,KS,415,382-4657
1,107,26,161.6,195.5,254.4,13.7,1,no,no,yes,...,27.47,103,16.62,103,11.45,3,3.7,OH,415,371-7191
2,137,0,243.4,121.2,162.6,12.2,0,no,no,no,...,41.38,110,10.3,104,7.32,5,3.29,NJ,415,358-1921
3,84,0,299.4,61.9,196.9,6.6,2,no,yes,no,...,50.9,88,5.26,89,8.86,7,1.78,OH,408,375-9999
4,75,0,166.7,148.3,186.9,10.1,3,no,yes,no,...,28.34,122,12.61,121,8.41,3,2.73,OK,415,330-6626


### Data Description

In [3]:
df.describe()

Unnamed: 0,Account_Length,Vmail_Message,Day_Mins,Eve_Mins,Night_Mins,Intl_Mins,CustServ_Calls,Day_Calls,Day_Charge,Eve_Calls,Eve_Charge,Night_Calls,Night_Charge,Intl_Calls,Intl_Charge,Area_Code
count,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0,3333.0
mean,101.064806,8.09901,179.775098,200.980348,200.872037,10.237294,1.562856,100.435644,30.562307,100.114311,17.08354,100.107711,9.039325,4.479448,2.764581,437.182418
std,39.822106,13.688365,54.467389,50.713844,50.573847,2.79184,1.315491,20.069084,9.259435,19.922625,4.310668,19.568609,2.275873,2.461214,0.753773,42.37129
min,1.0,0.0,0.0,0.0,23.2,0.0,0.0,0.0,0.0,0.0,0.0,33.0,1.04,0.0,0.0,408.0
25%,74.0,0.0,143.7,166.6,167.0,8.5,1.0,87.0,24.43,87.0,14.16,87.0,7.52,3.0,2.3,408.0
50%,101.0,0.0,179.4,201.4,201.2,10.3,1.0,101.0,30.5,100.0,17.12,100.0,9.05,4.0,2.78,415.0
75%,127.0,20.0,216.4,235.3,235.3,12.1,2.0,114.0,36.79,114.0,20.0,113.0,10.59,6.0,3.27,510.0
max,243.0,51.0,350.8,363.7,395.0,20.0,9.0,165.0,59.64,170.0,30.91,175.0,17.77,20.0,5.4,510.0


### Data Preprocessing