
![Customer Churn](images/customer_churn.jpg)
***# **TELCO CUSTOMER CHURN ANALYSIS**
****


## **1. BUSINESS UNDERSTANDING**
***
###  1.1 OVERVIEW
***
Customer ‘churn’ is generally referred as 
customer changing or leaving the company 
services. In today's wor,ld businesses need  o
respond to changes in customer behaviour cau ed
by digital media, which can be achieved by thin ing
about and undnding ting consumer . urn* As per a study by Gartner (2012), Customer churn can impact the profitability of the company, a five per cent churn reduction can improve a company’s profitability by up to twenty-five per cent, hence understating and managing customer churn is important for the organiza

The telecommunications service industry has long been known for the intensity of competition and rivalry. While companies fiercely compete for customers, market share, and long-term survival (Kyei and Bayoh 2017), customers tend to switch operators repeatedly (Kumar et al. 2018) due to the lower financial costs associated with switching service providers. Considering that attracting new customers is both difficult and expensive, it is suggested that retaining the most valuable existing customers, and hence avoiding churn, should be given a higher priority than trying to attract new customers (Ahn et al. 2006; Hadden, Tiwari, Roy and Ruta 2007).
churn. tion.**


### 1.2 PROBLEM STATEMENT
***
A telecommunication company seeks to  identify the key factors that lead to customer attrition and to predict which customers are likely to churn in the future.

### 1.3 OBJECTIVES
***
#### 1.3.1 MAIN OBJECTIVES

The primary objective of this project is to develop a robust binary classification model capable of accurately predicting whether a customer will churn in the near future. This model aims to identify potential churners proactively, enabling targeted retention strategies and improving customer satisfaction and loyalty.

#### 1.3.2 SPECIFIC OBJECTIVES
1.  To identify the churn rate within the customer base
2.  To determine the demographic and service related factors that are most dtrongly correlated with customer churn
3.  To build a predictive model to identify customer at high risk of churning


### 1.4 SUCCESS METRICS
***


## DATA UNDERSTANDING
***

Here’s a table summarizing the dataset columns along with their descriptions:

| **Column Name**                       | **Description**                                                                                   |
|---------------------------------------|---------------------------------------------------------------------------------------------------|
| Customer ID                           | Unique identifier for each customer.                                                             |
| Gender                                | Demographic information about the customer (e.g., male, female).                                |
| Age                                   | Age of the customer.                                                                              |
| Married                               | Indicates whether the customer is married (yes/no).                                             |
| Number of Dependents                  | Number of dependents the customer has.                                                           |
| City                                  | The city where the customer resides.                                                              |
| Zip Code                              | Customer's postal code for more specific geographic analysis.                                     |
| Latitude                              | Geographic coordinate for the customer's location (for mapping).                                 |
| Longitude                             | Geographic coordinate for the customer's location (for mapping).                                 |
| Number of Referrals                   | Number of referrals made by the customer, indicating engagement.                                  |
| Tenure in Months                      | Duration of time the customer has been with the service, in months.                              |
| Offer                                 | Types of offers the customer has received (e.g., discounts, promotions).                        |
| Phone Service                         | Indicates if the customer uses phone services (yes/no).                                          |
| Avg Monthly Long Distance Charges      | Average monthly charges for long-distance calls.                                                  |
| Multiple Lines                        | Indicates whether the customer has multiple phone lines (yes/no).                                 |
| Internet Service                      | Type of internet service the customer has (e.g., DSL, fiber).                                    |
| Internet Type                         | Specific type of internet plan the customer subscribes to.                                        |
| Avg Monthly GB Download               | Average monthly data usage in gigabytes.                                                          |
| Online Security                       | Indicates if the customer has online security features (yes/no).                                  |
| Online Backup                         | Indicates if the customer has online backup services (yes/no).                                    |
| Device Protection Plan                | Indicates if the customer has a device protection plan (yes/no).                                  |
| Premium Tech Support                  | Availability of premium technical support (yes/no).                                              |
| Streaming TV                          | Indicates if the customer subscribes to streaming TV services (yes/no).                           |
| Streaming Movies                      | Indicates if the customer subscribes to streaming movie services (yes/no).                        |
| Streaming Music                       | Indicates if the customer subscribes to streaming music services (yes/no).                        |
| Unlimited Data                        | Indicates if the customer has an unlimited data plan (yes/no).                                    |
| Contract                              | Type of contract the customer is under (e.g., month-to-month, annual).                          |
| Paperless Billing                     | Indicates if the customer opts for paperless billing (yes/no).                                    |
| Payment Method                        | Method of payment used by the customer (e.g., credit card, PayPal).                              |
| Monthly Charge                        | Monthly cost of the service.                                                                      |
| Total Charges                         | Total amount paid by the customer to date.                                                       |
| Total Refunds                         | Total refunds issued to the customer.                                                             |
| Total Extra Data Charges              | Total extra charges for exceeding data limits.                                                   |
| Total Long Distance Charges           | Total charges for long-distance calls to date.                                                   |
| Total Revenue                         | Total revenue generated from the customer.                                                        |
| Customer Status                       | Indicates whether the customer is active or has churned.                                          |
| Churn Category                        | Category of churn (e.g., voluntary, involuntary).                                               |
| Churn Reason                          | Specific reasons provided by the customer for churn.                                             |

This structured table can help you understand and analyze the dataset effectively. If you need more details or specific analyses, feel free to ask!

### DATA PREPARATION
***
The following steps in summary shall be followed in the data preparation stage in preparation for Modeling in later stages

 **1. Data Loading**      
- Load the Datasets  
- Inspect the Data

 **2. Data Cleaning**
- Validity Check
- Consistency Check
- Uniformity Check
- Completeness Check
  
 **3. Exploratory Data Analysis**
- Understand Data Distribution     
- Identify Relationships - Univariate and Bivariate Analysis
- Handle High Cardinality Columns

***
#### 1. DATA LOADING 
***
The following was carried out 
1. Loading the Datasets  
2. Inspecting the Data


In [1]:
import os 
import numpy as np
import pandas as pd
from classes import *

# Libraries for visualizations
import folium
import seaborn as sns
import plotly.express as px
from IPython.display import display, HTML
import matplotlib.pyplot as plt
%matplotlib inline

# libraries for Model Preprocessing
from sklearn.model_selection import train_test_split
from sklearn.preprocessing   import StandardScaler, OneHotEncoder, LabelEncoder, MinMaxScaler
from imblearn.over_sampling  import SMOTE

# Libraries for Modeling
from sklearn.linear_model    import LogisticRegression
from sklearn.naive_bayes     import MultinomialNB
from sklearn.tree            import DecisionTreeClassifier
from sklearn.neighbors       import KNeighborsClassifier 
from sklearn.ensemble        import GradientBoostingClassifier,RandomForestClassifier
from xgboost                 import XGBClassifier
from sklearn.model_selection import RandomizedSearchCV, cross_val_score
from sklearn.metrics         import accuracy_score, f1_score,make_scorer, confusion_matrix,ConfusionMatrixDisplay
 

In [18]:
#Instantiate the loader class
data_loader = DataLoader()

# Loading the dataset
data=data_loader.read_data("customer_churn.csv")

# Instantiate the Information class
information=DataInfo(data)

# Inspecting the data
information.info()


Total Rows : 7043 
--------------------

Total Columns : 38 
--------------------

Column Names
--------------------
Index(['Customer ID', 'Gender', 'Age', 'Married', 'Number of Dependents',
       'City', 'Zip Code', 'Latitude', 'Longitude', 'Number of Referrals',
       'Tenure in Months', 'Offer', 'Phone Service',
       'Avg Monthly Long Distance Charges', 'Multiple Lines',
       'Internet Service', 'Internet Type', 'Avg Monthly GB Download',
       'Online Security', 'Online Backup', 'Device Protection Plan',
       'Premium Tech Support', 'Streaming TV', 'Streaming Movies',
       'Streaming Music', 'Unlimited Data', 'Contract', 'Paperless Billing',
       'Payment Method', 'Monthly Charge', 'Total Charges', 'Total Refunds',
       'Total Extra Data Charges', 'Total Long Distance Charges',
       'Total Revenue', 'Customer Status', 'Churn Category', 'Churn Reason'],
      dtype='object') 
 

Data Summary
--------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 704

Unnamed: 0,Age,Number of Dependents,Zip Code,Latitude,Longitude,Number of Referrals,Tenure in Months,Avg Monthly Long Distance Charges,Avg Monthly GB Download,Monthly Charge,Total Charges,Total Refunds,Total Extra Data Charges,Total Long Distance Charges,Total Revenue
count,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0,6361.0,5517.0,7043.0,7043.0,7043.0,7043.0,7043.0,7043.0
mean,46.509726,0.468692,93486.070567,36.197455,-119.756684,1.951867,32.386767,25.420517,26.189958,63.596131,2280.381264,1.962182,6.860713,749.099262,3034.379056
std,16.750352,0.962802,1856.767505,2.468929,2.154425,3.001199,24.542061,14.200374,19.586585,31.204743,2266.220462,7.902614,25.104978,846.660055,2865.204542
min,19.0,0.0,90001.0,32.555828,-124.301372,0.0,1.0,1.01,2.0,-10.0,18.8,0.0,0.0,0.0,21.36
25%,32.0,0.0,92101.0,33.990646,-121.78809,0.0,9.0,13.05,13.0,30.4,400.15,0.0,0.0,70.545,605.61
50%,46.0,0.0,93518.0,36.205465,-119.595293,0.0,29.0,25.69,21.0,70.05,1394.55,0.0,0.0,401.44,2108.64
75%,60.0,0.0,95329.0,38.161321,-117.969795,3.0,55.0,37.68,30.0,89.75,3786.6,0.0,0.0,1191.1,4801.145
max,80.0,9.0,96150.0,41.962127,-114.192901,11.0,72.0,49.99,85.0,118.75,8684.8,49.79,150.0,3564.72,11979.34



Dataset Overview
--------------------


Unnamed: 0,Customer ID,Gender,Age,Married,Number of Dependents,City,Zip Code,Latitude,Longitude,Number of Referrals,...,Payment Method,Monthly Charge,Total Charges,Total Refunds,Total Extra Data Charges,Total Long Distance Charges,Total Revenue,Customer Status,Churn Category,Churn Reason
0,0002-ORFBO,Female,37,Yes,0,Frazier Park,93225,34.827662,-118.999073,2,...,Credit Card,65.6,593.3,0.0,0,381.51,974.81,Stayed,,
1,0003-MKNFE,Male,46,No,0,Glendale,91206,34.162515,-118.203869,0,...,Credit Card,-4.0,542.4,38.33,10,96.21,610.28,Stayed,,
2,0004-TLHLJ,Male,50,No,0,Costa Mesa,92627,33.645672,-117.922613,0,...,Bank Withdrawal,73.9,280.85,0.0,0,134.6,415.45,Churned,Competitor,Competitor had better devices
3,0011-IGKFF,Male,78,Yes,0,Martinez,94553,38.014457,-122.115432,1,...,Bank Withdrawal,98.0,1237.85,0.0,0,361.66,1599.51,Churned,Dissatisfaction,Product dissatisfaction
4,0013-EXCHZ,Female,75,Yes,0,Camarillo,93010,34.227846,-119.079903,3,...,Credit Card,83.9,267.4,0.0,0,22.14,289.54,Churned,Dissatisfaction,Network reliability


***
**Initial Observations**
***
>- Dataset contains 7043 rows and 38 unique columns
>- There is presence of null values that need to be investigated
>- There is a need to confirm datatypes are appropriate

***
### DATA CLEANING
***

Data cleaning shall be carried out in the following steps:
1. Validity Check
2. Consistency Check
3. Uniformity Check
4. Completeness Check