<img src="images/upGrad.png" alt="upGrad" align="Right" style="width: 200px;"/>
<img src="images/IIITB.jpeg" alt="IITB" align="Left" style="width: 200px;"/>

# Telecom Churn Case Study

- <b>Authors:</b> Karthik Premanand, Anish Mahapatra
- <b>Email-id:</b> karthikprem26@gmail.com, anishmahapatra01@gmail.com

<i>Machine Learning II > Group Case Study 1 </i>

# Problem Statement
The telecommunications insdustry on average experiences a churn of 15-25%. It costs 5-10 times more to acquire a new customer than to retain an existing one. Customer retention has now become more important than customer acquisition.To reduce customer churn, telecom companies need to predict which customers are at a high risk of churn.

The analysis involves customer-level data of a leading telecom firm, building predictive models to identify customers that are at a high risk of churn and to identify the main indicators of churn 

#### Methods of defining churn:
- Postpaid: Disrectly inform the operator
- Prepaid: Churn prediction is usually more critical (and non-trivial) for prepaid customers, and the term ‘churn’ should be defined carefully

- Revenue-Based churn:  Customers who have not utilised any revenue-generating facilities such as mobile internet, outgoing calls, SMS etc. over a given period of time.

The main shortcoming of this definition is that there are customers who only receive calls/SMSes from their wage-earning counterparts, i.e. they don’t generate revenue but use the services. For example, many users in rural areas only receive calls from their wage-earning siblings in urban areas.

- Usage-based churn: Customers who have not done any usage, either incoming or outgoing - in terms of calls, internet etc. over a period of time.

A potential shortcoming of this definition is that when the customer has stopped using the services for a while, it may be too late to take any corrective actions to retain them. For e.g., if you define churn based on a ‘two-months zero usage’ period, predicting churn could be useless since by that time the customer would have already switched to another operator.

In this project, we will use the <b>usage-based</b> definition to define churn.
    
#### High-value Churn
    
In the Indian and the southeast Asian market, approximately 80% of revenue comes from the top 20% customers (called high-value customers). Thus, if we can reduce churn of the high-value customers, we will be able to reduce significant revenue leakage.

In this project, you will define high-value customers and predict churn only on high-value customers.

#### Understanding Customer Behaviour During Churn
Customers usually do not decide to switch to another competitor instantly, but rather over a period of time (this is especially applicable to high-value customers). In churn prediction, we assume that there are three phases of customer lifecycle :

- The ‘good’ phase: In this phase, the customer is happy with the service and behaves as usual.

- The ‘action’ phase: The customer experience starts to sore in this phase, for e.g. he/she gets a compelling offer from a  competitor, faces unjust charges, becomes unhappy with service quality etc. In this phase, the customer usually shows different behaviour than the ‘good’ months. Also, it is crucial to identify high-churn-risk customers in this phase, since some corrective actions can be taken at this point (such as matching the competitor’s offer/improving the service quality etc.)

- The ‘churn’ phase: In this phase, the customer is said to have churned. You define churn based on this phase. Also, it is important to note that at the time of prediction (i.e. the action months), this data is not available to you for prediction. Thus, after tagging churn as 1/0 based on this phase, you discard all data corresponding to this phase.


#### Data Preparation:
1. Derive new features using business logic
2. Filter high-value customers: Those who have recharged with an amount more than or equal to X, where X is the 70th percentile of the average recharge amount in the first two months (the good phase).
After filtering the high-value customers, you should get about 29.9k rows.
3. Tag churners and remove attributes of the churn phase: 
Now tag the churned customers (churn=1, else 0) based on the fourth month as follows: Those who have not made any calls (either incoming or outgoing) AND have not used mobile internet even once in the churn phase. The attributes you need to use to tag churners are:
- total_ic_mou_9
- total_og_mou_9
- vol_2g_mb_9
- vol_3g_mb_9

After tagging churners, remove all the attributes corresponding to the churn phase (all attributes having ‘ _9’, etc. in their names).

#### Modelling:
Build models to predict churn. The predictive model that you’re going to build will serve two purposes:

1. It will be used to predict whether a high-value customer will churn or not, in near future (i.e. churn phase). By knowing this, the company can take action steps such as providing special plans, discounts on recharge etc. (PCA + + Class-Imbalance + Classification)

2. It will be used to identify important variables that are strong predictors of churn. These variables may also indicate why customers choose to switch to other networks.

You can take the following suggestive steps to build the model:

- Preprocess data (convert columns to appropriate formats, handle missing values, etc.)

- Conduct appropriate exploratory analysis to extract useful insights (whether directly useful for business or for eventual modelling/feature engineering).

- Derive new features.

- Reduce the number of variables using PCA.

- Train a variety of models, tune model hyperparameters, etc. (handle class imbalance using appropriate techniques).

- Evaluate the models using appropriate evaluation metrics. Note that is is more important to identify churners than the non-churners accurately - choose an appropriate evaluation metric which reflects this business goal.

- Finally, choose a model based on some evaluation metric.

Build another model with the main objective of identifying important predictor attributes which help the business understand indicators of churn. A good choice to identify important variables is a logistic regression model or a model from the tree family. In case of logistic regression, make sure to handle multi-collinearity.

After identifying important predictors, display them visually - you can use plots, summary tables etc. - whatever you think best conveys the importance of features.

Finally, recommend strategies to manage customer churn based on your observations.

### Goals of case Study:
The business objective is to predict the churn in the last (i.e. the ninth) month using the data (features) from the first three months. To do this task well, understanding the typical customer behaviour during churn will be helpful.


## Evaluation Rubric:
- Data Quality Checks - missing value imputation, removing duplicate data, data redundancies
- Data Quality Issues mentioned and clearly explained in comments
- EDA using plots and summaries
- Filter high-value customers
- Feature engineering is done rigorusly and correctly

#### 1. Data Cleaning
- Handle actual missing values
- Find percentage of missing values
- Drop columns that have a high percentage of missing values
- Drop columns that are highly skewed
- Impute data wherever required
- Check if target variable (Sales price) is normally distributed or not. If it is not normally distributed, perform transformation to make it normally distributed
- Create dummy variables from categorical columns
- Convert year column to find age by using Age = max(YearColumn) - year in the row
- Scale the data

#### 2. Modeling
- PCA or some sort of dimensionality reduction technique is used, inclusing the data preparations required for it
- Class imbalance is used for at least one of the techniques
- Model hyperparameters are tuned using correct principles and the approach is explained clearly
- A reasonable number and variety of different models are attempted and the best one is chosen based on key performance metrics
- Model evaluation is conducted using an appropriate metric
- Model evaluation results are at par with the best possible models on this data set

#### 3.  Factors affecting Customer Churn and Business Recommendations
- Important churn indicators are identified correctly
- Clear actionable recommendations are provided based on supporting evidence

#### Additional Notes:

- Data cleaning using advanced techniques such as iterative imputer, KNN IMputer
- Filter HVC. Use month 6 and month 7 data. Make 2 derived columns: 

total_rech_6 = total_calling_6 + total_data_6

total_rech_7 = total_calling_7 + total_data_7

avg_rech_6_7 = (total_rech_6 + total_rech_7) / 2

Find 70th percentile value here and filter out > 70 percentile customers - The numbr of rows will be around 30k 

- Derive churn using month 9. Any customer that has calling = 0 and data = 0, then churn = 1

calling_minutes_9 = calling_minutes_incoming_9 + calling_minutes_outgoing_9 is equal to zero AND

data_usage = data_usage_2g_9 + data_usage_3g_9 is zero, Then the customer is churned

Remove all the columns corresponding to month 9
- EDA, Train-test split, Outlier Treatment
- Class Imbalance Techniques - SMOTE, Weight of Class
- Modelling:
Interpretable Model: 1 technique -> Random Forest, Logistic Regression + RFE, 
Model with good prediction: With PCA -> At least 3 different techniques - Logistic Regression, Regularized regression, Decision Trees, Random Forest, SVM
Perform hyper-parameter tuning
- Look at sensitivity and make sure that Type 2 error is low

### Purpose of the notebook.

The purpose of the notebook as follows:
1. Predict the churn of the customer in the 9th month
2. Identify the factors that affect the churn of the customer

Following are the steps to the followed to perform the analysis:

- [#1 Data load, importing libraries & Sense Check of Data](#1)
- [#2 Data Cleaning, Missing Value Treatment](#2)
- [#3 Subsetting High-Value Customers using 70 percentile value](#3)
- [#4 Feature Engineering and derived columns](#4)
- [#5 Obtaining Customer Churn (1 - Churn, 0 - No Churn)](#5)
- [#6 Exploratory Data Analysis (EDA)](#6)
    - [#6.1 Univariate and Bivariate Analysis of Columns](#6.1)
    - [#6.2 Outlier Analysis of the Data](#6.2)
- [#7 Handling Class Imbalance in the data - SMOTE, Weight of Class](#7)
- [#8 Modelling - Part 1: Obtaining best churn classification](#8)
    - [#8.1 PCA](#8.1)
        - [#8.1.1 Standardizing the data](#8.1)
        - [#8.1.2 PCA on the dataset](#8.1)
        - [#8.1.3 Scree Plot](#8.1)
        - [#8.1.4 Plotting the Heatmap](#8.1)
        - [#8.1.5 Hopkins Statistic](#8.1)
        - [#8.1.6 Silhouette Score Plot](#8.1)
        - [#8.1.7 Elbow Curve](#8.1)
        - [#8.1.8 Test-Train Split](#8.1)
    - [#8.2 Algorithm 1: Logistic Regression + RFE](#8.2)
        - [#8.2.1 Recursive Feature Elimination](#8.2)
        - [#8.2.2 Model Iterations](#8.2)
        - [#8.2.3 Variance Inflation Factor (VIF)](#8.2)
        - [#8.2.4 Residual Analysis of Train Data](#8.2)
        - [#8.2.5 Making Predictions ](#8.2)
        - [#8.2.6 Analysis of Results - Sensitivity and Type 2 Error](#8.2)
    - [#8.3 Algorithm 2: Regularized Regression (Advanced Regression)](#8.3)
        - [#8.3.1 Analysis of Feature to be predicted](#8.3.1)
        - [#8.3.2 Ridge Regression](#8.3.2)
        - [#8.3.3 Lasso Regression](#8.3.3)
        - [#8.3.4 Analysis of Results - Sensitivity and Type 2 Error](#8.3.4)
    - [#8.4 Algorithm 3: Decision Trees](#8.4)
        - [#8.4.1 Decision Tree with default parameters ](#8.4)
        - [#8.4.2 Plotting the Decision Tree ](#8.4)
        - [#8.4.3 Hyperparameter Tuning](#8.4)
            - [#8.4.3.1 Tuning max_depth](#8.4)
            - [#8.4.3.2 Tuning min_samples_leaf ](#8.4)
            - [#8.4.3.3 Tuning min_samples_split ](#8.4)
         - [#8.4.4 Grid Search to find Optimal Hyperparameters](#8.4)
         - [#8.4.5 Running the model with best parameters obtained from Grid Search](#8.4)
         - [#8.4.6 Plotting the decision tree](#8.4)
- [#9 Modelling - Part 2: Interpretable Results](#9)
    - [#9.1 Logistic Regression + RFE](#9.1)
        - [#9.2.1 Recursive Feature Elimination](#9.2)
        - [#9.2.2 Model Iterations](#9.2)
        - [#9.2.3 Variance Inflation Factor (VIF)](#9.2)
        - [#9.2.4 Residual Analysis of Train Data](#9.2)
        - [#9.2.5 Making Predictions ](#9.2)
        - [#9.2.6 Analysis of Results - Sensitivity and Type 2 Error](#9.2)
- [#10 Model Output Discussion](#10)
- [#11 Outputs of the Analysis](#11)
- [#12 Business Recommendations to reduce churn](#12)


<a id='1'></a>
## #1 Data load, importing libraries & Sense Check of Data

<a id='2'></a>
## #2 Data Cleaning, Missing Value Treatment

<a id='3'></a>
## #3 Subsetting High-Value Customers using 70 percentile value

<a id='4'></a>
## #4 Feature Engineering and derived columns

<a id='5'></a>
## #5 Obtaining Customer Churn (1 - Churn, 0 - No Churn)

<a id='6'></a>
## #6 Exploratory Data Analysis (EDA)

<a id='6.1'></a>
### #6.1 Univariate and Bivariate Analysis of Columns

<a id='6.2'></a>
### #6.2 Outlier Analysis of the Data

<a id='7'></a>
## #7 Handling Class Imbalance in the data - SMOTE, Weight of Class

<a id='8'></a>
## #8 Modelling - Part 1: Obtaining best churn classification

<a id='8.1'></a>
## #8.1 PCA

### #8.1.1 Standardizing the data

### #8.1.2 PCA on the dataset


### #8.1.3 Scree Plot


### #8.1.4 Plotting the Heatmap


### #8.1.5 Hopkins Statistic


### #8.1.6 Silhouette Score Plot


### #8.1.7 Elbow Curve


### #8.1.8 Test-Train Split


<a id='8.2'></a>
## #8.2 Algorithm 1: Logistic Regression + RFE


### #8.2.1 Recursive Feature Elimination


### #8.2.2 Model Iterations


### #8.2.3 Variance Inflation Factor (VIF)


### #8.2.4 Residual Analysis of Train Data


### #8.2.5 Making Predictions


### #8.2.6 Analysis of Results - Sensitivity and Type 2 Error

<a id='8.3'></a>
## #8.3 Algorithm 2: Regularized Regression (Advanced Regression)

### #8.3.1 Analysis of Feature to be predicted

### #8.3.2 Ridge Regression

### #8.3.3 Lasso Regression

### #8.3.4 Analysis of Results - Sensitivity and Type 2 Error

<a id='8.4'></a>
## #8.4 Algorithm 3: Decision Trees


### #8.4.1 Decision Tree with default parameters


### #8.4.2 Plotting the Decision Tree


### #8.4.3 Hyperparameter Tuning


#### #8.4.3.1 Tuning max_depth


#### #8.4.3.2 Tuning min_samples_leaf


#### #8.4.3.3 Tuning min_samples_split 


### #8.4.4 Grid Search to find Optimal Hyperparameters


### #8.4.5 Running the model with best parameters obtained from Grid Search


### #8.4.6 Plotting the decision tree

<a id='9'></a>
## #9 Modelling - Part 2: Interpretable Results


### #9.1 Logistic Regression + RFE


### #9.2.1 Recursive Feature Elimination


### #9.2.2 Model Iterations


### #9.2.3 Variance Inflation Factor (VIF)


### #9.2.4 Residual Analysis of Train Data


### #9.2.5 Making Predictions


### #9.2.6 Analysis of Results - Sensitivity and Type 2 Error

<a id='10'></a>
## #10 Model Output Discussion

<a id='11'></a>
## #11 Outputs of the Analysis

<a id='12'></a>
## #12 Business Recommendations to reduce churn