# Understanding Telco Customer Churn

### A few definitions
- [Customer attrition](https://en.wikipedia.org/wiki/Customer_attrition#:~:text=Customer%20attrition%2C%20also%20known%20as,loss%20of%20clients%20or%20customers.&text=Gross%20attrition%20is%20the%20loss,services%20during%20a%20particular%20period.), also known as customer churn is the loss of customers or subscription to goods/services by a business
for a given period of time.
- Customer attrition rate is the number of customers lost at the end of the period against the number of customers the business had at the start of the period. 
- Gross attrition is the loss of existing customers and their recurring revenue for contracted goods or services during the period
- Net attrition is the loss of customers while factoring the gain of new customers within the same group and location
- Monthly Recurring Revenue (MRR) is the recurring revenue expected on monthly basis for the subscribed goods/services
- Net Revenue Retention (NRR) measures the change in the MRR over the period (includes benefits from expansion)
- Gross Revenue Retention (GRR) measures annual revenue lost from the business' existing customer base (excludes benefits from expansion)

### Introduction

Churn is a critical metric for subscription and SaaS companies as it tells us how the departing customers affects the company's monthly revenue and growth, consequently investors' confidence in the company as well.  

The GRR is somewhat like a happiness indicator for the existing customers. Having high GRR shows that the company has high retention rates. Customers are happy with the services/products that they are provided with. Investors would be assured by this stability.  
If a company has high GRR and even higher NRR, it shows that on top of retaining existing customers, the company has grown its customer base further.  
High NRR coupled with low GRR implies that although the company has acquired many new customers, it has low retention rates.  
So even if there is still revenue left over after the churn, there is high potential the new customers might churn too. The growth of the company becomes less predictable.  

Telecommunication industry is highly sensitive to customer churns as technology advances and users' behaviour changed:
- with Mobile Number Portability (MNP), customers can easily switch to another provider while preserving their number
- OTT players such as Netflix, Amazon Prime Video, Disney+ are bypassing the traditional operators network such as cable, broadcast and satelite television
- OTT applications such as WhatsApp, Google Hangout, Skype are cannibalizing the paid voice and messaging services
- customers are less enticed to be contract bounded for handsets as new models get released so frequently

In this notebook, we will look at the customer churn in the telecommunication sector.  
Using the [Telco Customer Churn data](https://www.kaggle.com/blastchar/telco-customer-churn) from Kaggle, we explore the accuracy of 4 machine learning algorithms against the actual churn in the past month:  
- Logistic Regression Prediction
- Logistic Regression (SMOTE) Prediction
- Naive Bayes Prediction
- SVM Classifier Linear Prediction

Note: we train the models with last month's churn data using the algorithm provided in [Telecom Customer Churn Prediction](https://www.kaggle.com/pavanraj159/telecom-customer-churn-prediction).  
We compare the model's prediction against the same set of data for accuracy comparison.

Assuming that we wish to retain 90% NRR for this particular telco, we will explore with atoti the impact of each model on:
- Predicted revenue loss
- Number of customers to retain
- Expense spent to retain or replace customer

Finally, we use what-if simulation to see how the above will change when we change:
- the target NRR
- the budget spent on customer retention or replacement

### Things to install
pip install imblearn  
pip install sklearn

In [1]:
!pip install imblearn sklearn

Collecting imblearn
  Using cached imblearn-0.0-py2.py3-none-any.whl (1.9 kB)
Processing c:\users\yeo hui fang\appdata\local\pip\cache\wheels\22\0b\40\fd3f795caaa1fb4c6cb738bc1f56100be1e57da95849bfc897\sklearn-0.0-py2.py3-none-any.whl
Collecting imbalanced-learn
  Using cached imbalanced_learn-0.7.0-py3-none-any.whl (167 kB)
Collecting scikit-learn
  Downloading scikit_learn-0.23.1-cp38-cp38-win_amd64.whl (6.8 MB)
Collecting scipy>=0.19.1
  Downloading scipy-1.5.1-cp38-cp38-win_amd64.whl (31.4 MB)
Collecting joblib>=0.11
  Downloading joblib-0.16.0-py3-none-any.whl (300 kB)
Collecting threadpoolctl>=2.0.0
  Downloading threadpoolctl-2.1.0-py3-none-any.whl (12 kB)
Installing collected packages: scipy, joblib, threadpoolctl, scikit-learn, imbalanced-learn, imblearn, sklearn
Successfully installed imbalanced-learn-0.7.0 imblearn-0.0 joblib-0.16.0 scikit-learn-0.23.1 scipy-1.5.1 sklearn-0.0 threadpoolctl-2.1.0


In [2]:
import atoti as tt
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split

from _utils import data_utils, prediction

Welcome to atoti 0.4.2!

By using this community edition, you agree with the license available at https://www.atoti.io/eula.
Browse the official documentation at https://docs.atoti.io.
Join the community at https://www.atoti.io/register.

You can hide this message by setting the ATOTI_HIDE_EULA_MESSAGE environment variable to True.


# Data preparation

Using the [Telco Customer Churn data from Kaggle](https://www.kaggle.com/blastchar/telco-customer-churn), we perform the data clean up just as demonstrated in [Telecom Customer Churn Prediction](https://www.kaggle.com/pavanraj159/telecom-customer-churn-prediction).

In [3]:
telcom = pd.read_csv(
    "https://data.atoti.io/notebooks/telco-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv"
)
# perform data clean up
telcom = data_utils.data_manipulation(telcom)

We create a few new columns in preparation for the machine learning output.  
In the actual churn data, `ChurnProbability` is fixed as the customers have already churned. Hence we gave the probability a value 1.  
The same goes for the `PredictionAccuracy`. The `ChurnPredicted` would be the actual churn in this base use case.

In [4]:
# since the statistics is based on previous month, Churn/Non Churn probability is fixed and therefore 1
telcom["ChurnProbability"] = 1.0
telcom["PredictionAccuracy"] = 1.0
telcom["ChurnPredicted"] = telcom["Churn"]

telcom.head()

Unnamed: 0,CustomerID,Gender,SeniorCitizen,Partner,Dependents,Tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,TenureGroup,ChurnProbability,PredictionAccuracy,ChurnPredicted
0,7590-VHVEG,Female,No,Yes,No,1,No,No phone service,DSL,No,...,Month-to-month,Yes,Electronic check,29.85,29.85,No,Tenure_0-12,1.0,1.0,No
1,5575-GNVDE,Male,No,No,No,34,Yes,No,DSL,Yes,...,One year,No,Mailed check,56.95,1889.5,No,Tenure_24-48,1.0,1.0,No
2,3668-QPYBK,Male,No,No,No,2,Yes,No,DSL,Yes,...,Month-to-month,Yes,Mailed check,53.85,108.15,Yes,Tenure_0-12,1.0,1.0,Yes
3,7795-CFOCW,Male,No,No,No,45,No,No phone service,DSL,Yes,...,One year,No,Bank transfer (automatic),42.3,1840.75,No,Tenure_24-48,1.0,1.0,No
4,9237-HQITU,Female,No,No,No,2,Yes,No,Fiber optic,No,...,Month-to-month,Yes,Electronic check,70.7,151.65,Yes,Tenure_0-12,1.0,1.0,Yes


In the below function that will be used on the predictions from each data model, we see how the 3 new columns above are populated.  

We assign the prediction from the data models to `ChurnPredicted`.  \
If churn is predicted, we assign its corresponding probability from the data model to `ChurnProbability`.  \
This is because we are only interested in cases where customers are churning.  

We convert the binary result of `ChurnPredicted` to 'Yes' and 'No' value such that we can compare it against the actual `Churn`.  \
If the prediction matches the actual churn, we assign the value 1 to its `PredictionAccuracy`.  

These are in preparation for our model comparison against the actual churned cases.

In [5]:
# we run the same month data against the trained models and see its accuracy against the actual churn
def model_scenario(predictions, probabilities):
    churnProbability = np.amax(probabilities, axis=1)
    churn_forecast = (telcom.copy()).drop(
        ["ChurnPredicted", "ChurnProbability", "PredictionAccuracy"], axis=1
    )

    churn_forecast = pd.concat(
        [
            churn_forecast,
            pd.DataFrame(
                {"ChurnPredicted": predictions, "ChurnProbability": churnProbability}
            ),
        ],
        axis=1,
    )

    # we are not interested in the probability if it is predicted that the client will not churn
    churn_forecast["ChurnProbability"] = np.where(
        churn_forecast["ChurnPredicted"] == 1, churn_forecast["ChurnProbability"], 0
    )

    churn_forecast["ChurnPredicted"] = np.where(
        churn_forecast["ChurnPredicted"] == 1, "Yes", "No"
    )

    churn_forecast["PredictionAccuracy"] = (
        churn_forecast["ChurnPredicted"] == churn_forecast["Churn"]
    ).astype(int)

    return churn_forecast

## atoti cube creation  

We have a simple cube that holds a single data store - Customer store.

In [6]:
# a session has to be created for atoti
session = tt.create_session()

In [7]:
types = {"ChurnProbability": tt.types.FLOAT}

customer_store = session.read_pandas(
    telcom, keys=["CustomerID"], store_name="customer_store", types=types
)
customer_store.head()

Unnamed: 0_level_0,Gender,SeniorCitizen,Partner,Dependents,Tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,...,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,TenureGroup,ChurnProbability,PredictionAccuracy,ChurnPredicted
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
7590-VHVEG,Female,No,Yes,No,1,No,No phone service,DSL,No,Yes,...,Month-to-month,Yes,Electronic check,29.85,29.85,No,Tenure_0-12,1.0,1.0,No
5575-GNVDE,Male,No,No,No,34,Yes,No,DSL,Yes,No,...,One year,No,Mailed check,56.95,1889.5,No,Tenure_24-48,1.0,1.0,No
3668-QPYBK,Male,No,No,No,2,Yes,No,DSL,Yes,Yes,...,Month-to-month,Yes,Mailed check,53.85,108.15,Yes,Tenure_0-12,1.0,1.0,Yes
7795-CFOCW,Male,No,No,No,45,No,No phone service,DSL,Yes,No,...,One year,No,Bank transfer (automatic),42.3,1840.75,No,Tenure_24-48,1.0,1.0,No
9237-HQITU,Female,No,No,No,2,Yes,No,Fiber optic,No,No,...,Month-to-month,Yes,Electronic check,70.7,151.65,Yes,Tenure_0-12,1.0,1.0,Yes


In [8]:
cube = session.create_cube(customer_store, "customer_cube")

h = cube.hierarchies
m = cube.measures
l = cube.levels
cube

We create a simple function that helps change the dimension of our hierarchies as we group them into logical categories.

In [9]:
def set_dim(hier_name, dim_name):
    h[hier_name].dimension = dim_name

## Data classification using dimensions

We re-classify the hierarchies under Customer, Demographic, Account and Services.

In [10]:
customer_hierarchy = ["CustomerID", "Churn", "ChurnPredicted"]
[set_dim(hier_name, "Customer") for hier_name in customer_hierarchy]

demographic_hierarchy = ["Gender", "SeniorCitizen", "Partner", "Dependents"]
[set_dim(hier_name, "Demographic") for hier_name in demographic_hierarchy]

account_hierarchy = [
    "Tenure",
    "Contract",
    "PaperlessBilling",
    "PaymentMethod",
    "TenureGroup",
]
[set_dim(hier_name, "Account") for hier_name in account_hierarchy]

services_hierarchy = [
    "PhoneService",
    "MultipleLines",
    "InternetService",
    "OnlineSecurity",
    "OnlineBackup",
    "DeviceProtection",
    "TechSupport",
    "StreamingTV",
    "StreamingMovies",
]
[set_dim(hier_name, "Services") for hier_name in services_hierarchy]

cube

## Machine learning - Model training and Scenario creation

In order to start our model training, we do some data preprocessing by converting the labeled values into binary values.  
You can read up more on [LabelEncoder](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html).

In [11]:
# Columns to ignore for model training
ignore_col = ["CustomerID", "ChurnPredicted", "ChurnProbability", "PredictionAccuracy"]
# Target columns
target_col = ["Churn"]

binary_df = data_utils.data_preprocessing(telcom.copy())

We split the data into train and test data to be used by the machine learning algorithms.

In [12]:
train, test = train_test_split(binary_df, test_size=0.25, random_state=111)

##seperating dependent and independent variables
cols = [i for i in binary_df.columns if i not in ignore_col + target_col]
train_X = train[cols]
train_Y = train[target_col]
test_X = test[cols]
test_Y = test[target_col]

X = binary_df[cols]
Y = binary_df[target_col]

### Model Training

You can expand the below sections to look at how we train the models below. As we referenced the algorithm, we will not explained it further. Our purpose is to analyse the prediction and its impact on the telco churn.

#### Logistic Regression Model

In [13]:
from sklearn.linear_model import LogisticRegression

# predictions,probabilities = ml.run_predictive(telcom)
logit = LogisticRegression(
    C=1.0,
    class_weight=None,
    dual=False,
    fit_intercept=True,
    intercept_scaling=1,
    max_iter=100,
    multi_class="ovr",
    n_jobs=1,
    penalty="l2",
    random_state=None,
    solver="liblinear",
    tol=0.0001,
    verbose=0,
    warm_start=False,
)

logit = prediction.churn_prediction(
    logit, train_X, test_X, train_Y, test_Y, cols, "coefficients", threshold_plot=True
)

LogisticRegression(multi_class='ovr', n_jobs=1, solver='liblinear')

 Classification report : 
               precision    recall  f1-score   support

           0       0.83      0.91      0.87      1268
           1       0.69      0.53      0.60       490

    accuracy                           0.80      1758
   macro avg       0.76      0.72      0.73      1758
weighted avg       0.79      0.80      0.79      1758

Accuracy   Score :  0.8014789533560864
Area under curve :  0.7177557458314555 



#### Logistic Regression (SMOTE) Model

In [14]:
from imblearn.over_sampling import SMOTE

smote_X = binary_df[cols]
smote_Y = binary_df[target_col]

# Split train and test data
smote_train_X, smote_test_X, smote_train_Y, smote_test_Y = train_test_split(
    smote_X, smote_Y, test_size=0.25, random_state=111
)

# oversampling minority class using smote
os = SMOTE(random_state=0)
os_smote_X, os_smote_Y = os.fit_sample(smote_train_X, smote_train_Y)
os_smote_X = pd.DataFrame(data=os_smote_X, columns=cols)
os_smote_Y = pd.DataFrame(data=os_smote_Y, columns=target_col)

logit_smote = LogisticRegression(
    C=1.0,
    class_weight=None,
    dual=False,
    fit_intercept=True,
    intercept_scaling=1,
    max_iter=100,
    multi_class="ovr",
    n_jobs=1,
    penalty="l2",
    random_state=None,
    solver="liblinear",
    tol=0.0001,
    verbose=0,
    warm_start=False,
)

logit_smote = prediction.churn_prediction(
    logit_smote,
    os_smote_X,
    test_X,
    os_smote_Y,
    test_Y,
    cols,
    "coefficients",
    threshold_plot=True,
)

LogisticRegression(multi_class='ovr', n_jobs=1, solver='liblinear')

 Classification report : 
               precision    recall  f1-score   support

           0       0.87      0.82      0.84      1268
           1       0.59      0.69      0.64       490

    accuracy                           0.78      1758
   macro avg       0.73      0.75      0.74      1758
weighted avg       0.79      0.78      0.79      1758

Accuracy   Score :  0.7815699658703071
Area under curve :  0.754041395738106 



#### Naive Bayes Model

In [15]:
from sklearn.naive_bayes import GaussianNB

gnb = GaussianNB(priors=None)

gnb = prediction.churn_prediction_alg(gnb, os_smote_X, test_X, os_smote_Y, test_Y)

GaussianNB()

 Classification report : 
               precision    recall  f1-score   support

           0       0.90      0.73      0.81      1268
           1       0.53      0.80      0.64       490

    accuracy                           0.75      1758
   macro avg       0.72      0.76      0.72      1758
weighted avg       0.80      0.75      0.76      1758

Accuracy Score   :  0.7480091012514221
Area under curve :  0.7645850769329814


#### SVM Classifier Linear Model

In [16]:
from sklearn.svm import SVC

# Support vector classifier
# using linear hyper plane
svc_lin = SVC(
    C=1.0,
    cache_size=200,
    class_weight=None,
    coef0=0.0,
    decision_function_shape="ovr",
    degree=3,
    gamma=1.0,
    kernel="linear",
    max_iter=-1,
    probability=True,
    random_state=None,
    shrinking=True,
    tol=0.001,
    verbose=False,
)

svc_lin = prediction.churn_prediction(
    svc_lin,
    os_smote_X,
    test_X,
    os_smote_Y,
    test_Y,
    cols,
    "coefficients",
    threshold_plot=False,
)

SVC(gamma=1.0, kernel='linear', probability=True)

 Classification report : 
               precision    recall  f1-score   support

           0       0.87      0.82      0.84      1268
           1       0.60      0.67      0.63       490

    accuracy                           0.78      1758
   macro avg       0.73      0.75      0.74      1758
weighted avg       0.79      0.78      0.79      1758

Accuracy   Score :  0.7815699658703071
Area under curve :  0.7471544453743643 



### Predictions and Scenario creation

Now that we have trained the model, we are going to load the full dataset into the model to get the corresponding prediction and probability.  
We run the function `model_scenario` that we saw earlier on to get the `ChurnPredicted`, `ChurnProbability` and `PredictionAccuracy`.  \
Thereafter, we load the resultant pandas dataframe into the `customer_store` as a [scenario](https://docs.atoti.io/0.4.1/tutorial/01-Basics.html#Source-simulation). 

#### Scenario 1 - Logistic Regression Prediction 

In [17]:
lr_prediction = logit.predict(X)
lr_probability = logit.predict_proba(X)

logit_df = model_scenario(lr_prediction, lr_probability)
customer_store.scenarios["Logistic Regression Prediction"].load_pandas(logit_df)

The store has been sampled because there are more than 10000 lines in the files to load. Call Session.load_all_data() to trigger the full load of the data.


#### Scenario 2 - Logistic Regression (SMOTE) Prediction

In [18]:
smote_prediction = logit_smote.predict(X)
smote_probability = logit_smote.predict_proba(X)

smote_df = model_scenario(smote_prediction, smote_probability)
customer_store.scenarios["Logistic Regression (SMOTE) Prediction"].load_pandas(smote_df)

#### Scenario 3 - Naive Bayes Prediction

In [19]:
gnb_prediction = gnb.predict(X)
gnb_probability = gnb.predict_proba(X)

gnb_df = model_scenario(gnb_prediction, gnb_probability)
customer_store.scenarios["Naive Bayes Prediction"].load_pandas(gnb_df)

#### Scenario 4 - SVM Classifier Linear Prediction

In [20]:
svm_prediction = svc_lin.predict(X)
svm_probability = svc_lin.predict_proba(X)

svm_df = model_scenario(svm_prediction, svm_probability)
customer_store.scenarios["SVM Classifier Linear Prediction"].load_pandas(svm_df)

## Telco Churn Data Analysis

Large stores are sampled by default in atoti as we saw when we load the _Logistic Regression Prediction_ into the scenario.  
Now that we are ready with our analysis, let's [load all our data](https://docs.atoti.io/0.4.1/tutorial/02-Configuration.html#Sampling-mode).

In [21]:
session.load_all_data()

A quick comparison against the base, we see that 
- _Logistic Regression Prediction_ and _Logistic Regression (SMOTE) Prediction_ under predicted the actual churn figure by a difference of 419 and 410 respectively.  
- _Naive Bayes Prediction_ grossly overestimated the churns by about 1.5 times the value (1,143) 
- _SVM Classifier Linear Prediction_ provides the closest estimate by a difference of 339 though it's an overestimate. 

However, how reliable is this result?

In [23]:
cube.visualize("Number of customers churned last month by Tenure period")

Install and enable the atoti JupyterLab extension to see this widget.

#### Model Accuracy

Let's look at the accuracy of the algorithm as we compute the number of correctly predicted churn cases in the pivot table below.  
Looking at just the customers who churned, we see that _Logistic Regression Prediction_ has the highest percentage of correct prediction, 976 out of the 1,450 predicted true.  
In fact, _Logistic Regression Prediction_ has the highest Accuracy score of 0.80 during the model training and _Naive Bayes Model_ has the lowest accuracy score of 0.74.

In [24]:
cube.visualize("Algorithm Accuracy")

Install and enable the atoti JupyterLab extension to see this widget.

#### Churn and MRR Analysis

In [26]:
cube.visualize("Percentage customers churned last month")

Install and enable the atoti JupyterLab extension to see this widget.

We see that more than 25% customers churned last month. The telco would have lose all its customers in the few coming months if this attrition rate keeps up.  
Let's see the impact on this in the telco's net revenue retention (NRR).

Since we only have one month's data, we have a simple formula for calculating NRR:

# NRR = $\frac{MRR(initial) - Churn}{MRR(initial)} $  

We use [`atoti.total`](https://docs.atoti.io/0.4.1/lib/atoti.html?#atoti.total) to get the total _MonthlyCharges_ for _MRR Initial_ across all the customers because we want to ignore all filters for this measure.  
The level `Churn` shows the actual status of customers churning, while the level `ChurnPredicted` reflects the predicted status from the models.  
We get the revenue lost by taking the _MonthlyCharges_ for customers who have churned or predicted to churn.

In [27]:
m["MRR Initial"] = tt.total(m["MonthlyCharges.SUM"], on=h["CustomerID"])

m["Actual Revenue Lost"] = tt.total(
    tt.filter(m["MonthlyCharges.SUM"], l["Churn"] == "Yes"), on=h["CustomerID"]
)

#  we use ChurnPredicted here instead of churn because we want to see the difference between the prediction and the actual churn
m["Predicted Revenue Lost"] = tt.total(
    tt.filter(m["MonthlyCharges.SUM"], l["ChurnPredicted"] == "Yes"), on=h["CustomerID"]
)

m["NRR"] = (m["MRR Initial"] - m["Predicted Revenue Lost"]) / m["MRR Initial"]

From the below chart, we see the importance of choosing the correct machine learning model. We could grossly over-estimate or under-estimate the loss if we are not careful with our projection.  
For one, _Naive Bayes Prediction_ would not be a good match based on the comparison below.

In [29]:
cube.visualize("NRR - Actual vs Predicted")

Install and enable the atoti JupyterLab extension to see this widget.

#### Customer Retention Strategy

We have to reduce the rate of attrition before the telco loses all its customers.  
To do so, we have 2 potential strategies:
1. retain existing customers through better service offers or discounts etc
2. replace the churned customers through marketing effort

According to [Harvard Business Review](https://hbr.org/2014/10/the-value-of-keeping-the-right-customers), it can cost 5 to 25 times more money to replace a customer than retaining one.  
It is not realistic to assume we will be able to retain all customers, so let's make some assumptions:
1. We aim to achieve a target NRR of 90%
2. We compute the number of customers that we need to retain in order to achieve this target NRR
3. For each customer identified, we will set aside a budget of $100 for retention purpose
4. We do not know who has really churned yet

Let's start by creating a measure for our target NRR. This is so that we can change our target later in our simulations.

In [30]:
m["TargetNRR"] = 0.9

To achieve the target NRR, we compute the maximum loss possible.

In [31]:
m["Max Loss Possible"] = m["MRR Initial"] - (m["TargetNRR"] * m["MRR Initial"])

_Revenue Compensation_ here is the amount of money between what we predicted we will be lossing and the maximum loss we can have in order to achieve the target NRR.  
We needed this as a target revenue amount that we need to obtain from the customers that are either to be retained or replaced.

In [32]:
m["Revenue Compensation"] = m["Predicted Revenue Lost"] - m["Max Loss Possible"]

Let's take the average _MonthlyCharges_ of those who are predicted to churn as the amount that each retained customer will give.  
Notice we use [`atoti.parent_value`](https://docs.atoti.io/0.4.1/lib/atoti.html?#atoti.parent_value) on the `ChurnMean`, this is because we need this value to be constant across all the customers in order to have a constant retention size.

By dividing the _Revenue Compensation_ by the average _MonthlyCharges_, we get the target number of customers to retain for each algorithm.

In [33]:
churnMean = tt.agg.mean(
    tt.filter(m["MonthlyCharges.SUM"], l["ChurnPredicted"] == "Yes"),
    scope=tt.scope.origin(l["CustomerID"]),
)

m["Churn MonthlyCharges.MEAN"] = tt.parent_value(
    churnMean, on=h["CustomerID"], total_value=churnMean
)

m["Retention size"] = tt.total(
    tt.ceil(m["Revenue Compensation"] / m["Churn MonthlyCharges.MEAN"]),
    on=h["ChurnPredicted"],
)

Let's also create a measure for _Predicted Churn Count_ so that we can see how many customers are predicted to churn and how many we intent to retain.

In [34]:
m["Predicted Churn Count"] = tt.agg.sum(
    tt.filter(m["contributors.COUNT"], l["ChurnPredicted"] == "Yes"),
    scope=tt.scope.origin(l["CustomerID"]),
)

In [36]:
cube.visualize("Predicted churn vs retention size")

Install and enable the atoti JupyterLab extension to see this widget.

Now that we know the estimated number of customers to retain, how do we identify who to retain?  
During data clean up, we ensured that only predicted churn customers have a value under _ChurnProbability_ and therefore eligible for retention.  
However, we increase this possibility in _churnPositive_ if the _MonthyCharges_ of the customer is higher or equal to the average as the higher their recurring charges, the less customers we need to retain.

We will be ranking the customers who has the highest possibility to churn by using [`atoti.rank`](https://docs.atoti.io/0.4.1/lib/atoti.html?#atoti.rank).  
This allows us to identify the customers to be retained as those whose rank is lesser or equal to the retention size.

In [37]:
# we only rank those customers who are churning. We give higher weightage to customer with higher charge so as to minimize the lost
m["churnPositive"] = tt.where(
    (m["MonthlyCharges.MEAN"] >= m["Churn MonthlyCharges.MEAN"])
    & (m["ChurnProbability.MEAN"] > 0),
    m["ChurnProbability.MEAN"] + 1,
    m["ChurnProbability.MEAN"],
)

m["Churn Rank"] = tt.rank(
    m["churnPositive"], h["CustomerID"], ascending=False, apply_filters=True
)

So let's set our _Retention budget_ and _Marketing budget_.

For the customers identified for retention, we create a measure _Forecast Expense_ where we spend the _Retention budget_ and a measure _Full recruitment expense_ where we spend the _Marketing budget_.

In [38]:
m["Retention budget"] = 100
m["Marketing budget"] = 500

# we spent $100 on each of the customers identified and managed to retain all of them
m["Forecast expense"] = tt.agg.sum(
    tt.where(
        (m["Churn Rank"] <= m["Retention size"]) & (m["churnPositive"] > 0),
        m["Retention budget"],
        0,
    ),
    scope=tt.scope.origin(l["CustomerID"]),
)

# we retained none of the customers, hence spending $500 to recruit number of new customers equivalent to the retention size
m["Full recruitment expense"] = tt.agg.sum(
    tt.where(
        (m["Churn Rank"] <= m["Retention size"]) & (m["churnPositive"] > 0),
        m["Marketing budget"],
        0,
    ),
    scope=tt.scope.origin(l["CustomerID"]),
)

Based on the predicted churn, we see the expenses that we needed to maintain 90% NRR either to retain or replace the retention size.  
We see the amount of saving we will have if we retained successfully our target customers.  
We also see that _Logistic Regression Prediction_ allows us to maintain the same NRR with the lowest amount of money.

In [40]:
cube.visualize("Expenses on effort to maintain 90% NRR")

Install and enable the atoti JupyterLab extension to see this widget.

#### Reality check

Now let's compare our prediction against the actual churn results.

We assume that those who we attempt to retain did not churn, the retention campaign is successful.  
However, there are those who we didn't predict they will churn, and in fact churned.  
We compute the new revenue lost on these assumption and understanding.

In [41]:
# those that were not predicted correctly by the algorithm or those that we did not attempt to retain will make up the revenue lost
m["Post retention revenue lost"] = tt.agg.sum(
    tt.where(
        m["Forecast expense"] == 100,
        0,
        tt.where(l["Churn"] == "Yes", m["MonthlyCharges.SUM"], 0),
    ),
    scope=tt.scope.origin(l["CustomerID"]),
)
# m["NRR"] = (m["MonthlyCharges.SUM"] - m["Lost Revenue"]) / m["MonthlyCharges.SUM"]

m["Post retention NRR"] = (
    m["MonthlyCharges.SUM"] - m["Post retention revenue lost"]
) / m["MonthlyCharges.SUM"]

In the table below, we see that NRR across the different prediction is above 80%, with _Naive Bayes Prediction_ retaining the most revenue (ignoring the base scenario).  
However, it also incurrs the most expenses in retention which we saw earlier. This could be explained by the fact that it has a larger retention size of 2,354 compared to the rest.  
This greatly increases the chances of identifying the correct customer who will churn compared to the rest of the prediction models.

In [43]:
cube.visualize("NRR after retention effort")

Install and enable the atoti JupyterLab extension to see this widget.

There exists a possibility that a customer has been predicted to churn but in fact, did not. In this case, the retention budget is kind of wasted.  
We can see this in _Naive Bayes Prediction_ where it predicted 3,012 customers churning but only 1,508 of these customers actually churned.  

Let's see how much of the expense was actually well-spent.

In [44]:
# the number of $100 spent who had no intention to churn at all, which meant a wrong prediction is made
# we need to spend $500 to get replace these customers in order to achieve the 90% NRR
m["Successful Retention Expense"] = tt.agg.sum(
    tt.where(
        (m["Forecast expense"] == 100) & (l["Churn"] == "Yes"), m["Forecast expense"], 0
    ),
    scope=tt.scope.origin(l["CustomerID"]),
)

In [46]:
cube.visualize("Successful retention with respect to expenses")

Install and enable the atoti JupyterLab extension to see this widget.

For those models that didn't meet the 90% target NRR, chances are that it identified customers who didn't actually churn within the retention list.  
Now we need to make up for this gap in the NRR, let's see how many new customers we need to recruit.  

Again, we compute the difference between our _Post retention revenue lost_ with the _Max Loss Possible_ to see how much revenue we need to replace.

In [47]:
gap_to_target_nrr = m["TargetNRR"] - m["Post retention NRR"]

m["Gap in revenue lost"] = m["Post retention revenue lost"] - m["Max Loss Possible"]

m["Clients to replace"] = tt.total(
    tt.where(
        m["Gap in revenue lost"] > 0,
        tt.ceil(m["Gap in revenue lost"] / m["Churn MonthlyCharges.MEAN"]),
        0,
    ),
    on=h["ChurnPredicted"],
)

Money spent on retention is already spent. We have to add on marketing expense for replacing the customers.  
Let's compute the _Actual Expense_.

In [48]:
m["Actual Marketing budget"] = m["Clients to replace"] * m["Marketing budget"]
m["Actual Expense"] = m["Forecast expense"] + m["Actual Marketing budget"]

Surprisingly, when we look at the NRR which are now above 90%, _Logistic Regression (SMOTE) Prediction_ turns out to be the most cost effective model!  

In [50]:
cube.visualize()

Install and enable the atoti JupyterLab extension to see this widget.

Now if we compare _Logistic Regression (SMOTE) Prediction_ against _Naive Bayes Prediction_ that has the highest NRR, we see that for a 23% difference in revenue lost, we have to spend 29% more to achieve 90% NRR.  
However, do keep in mind that the expense (difference of \\$53,200) is one time off, while the revenue (difference of \\$10,487.41) is recurring. 
It will take approximately 5 months to earn back the extra expense spent.  
So we have to weight which model brings better value to the telco.

In [52]:
cube.visualize("Naive Bayes vs Logistic Regression (SMOTE)")

Install and enable the atoti JupyterLab extension to see this widget.

## What-if we want 95% NRR?

We can easily setup the simulation that allow us to replace the _TargetNRR_.

In [53]:
NRR_simulation = cube.setup_simulation(
    "NRR Simulation", base_scenario="90% NRR", replace=[m["TargetNRR"]],
).scenarios

NRR_simulation["95% NRR"] = 0.95

We see the expense to achieve 95% NRR is close to 1.5 times the expense to achieve 90% NRR.  
All the predictions have a gap with the targeted NRR, hence requires new recruitment of customers to replace those who have churn.  
This consequently results in higher expenses.

In [55]:
cube.visualize()

Install and enable the atoti JupyterLab extension to see this widget.

## What-if Marketing budget is twice the expected?

We can easily setup the simulation that allow us to scale the _Marketing budget_.

In [56]:
marketing_budget_simulation = cube.setup_simulation(
    "Marketing Budget Simulation",
    base_scenario="5 x Retention",
    multiply=[m["Marketing budget"]],
).scenarios

We create a scenario where we multiply _Marketing budget_ by 2.

In [57]:
marketing_budget_simulation["10 x Retention"] = 2

We see the Final NRR and Actual Expense being recomputed. This increase in _Marketing budget_ does not impact the _Naive Bayes Prediction_ model as its NRR exceeds 90% if the retention rate is 100%.  

In [59]:
cube.visualize()

Install and enable the atoti JupyterLab extension to see this widget.