# Understanding Telco Customer Churn

### A few definitions
- [Customer attrition](https://en.wikipedia.org/wiki/Customer_attrition#:~:text=Customer%20attrition%2C%20also%20known%20as,loss%20of%20clients%20or%20customers.&text=Gross%20attrition%20is%20the%20loss,services%20during%20a%20particular%20period.), also known as customer churn is the loss of customers or subscription to goods/services by a business
for a given period of time.
- Customer attrition rate is the number of customers lost at the end of the period against the number of customers the business had at the start of the period. 
- Gross attrition is the loss of revenue from churned customers
- Net attrition is the loss of revenue from churned customers including the benefits from expansion (new customers, upgrades...)
- Monthly Recurring Revenue (MRR) is the recurring revenue expected on monthly basis for the subscribed goods/services
- Gross Revenue Retention (GRR) rate measures the change in the MRR over the period, excluding benefits from expansion.
- Net Revenue Retention (NRR) rate measures the change in the MRR over the period, including benefits from expansion.

### Introduction

Churn is a critical metric for subscription and SaaS companies as it tells us how the departing customers affects the company's monthly revenue and growth, consequently investors' confidence in the company as well.  

The GRR is somewhat like a happiness indicator for the existing customers. Having high GRR shows that the company has high retention rates. Customers are happy with the services/products that they are provided with. Investors would be assured by this stability.  
If a company has high GRR and even higher NRR, it shows that on top of retaining existing customers, the company has grown its customer base further.  
High NRR coupled with low GRR implies that although the company has acquired many new customers, it has low retention rates.  
So even if there is still revenue left over after the churn, there is high potential the new customers might churn too. The growth of the company becomes less predictable.  

Telecommunication industry is highly sensitive to customer churns as technology advances and users' behaviour changed:
- with Mobile Number Portability (MNP), customers can easily switch to another provider while preserving their number
- OTT players such as Netflix, Amazon Prime Video, Disney+ are bypassing the traditional operators network such as cable, broadcast and satelite television
- OTT applications such as WhatsApp, Google Hangout, Skype are cannibalizing the paid voice and messaging services
- customers are less enticed to be contract bounded for handsets as new models get released so frequently

In this notebook, we will look at the customer churn in the telecommunication sector.  
Using the [Telco Customer Churn data](https://www.kaggle.com/blastchar/telco-customer-churn) from Kaggle, we explore the accuracy of 4 machine learning algorithms against the actual churn in the past month:  
- Logistic Regression Prediction
- Logistic Regression (SMOTE) Prediction
- Naive Bayes Prediction
- SVM Classifier Linear Prediction

Note: we train the models with last month's churn data using the algorithm provided in [Telecom Customer Churn Prediction](https://www.kaggle.com/pavanraj159/telecom-customer-churn-prediction).  
We compare the model's prediction against the same set of data for accuracy comparison.

Assuming that we wish to retain 90% NRR for this particular telco, we will explore with atoti the impact of each model on:
- Predicted revenue loss
- Number of customers to retain
- Expense spent to retain or replace customer

Finally, we use what-if simulation to see how the above will change when we change:
- the target NRR
- the budget spent on customer retention or replacement

### Things to install
pip install imblearn  
pip install sklearn

!pip install imblearn sklearn

Load packages

In [1]:
import os
import atoti as tt
import numpy as np
import pandas as pd
import glob
from _utils import data_utils, prediction
from imblearn.combine import SMOTEENN
from imblearn.over_sampling import SMOTE
from pandas_profiling import ProfileReport
from sklearn.cross_decomposition import PLSRegression
from sklearn.dummy import DummyClassifier
from sklearn.model_selection import GridSearchCV, train_test_split
from collections import Counter
import pickle

### Global variables

In [2]:
PROJECT_PATH = './'
DATA_PATH = './data/'
MODELS_PATH = './models/'

# STEP 1: Load the data

In [3]:
telcom = pd.read_csv(
    "https://data.atoti.io/notebooks/telco-churn/WA_Fn-UseC_-Telco-Customer-Churn.csv"
)
# perform data clean up
telcom = data_utils.data_cleanup(telcom)

print('Original data size: {}\n'.format(telcom.shape))
telcom.head(2)

Original data size: (7032, 22)



Unnamed: 0,CustomerID,Gender,SeniorCitizen,Partner,Dependents,Tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,TenureGroup
0,7590-VHVEG,Female,No,Yes,No,1,No,No phone service,DSL,No,...,No,No,No,Month-to-month,Yes,Electronic check,29.85,29.85,No,Tenure_0-12
1,5575-GNVDE,Male,No,No,No,34,Yes,No,DSL,Yes,...,No,No,No,One year,No,Mailed check,56.95,1889.5,No,Tenure_24-48


For this analysis, we have performed a series of transformations on the data:

- Processing of columns: encoding, normalization etc.
- Dimension reduction: Partial Least Squares and component selection

These preparation steps are optional. You can use a different approach that fits the best the models you want to use to make predictions.

Here,we load the transformed dataset.

In [4]:
telcom_transf = pd.read_csv('https://s3.eu-west-3.amazonaws.com/data.atoti.io/notebooks/telco-churn/Telco-Customer-Churn_transformed.csv')

print('Transformed data size: {}\n'.format(telcom_transf.shape))
telcom_transf.head(2)

Transformed data size: (7032, 4)



Unnamed: 0,LV1,LV2,LV3,Churn
0,-0.73034,-0.174424,0.396948,0
1,1.221606,0.230012,0.859353,0


# STEP 2: Load the models

In [5]:
filename = os.path.join(MODELS_PATH, 'dummy_unif_clf.sav')
dummy_unif_clf = pickle.load(open(filename, 'rb'))

In [6]:
filename = os.path.join(MODELS_PATH, 'dummy_strat_clf.sav')
dummy_strat_clf = pickle.load(open(filename, 'rb'))

In [7]:
filename = os.path.join(MODELS_PATH, 'dummy_major_clf.sav')
dummy_major_clf = pickle.load(open(filename, 'rb'))

In [8]:
filename = os.path.join(MODELS_PATH, 'gnb_clf.sav')
gnb_clf = pickle.load(open(filename, 'rb'))

In [9]:
filename = os.path.join(MODELS_PATH, 'lr_clf.sav')
lr_clf = pickle.load(open(filename, 'rb'))

In [10]:
filename = os.path.join(MODELS_PATH, 'svc_clf.sav')
svc_clf = pickle.load(open(filename, 'rb'))

# STEP 3: Machine Learning

You can expand the below sections to look at how we train the models below. As we referenced the algorithm, we will not explained it further. Our purpose is to analyse the prediction and its impact on the telco churn.

## Process the data

In [11]:
# since the statistics is based on previous month, Churn/Non Churn probability is fixed and therefore 1
telcom["ChurnProbability"] = 1.0
telcom["ChurnPredicted"] = telcom["Churn"]

In [12]:
cols = [c for c in telcom_transf.columns if c != 'Churn']
target_col = 'Churn'

X = telcom_transf[cols]
Y = telcom_transf[target_col]

## Models performance

#### Dummy Model - Uniform
This model predicts churn randomly

In [13]:
dummy_unif_clf = prediction.churn_prediction(
    dummy_unif_clf,
    X,
    Y,
    X.columns,
    "features",
    threshold_plot=True,
    coefs_or_features=False,
)

-------------------------------------------------------------------------------
DummyClassifier(strategy='uniform')
-------------------------------------------------------------------------------


 Classification report: 
               precision    recall  f1-score   support

           0       0.74      0.51      0.61      5163
           1       0.27      0.50      0.35      1869

    accuracy                           0.51      7032
   macro avg       0.51      0.51      0.48      7032
weighted avg       0.62      0.51      0.54      7032

F1 score:  0.35
ROC AUC:  0.51 



#### Dummy Model - Stratified
This model predicts churn by respecting the training set’s class distribution

In [14]:
dummy_strat_clf = prediction.churn_prediction(
    dummy_strat_clf,
    X,
    Y,
    X.columns,
    "features",
    threshold_plot=True,
    coefs_or_features=False,
)

-------------------------------------------------------------------------------
DummyClassifier(strategy='stratified')
-------------------------------------------------------------------------------


 Classification report: 
               precision    recall  f1-score   support

           0       0.74      0.74      0.74      5163
           1       0.27      0.27      0.27      1869

    accuracy                           0.61      7032
   macro avg       0.50      0.50      0.50      7032
weighted avg       0.61      0.61      0.61      7032

F1 score:  0.27
ROC AUC:  0.5 



#### Dummy Model - Most frequent
This model predicts the majority class (he most frequent label in the training set) all the time

In [15]:
dummy_major_clf = prediction.churn_prediction(
    dummy_major_clf,
    X,
    Y,
    X.columns,
    "features",
    threshold_plot=True,
    coefs_or_features=False,
)

-------------------------------------------------------------------------------
DummyClassifier(strategy='most_frequent')
-------------------------------------------------------------------------------


 Classification report: 
               precision    recall  f1-score   support

           0       0.73      1.00      0.85      5163
           1       0.00      0.00      0.00      1869

    accuracy                           0.73      7032
   macro avg       0.37      0.50      0.42      7032
weighted avg       0.54      0.73      0.62      7032

F1 score:  0.0
ROC AUC:  0.5 



  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))
  _warn_prf(average, modifier, msg_start, len(result))


#### Naive Bayes Model

Gaussian Naive Bayes algorithm can be used with the hypothesis that features are independent from each other and their distribution being Gaussian.

In [16]:
gnb_clf = prediction.churn_prediction(
    gnb_clf,
    X,
    Y,
    X.columns,
    "features",
    threshold_plot=True,
    coefs_or_features=False,
)

-------------------------------------------------------------------------------
GaussianNB()
-------------------------------------------------------------------------------


 Classification report: 
               precision    recall  f1-score   support

           0       0.85      0.87      0.86      5163
           1       0.62      0.58      0.60      1869

    accuracy                           0.79      7032
   macro avg       0.74      0.72      0.73      7032
weighted avg       0.79      0.79      0.79      7032

F1 score:  0.6
ROC AUC:  0.72 



#### Logistic Regression Model

In [17]:
lr_clf = prediction.churn_prediction(
    lr_clf,
    X,
    Y,
    X.columns,
    "features",
    threshold_plot=True,
    coefs_or_features=False,
)

-------------------------------------------------------------------------------
LogisticRegression(C=0.1, class_weight={0: 1, 1: 1.5}, random_state=0,
                   solver='newton-cg')
-------------------------------------------------------------------------------


 Classification report: 
               precision    recall  f1-score   support

           0       0.86      0.86      0.86      5163
           1       0.61      0.62      0.61      1869

    accuracy                           0.79      7032
   macro avg       0.74      0.74      0.74      7032
weighted avg       0.79      0.79      0.79      7032

F1 score:  0.61
ROC AUC:  0.74 



#### SVM Classifier Linear Model

**That cell will take a few minutes to run!**

In [18]:
svc_clf = prediction.churn_prediction(
    svc_clf,
    X,
    Y,
    X.columns,
    "features",
    threshold_plot=True,
    coefs_or_features=False,
)

-------------------------------------------------------------------------------
SVC(C=0.1, class_weight='balanced', gamma='auto', kernel='linear',
    probability=True)
-------------------------------------------------------------------------------


 Classification report: 
               precision    recall  f1-score   support

           0       0.90      0.75      0.82      5163
           1       0.53      0.78      0.63      1869

    accuracy                           0.76      7032
   macro avg       0.72      0.76      0.73      7032
weighted avg       0.80      0.76      0.77      7032

F1 score:  0.63
ROC AUC:  0.76 



**From the results above, we can see that the models are underfitting the data as the training and testing performance are the same and are both quite low (F1 score is less than 0.70).
Which is not surprising given that we only collected a few data corresponding to one month.
Thus, both can be improved by collecting more data.**

# STEP 3: Atoti's magic!
In this part, we leverage the power of Atoti to post-process the results of the models and perform ***scenarios simulations***

## Atoti cube creation  

We have a simple cube that holds a single data store - Customer store.

In [19]:
# a session has to be created for atoti
session = tt.create_session()

In [20]:
types = {"ChurnProbability": tt.type.FLOAT}

customer_store = session.read_pandas(
    telcom, keys=["CustomerID"], store_name="customer_store", types=types
)
customer_store.head()

Unnamed: 0_level_0,Gender,SeniorCitizen,Partner,Dependents,Tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,...,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,Churn,TenureGroup,ChurnProbability,ChurnPredicted
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
7590-VHVEG,Female,No,Yes,No,1,No,No phone service,DSL,No,Yes,...,No,Month-to-month,Yes,Electronic check,29.85,29.85,No,Tenure_0-12,1.0,No
5575-GNVDE,Male,No,No,No,34,Yes,No,DSL,Yes,No,...,No,One year,No,Mailed check,56.95,1889.5,No,Tenure_24-48,1.0,No
3668-QPYBK,Male,No,No,No,2,Yes,No,DSL,Yes,Yes,...,No,Month-to-month,Yes,Mailed check,53.85,108.15,Yes,Tenure_0-12,1.0,Yes
7795-CFOCW,Male,No,No,No,45,No,No phone service,DSL,Yes,No,...,No,One year,No,Bank transfer (automatic),42.3,1840.75,No,Tenure_24-48,1.0,No
9237-HQITU,Female,No,No,No,2,Yes,No,Fiber optic,No,No,...,No,Month-to-month,Yes,Electronic check,70.7,151.65,Yes,Tenure_0-12,1.0,Yes


In [21]:
cube = session.create_cube(customer_store, "customer_cube")

h = cube.hierarchies
m = cube.measures
l = cube.levels
cube

We create a simple function that helps change the dimension of our hierarchies as we group them into logical categories.

In [22]:
def set_dim(hier_name, dim_name):
    h[hier_name].dimension = dim_name

## Data classification using dimensions

We re-classify the hierarchies under Customer, Demographic, Account and Services.

In [23]:
customer_hierarchy = ["CustomerID", "Churn", "ChurnPredicted"]
[set_dim(hier_name, "Customer") for hier_name in customer_hierarchy]

demographic_hierarchy = ["Gender", "SeniorCitizen", "Partner", "Dependents"]
[set_dim(hier_name, "Demographic") for hier_name in demographic_hierarchy]

account_hierarchy = [
    "Tenure",
    "Contract",
    "PaperlessBilling",
    "PaymentMethod",
    "TenureGroup",
]
[set_dim(hier_name, "Account") for hier_name in account_hierarchy]

services_hierarchy = [
    "PhoneService",
    "MultipleLines",
    "InternetService",
    "OnlineSecurity",
    "OnlineBackup",
    "DeviceProtection",
    "TechSupport",
    "StreamingTV",
    "StreamingMovies",
]
[set_dim(hier_name, "Services") for hier_name in services_hierarchy]

cube

### Predictions and Scenario creation

Now that we have trained the model, we are going to load the full dataset into the model to get the corresponding prediction and probability. We run the function `model_scenario` to get the `ChurnPredicted` and `ChurnProbability`.

We assign the prediction from the data models to `ChurnPredicted`. If churn is predicted, we assign its corresponding probability from the data model to `ChurnProbability`. This is because we are only interested in cases where customers are churning.

We convert the binary result of `ChurnPredicted` to 'Yes' and 'No' value such that we can compare it against the actual `Churn`. If the prediction matches the actual churn, we assign the value 1 to its `PredictionAccuracy`. 

Thereafter, we load the resultant pandas dataframe into the `customer_store` as a [scenario](https://docs.atoti.io/0.4.1/tutorial/01-Basics.html#Source-simulation). 

In [24]:
# we run the same month data against the trained models and see its accuracy against the actual churn
def model_scenario(predictions, probabilities):

    churnProbability = np.amax(probabilities, axis=1)

    churn_forecast = telcom.copy().reset_index(drop=True)
    churn_forecast = churn_forecast.drop(["ChurnPredicted", "ChurnProbability"], axis=1)

    churn_forecast = pd.concat(
        [
            churn_forecast,
            pd.DataFrame(
                {"ChurnPredicted": predictions, "ChurnProbability": churnProbability}
            ),
        ],
        axis=1,
    )

    # we are not interested in the probability if it is predicted that the client will not churn
    churn_forecast["ChurnProbability"] = np.where(
        churn_forecast["ChurnPredicted"] == 1, churn_forecast["ChurnProbability"], 0
    )

    churn_forecast["ChurnPredicted"] = np.where(
        churn_forecast["ChurnPredicted"] == 1, "Yes", "No"
    )

    return churn_forecast

#### Scenario 1 - Using Naive Bayes Classifier

In [25]:
gnb_prediction = gnb_clf.predict(X)
gnb_probability = gnb_clf.predict_proba(X)

gnb_df = model_scenario(gnb_prediction, gnb_probability)
customer_store.scenarios["Naive Bayes Classifier"].load_pandas(gnb_df)

The store has been sampled because there are more than 10000 lines in the files to load and the appended lines. Call Session.load_all_data() to trigger the full load of the data.


#### Scenario 2 - Using Logistic Regression Classifier 

In [26]:
lr_prediction = lr_clf.predict(X)
lr_probability = lr_clf.predict_proba(X)

lr_df = model_scenario(lr_prediction, lr_probability)
customer_store.scenarios["Logistic Regression Classifier"].load_pandas(lr_df)

#### Scenario 3 - Using SVM Classifier

In [27]:
svm_prediction = svc_clf.predict(X)
svm_probability = svc_clf.predict_proba(X)

svm_df = model_scenario(svm_prediction, svm_probability)
customer_store.scenarios["SVM Classifier"].load_pandas(svm_df)

#### Scenario 4 - Using Dummy Uniform Classifier

In [28]:
dummy_unif_prediction = dummy_unif_clf.predict(X)
dummy_unif_probability = dummy_unif_clf.predict_proba(X)

dummy_unif_df = model_scenario(dummy_unif_prediction, dummy_unif_probability)
customer_store.scenarios["Dummy Uniform Classifier"].load_pandas(dummy_unif_df)

#### Scenario 5 - Using Dummy Straified Classifier

#### Scenario 6 - Using Majority Class Classifier

## Telco Churn Data Analysis

Large stores are sampled by default in atoti as we saw when we load the _Logistic Regression Prediction_ into the scenario.  
Now that we are ready with our analysis, let's [load all our data](https://docs.atoti.io/0.4.1/tutorial/02-Configuration.html#Sampling-mode).

In [29]:
session.load_all_data()

Let's have a quick overview of each prediction results.

In [30]:
session.visualize("Last month customer attrition by models")

This is just giving us the count for churned and retained customers. A more relevant view to asses our models it to look at F1-score, or Precision and Recall.

#### Model F1-score

Let's look at the F1-score of the algorithm as we compute the number of correctly predicted churn cases in the pivot table below.  
Looking at just the customers who churned, we see that _SVM Classifier_ has the highest percentage of correct prediction (Recall of 0.76), with 73 churns detected out of the 96.  
But, at the same time, SVM Classifier is the one with the most false positive (Precision of 0.53). Which means that 47% of the time it is predicting churn wrongly!

As a consequence, we will focus on the F1-score to compare our classifiers in the following paragraphs, as it takes into account Precision and Recall at the same time.

In [31]:
m["true positive"] = tt.agg.sum(
    tt.filter(
        tt.where(l["Churn"] == l["ChurnPredicted"], 1, 0), l["ChurnPredicted"] == "Yes"
    ),
    scope=tt.scope.origin(l["CustomerID"]),
)
m["true negative"] = tt.agg.sum(
    tt.filter(
        tt.where(l["Churn"] == l["ChurnPredicted"], 1, 0), l["ChurnPredicted"] == "No"
    ),
    scope=tt.scope.origin(l["CustomerID"]),
)

m["false positive"] = tt.agg.sum(
    tt.filter(
        tt.where(l["Churn"] != l["ChurnPredicted"], 1, 0), l["ChurnPredicted"] == "Yes"
    ),
    scope=tt.scope.origin(l["CustomerID"]),
)
m["false negative"] = tt.agg.sum(
    tt.filter(
        tt.where(l["Churn"] != l["ChurnPredicted"], 1, 0), l["ChurnPredicted"] == "No"
    ),
    scope=tt.scope.origin(l["CustomerID"]),
)

m["precision"] = m["true positive"] / (m["true positive"] + m["false positive"])
m["recall"] = m["true positive"] / (m["true positive"] + m["false negative"])
m["f1 score"] = 2 * (m["recall"] * m["precision"]) / (m["recall"] + m["precision"])

In [32]:
session.visualize("Last month customer churn F1-score by model")

The results here above show that Logistic Regression Classifier is the best model w.r.t F1 score.
Let's analyze if it would be the one bringing the highest revenue to the company.

#### Churn and MRR Analysis

In [75]:
session.visualize("Percentage customers churned last month")

We see that more than 25% customers churned last month. The telco would have lose all its customers in the few coming months if this attrition rate keeps up.  
Let's see the impact on this in the telco's net revenue retention (NRR).

Since we only have one month's data, we have a simple formula for calculating NRR:

# NRR = $\frac{MRR(initial) + Expansion - Churn}{MRR(initial)} $  

We use [`atoti.total`](https://docs.atoti.io/0.4.1/lib/atoti.html?#atoti.total) to get the total _MonthlyCharges_ for _MRR Initial_ across all the customers because we want to ignore all filters for this measure.  
The level `Churn` shows the actual status of customers churning, while the level `ChurnPredicted` reflects the predicted status from the models.  
We get the revenue lost by taking the _MonthlyCharges_ for customers who have churned or predicted to churn.

In [37]:
m["MRR Initial"] = tt.total(m["MonthlyCharges.SUM"], h["CustomerID"])

m["Actual RR Loss"] = tt.total(
    tt.filter(m["MonthlyCharges.SUM"], l["Churn"] == "Yes"), h["CustomerID"]
)

#  we use ChurnPredicted here instead of churn because we want to see the difference between the prediction and the actual churn
# We consider the mean cost of a churn being the same for all scenarios
churnMean = tt.agg.mean(
    tt.filter(m["MonthlyCharges.SUM"], l["Churn"] == "Yes"),
    scope=tt.scope.origin(l["CustomerID"]),
)

m["Churn MonthlyCharges.MEAN"] = tt.parent_value(
    churnMean, on=h["CustomerID"], total_value=churnMean
)

m["Predicted RR Loss"] = tt.agg.sum(
    tt.filter(m["Churn MonthlyCharges.MEAN"], l["ChurnPredicted"] == "Yes"),
    scope=tt.scope.origin(l["CustomerID"]),
)

m["NRR"] = (m["MRR Initial"] - m["Predicted RR Loss"]) / m["MRR Initial"]

From the below chart, we see the importance of choosing an appropriate predictive model w.r.t our business case. We could grossly over-estimate or under-estimate the loss if we are not careful with our projection.  
For one, _SVM Classifier_ would not be a good match based on the comparison below.

In [74]:
session.visualize("NRR - Actual vs Predicted")

#### Customer Retention Strategy

We have to reduce the rate of attrition before the telco loses all its customers.  
To do so, we have 2 potential strategies:
1. retain existing customers through better service offers or discounts etc
2. replace the churned customers through marketing effort

According to [Harvard Business Review](https://hbr.org/2014/10/the-value-of-keeping-the-right-customers), it can cost 5 to 25 times more money to replace a customer than retaining one.  
It is not realistic to assume we will be able to retain all customers, so let's make some assumptions:
1. We aim to achieve a target NRR of 90%
2. We compute the number of customers that we need to retain in order to achieve this target NRR
3. For each customer identified, we will set aside a budget of $100 for retention purpose
4. We do not know who has really churned yet

Let's start by creating a measure for our target NRR. This is so that we can change our target later in our simulations.

In [39]:
m["TargetNRR"] = 0.9

To achieve the target NRR, we compute the maximum loss possible.

In [40]:
m["Expected RR Loss"] = m["MRR Initial"] - (m["TargetNRR"] * m["MRR Initial"])

We define _Predicted RR Loss Overflow _ here as the amount of money between what we predicted we will be loosing and the maximum loss we can have in order to achieve the target NRR.  
We need this as a target revenue amount that we need to obtain from the customers that are either to be retained or replaced.

In [41]:
m["Predicted RR Loss Overflow"] = tt.total(
    m["Predicted RR Loss"] - m["Expected RR Loss"], h["CustomerID"]
)

Let's take the average _MonthlyCharges_ of those who are predicted to churn as the amount that each retained customer will give.  
Notice we use [`atoti.parent_value`](https://docs.atoti.io/0.4.1/lib/atoti.html?#atoti.parent_value) on the `ChurnMean`, this is because we need this value to be constant across all the customers in order to have a constant Predicted Churn Overflow .

By dividing the _Predicted RR Loss Overflow_ by the average _MonthlyCharges_, we get the target number of customers to retain for each algorithm.

In [42]:
m["Predicted Churn Overflow"] = tt.total(
    tt.math.ceil(m["Predicted RR Loss Overflow"] / m["Churn MonthlyCharges.MEAN"]),
    h["ChurnPredicted"],
)

Let's also create a measure for _Predicted Churn Count_ so that we can see how many customers are predicted to churn and how many we intend to retain.

In [43]:
m["Predicted Churn Count"] = tt.agg.sum(
    tt.filter(m["contributors.COUNT"], l["ChurnPredicted"] == "Yes"),
    scope=tt.scope.origin(l["CustomerID"]),
)

In [73]:
session.visualize("Predicted churn vs Predicted Churn Overflow")

Now that we know the estimated number of customers to retain, how do we identify who to retain?  
During data clean up, we ensured that only predicted churn customers have a value under _ChurnProbability_ and therefore eligible for retention.  
However, we increase this possibility in _Churn Score_ if the _MonthyCharges_ of the customer is higher or equal to the average as the higher their recurring charges, the less customers we need to retain.

We will be ranking the customers who has the highest possibility to churn by using [`atoti.rank`](https://docs.atoti.io/0.4.1/lib/atoti.html?#atoti.rank).  
This allows us to identify the customers to be retained as those whose rank is lesser or equal to the Predicted Churn Overflow .

In [45]:
# we only rank those customers who are churning. We give higher weightage to customer with higher charge so as to minimize the lost
m["Churn Score"] = tt.where(
    (m["MonthlyCharges.MEAN"] >= m["Churn MonthlyCharges.MEAN"])
    & (m["ChurnProbability.MEAN"] > 0),
    m["ChurnProbability.MEAN"] + 1,
    m["ChurnProbability.MEAN"],
)

m["Churn Rank"] = tt.rank(
    m["Churn Score"], h["CustomerID"], ascending=False, apply_filters=True
)

So let's set our _Retention budget_ and _New Customer budget _.

For the customers identified for retention, we create a measure _Retention cost_ where we spend the _Retention budget_ and a measure _New Customer cost_ where we spend the _New Customer budget_.

In [46]:
m["Retention budget"] = 100
m["New Customer budget"] = 500

# we spent $100 on each of the customers identified and managed to retain all of them
m["Retention cost"] = tt.agg.sum(
    tt.where(
        (m["Churn Rank"] <= m["Predicted Churn Overflow"]) & (m["Churn Score"] > 0),
        m["Retention budget"],
        0,
    ),
    scope=tt.scope.origin(l["CustomerID"]),
)

# we retained none of the customers, hence spending $500 to recruit number of new customers equivalent to the Predicted Churn Overflow
m["New Customer cost"] = tt.agg.sum(
    tt.where(
        (m["Churn Rank"] <= m["Predicted Churn Overflow"]) & (m["Churn Score"] > 0),
        m["New Customer budget"],
        0,
    ),
    scope=tt.scope.origin(l["CustomerID"]),
)

Based on the predicted churn, we see the expenses that we needed to maintain 90% NRR either by retaining or replacing the Predicted Churn Overflow.  
We see the amount of saving we will have if we retained successfully our target customers.  
We also see that _Naive Bayes Classifier_ allows us to maintain the same NRR with the lowest amount of money.

In [72]:
session.visualize("Cost to maintain 90% NRR")

#### Reality check

Now let's compare our prediction against the actual churn results.

We assume that those who we attempt to retain did not churn, the retention campaign is successful.  
However, there are those who we didn't predict they will churn, and in fact churned.  
We compute the new revenue lost on these assumption and understanding.

In [48]:
# Churned customers that were not targeted by the campaign
m["After Campaign RR Loss"] = tt.agg.sum(
    tt.where(
        m["Retention cost"] == 100,
        0,
        tt.where(l["Churn"] == "Yes", m["Churn MonthlyCharges.MEAN"], 0),
    ),
    scope=tt.scope.origin(l["CustomerID"]),
)

m["After Campaign NRR"] = (m["MonthlyCharges.SUM"] - m["After Campaign RR Loss"]) / m[
    "MonthlyCharges.SUM"
]

In the table below, we see that NRR across the different prediction is above 80%, with _SVM Classifier_ retaining the most revenue (ignoring the base scenario).  
However, it also incurrs the most expenses in retention which we saw earlier. This could be explained by the fact that it has a larger Predicted Churn Overflow  of 168 compared to the rest.  
This greatly increases the chances of identifying the correct customer who will churn compared to the rest of the prediction models.

In [71]:
session.visualize("NRR after retention effort")

There exists a possibility that a customer has been predicted to churn but in fact, did not. In this case, the retention budget is kind of wasted.  
We can see this in _SVM Classifier_ where it predicted 199 customers churning but only 96 of these customers actually churned.  

Let's see how much of the expense was actually well-spent.

In [50]:
# Churned customer targeted by the campaign
m["Successful Retention Cost"] = tt.agg.sum(
    tt.where(
        (m["Retention cost"] == 100) & (l["Churn"] == "Yes"), m["Retention cost"], 0
    ),
    scope=tt.scope.origin(l["CustomerID"]),
)

In [70]:
session.visualize("Successful retention with respect to cost")

For those models that didn't meet the 90% target NRR, chances are that it identified customers who didn't actually churn within the retention list.  
Now we need to make up for this gap in the NRR, let's see how many new customers we need to recruit.  

Again, we compute the difference between our _After Campaign RR Loss_ with the _Expected RR Loss_ to see how much revenue we need to replace.

In [52]:
gap_to_target_nrr = m["TargetNRR"] - m["After Campaign NRR"]

m["Gap in revenue loss"] = m["After Campaign RR Loss"] - m["Expected RR Loss"]

m["Clients to replace"] = tt.total(
    tt.where(
        m["Gap in revenue loss"] > 0,
        tt.math.ceil(m["Gap in revenue loss"] / m["Churn MonthlyCharges.MEAN"]),
        0,
    ),
    h["ChurnPredicted"],
)

Money spent on retention is already spent. We have to add on marketing expense for replacing the customers.  
Let's compute the _Actual Expense_.

In [53]:
m["Actual New Customer budget"] = m["Clients to replace"] * m["New Customer budget"]
m["Actual Expense"] = m["Retention cost"] + m["Actual New Customer budget"]

When we look at the NRR which are now above 90%, _Logistic Regression Classifier_ turns out to be the most cost effective model.  

In [69]:
session.visualize()

Now if we compare _Logistic Regression Classifier_ against _SVM Classifier_ that has the highest NRR, we can see that we have 7% increase in revenue lost and we have to spend 6% more to achieve 90% NRR. 
So, despite not giving the best performance (F1 score) at first sight SVM Classifier brings better value to the Telco.

Here below, we compare SVM Classifier with Logistic Regression Classifier which is the best performing model.

In [65]:
session.visualize("SVM Classifier vs Logistic Regression Classifier")

## What-if we want 95% NRR?

We can easily setup the simulation that allow us to replace the _TargetNRR_.

In [56]:
NRR_simulation = cube.setup_simulation(
    "NRR Simulation", base_scenario="90% NRR", replace=[m["TargetNRR"]],
).scenarios

NRR_simulation["95% NRR"] = 0.95

We see the expense to achieve 95% NRR is close to 1.5 times the expense to achieve 90% NRR.  
All the predictions have a gap with the targeted NRR, hence requires new recruitment of customers to replace those who have churn.  
This consequently results in higher expenses.

In [66]:
session.visualize()

## What-if New Customer budget is twice the expected?

We can easily setup the simulation that allow us to scale the _New Customer budget_.

In [58]:
marketing_budget_simulation = cube.setup_simulation(
    "New Customer budget Simulation",
    base_scenario="5 x Retention",
    multiply=[m["New Customer budget"]],
).scenarios

We create a scenario where we multiply _New Customer budget_ by 2.

In [59]:
marketing_budget_simulation["10 x Retention"] = 2

We see the Final NRR and Actual Expense being recomputed. This increase in _New Customer budget _ does not impact the _Naive Bayes Prediction_ model as its NRR exceeds 90% if the retention rate is 100%.  

In [68]:
session.visualize()