# Understanding Telco Customer Churn
\[_In case you’re unable to see the atoti visualizations in GitHub, try viewing the notebook in [nbviewer](https://nbviewer.org/github/atoti/notebooks/blob/master/notebooks/customer-churn/main.ipynb)._]

### A few definitions
- [Customer attrition](https://en.wikipedia.org/wiki/Customer_attrition#:~:text=Customer%20attrition%2C%20also%20known%20as,loss%20of%20clients%20or%20customers.&text=Gross%20attrition%20is%20the%20loss,services%20during%20a%20particular%20period.), also known as customer churn is the loss of customers or subscription to goods/services by a business
for a given period of time.
- Customer attrition rate is the number of customers lost at the end of the period against the number of customers the business had at the start of the period. 
- Gross attrition is the loss of revenue from churned customers
- Net attrition is the loss of revenue from churned customers including the benefits from expansion (new customers, upgrades...)
- Monthly Recurring Revenue (MRR) is the recurring revenue expected on monthly basis for the subscribed goods/services
- Gross Revenue Retention (GRR) rate measures the change in the MRR over the period, excluding benefits from expansion.
- Net Revenue Retention (NRR) rate measures the change in the MRR over the period, including benefits from expansion.

### Introduction

Churn is a critical metric for subscription and SaaS companies as it tells us how the departing customers affects the company's monthly revenue and growth, consequently investors' confidence in the company as well.  

The GRR is somewhat like a happiness indicator for the existing customers. Having high GRR shows that the company has high retention rates. Customers are happy with the services/products that they are provided with. Investors would be assured by this stability.  
If a company has high GRR and even higher NRR, it shows that on top of retaining existing customers, the company has grown its customer base further.  
High NRR coupled with low GRR implies that although the company has acquired many new customers, it has low retention rates.  
So even if there is still revenue left over after the churn, there is high potential the new customers might churn too. The growth of the company becomes less predictable.  

Telecommunication industry is highly sensitive to customer churns as technology advances and users' behaviour changed:
- with Mobile Number Portability (MNP), customers can easily switch to another provider while preserving their number
- OTT players such as Netflix, Amazon Prime Video, Disney+ are bypassing the traditional operators network such as cable, broadcast and satelite television
- OTT applications such as WhatsApp, Google Hangout, Skype are cannibalizing the paid voice and messaging services
- customers are less enticed to be contract bounded for handsets as new models get released so frequently

In this notebook, we will look at the customer churn in the telecommunication sector.  
Using the [Telco Customer Churn data](https://www.kaggle.com/blastchar/telco-customer-churn) from Kaggle, we explore the accuracy of 4 machine learning algorithms against the actual churn in the past month:  
- Dummy Classifer Prediction
- Naive Bayes Prediction
- Logistic Regression Prediction
- SVM Classifier Linear Prediction

Note: we train the models with last month's churn data using the algorithm provided in [Telecom Customer Churn Prediction](https://www.kaggle.com/pavanraj159/telecom-customer-churn-prediction).  
We compare the model's prediction against the same set of data for accuracy comparison.

Assuming that we wish to retain 90% NRR for this particular telco, we will explore with atoti the impact of each model on:
- Predicted revenue loss
- Number of customers to retain
- Expense spent to retain or replace customer

Finally, we use what-if simulation to see how the above will change when we change:
- the target NRR
- the budget spent on customer retention or replacement

<div style="text-align: center;" ><a href="https://www.atoti.io/?utm_source=gallery&utm_content=drug-efficacy" target="_blank" rel="noopener noreferrer"><img src="https://data.atoti.io/notebooks/banners/discover.png" alt="Try atoti"></a></div>

### Things to install
pip install imblearn  
pip install sklearn

!pip install imblearn sklearn

Load packages

In [1]:
import glob
import os
import pickle
from collections import Counter

import atoti as tt
import numpy as np
import pandas as pd
from _utils import data_utils, prediction
from imblearn.combine import SMOTEENN
from imblearn.over_sampling import SMOTE
from pandas_profiling import ProfileReport
from sklearn.cross_decomposition import PLSRegression
from sklearn.dummy import DummyClassifier
from sklearn.model_selection import GridSearchCV, train_test_split

### Global variables

In [2]:
MODELS_PATH = "./models/"

# STEP 1: Load the data

We are using a processed version of the [Telco Customer Churn data from Kaggle](https://www.kaggle.com/blastchar/telco-customer-churn).  

Refer to [0_prepare_data.ipynb](0_prepare_data.ipynb) on the data preprocessing details.

In [3]:
telcom = pd.read_csv(
    "https://data.atoti.io/notebooks/telco-churn/tranformed_customer_churn.csv"
)
# perform data clean up
telcom = data_utils.data_cleanup(telcom)

print("Data size: {}\n".format(telcom.shape))
telcom.head(2)

Data size: (7032, 25)



Unnamed: 0,CustomerID,Gender,SeniorCitizen,Partner,Dependents,Tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,...,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,TenureGroup,ChurnProbability,ChurnPredicted,Subset,Churn
0,6429-SHBCB,Male,No,No,No,19,Yes,Yes,DSL,No,...,Month-to-month,No,Mailed check,69.6,1394.55,Tenure_12-24,0.0,No,Train,No
1,0810-DHDBD,Female,No,No,No,52,Yes,Yes,DSL,Yes,...,One year,No,Credit card (automatic),74.0,3877.65,Tenure_48-60,0.0,No,Train,No


##### First, we set the columns types.

In [4]:
col_types = {
    "str": [
        "CustomerID",
        "Gender",
        "Tenure",
        "InternetService",
        "Contract",
        "PaymentMethod",
        "TenureGroup",
        "Subset",
        "SeniorCitizen",
        "Partner",
        "Dependents",
        "PhoneService",
        "MultipleLines",
        "OnlineSecurity",
        "OnlineBackup",
        "DeviceProtection",
        "TechSupport",
        "StreamingTV",
        "StreamingMovies",
        "PaperlessBilling",
        "ChurnPredicted",
        "Churn",
    ],
    "float": ["MonthlyCharges", "TotalCharges", "ChurnProbability"],
}

reverse_col_types = {el: k for k, v in col_types.items() for el in v}

for c in telcom.columns:
    telcom[c] = telcom[c].astype(reverse_col_types[c])

For this analysis, we have performed a series of transformations on the data:

- Processing of columns: encoding, normalization etc.
- Dimension reduction: Partial Least Squares and component selection

These preparation steps are optional. You can use a different approach that fits the best the models you want to use to make predictions.  
Here, we load the transformed dataset.

In [5]:
telcom_transf = pd.read_csv(
    "https://data.atoti.io/notebooks/telco-churn/Telco-Customer-Churn_transformed.csv"
)

print("Transformed data size: {}\n".format(telcom_transf.shape))
telcom_transf.head(2)

Transformed data size: (7032, 4)



Unnamed: 0,LV1,LV2,LV3,Churn
0,-0.73034,-0.174424,0.396948,0
1,1.221606,0.230012,0.859353,0


# STEP 2: Load the models  

We have trained the following models and [saved them with Pickle approach](https://www.kaggle.com/prmohanty/python-how-to-save-and-load-ml-models) in [1_create_models.ipynb](1_create_models.ipynb):
- DummyClassifier
- Naive Bayes Model
- Logistic Regression Model
- SVM Classifier Linear Model

Refer to [1_create_models.ipynb](1_create_models.ipynb) to understand how we trained the models. We shall load the trained models to proceed with our prediction of the churn.

In [6]:
filename = os.path.join(MODELS_PATH, "dummy_unif_clf.sav")
dummy_unif_clf = pickle.load(open(filename, "rb"))

In [7]:
filename = os.path.join(MODELS_PATH, "gnb_clf.sav")
gnb_clf = pickle.load(open(filename, "rb"))

In [8]:
filename = os.path.join(MODELS_PATH, "lr_clf.sav")
lr_clf = pickle.load(open(filename, "rb"))

In [9]:
filename = os.path.join(MODELS_PATH, "svc_clf.sav")
svc_clf = pickle.load(open(filename, "rb"))

## Process the data

We will be using the selected features to perform our predictions.

In [10]:
X = telcom_transf.copy().drop(["Churn"], axis=1)

# STEP 3: Atoti's magic!
In this part, we leverage the power of Atoti to post-process the results of the models and perform ***scenarios simulations***

## Atoti cube creation  

We have a simple cube that holds a single data table - Customer table.

In [11]:
# a session has to be created for atoti
session = tt.create_session()

In [12]:
customer = session.read_pandas(telcom, keys=["CustomerID"], table_name="customer")
customer.head()

Unnamed: 0_level_0,Gender,SeniorCitizen,Partner,Dependents,Tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,...,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,TotalCharges,TenureGroup,ChurnProbability,ChurnPredicted,Subset,Churn
CustomerID,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
6429-SHBCB,Male,No,No,No,19,Yes,Yes,DSL,No,No,...,Month-to-month,No,Mailed check,69.6,1394.55,Tenure_12-24,0.0,No,Train,No
0810-DHDBD,Female,No,No,No,52,Yes,Yes,DSL,Yes,No,...,One year,No,Credit card (automatic),74.0,3877.65,Tenure_48-60,0.0,No,Train,No
4471-KXAUH,Female,No,Yes,No,42,Yes,Yes,Fiber optic,No,No,...,Month-to-month,Yes,Electronic check,84.3,3588.4,Tenure_24-48,1.0,Yes,Train,Yes
4868-AADLV,Male,Yes,Yes,Yes,66,Yes,Yes,Fiber optic,Yes,Yes,...,One year,Yes,Electronic check,116.25,7862.25,Tenure_gt_60,0.0,No,Train,No
6478-HRRCZ,Male,No,Yes,No,32,Yes,No,DSL,Yes,Yes,...,One year,No,Mailed check,70.5,2201.75,Tenure_24-48,0.0,No,Train,No


In [13]:
cube = session.create_cube(customer, "customer_cube")

h, m, l = cube.hierarchies, cube.measures, cube.levels
cube

We create a simple function that helps change the dimension of our hierarchies as we group them into logical categories.

In [14]:
def set_dim(hier_name, dim_name):
    h[hier_name].dimension = dim_name

## Data classification using dimensions

We re-classify the hierarchies under Customer, Demographic, Account and Services.

In [15]:
customer_hierarchy = ["CustomerID", "Churn", "ChurnPredicted"]
[set_dim(hier_name, "Customer") for hier_name in customer_hierarchy]

demographic_hierarchy = ["Gender", "SeniorCitizen", "Partner", "Dependents"]
[set_dim(hier_name, "Demographic") for hier_name in demographic_hierarchy]

account_hierarchy = [
    "Tenure",
    "Contract",
    "PaperlessBilling",
    "PaymentMethod",
    "TenureGroup",
]
[set_dim(hier_name, "Account") for hier_name in account_hierarchy]

services_hierarchy = [
    "PhoneService",
    "MultipleLines",
    "InternetService",
    "OnlineSecurity",
    "OnlineBackup",
    "DeviceProtection",
    "TechSupport",
    "StreamingTV",
    "StreamingMovies",
]
[set_dim(hier_name, "Services") for hier_name in services_hierarchy]

cube

### Predictions and Scenario creation

Now that we are going to load the full dataset into the trained model to get the corresponding prediction and probability. We run the function `model_scenario` to get the `ChurnPredicted` and `ChurnProbability`.

We assign the prediction from the data models to `ChurnPredicted`. If churn is predicted, we assign its corresponding probability from the data model to `ChurnProbability`. This is because we are only interested in cases where customers are churning.

We convert the binary result of `ChurnPredicted` to 'Yes' and 'No' value such that we can compare it against the actual `Churn`. If the prediction matches the actual churn, we assign the value 1 to its `PredictionAccuracy`. 

Thereafter, we load the resultant pandas dataframe into the `customer` as a [scenario](https://docs.atoti.io/0.4.1/tutorial/01-Basics.html#Source-simulation). 

In [16]:
# we run the same month data against the trained models and see its accuracy against the actual churn
def model_scenario(predictions, probabilities):

    churnProbability = np.amax(probabilities, axis=1)

    churn_forecast = telcom.copy().reset_index(drop=True)
    churn_forecast = churn_forecast.drop(["ChurnPredicted", "ChurnProbability"], axis=1)

    churn_forecast = pd.concat(
        [
            churn_forecast,
            pd.DataFrame(
                {"ChurnPredicted": predictions, "ChurnProbability": churnProbability}
            ),
        ],
        axis=1,
    )

    # we are not interested in the probability if it is predicted that the client will not churn
    churn_forecast["ChurnProbability"] = np.where(
        churn_forecast["ChurnPredicted"] == 1, churn_forecast["ChurnProbability"], 0
    )

    churn_forecast["ChurnPredicted"] = np.where(
        churn_forecast["ChurnPredicted"] == 1, "Yes", "No"
    )

    return churn_forecast

#### Scenario 1 - Using Naive Bayes Classifier

In [17]:
gnb_prediction = gnb_clf.predict(X)
gnb_probability = gnb_clf.predict_proba(X)

gnb_df = model_scenario(gnb_prediction, gnb_probability)
customer.scenarios["Naive Bayes Classifier"].load_pandas(gnb_df)

#### Scenario 2 - Using Logistic Regression Classifier 

In [18]:
lr_prediction = lr_clf.predict(X)
lr_probability = lr_clf.predict_proba(X)

lr_df = model_scenario(lr_prediction, lr_probability)
customer.scenarios["Logistic Regression Classifier"].load_pandas(lr_df)

#### Scenario 3 - Using SVM Classifier

In [19]:
svm_prediction = svc_clf.predict(X)
svm_probability = svc_clf.predict_proba(X)

svm_df = model_scenario(svm_prediction, svm_probability)
customer.scenarios["SVM Classifier"].load_pandas(svm_df)

#### Scenario 4 - Using Dummy Uniform Classifier

In [20]:
dummy_unif_prediction = dummy_unif_clf.predict(X)
dummy_unif_probability = dummy_unif_clf.predict_proba(X)

dummy_unif_df = model_scenario(dummy_unif_prediction, dummy_unif_probability)
customer.scenarios["Dummy Uniform Classifier"].load_pandas(dummy_unif_df)

#### Scenario 5 - Using Dummy Statified Classifier

#### Scenario 6 - Using Dummy Majority Class Classifier

## Telco Churn Data Analysis

Now that we are ready with our analysis.

Let's have a quick overview of each prediction results.

In [21]:
session.visualize("Last month customer attrition by models")

#### Model F1-score

This is just giving us the count for churned and retained customers. A more relevant view to asses our models it to look at F1-score, or Precision and Recall.

In [22]:
m["true positive"] = tt.agg.sum(
    tt.filter(
        tt.where(l["Churn"] == l["ChurnPredicted"], 1, 0), l["ChurnPredicted"] == "Yes"
    ),
    scope=tt.scope.origin(l["CustomerID"]),
)
m["true negative"] = tt.agg.sum(
    tt.filter(
        tt.where(l["Churn"] == l["ChurnPredicted"], 1, 0), l["ChurnPredicted"] == "No"
    ),
    scope=tt.scope.origin(l["CustomerID"]),
)

m["false positive"] = tt.agg.sum(
    tt.filter(
        tt.where(l["Churn"] != l["ChurnPredicted"], 1, 0), l["ChurnPredicted"] == "Yes"
    ),
    scope=tt.scope.origin(l["CustomerID"]),
)
m["false negative"] = tt.agg.sum(
    tt.filter(
        tt.where(l["Churn"] != l["ChurnPredicted"], 1, 0), l["ChurnPredicted"] == "No"
    ),
    scope=tt.scope.origin(l["CustomerID"]),
)

m["precision"] = m["true positive"] / (m["true positive"] + m["false positive"])
m["recall"] = m["true positive"] / (m["true positive"] + m["false negative"])
m["f1 score"] = 2 * (m["recall"] * m["precision"]) / (m["recall"] + m["precision"])

Let's look at the F1-score of the algorithm as we compute the number of correctly predicted churn cases in the pivot table below.  

Looking at just the customers who churned, we see that _SVM Classifier_ has the highest percentage of correct prediction (Recall of 0.78), with 1,451 churns detected out of the 1,869. But, at the same time, SVM Classifier is the one with the most false positive (Precision of 0.53). Which means that 47% of the time it is predicting churn wrongly!

As a consequence, we will focus on the F1-score to compare our classifiers in the following paragraphs, as it takes into account Precision and Recall at the same time.

In [23]:
session.visualize("Precision and Recall comparison across models")

In [24]:
session.visualize("Last month customer churn F1-score by model")

The results here above show that SVM Classifier is the best model w.r.t F1 score.
Let's analyze if it would be the one bringing the highest revenue to the company.

#### Churn and MRR Analysis

In [25]:
session.visualize("Percentage customers churned last month")

We see that more than 25% customers churned last month. The telco would have lose all its customers in the few coming months if this attrition rate keeps up.  
  
The below shows the actual churn vs the false positive predicted by each model. SVM Classifier has the most correct churn prediction but also the most number of false churn prediction.  

In [26]:
session.visualize("Churn prediction comparison")

Let's see the impact on this in the telco's net revenue retention (NRR).

Since we only have one month's data, we have a simple formula for calculating NRR:

# NRR = $\frac{MRR(initial) + Expansion - Churn}{MRR(initial)} $  

We use [`atoti.total`](https://docs.atoti.io/0.4.1/lib/atoti.html?#atoti.total) to get the total _MonthlyCharges_ for _MRR Initial_ across all the customers because we want to ignore all filters for this measure.  
The level `Churn` shows the actual status of customers churning, while the level `ChurnPredicted` reflects the predicted status from the models.  
We get the revenue lost by taking the _MonthlyCharges_ for customers who have churned or predicted to churn.

In [27]:
m["MRR Initial"] = tt.total(m["MonthlyCharges.SUM"], h["CustomerID"])

m["Actual RR Loss"] = tt.total(
    tt.filter(m["MonthlyCharges.SUM"], l["Churn"] == "Yes"), h["CustomerID"]
)

# we use ChurnPredicted here instead of churn because we want to see the difference between the prediction and the actual churn
# we know the exact amount of money (m["MonthlyCharges.SUM"]) lost if the customer churns
m["Predicted RR Loss"] = tt.agg.sum(
    tt.filter(m["MonthlyCharges.SUM"], l["ChurnPredicted"] == "Yes"),
    scope=tt.scope.origin(l["CustomerID"]),
)

m["NRR"] = (m["MRR Initial"] - m["Predicted RR Loss"]) / m["MRR Initial"]

We can see that all the models over-predicted the loss.

In [28]:
session.visualize("NRR - Actual vs Predicted")

From the below chart, we see the importance of choosing an appropriate predictive model w.r.t our business case. We could grossly over-estimate or under-estimate the loss if we are not careful with our projection.  

In [29]:
session.visualize("NRR - Actual vs Predicted")

Is it better to adopt a model that over-estimates the loss? After all, if we manage to retain these customers, the revenue will be retained.
If we adopt a model that under-estimates the loss, we would end up retaining less customers than necessary to maintain our NRR.  

Let's help business decide with a customer retention strategy that depicts the loss and gain by each model.

#### Customer Retention Strategy

We have to reduce the rate of attrition before the telco loses all its customers.  
To do so, we have 2 potential strategies:
1. retain existing customers through better service offers or discounts etc
2. replace the churned customers through marketing effort

According to [Harvard Business Review](https://hbr.org/2014/10/the-value-of-keeping-the-right-customers), it can cost 5 to 25 times more money to replace a customer than retaining one.  
It is not realistic to assume we will be able to retain all customers, so let's make some assumptions:
1. We aim to achieve a target NRR of 90%
2. We compute the number of customers that we need to retain in order to achieve this target NRR
3. For each customer identified, we will set aside a budget of $100 for retention purpose
4. We do not know who has really churned yet

Let's start by creating a measure for our target NRR. This is so that we can change our target later in our simulations.

In [30]:
m["TargetNRR"] = 0.9

To achieve the target NRR, we compute the maximum loss possible.

In [31]:
m["Expected RR Loss"] = m["MRR Initial"] - (m["TargetNRR"] * m["MRR Initial"])

We define _Predicted RR Loss Overflow_ here as the amount of money between what we predicted we will be losing and the maximum loss we can have in order to achieve the target NRR.  
We need this as a target revenue amount that we need to obtain from the customers that are either to be retained or replaced.

In [32]:
m["Predicted RR Loss Overflow"] = tt.total(
    m["Predicted RR Loss"] - m["Expected RR Loss"], h["CustomerID"]
)

Let's take the average _MonthlyCharges_ of those who are predicted to churn as the amount that each retained customer will give.  
Notice we use [`atoti.parent_value`](https://docs.atoti.io/0.4.1/lib/atoti.html?#atoti.parent_value) on the `ChurnMean`, this is because we need this value to be constant across all the customers in order to have a constant Predicted Churn Overflow .

By dividing the _Predicted RR Loss Overflow_ by the average _MonthlyCharges_, we get the target number of customers to retain for each algorithm.

In [33]:
m["Churn MonthlyCharges.MEAN"] = tt.parent_value(
    measure=m["MonthlyCharges.MEAN"],
    degrees={h["CustomerID"]: 1, h["ChurnPredicted"]: 1},
    apply_filters=False,
    total_value=m["MonthlyCharges.MEAN"],
)

In [34]:
m["Predicted Churn Overflow"] = tt.total(
    tt.math.ceil(m["Predicted RR Loss Overflow"] / m["Churn MonthlyCharges.MEAN"]),
    h["ChurnPredicted"],
)

Let's also create a measure for _Predicted Churn Count_ so that we can see how many customers are predicted to churn and how many we intend to retain.

In [35]:
m["Predicted Churn Count"] = tt.agg.sum(
    tt.filter(m["contributors.COUNT"], l["ChurnPredicted"] == "Yes"),
    scope=tt.scope.origin(l["CustomerID"]),
)

From the base scenario, we can see that in order to maintain NRR of 90%, we should retain 1,444 customers based on the actual churn.  
However, we can see that _Naive Bayes Classifier_ has the smallest gap to the base scenario in terms of number of customers to retain.  

In [36]:
session.visualize("Predicted churn vs Predicted Churn Overflow")

In [37]:
session.visualize("Difference in number of customers to retain")

Now that we know the estimated number of customers to retain, how do we identify who to retain?  
During data clean up, we ensured that only predicted churn customers have a value under _ChurnProbability_ and therefore eligible for retention.  
However, we increase this possibility in _Churn Score_ if the _MonthyCharges_ of the customer is higher or equal to the average as the higher their recurring charges, the less customers we need to retain.

We will be ranking the customers who has the highest possibility to churn by using [`atoti.rank`](https://docs.atoti.io/0.4.1/lib/atoti.html?#atoti.rank).  
This allows us to identify the customers to be retained as those whose rank is lesser or equal to the Predicted Churn Overflow .

In [38]:
# we only rank those customers who are churning. We give higher weightage to customer with higher charge so as to minimize the lost
m["Churn Score"] = tt.where(
    (m["MonthlyCharges.MEAN"] >= m["Churn MonthlyCharges.MEAN"])
    & (m["ChurnProbability.MEAN"] > 0),
    m["ChurnProbability.MEAN"] + 1,
    m["ChurnProbability.MEAN"],
)

m["Churn Rank"] = tt.rank(
    m["Churn Score"], h["CustomerID"], ascending=False, apply_filters=True
)

So let's set our _Retention budget_ and _New Customer budget_.

For the customers identified for retention, we create a measure _Retention cost_ where we spend the _Retention budget_ and a measure _New Customer cost_ where we spend the _New Customer budget_.

In [39]:
m["Retention budget"] = 100
m["New Customer budget"] = 500

# we spent $100 on each of the customers identified and managed to retain all of them
m["Retention cost"] = tt.agg.sum(
    tt.where(
        (m["Churn Rank"] <= m["Predicted Churn Overflow"]) & (m["Churn Score"] > 0),
        m["Retention budget"],
        0,
    ),
    scope=tt.scope.origin(l["CustomerID"], l["ChurnPredicted"]),
)

# we retained none of the customers, hence spending $500 to recruit number of new customers equivalent to the Predicted Churn Overflow
m["New Customer cost"] = tt.agg.sum(
    tt.where(
        (m["Churn Rank"] <= m["Predicted Churn Overflow"]) & (m["Churn Score"] > 0),
        m["New Customer budget"],
        0,
    ),
    scope=tt.scope.origin(l["CustomerID"], l["ChurnPredicted"]),
)

In [40]:
session.visualize("Customer retention based on churn ranking for Logistic Regression")

Based on the predicted churn, we see the expenses that we needed to maintain 90% NRR either by retaining or replacing the Predicted Churn Overflow.  
We see the amount of saving we will have if we retained successfully our target customers.  
We also see that _Naive Bayes Classifier_ allows us to maintain the same NRR with the least amount of money.

In [41]:
session.visualize("Cost to maintain 90% NRR")

#### Reality check

Now let's compare our prediction against the actual churn results.

We assume that those who we attempt to retain did not churn, the retention campaign is successful.  
However, there are those who we didn't predict they will churn, and in fact churned.  
We compute the new revenue lost on these assumption and understanding.

In [42]:
# Churned customers that were not targeted by the campaign
# we know the actual amount that was lost through the lost of these customers
m["After Campaign RR Loss"] = tt.agg.sum(
    tt.where(
        m["Retention cost"] == 100,
        0,
        tt.where(l["Churn"] == "Yes", m["MonthlyCharges.SUM"], 0),
    ),
    scope=tt.scope.origin(l["CustomerID"]),
)

m["After Campaign NRR"] = (m["MRR Initial"] - m["After Campaign RR Loss"]) / m[
    "MRR Initial"
]

In the table below, we see that NRR across the different prediction is above 80%, with _SVM Classifier_ retaining the most revenue (ignoring the base scenario).  
However, it also incurrs the most expenses in retention which we saw earlier. This could be explained by the fact that it has a larger Predicted Churn Overflow  of 2,496 compared to the rest.  
This greatly increases the chances of identifying the correct customer who will churn compared to the rest of the prediction models.

In [43]:
session.visualize("NRR after retention effort")

There exists a possibility that a customer has been predicted to churn but in fact, did not. In this case, the retention budget is kind of wasted.  
We can see this in _SVM Classifier_ where it predicted 2731 customers churning but only 1451 (slightly more than half of the predicted) of these customers actually churned.  

Let's see how much of the expense was actually well-spent.

In [44]:
# Churned customer targeted by the campaign
m["Successful Retention Cost"] = tt.agg.sum(
    tt.where(
        (m["Retention cost"] == 100) & (l["Churn"] == "Yes"), m["Retention cost"], 0
    ),
    scope=tt.scope.origin(l["CustomerID"]),
)

We see that the _Naive Bayes Classifier_ and _Logistic Regression Classifier_ are very close in performance, while _SVM Classifier_ doubled their unsuccessul retention costs.

In [45]:
session.visualize("Successful retention with respect to cost")

For those models that didn't meet the 90% target NRR, chances are that it identified customers who didn't actually churn within the retention list.  
Now we need to make up for this gap in the NRR, let's see how many new customers we need to recruit.  

Again, we compute the difference between our _After Campaign RR Loss_ with the _Expected RR Loss_ to see how much revenue we need to replace.

In [46]:
gap_to_target_nrr = m["TargetNRR"] - m["After Campaign NRR"]

m["Gap in revenue loss"] = m["After Campaign RR Loss"] - m["Expected RR Loss"]

m["Clients to replace"] = tt.total(
    tt.where(
        m["Gap in revenue loss"] > 0,
        tt.math.ceil(m["Gap in revenue loss"] / m["Churn MonthlyCharges.MEAN"]),
        0,
    ),
    h["ChurnPredicted"],
)

Money spent on retention is already spent. We have to add on marketing expense for replacing the customers.  
Let's compute the _Actual Expense_.

In [47]:
m["Actual New Customer budget"] = m["Clients to replace"] * m["New Customer budget"]
m["Actual Expense"] = m["Retention cost"] + m["Actual New Customer budget"]

When we look at the NRR which are now above 90%, _Logistic Regression Classifier_ turns out to be the most cost effective model.  

In [48]:
session.visualize("Actual Expense to maintain 90% NRR")

Now if we compare _Logistic Regression Classifier_ against _SVM Classifier_ that has the highest NRR, we can see that we have to spend \\$50k more in order to get the \\$12k additional increase in revenue. 

Here below, we compare SVM Classifier with Logistic Regression Classifier which is the best performing model.

In [49]:
session.visualize("SVM Classifier vs Logistic Regression Classifier")

## What-if we want 95% NRR?

We can easily setup a new parameter simulation that allow us to replace the _TargetNRR_.

Here, we create a new measure called "NRR Simulation parameter" whose default value correspond to prior target NRR (=0.9).

In [50]:
NRR_simulation = cube.create_parameter_simulation(
    "NRR Simulation",
    measures={"NRR Simulation parameter": 0.90},
    base_scenario_name="90% NRR",
)

We create a scenario corresponding to achieving 95% NRR.

In [51]:
NRR_simulation += ("95% NRR", 0.95)

Finally, we need to update all the measures involving the former "TargetNRR" measure in their calculation, so that we can also simulate them using our scenarios.

So, we replace "TargetNRR" by the "NRR Simulation parameter" in their calculation.

In [52]:
gap_to_target_nrr = m["NRR Simulation parameter"] - m["After Campaign NRR"]

To achieve the target NRR, we compute the maximum loss possible.

In [53]:
m["Expected RR Loss"] = m["MRR Initial"] - (
    m["NRR Simulation parameter"] * m["MRR Initial"]
)

We see the expense to achieve 95% NRR is close to 1.5 times the expense to achieve 90% NRR.  
All the predictions have a gap with the targeted NRR, hence requires new recruitment of customers to replace those who have churn.  
This consequently results in higher expenses.

In [54]:
session.visualize()

## What-if New Customer budget is twice the expected?

We can easily create a new parameter simulation that allow us to scale the _New Customer budget_.

Here, we create a simulation using the measure "New Customer budget" and set its default value correspond to the previous "New Customer budget" measure (= 500).

In [55]:
marketing_budget_simulation = cube.create_parameter_simulation(
    "New Customer budget Simulation",
    measures={"New Customer budget": 500},
    base_scenario_name="5 x Retention",
)

We can create a scenario corresponding to a "New Customer Budget" of 1000, or x10 our retention cost.

In [56]:
marketing_budget_simulation += ("10 x Retention", 1000)

We see the Final NRR and Actual Expense being recomputed. This increase in _New Customer budget_ does not impact the _SVM Classifier_ model as its NRR exceeds 90% if the retention rate is 100%.  

In [57]:
session.visualize("Budget adjustment against target NRR simulation")

<div style="text-align: center;" ><a href="https://www.atoti.io/?utm_source=gallery&utm_content=drug-efficacy" target="_blank" rel="noopener noreferrer"><img src="https://data.atoti.io/notebooks/banners/discover-try.png" alt="Try atoti"></a></div>