## Telecom Predicting Customer Churn

In [1]:
import warnings
warnings.simplefilter(action="ignore", category=FutureWarning)

import os
import urllib.request
import zipfile
from tempfile import NamedTemporaryFile as tmpfile
import seaborn as sns
import matplotlib as mpl
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pyrsm as rsm
import statsmodels.formula.api as smf
from sklearn import preprocessing
from statsmodels.genmod.families import Binomial
from statsmodels.genmod.families.links import logit
from sklearn.inspection import permutation_importance
from sklearn.inspection import plot_partial_dependence
from sklearn.neural_network import MLPClassifier

# increase plot resolution
# mpl.rcParams["figure.dpi"] = 150

In [3]:
## load the data - this dataset must NOT be changed
s_mobile = pd.read_pickle("data/s_mobile.pkl")
s_mobile["churn_yes"] = rsm.ifelse(s_mobile["churn"] == "yes", 1, 0)

In [4]:
#s_mobile.head()

If you want access to the full 1M row dataset, use the code below to download and use the data.

The downside to using the dataset with 1M rows is, of course, that estimation time will increase substantially. I do NOT recommend you use this dataset to select your final model or for tuning hyper parameters. You can, however, use this larger dataset to re-estimate your chosen model and generate profit estimates for the representative sample.

In [5]:
# show dataset description
rsm.describe(s_mobile)

## S-mobile

Dataset used to investigate opportunities to decrease customer churn at S-mobile. The sample consists of three parts:

1. A training sample with 27,300 observations and a 50% churn rate ("training == 1")
2. A test sample with 11,700 observations and a 50% churn rate ("training == 0")
3. A representative sample with 30,000 observations and a churn rate of 2%, i.e., the actual monthly churn rate for S-mobile ("is.na(training)" or "representative == 1")

## Variables

* customer: Customer ID
* churn: Did consumer churn in the last 30 days? (yes or no)
* changer: % change in revenue over the most recent 4 month period
* changem: % change in minutes of use over the most recent 4 month period
* revenue: Mean monthly revenue in SGD
* mou: Mean monthly minutes of use
* overage: Mean monthly overage minutes
* roam: Mean number of roaming calls
* conference: Mean number of conference calls
* months: # of months the customer has had service with S-Mobile
* uniqsubs: Number of individuals listed on the customer account
* custcare: Mean number of calls to customer care 
* retcalls: Number of calls by the customer to the retention team
* dropvce: Mean number of dropped voice calls 
* eqpdays: Number of days customer has owned current handset
* refurb: Handset is refurbished (no or yes)
* smartphone: Handset is a smartphone (no or yes)
* creditr: High credit rating as opposed to medium or low (no or yes)
* mcycle: Subscriber owns a motorcycle (no or yes)
* car: Subscriber owns a car (no or yes)
* travel: Subscriber has traveled internationally (no or yes)
* region: Regions delineated by the 5 Community Development Council Districts (e.g., CS is Central Singapore)
* occupation: Categorical variable with 4 occupation levels (professional, student, retired, or other)
* training: 1 for training sample, 0 for test sample, NA for representative sample
* representative: 1 for representative sample, 0 for training and test sample


In [6]:
# select variables to standardize
to_std = s_mobile.loc[:, "changer":"eqpdays"].columns

# scale numeric variables by (x - mean(x)) / sd(x)
s_mobile_std = s_mobile.copy()
s_mobile_std[to_std] = rsm.scale_df(
    s_mobile[to_std], sf=1, train=s_mobile.training == 1
)

In [7]:
s_mobile_std['cweight'] = rsm.ifelse(s_mobile.churn == 'yes', 2, 98)

In [8]:
s_mobile_std.head()

Unnamed: 0,customer,churn,changer,changem,revenue,mou,overage,roam,conference,months,...,highcreditr,mcycle,car,travel,region,occupation,training,representative,churn_yes,cweight
0,U86940794,yes,3.46518,1.267979,0.110704,0.161153,1.078966,-0.165043,-0.24309,-1.123607,...,no,no,no,no,CS,other,0.0,0,1,2
1,U56788559,no,-0.299114,-0.235357,-0.074232,-0.70287,-0.956935,-0.165043,-0.24309,-1.123607,...,yes,no,no,no,SE,other,0.0,0,0,98
2,U47928407,no,-0.299114,-0.466639,-0.420987,1.223101,0.42921,-0.165043,0.679523,-1.019391,...,no,yes,no,yes,NW,professional,,1,0,98
3,U75794640,no,-0.299114,-0.447366,-0.513455,0.085028,-0.956935,-0.165043,-0.24309,1.898662,...,yes,yes,no,no,NW,retired,1.0,0,0,98
4,U41010771,no,-0.368184,-0.447366,0.989151,2.612388,0.660234,1.075619,-0.24309,0.126987,...,no,yes,yes,no,SW,other,,1,0,98


### Splitting the data

In [9]:
# Train Sample
s_mobile_train = s_mobile_std[s_mobile_std['training'] == 1]
s_mobile_train.head()

Unnamed: 0,customer,churn,changer,changem,revenue,mou,overage,roam,conference,months,...,highcreditr,mcycle,car,travel,region,occupation,training,representative,churn_yes,cweight
3,U75794640,no,-0.299114,-0.447366,-0.513455,0.085028,-0.956935,-0.165043,-0.24309,1.898662,...,yes,yes,no,no,NW,retired,1.0,0,0,98
8,U30531925,yes,0.080768,0.419943,0.341875,1.004241,0.891259,-0.165043,-0.24309,0.231203,...,no,no,no,no,NW,other,1.0,0,1,2
9,U34592616,yes,-0.299114,-0.505186,-1.091381,-0.976921,-0.956935,-0.165043,-0.24309,3.253473,...,no,no,no,no,CS,other,1.0,0,1,2
12,U88670364,yes,-0.26458,-0.389545,-0.074232,-0.160477,0.111552,-0.165043,-0.24309,-0.810958,...,no,no,no,no,CS,other,1.0,0,1,2
15,U51351177,yes,-0.471788,-0.543733,-0.975796,-0.778995,-0.956935,0.248511,-0.24309,-0.394094,...,no,no,no,no,SW,other,1.0,0,1,2


In [1]:
rsm.distr_plot(s_mobile_train.loc[:, "churn":])

NameError: name 'rsm' is not defined

In [11]:
# Test Sample 
s_mobile_test = s_mobile_std[s_mobile_std['training'] == 0]
s_mobile_test.head()

Unnamed: 0,customer,churn,changer,changem,revenue,mou,overage,roam,conference,months,...,highcreditr,mcycle,car,travel,region,occupation,training,representative,churn_yes,cweight
0,U86940794,yes,3.46518,1.267979,0.110704,0.161153,1.078966,-0.165043,-0.24309,-1.123607,...,no,no,no,no,CS,other,0.0,0,1,2
1,U56788559,no,-0.299114,-0.235357,-0.074232,-0.70287,-0.956935,-0.165043,-0.24309,-1.123607,...,yes,no,no,no,SE,other,0.0,0,0,98
6,U83335656,yes,0.011699,0.285029,-0.074232,2.174668,0.299259,-0.165043,-0.24309,-0.289878,...,no,no,no,no,CS,professional,0.0,0,1,2
13,U40356414,yes,-0.368184,-0.042621,-0.259168,-0.135736,0.71799,-0.165043,0.679523,-0.602526,...,no,no,no,no,CS,other,0.0,0,1,2
21,U19986413,no,-0.299114,-0.293177,-0.351636,-0.107189,-0.956935,0.524214,-0.24309,1.377581,...,no,no,no,no,NE,other,0.0,0,0,98


In [12]:
# Representative Sample
s_mobile_representative = s_mobile_std[s_mobile_std['representative'] == 1]
s_mobile_representative.head()

Unnamed: 0,customer,churn,changer,changem,revenue,mou,overage,roam,conference,months,...,highcreditr,mcycle,car,travel,region,occupation,training,representative,churn_yes,cweight
2,U47928407,no,-0.299114,-0.466639,-0.420987,1.223101,0.42921,-0.165043,0.679523,-1.019391,...,no,yes,no,yes,NW,professional,,1,0,98
4,U41010771,no,-0.368184,-0.447366,0.989151,2.612388,0.660234,1.075619,-0.24309,0.126987,...,no,yes,yes,no,SW,other,,1,0,98
5,U18263157,no,-0.368184,-0.428092,-0.282285,-0.021548,0.097113,0.11066,-0.24309,-0.810958,...,no,no,no,no,SW,other,,1,0,98
7,U18798421,no,3.016227,0.439217,0.06447,0.498007,0.963454,-0.165043,0.679523,-1.227823,...,yes,yes,yes,no,NW,other,,1,0,98
10,U30117312,no,-0.26458,2.135288,-0.629041,-0.944567,-0.956935,-0.165043,-0.24309,-0.602526,...,yes,no,no,no,NW,other,,1,0,98


In [13]:
s_mobile_std.columns

Index(['customer', 'churn', 'changer', 'changem', 'revenue', 'mou', 'overage',
       'roam', 'conference', 'months', 'uniqsubs', 'custcare', 'retcalls',
       'dropvce', 'eqpdays', 'refurb', 'smartphone', 'highcreditr', 'mcycle',
       'car', 'travel', 'region', 'occupation', 'training', 'representative',
       'churn_yes', 'cweight'],
      dtype='object')

### Logistic Regression With Weights

In [14]:
lr2 = smf.glm(
    formula="churn_yes ~ changer + changem + mou + \
    overage + months + uniqsubs +  \
    retcalls + dropvce + eqpdays + refurb + \
    highcreditr + mcycle + \
    travel + region + occupation + \
    churn:changer + churn:changem  + occupation:mou + \
    occupation:months + months:retcalls + \
    retcalls:churn + months:churn + churn:months + \
    churn:overage + overage:region",    family=Binomial(link=logit()),
    freq_weights=s_mobile_train.loc[s_mobile_train.training == 1, "cweight"],
    data= s_mobile_train.query("training == 1")
).fit(cov_type="HC1")
lr2.summary()
rsm.model_fit(lr2)
#rsm.or_ci(lr2)


Pseudo R-squared (McFadden): 0.062
Pseudo R-squared (McFadden adjusted): 0.061
Area under the RO Curve (AUC): 0.632
Log-likelihood: -125589.584, AIC: 251253.167, BIC: 251701.854
Chi-squared: 254747351.66 df(36), p.value < 0.001 
Nr obs: 27,300



In [15]:
(
    rsm.or_ci(lr2, importance=True, data=s_mobile_std[s_mobile_std.training==1])
    .sort_values("importance", ascending=False)
    .reset_index(drop=True)
)

Unnamed: 0,index,OR,OR%,2.5%,97.5%,p.values,Unnamed: 7,dummy,importance,wmean,wstd,min,max
0,occupation[T.retired],0.187,-81.3%,0.173,0.202,< .001,***,True,5.342,0.137,0.344,0.0,1.0
1,overage,3.607,260.7%,3.107,4.187,< .001,***,False,3.607,-0.067,0.981,-0.957,3.129
2,changem,0.313,-68.7%,0.174,0.562,< .001,***,False,3.196,0.022,0.961,-1.797,9.748
3,churn[T.no]:changem,3.052,205.2%,1.698,5.485,< .001,***,True,3.052,0.023,0.95,-1.797,9.748
4,churn[T.no]:overage,0.351,-64.9%,0.302,0.408,< .001,***,True,2.85,-0.069,0.971,-0.957,3.129
5,highcreditr[T.yes],0.499,-50.1%,0.479,0.52,< .001,***,True,2.004,0.18,0.384,0.0,1.0
6,occupation[T.student],1.845,84.5%,1.769,1.925,< .001,***,True,1.845,0.055,0.227,0.0,1.0
7,region[T.SW],0.639,-36.1%,0.616,0.662,< .001,***,True,1.565,0.202,0.401,0.0,1.0
8,changer,1.556,55.6%,1.291,1.875,< .001,***,False,1.556,-0.016,0.969,-2.44,10.234
9,retcalls,1.549,54.9%,1.457,1.647,< .001,***,False,1.549,-0.06,0.831,-0.191,17.812


In [16]:
s_mobile_std["pred_logit"] = rsm.predict_ci(fitted=lr2,df=s_mobile_std)['prediction']

In [17]:
s_mobile_std.head()

Unnamed: 0,customer,churn,changer,changem,revenue,mou,overage,roam,conference,months,...,mcycle,car,travel,region,occupation,training,representative,churn_yes,cweight,pred_logit
0,U86940794,yes,3.46518,1.267979,0.110704,0.161153,1.078966,-0.165043,-0.24309,-1.123607,...,no,no,no,CS,other,0.0,0,1,2,0.063865
1,U56788559,no,-0.299114,-0.235357,-0.074232,-0.70287,-0.956935,-0.165043,-0.24309,-1.123607,...,no,no,no,SE,other,0.0,0,0,98,0.008829
2,U47928407,no,-0.299114,-0.466639,-0.420987,1.223101,0.42921,-0.165043,0.679523,-1.019391,...,yes,no,yes,NW,professional,,1,0,98,0.021248
3,U75794640,no,-0.299114,-0.447366,-0.513455,0.085028,-0.956935,-0.165043,-0.24309,1.898662,...,yes,no,no,NW,retired,1.0,0,0,98,0.000795
4,U41010771,no,-0.368184,-0.447366,0.989151,2.612388,0.660234,1.075619,-0.24309,0.126987,...,yes,yes,no,SW,other,,1,0,98,0.008714


In [18]:
s_mobile_std[s_mobile_std.representative == 1]['pred_logit'].describe()

count    30000.000000
mean         0.019792
std          0.022403
min          0.000006
25%          0.009868
50%          0.017279
75%          0.025313
max          0.953282
Name: pred_logit, dtype: float64

In [19]:
s_mobile_std[s_mobile_std.representative == 1]['occupation'].value_counts()

other           18923
professional     5208
retired          4215
student          1654
Name: occupation, dtype: int64

In [20]:
odds_feature = rsm.or_ci(lr2)
odds_feature = odds_feature.sort_values(by = 'OR%' , ascending=False)
odds_feature.head(10)

Unnamed: 0,index,OR,OR%,2.5%,97.5%,p.values,Unnamed: 7
10,occupation[T.student],1.845,84.5%,1.769,1.925,< .001,***
34,dropvce,1.064,6.4%,1.049,1.079,< .001,***
12,changer,1.556,55.6%,1.291,1.875,< .001,***
32,retcalls,1.549,54.9%,1.457,1.647,< .001,***
9,occupation[T.professional],1.451,45.1%,1.408,1.495,< .001,***
35,eqpdays,1.388,38.8%,1.368,1.409,< .001,***
1,refurb[T.yes],1.325,32.5%,1.281,1.37,< .001,***
28,occupation[T.student]:months,1.038,3.8%,1.0,1.078,0.047,*
31,uniqsubs,1.033,3.3%,1.021,1.045,< .001,***
20,overage,3.607,260.7%,3.107,4.187,< .001,***


<div style="text-align: justify">Using our logit model, we were able to predict the likelihood customer churn risk. The odd_ratio function was used to understand the relative imprtance of the features which we have used in the model.</div>

<div style="text-align: justify">As we can observe in the above table, the features with highest relative importance are:</div>


1) occupation[T.student]

2) occupation[T.professional]

3) eqpdays

4) overage

5) refurb[T.yes]

6)  retcals

**Occupation**

**1) Student**
<div style="text-align: justify"> Having the odds_ratio of 1.845 means that the possibility of a student to churn is 1.845 times more likely to happen than those who are not student.</div>


**2) Professional**
<div style="text-align: justify">Having the odds_ratio of 1.451 means that the possibility of a professional to churn is 1.451 times more likely to happen than those who are not professionals.</div>


**eqpdays**
<div style="text-align: justify">The feature 'eqpdays' represent the number of days customer has owned current handset. **Assumption:** Higher eqpdays, lower chance of churn. The odds ratio of 1.370 means that the customer is 1.388 times more likely to churn if the customer has been using the current handset for less number of days than those who have been using it for longer duration.</div>


**overage**
<div style="text-align: justify">'overage' represents the mean monthly minutes that a customer uses over the alloted minutes by the carrier services. A customer is more likely to churn if they have higher overage as thay are geeting chargged separately for the extra minutes. In our case, those customers are 3.607 time more likely to churn than those who have lower overage or no overage.</div>

**refurb[T.yes]**
<div style="text-align: justify">The customers with a refurbished smartphone are 1.325 times more likely to churn that those who doesn't have a refurbished smartphone.</div>

**retcalls**
<div style="text-align: justify">The customer with higher number of calls to the retention team have 1.549 times more likelyhood of churn that those with lower number of calls to the retention team.</div>

<div style="text-align: justify">1. Student discounts: Our first incentive to reduce the customer churn rate of S-mobile is to offer discount prices to students. We plan to target the Customer segment variable type of “occupation”. The occupation of a customer can vary the amount of money he is willing to pay for a monthly carrier plan. Students fall in the low income category group and are hence more likely to churn. We plan to provide them monthly discount of 15% to ease their payment capability. We expect to see lower number of calls made by students to the retention team and increased months of service along with high revenue generation.</div> 
<br>

<div style="text-align: justify">2. Our team noticed that “eqpdays” is an important indicator of whether a current customer is likely to churn or not. Usually mobile service subscribers are required to enroll in a 2 year contract with the conditional benefit of a free handset along with the plan. Most subscribers stick to the provider for the two year period and are likely to churn as soon as the period gets over. The benefits of switching to another provider, offering a new handset with better interfaces and services outweigh the switching cost. The second most important variable that affects the churn rate is “refurb” - Handset-refurbished or new, implies nearly the same insight.  A subscriber is most likely to purchase a new handset as a promotion from the current provider and stay with the current provider until end of a contract period. However, if a used or refurbished handset was purchased without any obligation, the customer is more likely to switch to a new service provider. To tackle this problem we plan to suggest a micro-marketing strategy to retain the customer  who are likely to leave at the end of their contract period. We plan to offer them a new handset for another year or two. We assume that they shall remain with the company after the second handset is provided. This will refrain customers from switching to a new carrier for the next contract cycle an hopefully beyond.</div>

<br>

<div style="text-align: justify">3. Our third plan of action is to target seniors or the retired occupational people. We observed an extremely low churn rate among retired professionals. A possible reason for this might be the fear of switching to a new technology or brand loyalty. These customers are extremely useful for the company as they generate consistent profit.  It is hence important to retain these customers and acquire more such customers. In order to do so, our team suggests family packs to the elderly retired people for a discount of 20%. The number of people allowed in the plan can anywhere be 2-6. This will make sure that the entire family opts for the same mobile carrier. The customer usage factor of “uniqsubs” (umber of individuals of an account) will increase the promising nature of more subscribers and will hence generate more revenue. Since the elderly person in a household is less ikely to churn, he is likely to make sure that the rest of the family also sticks to the carrier.</div> 

<br>

<div style="text-align: justify">4. Another tactic to reduce the churn rate would be to manage the customer user variable “overage”. The purpose of overage is to provide customers a minimum pay allowance option for overuse of their monthly plan. Users might benefit from the plan by upgrading their services at a marginal cost increase. The telecom service providers can benefit by converting this uncertain cash flow monthly overage charge amount to predictable cash flow.  We plan to offer an incentive program to overage customers by upgrading them to a premium plan and providing 2 months of free subscription as gratitude. This shall benefit both the customer and supplier and in turn reduce the churn rate.</div>


#### Incentive strategy 1

In [21]:
#industry standard
breakeven = 0.25

In [22]:
s_mobile_std["mailto_logit"] = rsm.ifelse((s_mobile_std["pred_logit"] > breakeven)  & (s_mobile['occupation']== 'student') , 1, 0)

In [23]:
s_mobile_std[s_mobile_std.representative==1]["pred_logit"].mean() #1.9 % of student

0.019791736575096943

In [24]:
s_mobile_std[s_mobile_std.representative==1]["mailto_logit"].mean() #1.9 % of student

0.0003333333333333333

In [25]:
s_mobile_std[s_mobile_std.representative == 1]['pred_logit'].describe()

count    30000.000000
mean         0.019792
std          0.022403
min          0.000006
25%          0.009868
50%          0.017279
75%          0.025313
max          0.953282
Name: pred_logit, dtype: float64

#### running model again

In [26]:
s_mobile_std['months_stu'] = s_mobile_std['months'] * 1.20
s_mobile_std['retcalls_stu'] = s_mobile_std['retcalls'] * 0.70

In [27]:
lr2 = smf.glm(
    formula="churn_yes ~ changer + changem + mou + \
    overage + months_stu + uniqsubs +  \
    retcalls_stu + dropvce + eqpdays + refurb + \
    highcreditr + mcycle + \
    travel + region + occupation + \
    churn:changer + churn:changem  + occupation:mou + \
    occupation:months_stu + months_stu:retcalls_stu + \
    retcalls_stu:churn + months_stu:churn + churn:months + \
    churn:overage + overage:region",    family=Binomial(link=logit()),
    freq_weights=s_mobile_std.loc[s_mobile_std.training == 1, "cweight"],
    data= s_mobile_std.query("training == 1")
).fit(cov_type="HC1")
lr2.summary()
rsm.model_fit(lr2)




Pseudo R-squared (McFadden): 0.062
Pseudo R-squared (McFadden adjusted): 0.061
Area under the RO Curve (AUC): 0.632
Log-likelihood: -125589.584, AIC: 251253.167, BIC: 251701.854
Chi-squared: 254744552.541 df(36), p.value < 0.001 
Nr obs: 27,300



In [28]:
s_mobile_std["pred_logit_stu"] = rsm.predict_ci(fitted=lr2,df=s_mobile_std)['prediction']

  se = np.sqrt((exog.dot(fitted.cov_params()) * exog).sum(-1))


In [29]:
sum(s_mobile_std["pred_logit_stu"])==sum(s_mobile_std["pred_logit"])

False

In [30]:
s_mobile_std[s_mobile_std.representative == 1]['pred_logit_stu'].describe()

count    30000.000000
mean         0.019792
std          0.022403
min          0.000006
25%          0.009868
50%          0.017279
75%          0.025313
max          0.953282
Name: pred_logit_stu, dtype: float64

In [31]:
s_mobile_std["mailto_logit_stu"] = rsm.ifelse((s_mobile_std["pred_logit_stu"] > breakeven)  & (s_mobile['occupation']== 'student') , 1, 0)

In [32]:
s_mobile_std[s_mobile_std.representative==1]["pred_logit_stu"].mean() #1.9 % of student

0.019791818030628912

In [33]:
s_mobile_std[s_mobile_std.representative==1]["mailto_logit_stu"].mean() #1.9 % of student

0.0003333333333333333

#### Incentive Strategy 2

In [34]:
s_mobile_std['eqpdays_phn'] = s_mobile_std['eqpdays'] * 20
s_mobile_std['months_phn'] = s_mobile_std['months'] + 2
s_mobile_std['retcalls_phn'] = s_mobile_std['retcalls'] * 0.70

In [35]:
lr2 = smf.glm(
    formula="churn_yes ~ changer + changem + mou + \
    overage + months_stu + uniqsubs +  \
    retcalls_stu + dropvce + eqpdays + refurb + \
    highcreditr + mcycle + \
    travel + region + occupation + \
    churn:changer + churn:changem  + occupation:mou + \
    occupation:months_stu + months_stu:retcalls_stu + \
    retcalls_stu:churn + months_stu:churn + churn:months + \
    churn:overage + overage:region",    family=Binomial(link=logit()),
    freq_weights=s_mobile_std.loc[s_mobile_std.training == 1, "cweight"],
    data= s_mobile_std.query("training == 1")
).fit(cov_type="HC1")
lr2.summary()
rsm.model_fit(lr2)



Pseudo R-squared (McFadden): 0.062
Pseudo R-squared (McFadden adjusted): 0.061
Area under the RO Curve (AUC): 0.632
Log-likelihood: -125589.584, AIC: 251253.167, BIC: 251701.854
Chi-squared: 254744552.541 df(36), p.value < 0.001 
Nr obs: 27,300



In [36]:
sum(s_mobile_std['churn_yes'])

20100

In [37]:
s_mobile_std["pred_logit_phn"] = rsm.predict_ci(fitted=lr2,df=s_mobile_std)['prediction']

  se = np.sqrt((exog.dot(fitted.cov_params()) * exog).sum(-1))


In [38]:
sum(s_mobile_std["pred_logit_phn"])==sum(s_mobile_std["pred_logit"])

False

In [39]:
s_mobile_std["mailto_logit_phn"] = rsm.ifelse((s_mobile_std["pred_logit_phn"] > breakeven), 1, 0)

In [40]:
sum(s_mobile_std[s_mobile_std.representative==1]["pred_logit_phn"]) #1.9 % of student

593.7545409188674

In [41]:
s_mobile_std["mailto_logit2"] = rsm.ifelse((s_mobile_std["pred_logit"] > breakeven), 1, 0)

In [42]:
sum(s_mobile_std[s_mobile_std.representative==1]["pred_logit"]) #1.9 % of student

593.7520972529082

In [43]:
s_mobile_std[s_mobile_std.representative == 1]['pred_logit_phn'].describe()

count    30000.000000
mean         0.019792
std          0.022403
min          0.000006
25%          0.009868
50%          0.017279
75%          0.025313
max          0.953282
Name: pred_logit_phn, dtype: float64

<div style="text-align: justify">1) Student discount: Under the representative sample, we are targeting customer occupation of students. The criteria which we used to target students were essentially based on the feature importance which we laid out on our logit model. Customer characteristics like ‘retcalls’ and ‘months’ were modified based on the impact of monthly customer discount being implemented for the student customer segment.</div> 

<br>

<div style="text-align: justify">2)	Promotional Smartphone: Under the representative sample, we are targeting everyone specifically focusing on the number of days the customer has owned the current handset (eqpdays). The most important feature under our model which directly impacts this incentive is the feature ‘refurb’ which basically tells us whether the current handset is refurbished or not.</div>

### Incentive 1 Economics

In [54]:
churn = pd.DataFrame(
    {
        "student": [0.019791,0.0099, 0.012, 0.013, 0.015],
        "student_incentive": [0.0045,0.00495, 0.005445, 0.0059895, 0.00658845],

    }
)

empty_array = np.zeros(shape = (60,2))
churn8 = pd.DataFrame(empty_array, columns = ["student", "student_incentive"])

a_list = list(range(len(churn)))
multiplied_list = [element * 12 for element in a_list]
multiplied_list

churn8.loc[multiplied_list] = churn.values

# list your assumptions here
monthly_revenue = 30
annual_growth = 0.03
annual_discount_rate = 0.1
monthly_discount_rate = (1+annual_discount_rate)**(1/12)-1
cost_service = 0.15*monthly_revenue
marketing_cost = 0.05*monthly_revenue
nr_years = 5

retention_s = np.array(1-churn8.iloc[:, 0])
retention_s = retention_s.cumprod()

array0 = np.ones(60,)
for i in range(len(array0)):
    if i in range(0,12):
        array0[i] = 30
        continue
    if i%12 == 0: 
        array0[i] = array0[i-1] * (1 + annual_growth)
    else:
        array0[i] = array0[i-1]

revenues_s = array0
service_s = 0.15*revenues_s
marketing_s = 0.05*revenues_s
profit_s = revenues_s - service_s - marketing_s
expected_profit_s = retention_s * profit_s

pv_expected_profit_s = np.ones(60,)
for i in range(len(expected_profit_s)):
    pv_expected_profit_s[i] = expected_profit_s[i]/(1+monthly_discount_rate)**(i+1)


pv_expected_profit_s = pv_expected_profit_s # present value of expected profits
clv_s = np.cumsum(pv_expected_profit_s) 
clv_s

array([  23.33890827,   46.49318111,   69.4642792 ,   92.25365162,
        114.86273604,  137.2929587 ,  159.54573461,  181.62246755,
        203.5245502 ,  225.25336425,  246.81028041,  268.19665859,
        289.83401193,  311.30019094,  332.59654979,  353.72443194,
        374.68517023,  395.48008692,  416.11049386,  436.57769248,
        456.88297393,  477.02761916,  497.01289895,  516.84007407,
        536.85737975,  556.71632739,  576.41816978,  595.96414977,
        615.35550041,  634.59344498,  653.67919709,  672.61396073,
        691.39893038,  710.03529107,  728.52421847,  746.86687891,
        765.36669084,  783.72014972,  801.92841334,  819.99263036,
        837.91394034,  855.69347382,  873.33235241,  890.83168883,
        908.19258701,  925.41614215,  942.50344076,  959.45556079,
        976.5182737 ,  993.44600252, 1010.23981512, 1026.90077091,
       1043.42992095, 1059.82830794, 1076.09696637, 1092.23692252,
       1108.24919457, 1124.13479262, 1139.89471881, 1155.52996

In [55]:
retention_si = np.array(1-churn8.iloc[:, 1])
retention_si= retention_si.cumprod()

array1 = np.ones(60,)


for i in range(len(array1)):
    if i in range(0,12):
        array1[i] = 25.5
        continue
    if i%12 == 0: 
        array1[i] = array1[i-1] * (1 + annual_growth)
    else:
        array1[i] = array1[i-1]
revenues_si = array1
service_si= 0.15*revenues_si
marketing_si = 0.05*revenues_si
profit_si = revenues_si - service_si - marketing_si
expected_profit_si = retention_si * profit_si

pv_expected_profit_si = np.ones(60,)
for i in range(len(expected_profit_si)):
    pv_expected_profit_si[i] = expected_profit_si[i]/(1+monthly_discount_rate)**(i+1)


pv_expected_profit_si = pv_expected_profit_si # present value of expected profits
clv_si = np.cumsum(pv_expected_profit_si)
clv_si

array([  20.14754068,   40.13569303,   59.96571797,   79.63886647,
         99.15637957,  118.51948853,  137.72941483,  156.78737032,
        175.69455725,  194.45216836,  213.06138694,  231.52338694,
        250.29542754,  268.91896147,  287.39516357,  305.72519939,
        323.91022526,  341.95138836,  359.8498268 ,  377.60666969,
        395.22303718,  412.70004059,  430.03878245,  447.24035653,
        464.72210349,  482.06555135,  499.27179422,  516.34191753,
        533.27699812,  550.07810434,  566.74629606,  583.28262476,
        599.68813364,  615.96385761,  632.1108234 ,  648.13004964,
        664.40127803,  680.54378381,  696.55858532,  712.44669283,
        728.20910862,  743.84682705,  759.3608346 ,  774.75210996,
        790.02162408,  805.1703402 ,  820.19921397,  835.10919347,
        850.24459984,  865.26026915,  880.15714865,  894.9361781 ,
        909.59828981,  924.14440872,  938.57545247,  952.89233141,
        967.09594871,  981.18720039,  995.16697539, 1009.03615

The CLV for implementing the free handset incentive is approximately 146 USD higher than not implementing any strategy at all.

### Incentive 2 Economics

In [48]:
churn = pd.DataFrame(
    {
        "no_handset": [0.019791,0.0099, 0.012, 0.013, 0.015],
        "handset": [0,0, 0.019791, 0.0099, 0.012],

    }
)

empty_array = np.zeros(shape = (60,2))
churn8 = pd.DataFrame(empty_array, columns = ["student", "student_incentive"])

a_list = list(range(len(churn)))
multiplied_list = [element * 12 for element in a_list]
multiplied_list

churn8.loc[multiplied_list] = churn.values

# list your assumptions here
monthly_revenue = 30
annual_growth = 0.03
annual_discount_rate = 0.1
monthly_discount_rate = (1+annual_discount_rate)**(1/12)-1
cost_service = 0.15*monthly_revenue
marketing_cost = 0.05*monthly_revenue
cost_of_phone = 500
nr_years = 5

retention_nh = np.array(1-churn8.iloc[:, 0])
retention_nh = retention_nh.cumprod()

array0 = np.ones(60,)
for i in range(len(array0)):
    if i in range(0,12):
        array0[i] = 30
        continue
    if i%12 == 0: 
        array0[i] = array0[i-1] * (1 + annual_growth)
    else:
        array0[i] = array0[i-1]

revenues_nh = array0
service_nh = 0.15*revenues_nh
marketing_nh = 0.05*revenues_nh
profit_nh = revenues_nh - service_nh - marketing_nh 
expected_profit_nh = retention_nh * profit_nh

pv_expected_profit_nh = np.ones(60,)
for i in range(len(expected_profit_nh)):
    pv_expected_profit_nh[i] = expected_profit_nh[i]/(1+monthly_discount_rate)**(i+1)


pv_expected_profit_nh = pv_expected_profit_nh# present value of expected profits
clv_nh = np.cumsum(pv_expected_profit_nh) 
clv_nh

array([  23.33890827,   46.49318111,   69.4642792 ,   92.25365162,
        114.86273604,  137.2929587 ,  159.54573461,  181.62246755,
        203.5245502 ,  225.25336425,  246.81028041,  268.19665859,
        289.83401193,  311.30019094,  332.59654979,  353.72443194,
        374.68517023,  395.48008692,  416.11049386,  436.57769248,
        456.88297393,  477.02761916,  497.01289895,  516.84007407,
        536.85737975,  556.71632739,  576.41816978,  595.96414977,
        615.35550041,  634.59344498,  653.67919709,  672.61396073,
        691.39893038,  710.03529107,  728.52421847,  746.86687891,
        765.36669084,  783.72014972,  801.92841334,  819.99263036,
        837.91394034,  855.69347382,  873.33235241,  890.83168883,
        908.19258701,  925.41614215,  942.50344076,  959.45556079,
        976.5182737 ,  993.44600252, 1010.23981512, 1026.90077091,
       1043.42992095, 1059.82830794, 1076.09696637, 1092.23692252,
       1108.24919457, 1124.13479262, 1139.89471881, 1155.52996

In [56]:
retention_h = np.array(1-churn8.iloc[:, 1])
retention_h= retention_h.cumprod()

array1 = np.ones(60,)

for i in range(len(array1)):
    if i in range(0,12):
        array1[i] = 30
        continue
    if i%12 == 0: 
        array1[i] = array1[i-1] * (1 + annual_growth)
    else:
        array1[i] = array1[i-1]

revenues_h = array1
service_h = 0.15*revenues_h
marketing_h = 0.05*revenues_h
profit_h = revenues_h - service_h - marketing_h 
profit_h[0] = profit_h[0]-500

expected_profit_h = retention_h * profit_h

pv_expected_profit_h = np.ones(60,)
for i in range(len(expected_profit_h)):
    pv_expected_profit_h[i] = expected_profit_h[i]/(1+monthly_discount_rate)**(i+1)


pv_expected_profit_h = pv_expected_profit_h# present value of expected profits
clv_h = np.cumsum(pv_expected_profit_h)
clv_h

array([-470.10928256, -446.59380921, -423.2643681 , -400.11948752,
       -377.1577074 , -354.37757922, -331.77766592, -309.35654181,
       -287.11279248, -265.04501471, -243.15181638, -221.43181638,
       -199.34706273, -177.43702281, -155.70031446, -134.13556643,
       -112.74141835,  -91.51652058,  -70.45953418,  -49.56913079,
        -28.84399256,   -8.28281208,   12.11570775,   32.35285373,
         52.91961485,   73.32367116,   93.56630983,  113.64880784,
        133.57243207,  153.33843939,  172.9480767 ,  192.40258106,
        211.70317974,  230.85109029,  249.84752064,  268.69366915,
        287.83629079,  306.82747406,  325.66841701,  344.3603082 ,
        362.90432678,  381.30164257,  399.55341617,  417.66079894,
        435.62493319,  453.44695216,  471.12798013,  488.66913249,
        506.47549292,  524.14098622,  541.66672682,  559.05382028,
        576.30336347,  593.41644454,  610.39414307,  627.23753006,
        643.94766806,  660.52561121,  676.97240532,  693.28908

The CLV for implementing the free handset incentive is approximately 460 USD higher than not implementing any strategy at all.