# Telco Case Study

Let's start by loading the data and looking at the first few rows.

In [1]:
# This loads 'pandas' a library that can help you do all sorts of data manipulation. 
import pandas as pd

file_name = 'churn.csv'
df1 = pd.read_csv(file_name)
df1.head()

Unnamed: 0,gender,SeniorCitizen,Partner,Dependents,tenure,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,MonthlyCharges,Churn
0,Female,0,Yes,No,1,No,No phone service,DSL,No,Yes,No,No,No,No,Month-to-month,Yes,Electronic check,29.85,No
1,Male,0,No,No,34,Yes,No,DSL,Yes,No,Yes,No,No,No,One year,No,Mailed check,56.95,No
2,Male,0,No,No,2,Yes,No,DSL,Yes,Yes,No,No,No,No,Month-to-month,Yes,Mailed check,53.85,Yes
3,Male,0,No,No,45,No,No phone service,DSL,Yes,No,Yes,Yes,No,No,One year,No,Bank transfer (automatic),42.3,No
4,Female,0,No,No,2,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,70.7,Yes


Here's a brief summary of the data.

In [2]:
df1.describe()

Unnamed: 0,SeniorCitizen,tenure,MonthlyCharges
count,7032.0,7032.0,7032.0
mean,0.1624,32.421786,64.798208
std,0.368844,24.54526,30.085974
min,0.0,1.0,18.25
25%,0.0,9.0,35.5875
50%,0.0,29.0,70.35
75%,0.0,55.0,89.8625
max,1.0,72.0,118.75


In [3]:
df1.select_dtypes(include=['object']).describe()

Unnamed: 0,gender,Partner,Dependents,PhoneService,MultipleLines,InternetService,OnlineSecurity,OnlineBackup,DeviceProtection,TechSupport,StreamingTV,StreamingMovies,Contract,PaperlessBilling,PaymentMethod,Churn
count,7032,7032,7032,7032,7032,7032,7032,7032,7032,7032,7032,7032,7032,7032,7032,7032
unique,2,2,2,2,3,3,3,3,3,3,3,3,3,2,4,2
top,Male,No,No,Yes,No,Fiber optic,No,No,No,No,No,No,Month-to-month,Yes,Electronic check,No
freq,3549,3639,4933,6352,3385,3096,3497,3087,3094,3472,2809,2781,3875,4168,2365,5163


And here's a quick example of how to build a logistic regression to predict whether someone will churn.

In [4]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics

# Set up the data and split into training and test
target_col = "Churn"
all_columns = df1.columns.values
feature_names = all_columns[all_columns != target_col]
X = df1[feature_names]
X = pd.get_dummies(X, drop_first=False)
y = df1[target_col] == "Yes"
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42)

# Build predictive model
model = LogisticRegression(solver='liblinear')
model = model.fit(X_train, y_train)

# Get AUC
y_prob = model.predict_proba(X_test)[:, 1]
fpr, tpr, thresholds = metrics.roc_curve(y_test, y_prob)
print("AUC:", metrics.auc(fpr, tpr))

AUC: 0.8372489175952872


The model is doing at least better than random at ranking individuals according to their churn. Let's now try to evaluate the potential economic impact it could have for TelCo. Please fill the following benefit-cost matrix with the values TelCo would obtain in each of these scenarios.

| Decision | Customer stays | Customer leaves |
| --- | --- | --- |
| Offer retention incentive | ??? | ??? |
| Do not offer incentive | ??? | ??? |

Do you have data/information to estimate the expected value for each of these decisions? If yes, list the data/information sources that you would use to do so. If not, list the data/information sources that you are missing. 


Now, use your answer above to make targeting decisions and estimate the potential economic impact (expected value) of those decisions. If you answered that you cannot compute the expected value becaue some data/information is missing, then use your best judgement to come up with assumptions that would enable you to approximate the expected value of your targeting decisions.  

In [5]:
import numpy as np

# Decisions: This targets all people with a probability to churn above 50%
# Change it with something else that makes more sense.
decisions = y_prob > 0.5
decisions.mean()

# Profits without churn offer. This estimates the annual profits in the test set when nobody is targeted.
# This is the baseline at TelCo
potential_values = X_test.MonthlyCharges * 12
actual_values = potential_values * (1 - y_test)
profits_without_offer = actual_values.sum()
print(f"Profits without offer: ${profits_without_offer:,.2f}")

# Estimate the value of your targeting decisions here.
profits_with_offer = 0 # YOUR ANSWER COMES IN HERE
print(f"Profits with offer: ${profits_with_offer:,.2f}")

Profits without offer: $375,756.00
Profits with offer: $0.00
