# Churn prediction 

A definição de churn dependerá da empresa e de seu modelo de negócios, mas essencialmente o evento de churn quando um cliente para de comprar um produto, usar um serviço ou ocorre com um produto ou aplicativo. A rotatividade pode ocorrer em um contexto de negócios contratual ou não contratual.

A rotatividade contratual ocorre quando os clientes cancelam explicitamente um serviço ou assinatura, enquanto o não contratual é mais difícil de observar e requer uma exploração de dados aprofundada. Além disso, a rotatividade pode ser vista como voluntária ou involuntária. A rotatividade voluntária significa que os clientes decidem parar de usar o produto ou serviço, enquanto a rotatividade involuntária ocorre quando os clientes não atualizam automaticamente sua assinatura devido ao vencimento do cartão de crédito ou outros bloqueadores.

## Explore churn rate

In [202]:
import pandas as pd
import numpy as np

In [203]:
telco_raw = pd.read_csv('telco.csv',';')

In [204]:
print(set(telco_raw['Churn']))

{'Yes', 'No'}


In [205]:
# Calculate the ratio size of each churn group
telco_raw.groupby(['Churn']).size() / telco_raw.shape[0] * 100

Churn
No     73.463013
Yes    26.536987
dtype: float64

## Target and Features

In [206]:
custid = ['customerID']
target = ['Churn']

features = [col for col in telco.columns
                if col not in custid+target]

X = telco[features]
Y = telco[target]

In [207]:
categorical = telco_raw.nunique()[telco_raw.nunique()<10].keys().tolist()
categorical.remove(target[0])

numerical = [col for col in telco_raw.columns
                if col not in custid+target+categorical]

In [208]:
telco_raw = pd.get_dummies(data=telco_raw, columns=categorical, drop_first=True)

## Scaling

In [209]:
from sklearn.preprocessing import StandardScaler

In [210]:
scaler = StandardScaler()

scaled_numerical = scaler.fit_transform(telco_raw[numerical])
scaled_numerical = pd.DataFrame(scaled_numerical, columns=numerical)

In [211]:
telco_raw = telco_raw.drop(columns=numerical, axis=1)
telco = telco_raw.merge(right= scaled_numerical, how = 'left', left_index=True, right_index=True)

In [212]:
telco['Churn'] = telco['Churn'].replace({'No': 0, 'Yes': 1})

## Predict churn with logistic regression

In [213]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_score, recall_score

In [256]:
train_X, test_X, train_Y, test_Y = train_test_split(X,Y, test_size=0.25)

In [259]:
logreg = LogisticRegression(penalty='l1',C=0.025, solver='liblinear')

In [260]:
logreg.fit(train_X, train_Y.values.ravel())

LogisticRegression(C=0.025, penalty='l1', solver='liblinear')

In [261]:
pred_train_Y = logreg.predict(train_X)
pred_test_Y = logreg.predict(test_X)

### Model performance Metrics

Principais métricas

* Accuracy - The % of the correctly predicted labels (both Churn and non Churn)
* Precision - The % of total model's positive class prediction (here - predicted as Churn) that were correctly classified
* Recall - The % total positive class samples (all churned customers) that were correctly classified

In [219]:
def model_performance(train_Y, pred_train_Y,test_Y, pred_test_Y):
    
    train_accuracy = accuracy_score(train_Y, pred_train_Y)
    test_accuracy = accuracy_score(test_Y, pred_test_Y)
    
    print('\nTraining accuracy:', round(train_accuracy,4))
    print('Test accuracy:', round(test_accuracy,4))
    
    train_precision = precision_score(train_Y, pred_train_Y, average=None) # pos_label='yes'
    test_precision = precision_score(test_Y, pred_test_Y,average=None)
    
    print('\nTraining precision:', train_precision)
    print('Test precision:', test_precision)
    
    train_recall = recall_score(train_Y, pred_train_Y,average=None)
    test_recall = recall_score(test_Y, pred_test_Y,average=None)
    
    print('\nTraining recall:', train_recall)
    print('Test recall:', test_recall)
    
    

In [262]:
model_performance(train_Y, pred_train_Y,test_Y, pred_test_Y)


Training accuracy: 0.7997
Test accuracy: 0.8058

Training precision: [0.83207011 0.67075472]
Test precision: [0.84446023 0.65155807]

Training recall: [0.90963231 0.50070423]
Test recall: [0.90625    0.51224944]


### Opt Model and parameters 

In [221]:
C = [1, .5, .25, .1, .05, .025, .01, .005, .0025]
l1_metrics = np.zeros((len(C), 7))
l1_metrics[:,0] = C
l1_metrics

array([[1.    , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ],
       [0.5   , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ],
       [0.25  , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ],
       [0.1   , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ],
       [0.05  , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ],
       [0.025 , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ],
       [0.01  , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ],
       [0.005 , 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ],
       [0.0025, 0.    , 0.    , 0.    , 0.    , 0.    , 0.    ]])

In [226]:
for index in range(0, len(C)):
    logreg = LogisticRegression(penalty='l1', C=C[index], solver='liblinear')
    logreg.fit(train_X, train_Y.values.ravel())
    pred_test_Y = logreg.predict(test_X)
    
    l1_metrics[index,1] = np.count_nonzero(logreg.coef_)
    l1_metrics[index,2] = accuracy_score(test_Y, pred_test_Y)
    l1_metrics[index,3] = precision_score(test_Y, pred_test_Y,pos_label=0)
    l1_metrics[index,4] = precision_score(test_Y, pred_test_Y,pos_label=1)
    l1_metrics[index,5] = recall_score(test_Y, pred_test_Y,pos_label=0)
    l1_metrics[index,6] = recall_score(test_Y, pred_test_Y,pos_label=1)
    

In [227]:
col_names = ['C','Non-Zero Coeffs','Accuracy','Precision_No','Precision_Yes','Recall_No', 'Recall_Yes'] #,'Recall'
results = pd.DataFrame(l1_metrics,columns=col_names)

In [228]:
results

Unnamed: 0,C,Non-Zero Coeffs,Accuracy,Precision_No,Precision_Yes,Recall_No,Recall_Yes
0,1.0,27.0,0.795003,0.842613,0.640097,0.883956,0.555556
1,0.5,22.0,0.793299,0.841246,0.636804,0.883178,0.551363
2,0.25,20.0,0.792731,0.840623,0.635922,0.883178,0.549266
3,0.1,18.0,0.791596,0.837878,0.636139,0.885514,0.538784
4,0.05,15.0,0.793299,0.837739,0.641604,0.888629,0.536688
5,0.025,12.0,0.795003,0.832253,0.655914,0.900312,0.51153
6,0.01,8.0,0.786485,0.812242,0.664495,0.919782,0.427673
7,0.005,3.0,0.777399,0.789987,0.690583,0.946262,0.322851
8,0.0025,2.0,0.730267,0.730222,0.75,0.999221,0.006289


## Predict churn with decision trees

In [229]:
from sklearn.tree import DecisionTreeClassifier

In [230]:
clf = DecisionTreeClassifier(max_depth = 7, 
               criterion = 'gini', 
               splitter  = 'best')
treemodel = clf.fit(train_X, train_Y)

In [231]:
pred_train_Y =  treemodel.predict(train_X)
pred_test_Y =  treemodel.predict(test_X)

In [232]:
model_performance(train_Y, pred_train_Y,test_Y, pred_test_Y)


Training accuracy: 0.8258
Test accuracy: 0.7853

Training precision: [0.86343612 0.69732441]
Test precision: [0.83456425 0.62162162]

Training recall: [0.90694087 0.59913793]
Test recall: [0.88006231 0.53039832]


### Parameter Tuning 

In [233]:
depth_list = list(range(2,15))
depth_tuning = np.zeros((len(depth_list),6))
depth_tuning[:,0] = depth_list
depth_tuning

array([[ 2.,  0.,  0.,  0.,  0.,  0.],
       [ 3.,  0.,  0.,  0.,  0.,  0.],
       [ 4.,  0.,  0.,  0.,  0.,  0.],
       [ 5.,  0.,  0.,  0.,  0.,  0.],
       [ 6.,  0.,  0.,  0.,  0.,  0.],
       [ 7.,  0.,  0.,  0.,  0.,  0.],
       [ 8.,  0.,  0.,  0.,  0.,  0.],
       [ 9.,  0.,  0.,  0.,  0.,  0.],
       [10.,  0.,  0.,  0.,  0.,  0.],
       [11.,  0.,  0.,  0.,  0.,  0.],
       [12.,  0.,  0.,  0.,  0.,  0.],
       [13.,  0.,  0.,  0.,  0.,  0.],
       [14.,  0.,  0.,  0.,  0.,  0.]])

In [234]:
for index in range(len(depth_list)):
    tree = DecisionTreeClassifier(max_depth=depth_list[index])
    tree.fit(train_X, train_Y)
    pred_test_Y = tree.predict(test_X)
    depth_tuning[index,1] = accuracy_score(test_Y, pred_test_Y)
    
    depth_tuning[index,2] = precision_score(test_Y, pred_test_Y,pos_label=0)
    depth_tuning[index,3] = precision_score(test_Y, pred_test_Y,pos_label=1)
    
    depth_tuning[index,4] = recall_score(test_Y, pred_test_Y,pos_label=0)
    depth_tuning[index,5] = recall_score(test_Y, pred_test_Y,pos_label=1)

In [235]:
col_names = ['Depth','Accuracy','Precision_No','Precision_Yes','Recall_No', 'Recall_Yes']
tree_results = pd.DataFrame(depth_tuning,columns=col_names)

In [236]:
tree_results

Unnamed: 0,Depth,Accuracy,Precision_No,Precision_Yes,Recall_No,Recall_Yes
0,2.0,0.776831,0.802444,0.645833,0.920561,0.389937
1,3.0,0.776831,0.802444,0.645833,0.920561,0.389937
2,4.0,0.777399,0.802578,0.648084,0.92134,0.389937
3,5.0,0.779671,0.81728,0.627507,0.898754,0.459119
4,6.0,0.780239,0.807824,0.648026,0.916667,0.412998
5,7.0,0.784781,0.833456,0.621287,0.880841,0.526205
6,8.0,0.764338,0.821614,0.57561,0.864486,0.494759
7,9.0,0.752413,0.818797,0.547564,0.848131,0.494759
8,10.0,0.737649,0.824132,0.515213,0.813863,0.532495
9,11.0,0.74276,0.828458,0.524194,0.816199,0.545073


## Identify and interpret churn drivers

In [240]:
from sklearn import tree
import graphviz

In [278]:
exported = tree.export_graphviz(decision_tree=treemodel,
                               out_file=None,
                               feature_names=train_X.columns,
                               precision=1,
                               class_names=['Not churn','Churn'],
                               filled = True)

graph = graphviz.Source(exported)

In [281]:
#display(graph)

### the coefficient for odds

In [271]:
feature_names = pd.DataFrame(train_X.columns, columns = ['Feature'])
log_coef = pd.DataFrame(np.transpose(logreg.coef_), columns = ['Coefficient'])
coefficients = pd.concat([feature_names, log_coef], axis = 1)

In [272]:
coefficients.columns = ['Feature','Coefficient']
coefficients['Exp_Coefficient'] = np.exp(coefficients['Coefficient'])

coefficients = coefficients[coefficients['Coefficient']!=0]
print(coefficients.sort_values(by=['Exp_Coefficient']))

                           Feature  Coefficient  Exp_Coefficient
27                          tenure    -0.827912         0.436961
4                 PhoneService_Yes    -0.789483         0.454080
22               Contract_Two year    -0.643787         0.525299
16                 TechSupport_Yes    -0.452694         0.635913
10              OnlineSecurity_Yes    -0.413132         0.661575
21               Contract_One year    -0.406172         0.666196
3                   Dependents_Yes    -0.152704         0.858384
12                OnlineBackup_Yes    -0.121862         0.885270
14            DeviceProtection_Yes    -0.067107         0.935095
2                      Partner_Yes    -0.064901         0.937160
23            PaperlessBilling_Yes     0.103254         1.108773
25  PaymentMethod_Electronic check     0.259203         1.295897
28                  MonthlyCharges     0.879866         2.410577


The coefficients can be interpreted as the change in log-odds of the churn associated with 1 unit increase in the input feature value. For example if the input feature is tenure in years, then increase in the tenure by one year will have an effect equal to the coefficient to the log-odds

The interpretation of the coefficient for odds is as follows - values less than 1 decrease the odds, and values more than 1 increase the odds. The effect on the odds is calculated by multiplying the exponent of the coefficient. So the effect of one additional year of tenure decreases the odds of churn by 1 minus 0.403.