# **Churn prediction scoring with Jupyter Voilà  and Lime**
---

## **Churn Prediction**

Customer attrition, also known as customer churn, customer turnover, or customer defection, is the loss of clients or customers.
Banks, telephone service companies, Internet service providers, pay TV companies, insurance firms, and alarm monitoring services, often use **customer prediction** analysis and customer prediction rates as one of their key business metrics (along with cash flow, EBITDA, etc.) because the cost of retaining an existing customer is far less than acquiring a new one.

## **Dataset**

In this example, we are using banking churn prediction dataset.

Link - https://www.kaggle.com/nasirislamsujan/bank-customer-churn-prediction/data

## **How to use**
- The dataset is already set, and the preprocessing code also.
- The user needs to pick the model from the list and train it on the processed data.
- Important: **training might take few seconds**
- Then the user gets back the metrics (loss and accuracy) of the model.
- After training the model, the user can use the sliders and buttons to insert single user data and make predictions by himself.
- For each prediction, if the user is interested, he can recive an image which explains the prediction of the model.

## **Code**

In [44]:
import warnings
warnings.filterwarnings("ignore")
warnings.filterwarnings("ignore", category=DeprecationWarning)

### **Dataset**

In [45]:
import pandas as pd
from sklearn import preprocessing
from sklearn.model_selection import train_test_split

In [46]:
# Path in cnvrg datasets.
# path = '/data/churn-prediction-banking/Churn_Modelling.csv'

# Path in local.
path = '/Users/omerliberman/Desktop/datasets/churn prediction - banking/ds.csv'

In [47]:
df = pd.read_csv(path, index_col='RowNumber')
df = df.sample(frac=1).reset_index()

groupby = df.groupby('Exited')
all_ones = groupby.get_group(1).set_index('RowNumber')
all_zeros = groupby.get_group(0).set_index('RowNumber')

In [48]:
sample_of_zeros = all_zeros.head(3)
sample_of_ones = all_ones.head(3)

sample = sample_of_zeros
sample = sample.append(sample_of_ones)

sample

Unnamed: 0_level_0,CustomerId,Surname,CreditScore,Geography,Gender,Age,Tenure,Balance,NumOfProducts,HasCrCard,IsActiveMember,EstimatedSalary,Exited
RowNumber,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1
271,15787071,Dulhunty,650,Spain,Male,41,9,0.0,2,0,1,191599.67,0
5063,15650432,Liu,849,Germany,Male,41,10,84622.13,1,1,1,198072.16,0
5532,15696744,Miller,705,France,Female,31,3,119794.67,1,0,0,182528.44,0
6497,15789313,Ugorji,595,Germany,Female,44,4,96553.52,2,1,0,143952.24,1
2640,15581036,Beyer,712,Germany,Female,40,3,109308.79,2,1,0,120158.72,1
5357,15655436,Kendall,839,Germany,Male,47,2,136911.07,1,1,1,168184.62,1


In [49]:
# **Preprocessing**

In [50]:
# Features should be "dummied".
features_to_dummy = ['Geography', 'Gender']

# Features should be dropped.
features_to_drop = ['RowNumber', 'Surname', 'CustomerId', 'IsActiveMember']

# Features to scale.
features_to_scale = ['CreditScore', 'Age', 'Balance', 'EstimatedSalary']

In [51]:
# Drop redundant columns
df = df.drop(labels=features_to_drop, axis=1)

# Convert the features Geography and Gender to categorical features.
df = pd.get_dummies(df, columns=features_to_dummy, dtype=float)

# Scaling.
max_values_dict = dict()
for col in features_to_scale:
    max_in_col = max(df[col])
    max_values_dict[col] = max_in_col
    df[col] /= max_in_col

In [52]:
# Set the X, y.
target_col = 'Exited'
X, y = df.drop(labels=target_col, axis=1), df[target_col]

df = X
df[target_col] = y

In [53]:
# Spliting to train & test sets.
feature_names = df.columns
test_size = 0.2
train, test = train_test_split(df, test_size=test_size)

In [54]:
# Setting X and y for training.
X_train, y_train = train.iloc[:, :-1], train.iloc[:, -1]
X_test, y_test = test.iloc[:, :-1], test.iloc[:, -1]

### **Training**

In [55]:
import pickle
from ipywidgets import widgets, ButtonStyle
from sklearn.metrics import zero_one_loss, accuracy_score

from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier

from sklearn.naive_bayes import GaussianNB
from sklearn.tree import DecisionTreeClassifier

In [56]:
model_wid = widgets.Select(
                            options=['10-nearest-neighbors', 'SVM', 'GaussianNaiveBayes', 'DecisionTree'],
                            value='10-nearest-neighbors',
                            rows=4,
                            description='Model: ',
                            disabled=False
                            )
model_wid

Select(description='Model: ', options=('10-nearest-neighbors', 'SVM', 'GaussianNaiveBayes', 'DecisionTree'), r…

In [57]:
def select_model(model_name):
    if model_name == '10-nearest-neighbors':
        return KNeighborsClassifier(n_neighbors=10)
    
    elif model_name == 'SVM':
        return SVC(gamma='auto', probability=True)
    
    elif model_name == 'GaussianNaiveBayes':
        return GaussianNB()
    
    elif model_name == 'DecisionTree':
        return DecisionTreeClassifier()
    
    else:
        raise Exception('Error in model selection')

In [58]:
acc_wid = widgets.IntText(placeholder='Accuracy',
                    description='Accuracy: ',
                    disabled=False)

acc_wid

IntText(value=0, description='Accuracy: ')

In [59]:
loss_wid = widgets.IntText(placeholder='Loss',
                    description='Loss: ',
                    disabled=False)

loss_wid

IntText(value=0, description='Loss: ')

In [60]:
def train_model(_=None):
    # Initiation of the classifier.
    model = select_model(model_wid.value)
    
    history = model.fit(X_train, y_train)
    
    X_test, y_test = test.iloc[:, :-1], test.iloc[:, -1]

    prediction = model.predict(X_test)

    # Getting loss and accuracy.
    ac = round(accuracy_score(y_test, prediction), 2) * 100
    ls = round(zero_one_loss(y_test, prediction), 2) * 100
    
    acc_wid.value, loss_wid.value = ac, ls
    
    # Saving the trained model.
    pickle.dump(model, open("{}.sav".format(model_wid.value), 'wb'))
    
    train_status.value = "{} model is trained!".format(model_wid.value)

In [61]:
train_button = widgets.Button(description='Train Model')
train_button.on_click(train_model)

train_status = widgets.Text(placeholder='Training Status',
                            disabled=True)

vb_train_and_status = widgets.HBox([train_button, train_status])
vb_train_and_status

HBox(children=(Button(description='Train Model', style=ButtonStyle()), Text(value='', disabled=True, placehold…

## **Predict with Voila**

In [62]:
import numpy as np
from IPython.display import display
from ipywidgets import widgets, ButtonStyle, VBox, HBox

### **Set Values**

In [63]:
# CustomerID (ignored)
customerId = widgets.Text(placeholder='CustomerID',
                    description='CustomerID: ',
                    disabled=False)

customerId

Text(value='', description='CustomerID: ', placeholder='CustomerID')

In [64]:
# Surname (ignored)
Surname = widgets.Text(placeholder='Surname',
                    description='Surname: ',
                    disabled=False)

Surname

Text(value='', description='Surname: ', placeholder='Surname')

In [65]:
# CreditScore
credit_score = widgets.IntSlider(
    value=500,
    min=0,
    max=1000,
    description='Credit Score:',
)
credit_score.style.handle_color = 'blue'

credit_score

IntSlider(value=500, description='Credit Score:', max=1000, style=SliderStyle(handle_color='blue'))

In [66]:
# Age
age = widgets.IntSlider(
    value=40,
    min=0,
    max=120,
    description='Age:',
)
age.style.handle_color = 'blue'

age

IntSlider(value=40, description='Age:', max=120, style=SliderStyle(handle_color='blue'))

In [67]:
# Tenure
tenure = widgets.IntSlider(
    value=3,
    min=0,
    max=10,
    step=1,
    description='Tenure:',
)
tenure.style.handle_color = 'blue'

tenure

IntSlider(value=3, description='Tenure:', max=10, style=SliderStyle(handle_color='blue'))

In [68]:
# Balance
balance = widgets.FloatSlider(
    value=50000,
    min=0,
    max=300000.0,
    step=0.1,
    description='Balance:',
)
balance.style.handle_color = 'blue'

balance

FloatSlider(value=50000.0, description='Balance:', max=300000.0, style=SliderStyle(handle_color='blue'))

In [69]:
# NumOfProducts
num_of_products = widgets.IntSlider(
    value=3,
    min=1,
    max=5,
    step=1,
    description='NumOfProducts:',
)
num_of_products.style.handle_color = 'blue'

num_of_products

IntSlider(value=3, description='NumOfProducts:', max=5, min=1, style=SliderStyle(handle_color='blue'))

In [70]:
# EstimatedSalary
estimated_salary = widgets.FloatSlider(
    value=50000,
    min=0,
    max=300000.0,
    step=0.1,
    description='EstimatedSalary:',
)
estimated_salary.style.handle_color = 'blue'

estimated_salary

FloatSlider(value=50000.0, description='EstimatedSalary:', max=300000.0, style=SliderStyle(handle_color='blue'…

In [71]:
# Geography
geography = widgets.Dropdown(
    options=['France', 'Germany', 'Spain'],
    value='Germany',
    description='Geography:',
)

geography

Dropdown(description='Geography:', index=1, options=('France', 'Germany', 'Spain'), value='Germany')

In [72]:
# HasCard
has_cr_card = widgets.ToggleButtons(
    options=['True', 'False'],
    description='Has Card:',
)

has_cr_card

ToggleButtons(description='Has Card:', options=('True', 'False'), value='True')

In [73]:
# Gender
gender = widgets.ToggleButtons(
    options=['Female', 'Male'],
    description='Gender:',
)
credit_score.style.handle_color = 'blue'

gender

ToggleButtons(description='Gender:', options=('Female', 'Male'), value='Female')

In [74]:
# Process the results
def process_values(credit_score=credit_score, 
                   age=age, 
                   tenure=tenure, 
                   balance=balance, 
                   num_of_products=num_of_products,
                   has_cr_card=has_cr_card, 
                   estimated_salary=estimated_salary, 
                   geography=geography, 
                   gender=gender):
    
    CreditScore = credit_score.value / max_values_dict['CreditScore']
    Age = age.value / max_values_dict['Age']
    Tenure = tenure.value
    Balance = balance.value / max_values_dict['Balance']
    NumOfProducts = num_of_products.value
    HasCrCard = 0 if has_cr_card.value == 'False' else 1
    EstimatedSalary = estimated_salary.value / max_values_dict['EstimatedSalary']
    Geography_Spain = 1 if geography.value == 'Spain' else 0
    Geography_Germany = 1 if geography.value == 'Germany' else 0
    Geography_France = 1 if geography.value == 'France' else 0
    Gender_Male = 1 if gender.value == 'Male' else 0
    Gender_Female = 1 if gender.value == 'Female' else 0

    to_predict = [CreditScore, Age, Tenure, Balance, NumOfProducts, HasCrCard, EstimatedSalary, 
                  Geography_France, Geography_Germany, Geography_Spain, Gender_Female, Gender_Male]

    to_predict = np.array(to_predict).reshape(1, -1)
    
    return to_predict

### **Prediction and Explanation**

In [75]:
prediction_wid = widgets.Text(placeholder='Prediction',
                    description='Prediction: ',
                    disabled=False)

probability_wid = widgets.FloatText(placeholder='Probability',
                    description='Prboability: ',
                    disabled=False)

vb_pred_and_prob = VBox([prediction_wid, probability_wid])

In [76]:
# Loading the trained model.
try:
    model = pickle.load(open("{}.sav".format(model_wid.value), 'rb'))
except Exception:
    model = train_model(model_wid.value)
    
def show_prediction(_=None):
    to_predict = process_values()
    prob = model.predict_proba(to_predict)[0]
    churn_prob, not_churn_prob = prob[0], prob[1]
    prediction = "Not Churn" if not_churn_prob > churn_prob else "Churn"

    prediction_wid.value = prediction
    probability_wid.value = not_churn_prob if not_churn_prob > churn_prob else churn_prob
    return np.array(to_predict).reshape(-1,)

In [77]:
import lime
import ipywidgets as widgets
import matplotlib.pyplot as plt
from IPython.display import clear_output
from lime.lime_tabular import LimeTabularExplainer  # <--- Since we're using DataFrames

In [78]:
def show_explanation(_=None):
    explainer = LimeTabularExplainer(X_train.values, 
                                       feature_names=feature_names, 
                                       class_names=[0,1])
    exp = explainer.explain_instance(data_row=to_predict, 
                                        predict_fn=model.predict_proba,
                                        num_features=10)
    fig = exp.as_pyplot_figure(label=1).savefig("1.png", bbox_inches="tight")
    file1 = open("1.png", "rb")
    image1 = file1.read()
    exp_wid1.value = image1

In [79]:
file = open("white.png", "rb")
image = file.read()
exp_wid1 = widgets.Image(
    value=image,
    format='png',
    width=350,
    height=200,
)

hb_pred_prob_exp = HBox([vb_pred_and_prob, exp_wid1])

In [80]:
predBtn = widgets.Button(description='Predict')

predBtn.on_click(show_prediction)
predBtn.on_click(show_explanation)
to_predict = show_prediction()

final_vbox = HBox([predBtn, hb_pred_prob_exp])

final_vbox

HBox(children=(Button(description='Predict', style=ButtonStyle()), HBox(children=(VBox(children=(Text(value='N…