# Yet another look at Telco customer churn problem

## TL;DR
### This notebook aims at showing the difference between Kaggle world and the real-world using telco churn dataset. An example, how you could look at the problem and build applicable solution, that might be acceptable by the business. 
### How to do some of the feature engineering and how to create training dataset that makes sense
### How to build a baseline model - I am not focusing on tuning the model or building multiple models, you all know how to do that better than me.
### I am trying to focus on consequences of the model and how important it is to put modelling into right business context and be able to show model value. 

- <a href='#1'>1. EDA</a>
    - <a href='#1.1'>1.1. Exploratory data analysis</a>
    - <a href='#1.2'>1.2. Missing Values exploration</a>
    - <a href='#1.3'>1.3. Churn breakdown</a>
    - <a href='#1.4'>1.4. Data distributions against Churn flag</a>
- <a href='#2'>2. Feature preparation</a>
    - <a href='#2.1'>2.1. Feature engineering</a>
- <a href='#3'>3. Modelling</a>
    - <a href='#3.1'>3.1. Analysing and transforming results</a>
    - <a href='#3.2'>3.2. Threshold selection</a>
- <a href='#4'>4. The consequences of our model</a>
    - <a href='#4.1'>4.1. Feature Importance</a>
    - <a href='#4.2'>4.2. Unconfusing confusion matrices</a>
    - <a href='#4.3'>4.3. Campaigning the model</a>
    - <a href='#4.4'>4.4. Customer Lifetime Value</a>
- <a href='#5'>5. Conclusions</a>

In [None]:
%matplotlib inline
import pandas as pd
import numpy as np
import itertools
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelBinarizer, LabelEncoder
from sklearn.model_selection import train_test_split
import lightgbm as lgbm
from sklearn.metrics import auc, f1_score, roc_auc_score, roc_curve, confusion_matrix, accuracy_score, precision_score
import seaborn as sns
import time
pd.set_option('display.max_rows', 90)
pd.set_option('display.max_columns', 200)
pd.set_option('display.width', 1000)

### <a id='1'>1. EDA</a>

In [None]:
in_file = '/kaggle/input/datasets-for-churn-telecom/cell2celltrain.csv'

### <a id='1.1'>1.1. Exploratory data analysis</a>

In [None]:
df = pd.read_csv(in_file)

In [None]:
df.head()

In [None]:
df.shape

### Class sizes

In [None]:
df.Churn.value_counts()

### Ratios
We take a note of class imbalance

In [None]:
df.Churn.value_counts()/df.shape[0]

### <a id='1.2'>1.2. Missing Values exploration</a>

### The usual suspects missing values check

In [None]:
# number of missing values in dataset
df.isnull().sum().values.sum()

### Boom! We have missing vals but why? Let's roll up sleeves and check.
#### Despite number of techniques available, we will use common sense most of all, to determine whether or not missing value is there for a reason or missing at random

In [None]:
missing = list()
for x in df.columns:
    if df[x].isnull().sum() != 0:
        print(x, df[x].isnull().sum())
        missing.append(x)

#### From above list we can remove straight away: MonthlyRevenue, MonthlyMinutes, TotalRecurringCharge, RoamingCalls, PercChangeMinutes, PercChangeRevenues, Handsets, HandsetModels, CurrentEquipmentDays, OverageMinutes.
#### As, above features simply could have zero value or be not present at all. We'll set them to zero 'safely'. We'll investigate the remaining lot!

In [None]:
# First of, what on earthe is this feature name? is the Director assisting calls?
df.DirectorAssistedCalls.describe()

#### Stats are showing that the most values are zero, so this could be calls to call centre, where customer said 
### 'I want to speak to your MANAGER!'

In [None]:
plt.figure(figsize=(7,7))
plt.grid(True)
sns.distplot(df.DirectorAssistedCalls.fillna(0))
plt.xlim(right=25)

does it have an impact on churn?

In [None]:
df.Churn[df.DirectorAssistedCalls != 0].value_counts()

... hmmmm interesting, recall from above that our churn population in data set is 14711, and with this feature we're capturing a lot of churners. However, way more non-churnes to be fair.

In [None]:
# ratios against all customer population
df.Churn[df.DirectorAssistedCalls != 0].value_counts()/df.shape[0]

### I think we can zero-out this feature

### We've left with missing vals in Age let's look at stats first

In [None]:
df.AgeHH1.describe()

In [None]:
df.AgeHH2.describe()

### How many records are impacted by 0 age
This could be due to many reasons:
> customers given option not to report age

> 'Clerical' error

> recent change in system

> ETL process crashed


In [None]:
df.Churn[(df.AgeHH1.fillna(0) == 0)&(df.AgeHH2.fillna(0) == 0)].value_counts()

### Some missing values conclusions.
In my opinion is safe to 0 substitute all missing variables in this data set.
As we showing above, there could be some data missing at random, however vast majority is due to the processes like data gathering, data entering, choice for customer that results in missing information.

In [None]:
df = df.fillna(0)

### <a id='1.3'>1.3. Churn breakdown</a>

## Let's take a look at churn and how it's reflected in some of the features

In [None]:
#get a list of categoricals
categoricals = list()
for x in df.columns:
    if df[x].dtype == 'object':
        categoricals.append(x)

In [None]:
df[categoricals].nunique()

### Most of categoricals are binary so let's see how the churn is reflected in some other variables

In [None]:
def plot_val_counts(df, col=''):
    plt.figure(figsize=(5,5))
    plt.grid(True)
    plt.bar(df[col][df.Churn=='Yes'].value_counts().index, 
            df[col][df.Churn=='Yes'].value_counts().values)
    plt.title(f'{col}')
    plt.xticks(rotation=-90)

In [None]:
plot_val_counts(df, col='HandsetPrice')

In [None]:
plot_val_counts(df, col='CreditRating')

In [None]:
plot_val_counts(df, col='Occupation')

In [None]:
plot_val_counts(df, col='PrizmCode')

### <a id='1.4'>1.4. Data distributions</a>
### Data distributions

In [None]:
def plot_distro(df, col = '', y_limit=None, x_limit_r=None, x_limit_l = None):
    plt.figure(figsize=(10,10))
    plt.grid(True)
    sns.distplot(df[col][df.Churn == 'Yes'])
    sns.distplot(df[col][df.Churn == 'No'])
    plt.legend(['churn_flag_yes', 'churn_flag_no'])
    if y_limit:
        plt.ylim(top=y_limit)
    if x_limit_r:
        plt.xlim(right=x_limit_r)
    if x_limit_l:
        plt.xlim(left=x_limit_l)

In [None]:
plot_distro(df, col='PercChangeMinutes', x_limit_r=1200, x_limit_l=-1200)

In [None]:
plot_distro(df, col='TotalRecurringCharge', x_limit_r=180)

In [None]:
plot_distro(df, col='DirectorAssistedCalls', y_limit=.3, x_limit_r=10)

In [None]:
plot_distro(df, col='MonthlyRevenue', x_limit_r=200)

In [None]:
plt.figure(figsize=(10,10))
plt.grid(True)
sns.boxplot(x=df.Occupation[df.Churn == 'Yes'], y=df.MonthlyRevenue[df.Churn == 'Yes'])
#sns.boxplot(x=df.Occupation[df.Churn == 'No'], y=df.MonthlyRevenue[df.Churn == 'No'])

### Not much of a difference visible from this perspective, let's zoom in

In [None]:
plt.figure(figsize=(10,10))
plt.grid(True)
sns.boxplot(x=df.Occupation, y=df.MonthlyRevenue, hue=df.Churn)
plt.ylim(top=100)

In [None]:
plt.figure(figsize=(10,10))
plt.grid(True)
sns.boxplot(x=df.ChildrenInHH, y=df.MonthlyRevenue, hue=df.Churn)
#sns.boxplot(x=df.ChildrenInHH[df.Churn == 'No'], y=df.MonthlyRevenue[df.Churn == 'No'])
plt.ylim(top=150)

### Let's note that differences are just subtle, there are no obvious relationships between Occupation and churn

### Customer tenure and churn

In [None]:
df.MonthsInService.describe()

In [None]:
tenure_churn = df.MonthsInService[df.Churn == 'Yes'].value_counts()
tenure_no_churn = df.MonthsInService[df.Churn == 'No'].value_counts()

In [None]:
tenure = pd.merge(tenure_churn.reset_index(), tenure_no_churn.reset_index(), on='index')

In [None]:
tenure = tenure.sort_values(by='index')

In [None]:
tenure = tenure.reset_index().drop(columns='level_0')

In [None]:
tenure.columns

In [None]:
plt.figure(figsize=(10,10))
plt.grid(True)
sns.pointplot(x=tenure.index, y=tenure.MonthsInService_x, color='red')
sns.pointplot(x=tenure.index, y=tenure.MonthsInService_y, color='green')
plt.xticks(rotation=90)
plt.title('When the churn picks')

### <a id='2.1'>2.1. Feature preparation</a>

### Let's look at type of features and how we could categorize them

In [None]:
def get_lists_of_dtypes(df):
    """
    Helper function to create list of features by type and by number of unique
    values they consist of.
    """
    strings = list()
    integers = list()
    floats = list()
    # Checking for partial string match to append accordingly value type
    # As here we might have different type of ints and floats
    # Note that strings we're returning as dictionary, to have number of unique vals for each feature
    for x in df.columns[2:]:
        if str(df[x].dtype)[:3] in 'obj':
            strings.append({x:len(df[x].unique())})
        elif str(df[x].dtype)[:3] in 'int':
            integers.append(x)
        elif str(df[x].dtype)[:3] in 'flo':
            floats.append(x)
        else:
            continue
    return strings,integers, floats

In [None]:
s, i, f = get_lists_of_dtypes(df)

## Let's investigate strings and number of unique values
### Below we can see that features with 2 unique values are already encoded
### as binary type [0,1] - so we can leave them as they are
### remaining features we will one hot encode including ServiceArea despite having 748 unique values
### as this feature might have an impact on model, due to larger population density
### therefore, might have higher/lower churn propensity

In [None]:
s

### <a id='2.1'>2.1. Feature engineering</a>

In [None]:
def prep_categorical_features(s):
    """
    helper function to return features that we want to one hot encode
    """
    one_hot = list()
    binary = list()
    for x in s:
        for k, v in x.items():
            if v > 2:
                one_hot.append(k)
            else:
                binary.append(k)
    return one_hot, binary

In [None]:
one_hot, binary = prep_categorical_features(s)

### Prep numerical pairwise features
### as in many feature engineering techniques we will use some of the well known tricks
> pairwise sum, diff, ratio

> min, max, mean

### Let's write a function to do just that

In [None]:
def pairwise(col_1, col_2):
    """
    calculates pairwise features
    for given two dataframe columns
    """
    tot = col_1 + col_2
    diff = col_1 - col_2
    ratio = col_1/col_2
    return tot, diff, ratio

In [None]:
def stats(col):
    """
    calculates stats for given
    dataframe column
    """
    mini = col.min()
    maxi = col.max()
    avg = col.mean()
    return mini, maxi, avg

In [None]:
def feature_engine_numericals(dff, i, f):
    """
    Expands dataframe based on current lists of
    numerical features (int, floats)
    """
    numericals = i + f
    df = dff.copy()
    for x in numericals:
        df[f'{x}_min'], df[f'{x}_max'], df[f'{x}_mean'] = stats(df[x])
        for e in numericals:
            if e==x:
                pass
            else:
                df[f'sum_{x}_{e}'], df[f'diff_{x}_{e}'], df[f'ratio_{x}_{e}'] = pairwise(df[x], df[e])
    return df

In [None]:
%%time
pair_df = feature_engine_numericals(df, i, f)

### Feature engineering on categoricals transformations

In [None]:
def feature_engine_categoricals(dff, binary, one_hot):
    """
    Function to expand dataframe by one-hot encoding
    categorical variables also, changes datatype to float
    """
    df = dff.copy()
    lb = LabelBinarizer()
    for b in binary:
        df[f'{b}_tr'] = lb.fit_transform(df[b]).astype(np.float64)
        df = df.drop(columns=b)
    df = pd.get_dummies(df, columns=one_hot, dtype=float)
    return df

In [None]:
%%time
pair_df = feature_engine_categoricals(pair_df, binary, one_hot)

In [None]:
pair_df.shape, df.shape

In [None]:
pair_df.head()

### <a id='3'>3. Modelling</a>

### Encode labels and separate features and labels

In [None]:
le = LabelEncoder()

In [None]:
lab = le.fit_transform(pair_df.Churn).astype(np.float64)

In [None]:
l = pd.DataFrame({'lbls':pair_df.Churn, 'l_tr':lab})

In [None]:
l.head()

> 1 - customers who have churned

> 0 - customers retained

In [None]:
feats = pair_df.iloc[:,2:]

### Split for train test

In [None]:
x_train, x_test, y_train, y_test = train_test_split(feats, lab, test_size = .25, random_state = 7)

### Split train for train and eval

In [None]:
x_tr, x_ev, y_tr, y_ev = train_test_split(x_train, y_train, test_size = .05, random_state = 7)

In [None]:
x_tr.shape, x_ev.shape, y_tr.dtype, y_ev.dtype

In [None]:
train_data = lgbm.Dataset(data=x_tr, label=y_tr)
val_data = lgbm.Dataset(data=x_ev, label=y_ev)

### Model params

In [None]:
# tuning copied from https://www.kaggle.com/avanwyk/a-lightgbm-overview
# Note that there is no param search here, as this is meant to be a base line model.

advanced_params = {
    'boosting_type': 'gbdt',
    'objective': 'binary',
    'metric': 'auc',
    
    'learning_rate': 0.1,
    'num_leaves': 141, # more leaves increases accuracy, but may lead to overfitting.
    
    'max_depth': 7, # the maximum tree depth. Shallower trees reduce overfitting.
    'min_split_gain': 0, # minimal loss gain to perform a split
    'min_child_samples': 21, # or min_data_in_leaf: specifies the minimum samples per leaf node.
    'min_child_weight': 5, # minimal sum hessian in one leaf. Controls overfitting.
    
    'lambda_l1': 0.5, # L1 regularization
    'lambda_l2': 0.5, # L2 regularization
    
    'feature_fraction': 0.7, # randomly select a fraction of the features before building each tree.
    # Speeds up training and controls overfitting.
    'bagging_fraction': 0.5, # allows for bagging or subsampling of data to speed up training.
    'bagging_freq': 0, # perform bagging on every Kth iteration, disabled if 0.
    
    'scale_pos_weight': 99, # add a weight to the positive class examples (compensates for imbalance).
    
    'subsample_for_bin': 200000, # amount of data to sample to determine histogram bins
    'max_bin': 1000, # the maximum number of bins to bucket feature values in.
    # LightGBM autocompresses memory based on this value. Larger bins improves accuracy.
    
    'nthread': 4, # number of threads to use for LightGBM, best set to number of actual cores.
}

### Model train

In [None]:
# train function from https://www.kaggle.com/avanwyk/a-lightgbm-overview
def train_gbm(params, training_set, validation_set, init_gbm=None, boost_rounds=100, early_stopping_rounds=0, metric='auc'):
    evals_result = {} 

    gbm = lgbm.train(params, # parameter dict to use
                    training_set,
                    init_model=init_gbm, # initial model to use, for continuous training.
                    num_boost_round=boost_rounds, # the boosting rounds or number of iterations.
                    early_stopping_rounds=early_stopping_rounds, # early stopping iterations.
                    # stop training if *no* metric improves on *any* validation data.
                    valid_sets=validation_set,
                    evals_result=evals_result, # dict to store evaluation results in.
                    verbose_eval=True) # print evaluations during training.
    
    return gbm, evals_result

In [None]:
gbm, evals_result = train_gbm(advanced_params, training_set=train_data, validation_set=val_data,
                             boost_rounds=1000, early_stopping_rounds=50)

In [None]:
y_hat = gbm.predict(x_test)

### <a id='3.1'>3.1. Analysing and transforming results</a>

In [None]:
test_res = pd.DataFrame({'y_true':y_test, 'y_hat':y_hat})

In [None]:
test_res.y_hat[test_res.y_true == 0].shape, test_res.y_hat[test_res.y_true == 1].shape

In [None]:
roc_auc_score(test_res.y_true, test_res.y_hat)

In [None]:
test_res.y_hat[test_res.y_true == 0].describe()

In [None]:
test_res.y_hat[test_res.y_true == 1].describe()

In [None]:
def plot_distro(df, col = '', fiter_col = '', y_limit=None, x_limit_r=None, x_limit_l = None):
    plt.figure(figsize=(10,10))
    plt.grid(True)
    sns.distplot(df[col][df[fiter_col] == 1])
    sns.distplot(df[col][df[fiter_col] == 0])
    plt.legend(['churn_flag_yes', 'churn_flag_no'])
    if y_limit:
        plt.ylim(top=y_limit)
    if x_limit_r:
        plt.xlim(right=x_limit_r)
    if x_limit_l:
        plt.xlim(left=x_limit_l)

In [None]:
plot_distro(test_res, col = 'y_hat', fiter_col = 'y_true', y_limit=None, x_limit_r=None, x_limit_l = .8)

### Let's transform the results to separate two distributions as far as possible

### <a id='3.2'>3.2. Threshold selection</a>

In [None]:
plt.figure(figsize=(10,10))
plt.grid(True)
sns.distplot(np.exp(test_res.y_hat[test_res.y_true == 0]), color='green')
sns.distplot(np.exp(test_res.y_hat[test_res.y_true == 1]), color='red')
plt.title('Distribution of the results, for two classes')
plt.legend(['no_churn', 'churn'])

In [None]:
plt.figure(figsize=(12,12))
plt.grid(True)
sns.distplot(1/np.log(test_res.y_hat[test_res.y_true == 0]), color='green')
sns.distplot(1/np.log(test_res.y_hat[test_res.y_true == 1]), color='red')
plt.plot([-36.3, -36.3], [0, 0.024], 'bo--', linewidth=2.5)
plt.plot([-45, -45], [0, 0.024], 'go--', linewidth=2.5)
plt.title('Distribution of the results, for two classes with upper thresholds')

plt.legend(['best_auc_threshold','threshold_business', 'no_churn', 'churn'])

### Now we can investigate how to select the threshold for setting churn flag = 1
### Above plot implies that there's no perfect separation, as this is always the case in real life
### Therefore, we have to look for so called 'swee tspot' where we maximise impact of the model
### In the industry treshold is usually determined by business who uses model
### In Kaggle we're looking for best AUC threshold, if that's the metric on leader board
### I will focus on business case here. You all know how to do it the other way ;)
#### from below descriptive stats we could start at -39

In [None]:
1/np.log(test_res['y_hat'][test_res['y_true'] == 1]).describe()

### Transform results accordingly

In [None]:
test_res['y_transformed'] = 1/np.log(test_res['y_hat'])

In [None]:
def plot_roc_curve(test_res, threshold = -39):
    ns_probs = [0 for _ in range(len(test_res))]
    fpr, tpr, threshold = roc_curve(test_res.y_true, np.where(test_res.y_transformed < threshold, 1, 0))
    _fpr_, _tpr_, _threshold_ = roc_curve(test_res.y_true, ns_probs)
    roc_auc = auc(fpr, tpr)
    plt.figure(figsize=(10,10))
    plt.grid(True)
    plt.title("ROC Curve. Area under Curve: {:.3f}".format(roc_auc))
    plt.xlabel('False Positive Rate')
    plt.ylabel('True Positive Rate')
    _ = plt.plot(fpr, tpr, 'r')
    __ = plt.plot(_fpr_, _tpr_, 'b', ls = '--' )

In [None]:
plot_roc_curve(test_res, -36.3)

In [None]:
plot_roc_curve(test_res, -45)

### <a id='4'>4. The consequences of our model</a>

### Each model deployed in production or considered for deployment has its consequences. Rarely spoken in ML literature in these terms. As we, Data Scientists, tend to talk in True Positive, True Negatives, False Positives, False Negatives terms. We need to bring these terms to business in their jargon, especially to be able to explain the impact and put Dollar figure on top of the results. In my experience the simpler these terms are the more successful you will be with productionizing the model, that's what we aim for, right?
### So, here we will focus a bit on how business see this problem and what we Data Scientist might do to maximise chance of success

In [None]:
test_res.y_transformed

### Here we'll look at confusion matrices, where we can see how change in the threshold impacts the results. Having the results in this form we have one foot in the door. But still we need more explanation.

### So, here is a little bit of industry context. Most of the companies these days have models that have customer retention as a main goal. These models are run by Marketing teams or CVM teams (Customer Value Management) or perhaps some other function. As Data Scientist you need to gain their trust. They won't buy in to Blackbox solution right away i.e. Deep Learning model, they will have to understand what's going on, under the hood. This we can achieve in multiple ways, some examples:
> Feature importance that make sense to the business, tree-based models are good for that, but sometimes results are funny

> Confusion matrix explanation and business impact of the model

### <a id='4.1'>4.1 Feature Importance</a> 

In [None]:
lgbm.plot_importance(gbm, figsize=(10,12), max_num_features=15,importance_type='split' )

In [None]:
lgbm.plot_importance(gbm, figsize=(10,12), max_num_features=15,importance_type='gain' )

### From above feature importance plots, we see that almost all most important features are engineered.

### <a id='4.2'>4.2 Unconfusing confusion matrices</a>

In [None]:
def plot_conf_mat(cm):
    """
    Helper function to plot confusion matrix.
    With text centerred.
    """
    plt.figure(figsize=(8,8))
    ax = sns.heatmap(cm, annot=True,fmt="d",annot_kws={"size": 16})
    bottom, top = ax.get_ylim()
    ax.set_ylim(bottom + 0.5, top - 0.5)

### Here we visualise matrices. 
> First from auc view point

> Second what I call business view point

In [None]:
# AUC
cm_auc = confusion_matrix(test_res.y_true, np.where(test_res.y_transformed < -36.3, 1, 0), labels=[0, 1])

In [None]:
plot_conf_mat(cm_auc)

In [None]:
# Business
cm_bus = confusion_matrix(test_res.y_true, np.where(test_res.y_transformed < -43, 1, 0), labels=[0, 1])

In [None]:
plot_conf_mat(cm_bus)

In [None]:
# Our testing population
test_res.y_true.value_counts()

### <a id='4.3'>4.3 Campaigning the model</a>

### So now, let's see how the model would be campaigned, and what would be an impact of the model

In [None]:
test_res.head()

In [None]:
# first threshold to explore is AUC oriented -36.3
auc_based = test_res[test_res.y_transformed <= -36.3]
auc_based.y_true.value_counts()

In [None]:
auc_based.y_true.value_counts()/auc_based.shape[0]

### So, our model would target over 5k customers in a campaign 60% of which wouldn't churn, 40% with higher churn propensity. Here the model would offer incentives to 3k customers that don't need them and 2k that might need them. The question here is, how many of 2k customers we would retain and how many would churn anyway?

### Unfortunately, there's no ideal solution we'd have to run campaign and study the results with A/B tests

In [None]:
# second threshold to explore would be Business oriented -43
bus_based = test_res[test_res.y_transformed <= -43]
bus_based.y_true.value_counts()

In [None]:
bus_based.y_true.value_counts()/bus_based.shape[0]

### <a id='4.4'>4.4 Customer Lifetime Value</a>

### We know now that business-oriented campaign will be smaller targeting 3.5k customers of which 55% wouldn't churn and 45% with higher churn propensity.

### One of the ways to look at this problem from business point of view is consider Customer Lifetime Value (CLV). 

### Where CLV = Avg. revenue per period x [ Retention Rate / (1 - Retention Rate) ] as per http://www.customer-lifetimevalue.com/

### Here we will make the following assumptions. period 12 months, retention rate 0.6

In [None]:
df.MonthlyRevenue.describe()

In [None]:
# average per month and retention rate
avg_pm = 58.5
ret_rate = .6

In [None]:
def clv(avg_pm, ret_rate):
    """
    Example calculation of CLV per year
    with assumed retention rate of customers.
    """
    clv = 12 * avg_pm / (ret_rate/(1-ret_rate))
    return clv

In [None]:
cust_lv = clv(avg_pm, ret_rate)

In [None]:
cust_lv

### If we assume that retention campaign would offer and extra 10% discount per month, we can see the impact right away for both models auc and business one. 
### To view that let's assume that campaign retains 50% of customers with high churn propensity

In [None]:
# so we will have 4000 customers with auc model
# and 2650 from business model
# retention rate would be the same
new_avg = 58.5 * .9

In [None]:
new_avg

In [None]:
campaign_clv = clv(new_avg, ret_rate)

In [None]:
campaign_clv

In [None]:
auc_campaing = 4000 * campaign_clv
bus_campaign = 2650 * campaign_clv

In [None]:
auc_campaing, bus_campaign

### AUC campaign discounts 3k customers at 10%, BUS campaign discounts 1.9k customers at 10%. So, during the next retention cycle, customers would expect another discount for loyalty. In my opinion is better to have campaigns optimized for precision

In [None]:
pr = precision_score(test_res.y_true, np.where(test_res.y_transformed < -43, 1, 0))
pr

In [None]:
acc = accuracy_score(test_res.y_true, np.where(test_res.y_transformed < -43, 1, 0))
acc

In [None]:
precision_score(test_res.y_true, np.where(test_res.y_transformed < -36.3, 1, 0))

In [None]:
accuracy_score(test_res.y_true, np.where(test_res.y_transformed < -36.3, 1, 0))

In [None]:
f1_score(test_res.y_true, np.where(test_res.y_transformed < -43, 1, 0))

In [None]:
f1_score(test_res.y_true, np.where(test_res.y_transformed < -36.3, 1, 0))

### <a id='5'>5. Conclusions:</a>

### In this example we looked at the churn problem with somewhat in-depth analysis. Moreover, this example should show you the journey that most of the models built in industry are going through. Before you educate business to the terms that Data Scientists are using, you might want to simplify your solution to one that you can fully explain. Very few business units would sing-up for solution they don’t trust and understand. Building trustworthy solution is rather lengthy process, so gear-up some patience and keep going it will work eventually.
### I would like to stress, that this is not a full solution. I got inspired by this excellent example of analysis, visualisation and multiple models in this kernel https://www.kaggle.com/pavanraj159/telecom-customer-churn-prediction. 

## Next steps:
### Build multiple models to improve precision or accuracy, this you can always get from the business, they know their costs, once you explain the upside and downside of each solution. Setup A/B test to see how well campaigns are doing against treatment and control groups. Consider lift modelling to optimise specific KPI or business target. You also could like my github repo 😊 https://github.com/jackall/churn_analysis, and upvote this kernel 😊 if you found it useful.