## <center>  Decision trees with a toy task and the Telecom dataset 


We would be performing Decision Tree Algorithms on 1.Built in Dataset  2. Telecom Churn Dataset 2.1 without Hyperparameter Tuning 2.2 Hyperparameter Tuning with GridSearch Stratified K-Fold Cross Validation 2.3 Without GridSearch Stratified K-Fold Cross Validation.

Loading all necessary libraries.

In [None]:
pip install pydotplus

In [None]:
%matplotlib inline
from matplotlib import pyplot as plt
plt.rcParams['figure.figsize'] = (10, 8)
import seaborn as sns
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
import collections
from sklearn.model_selection import GridSearchCV, StratifiedKFold
from sklearn import preprocessing
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from ipywidgets import Image
from io import StringIO
import pydotplus 
from sklearn.model_selection import train_test_split, cross_val_score
from tqdm import tqdm_notebook

### Part 1. Toy dataset "Will They? Won't They?"

Your goal is to figure out how decision trees work by walking through a toy problem. While a single decision tree does not yield outstanding results, other performant algorithms like gradient boosting and random forests are based on the same idea. That is why knowing how decision trees work might be useful.
We'll go through a toy example of binary classification - Person A is deciding whether they will go on a second date with Person B. It will depend on their looks, eloquence, alcohol consumption (only for example), and how much money was spent on the first date.


Creating the dataset


In [None]:
#Creating dataframe with dummy variables
def df(dic,features):
    out=pd.DataFrame(dic)
    out=pd.concat([out,pd.get_dummies(out[features])],axis=1)
    out.drop(features,axis=1,inplace=True)
    return out

#Intersecting features in train and train as absent in each.
def int_features(train,test):
    com_feat=list(set(train.keys()) & set(test.keys())) 
    return train[com_feat],test[com_feat]

In [None]:
feat=['Looks', 'Alcoholic_beverage','Eloquence','Money_spent']

Train data

In [None]:
df_train = {}
df_train['Looks'] = ['handsome', 'handsome', 'handsome', 'repulsive',
                         'repulsive', 'repulsive', 'handsome'] 
df_train['Alcoholic_beverage'] = ['yes', 'yes', 'no', 'no', 'yes', 'yes', 'yes']
df_train['Eloquence'] = ['high', 'low', 'average', 'average', 'low',
                                   'high', 'average']
df_train['Money_spent'] = ['lots', 'little', 'lots', 'little', 'lots',
                                  'lots', 'lots']
df_train['Will_go'] = LabelEncoder().fit_transform(['+', '-', '+', '-', '-', '+', '+'])

df_train = df(df_train, feat)
df_train

Test Data

In [None]:
df_test = {}
df_test['Looks'] = ['handsome', 'handsome', 'repulsive'] 
df_test['Alcoholic_beverage'] = ['no', 'yes', 'yes']
df_test['Eloquence'] = ['average', 'high', 'average']
df_test['Money_spent'] = ['lots', 'little', 'lots']
df_test = df(df_test, feat)
df_test

In [None]:
#features present in train but not in test need to be taken care of
y=df_train['Will_go']
df_train.pop('Will_go')
df_train

What is the entropy  S0  of the initial system? By system states, we mean values of the binary feature "Will_go" - 0 or 1 - two states in total.

S0 = -3/4*log2(3/4) - 1/4*log(1/4) =0.311+0.5=0.811

**Train decision tree using sklearn classifier**

In [None]:
tree=DecisionTreeClassifier(criterion='entropy', random_state=12) 
tree.fit(df_train, y) 

### The "Telecom" dataset

Each row represents a customer; each column contains customer’s attributes. The datasets have the following attributes or features:
State: string
Account length: integer
Area code: integer
International plan: string
Voice mail plan: string
Number vmail messages: integer
Total day minutes: double
Total day calls: integer
Total day charge: double
Total eve minutes: double
Total eve calls: integer
Total eve charge: double
Total night minutes: double
Total night calls: integer
Total night charge: double
Total intl minutes: double
Total intl calls: integer
Total intl charge: double
Customer service calls: integer
Churn: string


The dataset contains 667 rows (customers) and 20 columns (features).

The "Churn" column is the target to predict.

Reading the dataset

In [None]:
data = pd.read_csv('../input/edadata/telecom_churn.csv')

In [None]:
data.head()

**Slight preprocessing**

Labelling for the International plan & Voice mail Plan.

In [None]:
data['International plan']= data['International plan'].map({'Yes':1,'No':0})

data['Voice mail plan']= data['Voice mail plan'].map({'Yes':1,'No':0})

Convet churn variable into 0s & 1s

In [None]:
data['Churn']=data['Churn'].astype('int')

Primary Data Analysis of the data.

In [None]:
data.info()

Check for missing values in training set.

In [None]:
data.isna().sum()

Removing target variable &  saving State as a series.

In [None]:
state=data.pop('State')

**Split the dataframe into x matrix and y target vector.**

In [None]:
x,y = data.drop(['Churn'], axis=1),data['Churn']
    

In [None]:
x.shape,y.shape

The matrix and the vector have the same number of instances with all the columns in the matrix.


Split x & y into train and test data.

In [None]:
x_train,x_valid,y_train,y_valid=train_test_split(x,y,test_size=0.3,random_state=19)
x_train.shape,x_valid.shape,y_train.shape,y_valid.shape

Building the classifier.

In [None]:
tk=DecisionTreeClassifier(random_state=19)

Fit the train data into the classifier

In [None]:
tk.fit(x_train,y_train)

Model learning on validation set for prediction.


In [None]:
pre_valid=tk.predict(x_valid)

In [None]:
pre_valid.shape,y_valid.shape

In [None]:
accuracy_score(pre_valid,y_valid)

In [None]:
y.value_counts(normalize=True)



Churn depicting 1 says the percentage of clients about to churn out. 14.4 % is much bad as compared to 8.7 % which is determined by Decision Tree Classifier.

HYPERPARAMETER TUNING using GridSearch Cross validation

Set the combination of parameters for grid creation.

In [None]:
param = {'max_depth': np.arange(2,11),'min_samples_leaf': np.arange(1,11)}

For CV, we require number of splits and shuffling after each also known as K Fold Stratified CV.

In [None]:
Kfold= StratifiedKFold(n_splits=5,shuffle=True,random_state=19)

Creation of Grid

In [None]:
best = GridSearchCV(estimator=tk,param_grid=param,cv=Kfold,n_jobs=-1,verbose=1)

Model fit with training.

In [None]:
best.fit(x_train,y_train)

81 candidate refer to 9 sets of permutations for max_depth with 10 sets of min_samples_leaf

Cross Validation Assesment on model quality.

Our best set of parameters.

In [None]:
best.best_params_

Our best estimator.

In [None]:
best.best_estimator_

Our CV score for 'best' tree.

In [None]:
best.best_score_

**Validation Assesment**

Time to check accuracy of the model on validation set.

In [None]:
pred_val=best.predict(x_valid)

In [None]:
accuracy_score(pred_val,y_valid)

**So now our accuracy has increased from 91.3% to 94.2 %. This is always a better result.

# Tree visuals

In [None]:
dot_data=StringIO()
export_graphviz(decision_tree=best.best_estimator_,out_file=dot_data,filled=True,feature_names=data.drop(['Churn'], axis=1).columns)
graph=pydotplus.graph_from_dot_data(dot_data.getvalue())
Image(value=graph.create_png())

The tree we see here is problematic for the naked eye, let us reduce the max_depth to 3 just for visualization purpose.

In [None]:
tk9=DecisionTreeClassifier(random_state=19,max_depth=3).fit(x_train,y_train)

In [None]:
dot2_data=StringIO()
export_graphviz(decision_tree=tk9,out_file=dot2_data,filled=True,feature_names=data.drop(['Churn'], axis=1).columns)
graph=pydotplus.graph_from_dot_data(dot2_data.getvalue())
Image(value=graph.create_png())

So, orange boxes resemble optimism ,i.e. there is scope of retention of more clients and vice versa for blue. Above figure is the threshold for split. gini again like entropy is worse if tends towards 1. Out of 2333 clients 1984 are loyal and rest would churn out. And accordingly we go down the tree depth via thresholds and splits.

HYPERPARAMETER TUNING using hands on CV without any GRID SEARCH.

In [None]:
Kfold2= StratifiedKFold(n_splits=5,shuffle=True,random_state=19)

In [None]:
#Lists of CV accuracies and Validation accuracies

acc_depth , valid_acc=[],[]
max_depth_val= np.arange(2,10)
#for each value of max_depth
for new_max_depth in tqdm_notebook(max_depth_val):
    new=DecisionTreeClassifier(random_state=19,max_depth=new_max_depth)
    
    #performing cross validation
    val=cross_val_score(estimator=tk,X=x_train,y=y_train,cv=Kfold2)
    
    #Appending all CV scores after each split to get their mean.
    acc_depth.append(val.mean())
    
    #Asses the model on validation set
    new.fit(x_train,y_train)
    val2=new.predict(x_valid)
    valid_acc.append(accuracy_score(val2,y_valid))

In [None]:
acc_depth,valid_acc

For sets of 8 max_depth parameter we observe the respective accuracy scores of cross validation and validation sets.