**This notebook aimed to have an overview on the classification models preformance with the help from [lazypredict](https://github.com/shankarpandala/lazypredict).**




**Import libraries**


In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style('ticks')
plt.rcParams["figure.figsize"] = (10,8)

import warnings
warnings.filterwarnings("ignore")

%matplotlib inline

**Load data file**


In [None]:
df = pd.read_csv("./data/telecom-churn_dummies.csv")

In [None]:
df.drop(labels = "Unnamed: 0", axis=1, inplace=True)

**Data Scaling**


In [None]:
from sklearn.preprocessing import MinMaxScaler

In [None]:
data = df.copy()

In [None]:
MMs = MinMaxScaler()
data["MonthlyCharges"] = MMs.fit_transform(data[["MonthlyCharges"]])

**Upsampling and Split train and test data**


In [None]:
from sklearn.model_selection import train_test_split

In [None]:
X = data.drop(['Churn'], axis=1)
y = data["Churn"]

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state=42, stratify=data["Churn"])

In [None]:
import six
import sys
sys.modules['sklearn.externals.six'] = six

In [None]:
from imblearn.combine import SMOTEENN
sm = SMOTEENN()
X_train_resampled, y_train_resampled = sm.fit_sample(X_train, y_train)

In [None]:
# data_no = data[data.Churn == 0]
# data_yes = data[data.Churn == 1]

In [None]:
# data_yes_upsampled = data_yes.sample(n=len(data_no), replace=True, random_state=42)

In [None]:
# data_upsampled = data_no.append(data_yes_upsampled).reset_index(drop=True)
# sns.countplot('Churn', data=data_upsampled).set_title('Class Distribution After Resampling')

In [None]:
# X = data_upsampled.drop(['Churn'], axis=1)
# y = data_upsampled["Churn"]

**Imbalance Dataset**

**Lazypredict**


In [None]:
!pip install lazypredict

Collecting lazypredict
  Downloading lazypredict-0.2.9-py2.py3-none-any.whl (12 kB)
Collecting xgboost==1.1.1
  Downloading xgboost-1.1.1-py3-none-manylinux2010_x86_64.whl (127.6 MB)
[K     |████████████████████████████████| 127.6 MB 18 kB/s 
Collecting PyYAML==5.3.1
  Downloading PyYAML-5.3.1.tar.gz (269 kB)
[K     |████████████████████████████████| 269 kB 42.7 MB/s 
[?25hCollecting scikit-learn==0.23.1
  Downloading scikit_learn-0.23.1-cp37-cp37m-manylinux1_x86_64.whl (6.8 MB)
[K     |████████████████████████████████| 6.8 MB 35.4 MB/s 
[?25hCollecting tqdm==4.56.0
  Downloading tqdm-4.56.0-py2.py3-none-any.whl (72 kB)
[K     |████████████████████████████████| 72 kB 863 kB/s 
[?25hCollecting numpy==1.19.1
  Downloading numpy-1.19.1-cp37-cp37m-manylinux2010_x86_64.whl (14.5 MB)
[K     |████████████████████████████████| 14.5 MB 7.5 kB/s 
[?25hCollecting scipy==1.5.4
  Downloading scipy-1.5.4-cp37-cp37m-manylinux1_x86_64.whl (25.9 MB)
[K     |████████████████████████████████| 2

In [None]:
import lazypredict
from lazypredict.Supervised import LazyClassifier

In [None]:
clf = LazyClassifier(verbose=0,ignore_warnings=True, custom_metric=None)
models,predictions = clf.fit(X_train_resampled, X_test, y_train_resampled, y_test)

100%|██████████| 29/29 [00:13<00:00,  2.19it/s]


In [None]:
models

Unnamed: 0_level_0,Accuracy,Balanced Accuracy,ROC AUC,F1 Score,Time Taken
Model,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
AdaBoostClassifier,0.72,0.75,0.75,0.74,0.56
RandomForestClassifier,0.74,0.75,0.75,0.75,0.77
LogisticRegression,0.7,0.75,0.75,0.72,0.1
BaggingClassifier,0.75,0.75,0.75,0.76,0.35
CalibratedClassifierCV,0.7,0.74,0.74,0.72,2.0
ExtraTreesClassifier,0.74,0.74,0.74,0.75,0.59
SVC,0.73,0.74,0.74,0.74,0.73
LinearSVC,0.7,0.74,0.74,0.72,0.53
LGBMClassifier,0.73,0.74,0.74,0.75,0.86
LinearDiscriminantAnalysis,0.69,0.74,0.74,0.71,0.1


From the above result table, we can observe that **AdaBoostClassifier, RandomForestClassifier** are the best 2 classifier models.


In [None]:
RandomForestClassifier(bootstrap=False, ccp_alpha=0.0, class_weight=None,
                       criterion='gini', max_depth=20, max_features=5,
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=5, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=150,
                       n_jobs=None, oob_score=False, random_state=100,
                       verbose=0, warm_start=False)