# The Super Learner Classifier

## Import Packages Etc

In [1]:
from IPython.display import display, HTML, Image

import sys
from time import time
from sklearn.base import BaseEstimator, ClassifierMixin
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.utils.validation import check_X_y, check_array, check_is_fitted
from sklearn.utils.multiclass import unique_labels
from sklearn.utils import resample
from sklearn.metrics import euclidean_distances
from sklearn.metrics import mean_squared_error as MSE
from sklearn.model_selection import cross_val_predict
from sklearn import metrics
from sklearn.model_selection import GridSearchCV
from sklearn.model_selection import KFold
from sklearn.metrics import roc_auc_score
from sklearn.model_selection import StratifiedKFold

from sklearn.model_selection import train_test_split
from sklearn import tree
from sklearn import clone
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.model_selection import ParameterGrid
from sklearn.naive_bayes import MultinomialNB
from sklearn.ensemble import AdaBoostClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn import neural_network
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.linear_model import LinearRegression, LogisticRegression, Ridge, Lasso
from sklearn.ensemble import RandomForestRegressor, GradientBoostingClassifier, ExtraTreesRegressor, GradientBoostingRegressor, AdaBoostRegressor, RandomForestClassifier, BaggingClassifier

# Add more packages as required

%matplotlib inline
#%qtconsole

## Define Super Learner Classifier

The *Super Learner* is a heterogeneous stacked ensemble classifier. This is a classification model that uses a set of base classifiers of different types, the outputs of which are then combined in another classifier at the stacked layer. The Super Learner was described in [(van der Laan et al, 2007)](https://pdfs.semanticscholar.org/19e9/c732082706f39d2ba12845851309714db135.pdf) but the stacked ensemble idea has been around for a long time. 

Figure 1 shows a flow diagram of the Super Learner process (this is from (van der Laan et al, 2007). The base classifiers are trained and their outputs are combined along with the training dataset labels into a training set for the stack layer classifier. To avoid overfitting the generation of the stacked layer training set uses a k-fold cross validation process (described as V-fold in Figure 1). To further add variety to the base estimators a bootstrapping selection.
 
![Super Learner Process Flow](SuperLearnerProcessFlow.png "Logo Title Text 1")
Figure 1: A flow diagram for the Super Learner


### SuperLearnerClassifier Class

In [2]:
# Create a new classifier which is based on the sckit-learn BaseEstimator and ClassifierMixin classes
class SuperLearnerClassifier(BaseEstimator, ClassifierMixin):
    
    """An ensemble classifier that uses heterogeneous models at the base layer and a aggregatnio model at the aggregation layer. A k-fold cross validation is used to gnerate training data for the stack layer model.
    
    
    Parameters
    ----------
    base_clfs: object, optional (default="Decision Tree", "Logistic Regression", "Multi Layer Perceptron", "SVC","Gradient Boosting", "Extra Trees")
        The base estimators from which the stack layer is built.
        Avilable base estimators are: "Decision Tree", "Logistic Regression","K-Nearest-Neighbour", "SVC", "Random Forest",
                                      'Multi Layer Perceptron', 'Ada Boost', 'Bagging Classifer', 'Naive Bayes', "Extra Trees", 'Gradient Boosting'
        Input is in the form of a dict such as: {"Decision Tree: dt(object)}

    stack_clf: object, optional (default="Decision Tree")
        The stack estimator which will train on the stack layer.
        Avilable base estimators are: "Decision Tree", "Logistic Regression","K-Nearest-Neighbour", "SVC", "Random Forest",
                                    'Multi Layer Perceptron', 'Ada Boost', 'Bagging Classifer', 'Naive Bayes', "Extra Trees", 'Gradient Boosting'
        Input is in the form of a dict such as: {"Decision Tree: dt(object)}
        
    cv: Int, number of cross validation iterations the base estimators will run to build the stack layer
        Default: 10
        
    base_proba_layer: Boolean,if TRUE a probability based stack layer training set is used instead of a normal labeled based stack layer training set
        Default: False
    
    use_training_data: Boolen, if True original descriptive features are added to the stack layer. If FALSE, just the predictions from the base learners as used as the stack layer
        Default: False
    
    bootstrap_full_training: Boolean, Bootstrapping with replacemnet is applied to the full orignal training set to train the base learners after the stack layer has been constructed.
        Default: False
        
    bootstrap_cv_stage: Boolean, Bootstrapping with replacemnet is applied to the cross validation data that is used to build the stack layer
        Default: True
    
    verbose: Int, Controls the verbosity: the higher, the more messages.
    
    Attributes
    ----------
    stacked_layer_training_set_: Returns a numpy array which contains the stack layer from the fit phase

    _X: The predict() stacklayer as a numpy array





    Notes
    -----
    predictive_power(): Returns a Dataframe of the base estimators accuracy and F1 scores 
    base_learner_correlation(): Returns a correlation heatmap of the base estimators predcitions 

    See also
    --------
    
    ----------
    .. [1]  van der Laan, M., Polley, E. & Hubbard, A. (2007). 
            Super Learner. Statistical Applications in Genetics 
            and Molecular Biology, 6(1) 
            doi:10.2202/1544-6115.1309
    Examples
    --------
    >>> from sklearn.datasets import load_iris
    >>> from sklearn.model_selection import cross_val_score
    >>> clf = SuperLearnerClassifier()
    >>> iris = load_iris()
    >>> cross_val_score(clf, iris.data, iris.target, cv=10)
    """
    # Constructor for the classifier object
    def __init__(self, base_clfs= None, stacked_clf= None, cv=10, base_proba_layer = False,\
                 use_training_data = False, bootstrap_cv_stage = True, bootstrap_full_training = True ,verbose = 0):
        
    
        
        # If there is no argument for base_clfs the below estimators will be created 
        if base_clfs:    
            self.base_clfs = base_clfs
        else:
            dt = tree.DecisionTreeClassifier(criterion= 'entropy',max_depth= 47, min_samples_split= 50)
            logreg = LogisticRegression( C= 0.2,max_iter= 100,multi_class= 'ovr', solver= 'newton-cg')
            mlp = neural_network.MLPClassifier(alpha= 0.01, hidden_layer_sizes= 400)
            etc = ExtraTreesClassifier(criterion = 'entropy',max_depth = 22, n_estimators= 150)
            gbk = GradientBoostingClassifier(learning_rate= 0.3, n_estimators= 200)
            svc = SVC(C= 1, gamma= 1, kernel= 'linear')
           

            # Dict of base etimators containing their names and the object name 
            self.base_clfs = {"Decision Tree": dt, "Logistic Regression": logreg, "Multi Layer Perceptron": mlp, "SVC": svc,\
                              "Gradient Boosting": gbk, "Extra Trees": etc}
    
        # If there is no argument for stack_clf the below estimators will be created 
        if stacked_clf:
            self.stacked_clf = stacked_clf
        else:
            dt_stack = tree.DecisionTreeClassifier(criterion= 'entropy',max_depth= 47, min_samples_split= 50)
            # Dict of stack etimator containing their names and the object name 
            self.stacked_clf = {"Decision Tree": dt_stack}
        
    
        self.cv = cv
        self.base_proba_layer = base_proba_layer
        self.use_training_data = use_training_data
        self.verbose = verbose
        self.bootstrap_cv_stage = bootstrap_cv_stage
        self.bootstrap_full_training = bootstrap_full_training
        
    # The fit function to train a classifier
    def fit(self, X, y):
        """Build a SuperLearner classifier from the training set (X, y).
        Parameters
        ----------
        X : array-like, shape = [n_samples, n_features]
            The training input samples. 
        y : array-like, shape = [n_samples] 
            The target values (class labels) as integers or strings.
        Returns
        -------
        self : object
        """     
        
        
        
        # Check that X and y have correct shape
        X, y = check_X_y(X, y)
    
        
        self._X = X
        self._y = y
        y_temp_ = []


        # Gridsearch which was orginally developed inside the this Class but I decided it was cleaner to use externally 

#        self.model_tuned_params_list = dict()
        
#         if self.grid_search:
#             print( "Performing Grid Search")
#             for clf_name, clf in self.base_clfs.items():
#                     print("Running GridSearchCV for %s." %clf_name)
#                     param_grid = self.grid_params[0][clf_name]
#                     print(param_grid)
#                     clf = GridSearchCV(clf, param_grid, cv=5, verbose = verbose, return_train_score=True, scoring=scoring, refit=refit,n_jobs=self.n_jobs)
#                     clf.fit(X,y)
#                     display(clf.best_params_)
#                     self.model_tuned_params_list[clf_name] = clf.best_params_
                    
        
        # Iterate through the dict of base estimators 
        for clf_name, clf in self.base_clfs.items():
            # if verbose is greater then 0 details will be printed 
            if self.verbose > 0:
                print("Generating CV predictions for base model: %s" %clf_name)
            # To ensure that a gridsearch can be perfromed with SVC as the parameter porbability must be set to True, to use predict_proba
            if clf_name == "SVC" and self.base_proba_layer == True:
                clf.set_params(**{'probability': True})
            if clf_name == "SVC" and self.base_proba_layer == False:
                clf.set_params(**{'probability': False})

            # CV and Bootstrap stack building layer    
            if self.bootstrap_cv_stage:
                kf = KFold(n_splits=self.cv)
                # list to store all predictions of a base learner 
                cv_predict_ = []
                for train, test in kf.split(X, y):
                    cloned_model = clone(clf)
                    X_test_fold_ = X[test]
                    y_test_fold_ = y[test]
                    # Bootstrapping is perfomed using the resample function 
                    X_train_fold_, y_train_fold_ = resample(X[train], y[train], replace=True, random_state=None)
#                     X_train_fold_ = X[train]
#                     y_train_fold_ = y[train]
                    # Cloned estimator is used each iteration 
                    cloned_model.fit(X_train_fold_, y_train_fold_)
                    # All predcitons are added to the cv_predict list
                    if self.base_proba_layer:
                        y_predict = cloned_model.predict_proba(X_test_fold_)
                        cv_predict_.append(y_predict)
                    else:
                        y_predict = cloned_model.predict(X_test_fold_)
                        cv_predict_.append(y_predict)

                # Cv_predict is stacked to create a coloum in a np array        
                if self.base_proba_layer:
                    y_predict = np.vstack(cv_predict_)
                else:
                #Cv_predict is stacked to create a coloum in a np array        
                    y_predict = np.hstack(cv_predict_)
                    y_predict = np.reshape(y_predict,(len(y_predict),1))

            # If boot strapping is not required then just Cross Validation is performed 
            else:
                cloned_model = clone(clf)
                y_predict = cross_val_predict(cloned_model,X,y)
                y_predict = np.reshape(y_predict,(len(y_predict),1))
            if self.verbose > 0:   
                print("Training model on the whole training set")

            # Predictions from a base learner is appended to a list    
            y_temp_.append(y_predict)
            # Apply 1 level of bootstrapping to the orignal traning set
            if self.bootstrap_full_training:
                #apply bootstrapping to the whole training set
                X_bootstrapped, y_bootstraped = resample(X, y, random_state = 0, replace = True)
                clf.fit(X_bootstrapped, y_bootstraped)
            else:
                # fit the base learner 
                clf.fit(X, y)
            if self.verbose > 0:
                print("----------------------------------------------------------------------------")
                
        
        #horizontally build the array
        if self.use_training_data == False:
            self._X = np.hstack(y_temp_)

        else:
            X_temp_ = np.hstack(y_temp_)
            self._X = np.hstack((X, X_temp_))

        #Grid search to find best paramters for the stack layer estimator            
#         if self.grid_search:
#             print( "Performing Grid Search on stacking classifer")
#             for clf_name, clf in self.stacked_clf.items():
#                     print("Running GridSearchCV for %s." %clf_name)
#                     param_grid = self.grid_params[1][clf_name]
#                     clf = GridSearchCV(clf, param_grid, cv=5, verbose = verbose, return_train_score=True, scoring=scoring, refit=refit)
#                     clf.fit(X,y)
#                     display(clf.best_params_)
                    
        # Attribute of the stack layer 
        self.stacked_layer_training_set_ = self._X
        
        
        ######Stacking Layer###########
        # fit the stacked layer
        for clf_name, clf in self.stacked_clf.items():
            if self.verbose > 0:
                print("****Building stacked model: %s****" %clf_name)
            clf.fit(self._X,self._y)
            
        
        # Return the classifier
        return self

    # The predict function to make a set of predictions for a set of query instances
    def predict(self, X):
        """Predict class labels of the input samples X.
        Parameters
        ----------
        X : array-like matrix of shape = [n_samples, n_features]
            The input samples. 
        Returns
        -------
        p : array of shape = [n_samples, ].
            The predicted class labels of the input samples. 
        """
        if self.verbose > 0:
            print("*************************************************************************************************************")
        
        # Check that the input features match the type and shape of the training features
        X = check_array(X)
        # holds the predictions 
        y_temp_ = []
        # Iterate through the dict of base_learners
        for clf_name, clf in self.base_clfs.items():
                    if self.verbose > 0:
                        print("Predicting output for base model: %s" %clf_name)
                        
                    if self.base_proba_layer == False:
                        # predict the outcome and give the numpy array a shape of (len(x), 1)
                        y_predict = clf.predict(X)
                        y_predict = np.reshape(y_predict,(len(y_predict),1))
                    else:
                        #predict the probabilty outcome
                        y_predict = clf.predict_proba(X)
                    y_temp_.append(y_predict)
                    
        
        #horizontally build the array
        if self.use_training_data == False:
            self._X = np.hstack(y_temp_)
#             print(self.__X.shape)
        else:
            X_temp_ = np.hstack(y_temp_)
            self._X = np.hstack((X, X_temp_))
#             print(self.__X.shape)
        
        # holds the final perdiction
        y_final = []
        for clf_name, clf in self.stacked_clf.items():
            if self.verbose > 0:
                print("****Predicting stacked model: %s****" %clf_name)
            #predcit the stack outcome
            y_final = clf.predict(self._X)
            
        
        
        
        return np.array(y_final)
    
    # The predict function to make a set of predictions for a set of query instances
    def predict_proba(self, X):
        """Predict class probabilities of the input samples X.
        Parameters
        ----------
        X : array-like matrix of shape = [n_samples, n_features]
            The input samples. 
        Returns
        -------
        p : array of shape = [n_samples, n_labels].
            The predicted class label probabilities of the input samples. 
        """
        # Check that the input features match the type and shape of the training features
        X = check_array(X)
        
        y_temp_ = []

        for clf_name, clf in self.base_clfs.items():
                    print("Predicting output for base model: %s" %clf_name)
                    if self.base_proba_layer == False:
                        y_predict = clf.predict(X)
                        y_predict = np.reshape(y_predict,(len(y_predict),1))
                    else:
                        y_predict = clf.predict_proba(X)
                    y_temp_.append(y_predict)
                    
        
         #horizontally build the array
        if self.use_training_data == False:
            self._X = np.hstack(y_temp_)
#             print(self.__X.shape)
        else:
            X_temp_ = np.hstack(y_temp_)
            self._X = np.hstack((X, X_temp_))
#             print(self.__X.shape)
        
        
        y_final = []
        for clf_name, clf in self.stacked_clf.items():
            if self.verbose > 0:
                print("****Predicting stacked model: %s****" %clf_name)
            y_final = clf.predict_proba(self.__X)
        
        return np.array(y_final)
    
    

    # Returns a correlation heatmap of the base estimators predcitions 
    def base_learner_correlation(self):
        base_learners_list = [key for key, value in self.base_clfs.items()]
        df = pd.DataFrame(data= self.stacked_layer_training_set_, columns=base_learners_list)
        corr = df.corr(method='pearson').style.format("{:.2}").background_gradient(cmap=plt.get_cmap('coolwarm'), axis=1)

        return corr
    
    # Returns a Dataframe of the base estimators accuracy and F1 scores 
    def predictive_power(self):
        # create list of the of base learners
        base_learners_list = [key for key, value in self.base_clfs.items()]
        # Create dataframe with predictions of the base learners 
        df = pd.DataFrame(data= self.stacked_layer_training_set_, columns=base_learners_list)
        base_leaners_accuracy_list = []
        f1_score_list = []
        for base_learner in base_learners_list:
            #get the predcition coloum for a specific base learner
            y_predict = df[base_learner]
            accuracy = metrics.accuracy_score(self._y, y_predict)
            f1_score_list.append(metrics.f1_score(self._y, y_predict, average='macro'))
            base_leaners_accuracy_list.append(accuracy)
        #Create Dataframe from both lists of F1 and accuracy score with headings and the index being the base learner names
        final_df =pd.DataFrame(list(zip(base_leaners_accuracy_list, f1_score_list)),index= base_learners_list,
              columns=['Accuracy','F1 Score'])
        
        return final_df
    
    
       
        

        

### Test the SuperLearnerClassifier

Performed a simple test using the SuperLearnClassifier on the Iris dataset

In [3]:
from sklearn.datasets import load_iris
clf = SuperLearnerClassifier()
iris = load_iris()
clf.fit(iris.data, iris.target)
cross_val_score(clf, iris.data, iris.target, cv=5)





array([ 0.96666667,  1.        ,  0.96666667,  0.96666667,  1.        ])

## Load & Partition Data

### Setup - IMPORTANT

Take only a sample of the dataset for fast testing

In [34]:
# Orignally tested with 10% but runs were taking much too long 
data_sampling_rate = 0.02

Setup the number of folds for all grid searches

In [28]:
# Chose 5 to reduce computational time 
cv_folds = 5

In [6]:
# Dicts to hold accuracies and params
model_test_accuracy_comparisons = {}
model_valid_accuracy_comparisons = {}
model_tuned_params_list = {}

### Load Dataset

Load the dataset and explore it.

In [35]:
dataset = pd.read_csv('fashion-mnist_train.csv')
dataset = dataset.sample(frac=data_sampling_rate) #take a sample from the dataset so everyhting runs smoothly
num_classes = 10
classes = {0: "T-shirt/top", 1:"Trouser", 2: "Pullover", 3:"Dress", 4:"Coat", 5:"Sandal", 6:"Shirt", 7:"Sneaker", 8:"Bag", 9:"Ankle boot"}
display(dataset.head())

Unnamed: 0,label,pixel1,pixel2,pixel3,pixel4,pixel5,pixel6,pixel7,pixel8,pixel9,...,pixel775,pixel776,pixel777,pixel778,pixel779,pixel780,pixel781,pixel782,pixel783,pixel784
40009,8,0,0,0,0,0,0,0,0,0,...,237,137,0,2,0,0,0,0,0,0
53438,5,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
54539,4,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
36562,4,0,0,0,0,0,0,0,0,0,...,1,0,70,202,187,97,0,0,0,0
3837,0,0,0,0,0,2,1,0,1,1,...,130,132,148,4,0,2,0,0,0,0


### Pre-process & Partition Data

Perform data pre-processing and manipulation as required

In [36]:
#Isolate the descriptive features we are interested in
X = dataset[dataset.columns[1:]]
y = np.array(dataset["label"])

#Normalise the data
X = X/255

#Divide into test/train+val sets
X_train_plus_valid, X_test, y_train_plus_valid, y_test = train_test_split(X, y, random_state=0, train_size = 0.7)

#Divide into train/validation sets
X_train, X_valid, y_train, y_valid = train_test_split(X_train_plus_valid, y_train_plus_valid, random_state=0, train_size = 0.5/0.7)

X_train.shape



(600, 784)

## Train and Evaluate a Simple Model

Train a Super Learner Classifier using the prepared dataset

In [9]:

clf = SuperLearnerClassifier()

clf.fit(X_train, y_train)

SuperLearnerClassifier(base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fr...timators=150, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)},
            base_proba_layer=False, bootstrap_cv_stage=True,
            bootstrap_full_training=True, cv=10,
            stacked_clf={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')},
            use_training_data=False, verbose=0)

Evaluate the trained classifier

We can see that the model is slighty overfitting from the evaluation the train data itself. It might be wise to reduce the influence the tree based, base classifers have or to possibly increase their min_samples_split to help reduce overfitting. 

5-fold cross valuation was applied. An accuracy (correctly predicted classes) score of 82.33% was observed for the validation set and an accuracy (correctly predicted classes) score of 81.55% was observed. From looking at the confusion matrix it seems that the classifer sttruggled to predict the classed 6 and 4 for both the Validation set and Test set. 

An average precision score of 82% was recorded for both the vaildation set and test set. This means that 82% of retrieved results for all classes were relevant. An average recall score of 82% was recorded for both the vaildation set and test set. This means that 82% of all relevant results were retrieved.

In [10]:

y_predict = clf.predict(X_train)

# Print performance details
accuracy = metrics.accuracy_score(y_train, y_predict) # , normalize=True, sample_weight=None
print("Accuracy: " +  str(accuracy))
print(metrics.classification_report(y_train, y_predict))

# Print confusion matrix
print("Confusion Matrix")
pd.crosstab(np.array(y_train), y_predict, rownames=['True'], colnames=['Predicted'], margins=True)

Accuracy: 0.936666666667
             precision    recall  f1-score   support

          0       0.88      0.93      0.90       137
          1       0.98      0.99      0.98       158
          2       0.90      0.86      0.88       145
          3       0.98      0.92      0.95       155
          4       0.82      0.91      0.87       145
          5       0.96      0.99      0.97       141
          6       0.91      0.82      0.86       136
          7       0.98      0.95      0.96       171
          8       0.98      0.98      0.98       154
          9       0.98      0.99      0.98       158

avg / total       0.94      0.94      0.94      1500

Confusion Matrix


Predicted,0,1,2,3,4,5,6,7,8,9,All
True,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0,128,1,0,1,2,0,3,0,2,0,137
1,0,157,0,1,0,0,0,0,0,0,158
2,2,0,125,0,16,0,2,0,0,0,145
3,3,2,0,143,5,0,2,0,0,0,155
4,0,1,7,1,132,0,4,0,0,0,145
5,0,0,0,0,0,139,0,2,0,0,141
6,12,0,7,0,5,0,111,0,1,0,136
7,0,0,0,0,0,5,0,162,0,4,171
8,1,0,0,0,0,1,0,1,151,0,154
9,0,0,0,0,0,0,0,1,0,157,158


Assess the performance on the validation dataset

In [11]:

y_predict = clf.predict(X_valid)

# Print performance details
accuracy = metrics.accuracy_score(y_valid, y_predict) # , normalize=True, sample_weight=None
print("Accuracy: " +  str(accuracy))
print(metrics.classification_report(y_valid, y_predict))

# Print confusion matrix
print("Confusion Matrix")
pd.crosstab(np.array(y_valid), y_predict, rownames=['True'], colnames=['Predicted'], margins=True)

Accuracy: 0.823333333333
             precision    recall  f1-score   support

          0       0.72      0.91      0.80        56
          1       0.91      0.98      0.95        64
          2       0.74      0.65      0.69        60
          3       0.94      0.76      0.84        58
          4       0.54      0.73      0.62        52
          5       0.93      0.95      0.94        59
          6       0.70      0.44      0.54        59
          7       0.88      0.94      0.91        68
          8       0.97      0.97      0.97        67
          9       0.91      0.84      0.87        57

avg / total       0.83      0.82      0.82       600

Confusion Matrix


Predicted,0,1,2,3,4,5,6,7,8,9,All
True,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0,51,0,0,0,1,0,3,0,1,0,56
1,0,63,0,0,0,0,1,0,0,0,64
2,4,0,39,1,16,0,0,0,0,0,60
3,2,4,2,44,5,0,1,0,0,0,58
4,0,1,5,2,38,0,6,0,0,0,52
5,0,0,0,0,0,56,0,2,0,1,59
6,14,1,7,0,10,0,26,0,1,0,59
7,0,0,0,0,0,1,0,64,0,3,68
8,0,0,0,0,0,1,0,0,65,1,67
9,0,0,0,0,0,2,0,7,0,48,57


Assess the performance on the test dataset

In [12]:

y_predict = clf.predict(X_test)

# Print performance details
accuracy = metrics.accuracy_score(y_test, y_predict) # , normalize=True, sample_weight=None
print("Accuracy: " +  str(accuracy))
print(metrics.classification_report(y_test, y_predict))

# Print confusion matrix
print("Confusion Matrix")
pd.crosstab(np.array(y_test), y_predict, rownames=['True'], colnames=['Predicted'], margins=True)

Accuracy: 0.815555555556
             precision    recall  f1-score   support

          0       0.72      0.82      0.77        85
          1       0.95      0.96      0.95        95
          2       0.66      0.67      0.67        79
          3       0.90      0.69      0.78        88
          4       0.65      0.76      0.70        97
          5       0.88      0.87      0.88        77
          6       0.56      0.46      0.51        84
          7       0.87      0.92      0.89        91
          8       0.94      0.96      0.95       106
          9       0.99      0.95      0.97        98

avg / total       0.82      0.82      0.81       900

Confusion Matrix


Predicted,0,1,2,3,4,5,6,7,8,9,All
True,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0,70,0,1,1,1,0,11,0,1,0,85
1,0,91,3,0,0,0,0,0,1,0,95
2,2,0,53,2,17,0,5,0,0,0,79
3,6,5,2,61,9,0,4,0,1,0,88
4,1,0,8,2,74,0,11,0,1,0,97
5,0,0,0,0,0,67,0,9,0,1,77
6,18,0,11,1,12,0,39,0,3,0,84
7,0,0,0,0,0,7,0,84,0,0,91
8,0,0,2,1,0,1,0,0,102,0,106
9,0,0,0,0,0,1,0,4,0,93,98


## Cross Validation Experiment

Perfrom a 5-fold cross validation experiment to evaluate the performance of the SuperLearnerClassifier

We can see that the overall mean performance has decreased from the tests above. 

In [13]:
#changed to 5 fold due to computaional expense
clf = SuperLearnerClassifier()
scores = cross_val_score(clf, X_train_plus_valid, y_train_plus_valid, cv=cv_folds)
print(scores)
print("Mean of the scores is:%f" %scores.mean())

[ 0.80614657  0.79146919  0.81666667  0.78229665  0.81294964]
Mean of the scores is:0.801906


#### Functions to initiate estimators

Two functions that make it easy to create a returned dictionary of base classifers. A list of the classifers names is the parameter. 

In [14]:
def create_base_learners(list_of_esitmators):
    
    base_clfs_dict = {}
    
    for key in list_of_esitmators:
        if key == "Decision Tree":
            dt = tree.DecisionTreeClassifier(criterion= 'entropy',max_depth= 47, min_samples_split= 50)
            base_clfs_dict.update({key: dt})
        elif key == "Logistic Regression":
            logreg = LogisticRegression( C= 0.2,max_iter= 100,multi_class= 'ovr', solver= 'newton-cg')
            base_clfs_dict.update({key: logreg})
        elif key == "K-Nearest-Neighbour":
            knn = KNeighborsClassifier(n_neighbors= 5, weights = "distance")
            base_clfs_dict.update({key: knn})
        elif key == "SVC":
            svc = SVC(C= 1, gamma= 1, kernel= 'linear')
            base_clfs_dict.update({key: svc})
        elif key == "Random Forest":
            rf = RandomForestClassifier( max_features= 8, min_samples_split= 50, n_estimators= 400, n_jobs= -1)
            base_clfs_dict.update({key: rf})
        elif key == "Gradient Boosting":    
            gbk = GradientBoostingClassifier(learning_rate= 0.3, n_estimators= 200)
            base_clfs_dict.update({key: gbk})
        elif key == "Naive Bayes":
            gnb = GaussianNB()
            base_clfs_dict.update({key: gnb})
        elif key == "Ada Boost":
            ada = AdaBoostClassifier(base_estimator = tree.DecisionTreeClassifier(criterion="entropy", min_samples_leaf = 200),\
                         n_estimators=100) 
            base_clfs_dict.update({key: ada})
        elif key == "Extra Trees":
            etc = ExtraTreesClassifier(criterion= 'entropy', max_depth= 22, n_estimators= 150)
            base_clfs_dict.update({key: etc})
        elif key == "Multi Layer Perceptron":
            mlp = neural_network.MLPClassifier(alpha= 0.01, hidden_layer_sizes= 400)
            base_clfs_dict.update({key: mlp})
        elif key == "Bagging Classifer":
            bagging_c = BaggingClassifier(base_estimator = tree.DecisionTreeClassifier(criterion="entropy", min_samples_leaf = 50),\
                              n_estimators=10, n_jobs= -1)
            base_clfs_dict.update({key: bagging_c})
        else:
            print("Please make sure you have selected an available estimator, %s is currently not an option" %key)

    return base_clfs_dict
   

def create_stacked_learner(list_of_stack_estimator):
    
    stacked_clf_dict = {}
    
    for key in list_of_stack_estimator:
        if key == "Decision Tree":
            dt_stack = tree.DecisionTreeClassifier(criterion= 'entropy',max_depth= 47, min_samples_split= 50)
            stacked_clf_dict.update({key: dt_stack})
        elif key == "Logistic Regression":
            logreg_stack = LogisticRegression(C= 0.2,max_iter= 100,multi_class= 'ovr', solver= 'newton-cg')
            stacked_clf_dict.update({key: logreg_stack})
        elif key == "K-Nearest-Neighbour":
            knn_stack = KNeighborsClassifier()
            stacked_clf_dict.update({key: knn_stack})
        elif key == "SVC":
            svc_stack = SVC()
            stacked_clf_dict.update({key: svc_stack})
        elif key == "Random Forest":
            rf_stack = RandomForestClassifier(n_estimators=10)
            stacked_clf_dict.update({key: rf_stack})
        elif key == "Gradient Boosting":    
            gbk_stack = GradientBoostingClassifier(learning_rate= 0.3, n_estimators= 200)
            stacked_clf_dict.update({key: gbk_stack})
        elif key == "Naive Bayes":
            gnb_stack = GaussianNB()
            stacked_clf_dict.update({key: gnb_stack})
        elif key == "Ada Boost":
            ada_stack = AdaBoostClassifier() 
            stacked_clf_dict.update({key: ada_stack})
        elif key == "Extra Trees":
            etc_stack = ExtraTreesClassifier()
            stacked_clf_dict.update({key: etc_stack})
        elif key == "Multi Layer Perceptron":
            mlp_stack = neural_network.MLPClassifier(hidden_layer_sizes=(300, 100))
            stacked_clf_dict.update({key: mlp_stack})
        else:
            print("Please make you have selected an available stacked estimator, %s is currently not an option" %key)

    return stacked_clf_dict

### Find optimal parameters for the base estimators

I ran the gridsearch below to find the optimal parameters for each of the base learners. 

This is the List of paramters found:

Decision Tree: {criterion= 'entropy', max_depth= 47, min_samples_split= 50}, 
K-Nearest-Neighbour: {n_neighbors= 5, weights= 'distance'},
Logistic Regression: {C= 0.2, max_iter= 100,multi_class= 'ovr', solver= 'newton-cg'},
Random Forest: {max_features= 8, min_samples_split= 50,n_estimators= 400},
SVC: {C= 1, gamma= 1, kernel= 'linear'}}
Gradient Boosting: {learning_rate= 0.3, n_estimators= 200}
Ada Boost: {'base_estimator': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=6,
              max_features=None, max_leaf_nodes=None,
              min_impurity_decrease=0.0, min_impurity_split=None,
              min_samples_leaf=200, min_samples_split=2,
              min_weight_fraction_leaf=0.0, presort=False, random_state=None,
              splitter='best'), 'n_estimators': 150},
Extra Trees: {criterion= 'entropy', 'max_depth'= 22, 'n_estimators'= 150},
Multi Layer Perceptron: {alpha= 0.01, hidden_layer_sizes= 400}}

In [15]:
# Set up the parameter grid to seaerch
# param_grid =    {'SVC': {'kernel': ['linear', 'rbf'],'C': [1, 10, 100, 1000], 'gamma': list(range(1, 10, 10))}, 
#                  'Decision Tree':{'criterion': [ "entropy", "gini"], 'max_depth': list(range(2,50, 5)),'min_samples_split': [200,100,50,300,400]},
#                  "Logistic Regression":{'multi_class': ['ovr'], 'C': [x / 10.0 for x in range(2, 21, 2)],'solver':['newton-cg', 'lbfgs', 'liblinear', 'sag'], 'max_iter':[100,500,1000]},
#                  "K-Nearest-Neighbour": {'n_neighbors': list(range(5, 60, 5)), 'weights': ['uniform', 'distance']},
#                  "Random Forest": {'n_estimators': list(range(100, 501, 50)),'max_features': list(range(2, 10, 2)), 'min_samples_split': [100,200,300,100,50] },
#                  "GradientBoosting": {'n_estimators': [60,100,150,200], 'learning_rate': [0.8, 1.0,0.5,0.3,0.1] },
#                  "Ada Boost": {'n_estimators': list(range(50, 501, 50)),'base_estimator': [tree.DecisionTreeClassifier(criterion="entropy", max_depth = 6, min_samples_leaf = 200),]},
#                  "Bagging Classifer": {'n_estimators': list(range(50, 501, 50)),'base_estimator': [tree.DecisionTreeClassifier(criterion="entropy", max_depth = 6, min_samples_leaf = 200),tree.DecisionTreeClassifier()]},
#                  "Multi Layer Perceptron": {'hidden_layer_sizes': [(400), (400, 200), (400, 200, 100)], 'alpha': list(10.0 ** -np.arange(1, 7))},
#                  "Extra Trees": {'n_estimators': list(range(100, 501, 50)), 'criterion': [ "entropy", "gini"], 'max_depth': list(range(2,50, 5))}
#                 }

# #list of all the base estimators
# base_learners = ['Decision Tree', "Logistic Regression","K-Nearest-Neighbour","SVC","Random Forest","Gradient Boosting",\
#                  "Naive Bayes","Ada Boost","Extra Trees","Multi Layer Perceptron"]


# #inisiate base classifers and return as dict
# base_learners_dict = create_base_learners(base_learners)

# # dicts to store the best scores and best parameters of the base estimators
# model_tuned_params_list = {}
# grid_search_scores = {}

# # Iterate through all selected estimators and perfrom on gridsearch on each one
# print( "Performing Grid Search")
# for clf_name, clf in base_learners_dict.items():
#         print("Running GridSearchCV for %s." %clf_name)
#         # Get parameters for specific estimator
#         param_grid_estimator = param_grid[clf_name]
        
#         # Perform the search
#         grd = GridSearchCV(clf, param_grid_estimator, cv=cv_folds, verbose = 5, return_train_score=True ,n_jobs= -1)
#         grd.fit(X_train_plus_valid,y_train_plus_valid)
        
#         # Print and save the results
#         display(grd.best_params_)
#         model_tuned_params_list[clf_name] = grd.best_params_
#         grid_search_scores[clf_name] = grd.best_score_

## Comparing the Performance of Different Stack Layer Approaches

Compared the performance of the ensemble when a label based stack layer training set and a probability based stack layer training set is used.

It is evident that the Classifier performs better when label based stack layer is used rather then a probability based stack layer. This result will vary depending on the stack layer estimator used. I noticed that using logistic regression as the stack estimator resulted in much better performance when trained on the probability based stack layer training set rather then the label based stack layer training set.

In [16]:

probability_training = [True, False]
probability_training_scores = {}


print("Results where probability based stack layer is True")
clf = SuperLearnerClassifier(base_proba_layer=True)
scores = cross_val_score(clf, X_train_plus_valid, y_train_plus_valid, cv=cv_folds)
print("\n" + "Cross Validation Scores:")
print(scores)
print("\n" + "Mean of the scores where probability based stack layer is True :%f" %(scores.mean()))
probability_training_scores["True_mean_score"] = scores.mean()
clf.fit(X_train, y_train)


Results where probability based stack layer is True

Cross Validation Scores:
[ 0.77777778  0.76303318  0.80952381  0.78229665  0.81055156]

Mean of the scores where probability based stack layer is True :0.788637


SuperLearnerClassifier(base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fr...timators=150, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)},
            base_proba_layer=True, bootstrap_cv_stage=True,
            bootstrap_full_training=True, cv=10,
            stacked_clf={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')},
            use_training_data=False, verbose=0)

In [19]:
y_predict = clf.predict(X_test)

# Print performance details
accuracy = metrics.accuracy_score(y_test, y_predict) # , normalize=True, sample_weight=None
print("\n" + "Accuracy: " +  str(accuracy))
print(metrics.classification_report(y_test, y_predict))

# Print confusion matrix
print("Confusion Matrix")
pd.crosstab(np.array(y_test), y_predict, rownames=['True'], colnames=['Predicted'], margins=True)


Accuracy: 0.795555555556
             precision    recall  f1-score   support

          0       0.76      0.84      0.79        85
          1       0.98      0.94      0.96        95
          2       0.58      0.66      0.62        79
          3       0.76      0.89      0.82        88
          4       0.70      0.53      0.60        97
          5       0.80      0.94      0.86        77
          6       0.54      0.51      0.52        84
          7       0.81      0.87      0.84        91
          8       0.98      0.95      0.97       106
          9       1.00      0.82      0.90        98

avg / total       0.80      0.80      0.79       900

Confusion Matrix


Predicted,0,1,2,3,4,5,6,7,8,9,All
True,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0,71,0,1,8,0,1,4,0,0,0,85
1,0,89,2,3,0,0,1,0,0,0,95
2,0,0,52,2,15,2,8,0,0,0,79
3,3,2,2,78,0,1,2,0,0,0,88
4,0,0,19,8,51,0,19,0,0,0,97
5,0,0,0,0,0,72,0,4,1,0,77
6,19,0,12,3,6,1,43,0,0,0,84
7,0,0,0,0,0,11,0,79,1,0,91
8,0,0,1,1,0,0,3,0,101,0,106
9,1,0,0,0,1,2,0,14,0,80,98


In [20]:

print("Results where probability based stack layer is False")
clf = SuperLearnerClassifier(base_proba_layer=False)
scores = cross_val_score(clf, X_train_plus_valid, y_train_plus_valid, cv=cv_folds)
print("\n" + "Cross Validation Scores:")
print(scores)
print("\n" + "Mean of the scores where probability based stack layer is False :%f" %(scores.mean()))
probability_training_scores["False_mean_score"] = scores.mean()
clf.fit(X_train, y_train)


Results where probability based stack layer is False

Cross Validation Scores:
[ 0.82033097  0.79383886  0.82619048  0.77751196  0.81534772]

Mean of the scores where probability based stack layer is False :0.806644


SuperLearnerClassifier(base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fr...timators=150, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)},
            base_proba_layer=False, bootstrap_cv_stage=True,
            bootstrap_full_training=True, cv=10,
            stacked_clf={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')},
            use_training_data=False, verbose=0)

In [23]:
y_predict = clf.predict(X_test)

# Print performance details
accuracy = metrics.accuracy_score(y_test, y_predict) # , normalize=True, sample_weight=None
print("\n" + "Accuracy: " +  str(accuracy))
print(metrics.classification_report(y_test, y_predict))

# Print confusion matrix
print("Confusion Matrix")
pd.crosstab(np.array(y_test), y_predict, rownames=['True'], colnames=['Predicted'], margins=True)


Accuracy: 0.797777777778
             precision    recall  f1-score   support

          0       0.71      0.85      0.77        85
          1       0.95      0.96      0.95        95
          2       0.62      0.48      0.54        79
          3       0.86      0.74      0.79        88
          4       0.57      0.78      0.66        97
          5       0.81      0.91      0.86        77
          6       0.66      0.46      0.55        84
          7       0.81      0.91      0.86        91
          8       0.99      0.93      0.96       106
          9       0.99      0.87      0.92        98

avg / total       0.81      0.80      0.80       900

Confusion Matrix


Predicted,0,1,2,3,4,5,6,7,8,9,All
True,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0,72,0,1,0,3,0,8,0,0,1,85
1,0,91,0,1,2,0,0,0,1,0,95
2,2,0,38,6,27,0,6,0,0,0,79
3,6,5,0,65,12,0,0,0,0,0,88
4,1,0,11,2,76,1,6,0,0,0,97
5,0,0,0,0,0,70,0,7,0,0,77
6,20,0,11,1,11,2,39,0,0,0,84
7,0,0,0,0,0,8,0,83,0,0,91
8,0,0,0,1,2,4,0,0,99,0,106
9,0,0,0,0,0,1,0,12,0,85,98


## Grid Search Through SuperLearnerClassifier Architectures & Parameters

Perfromed a grid search experiment to detemrine the optimal architecture and hyper-parameter values for the SuperLearnClasssifier for the MNIST Fashion classification problem.

Due to computational limits I had to reduce the search space for the grid search. I had originally hoped to perform around 32 to 60 different combinations but this would of taken more than 6 hours ( 300 fits). I also had to reduce the sample of the dataset to save time. I feel that the overall performance of the classifier would of been greater if the sample size was greater but this was limited due to computational power. 

I would of liked to evaluate if bootstrapping the full training set when fitting the base learners helped improve performance and if different types of stack learners would boosted performance.


In [37]:
# Taken from http://scikit-learn.org/stable/auto_examples/model_selection/plot_randomized_search.html
def report(results, n_top=3):
    for i in range(1, n_top + 1):
        candidates = np.flatnonzero(results['rank_test_score'] == i)
        for candidate in candidates:
            print("Model with rank: {0}".format(i))
            print("Mean validation score: {0:.3f} (std: {1:.3f})".format(
                  results['mean_test_score'][candidate],
                  results['std_test_score'][candidate]))
            print("Parameters: {0}".format(results['params'][candidate]))
            print("")

I had to drop some of the combinations dues to the time constraints, running all the combinatins listed here with a 5% dataset would of taken over 12 hours


In [38]:

base_learner_combination1 = create_base_learners(["Decision Tree", "Logistic Regression",\
                                                  "K-Nearest-Neighbour", "SVC", "Random Forest",\
                                         'Multi Layer Perceptron', 'Ada Boost', 'Bagging Classifer', 'Naive Bayes', "Extra Trees","Gradient Boosting"])

base_learner_combination2 = create_base_learners([ "Logistic Regression", "Random Forest", 'Multi Layer Perceptron', 'Gradient Boosting',\
                                         'Bagging Classifer', "SVC"])

# base_learner_combination3 = create_base_learners(["Decision Tree", "Logistic Regression","SVC"])

# base_learner_combination4 = create_base_learners(["Decision Tree", "Logistic Regression","SVC", "Random Forest",'Gradient Boosting',\
#                                          'Bagging Classifer','Extra Trees'])

# I had to drop some of the combinations dues to the time constraints, running all the combinatins listed here with a 5% dataset would of taken over 12 hours

stack_learner1 = create_stacked_learner(["Decision Tree"])
stack_learner2 = create_stacked_learner(["Logistic Regression"])
# stack_learner3 = create_stacked_learner(["Random Forest"])
# stack_learner4 = create_stacked_learner(['Multi Layer Perceptron'])




super_learner_params = {"base_clfs": [base_learner_combination1, base_learner_combination2],\
                        "stacked_clf" : [stack_learner1,stack_learner2], \
                        "base_proba_layer" : [True, False], "bootstrap_full_training" : [True]} 


In [39]:
# Create the gridsearch object
grd = GridSearchCV(SuperLearnerClassifier() ,param_grid= super_learner_params, verbose=2, cv=cv_folds)

#set start time
start = time()
#Fit and run the gridsearch 
grd.fit(X_train_plus_valid, y_train_plus_valid)

print("GridSearchCV took %.2f seconds for %d candidate parameter settings."
      % (time() - start, len(grd.cv_results_['params'])))



# display(grd.best_params_)
model_tuned_params_list["Tuned SuperLearner"] = grd.best_params_

Fitting 5 folds for each of 8 candidates, totalling 40 fits
[CV] base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'), 'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'K-Nearest-Neighbour': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='distance'), 'SVC': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_fun

[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:  2.9min remaining:    0.0s


[CV]  base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'), 'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'K-Nearest-Neighbour': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='distance'), 'SVC': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
  ma

[CV]  base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'), 'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'K-Nearest-Neighbour': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='distance'), 'SVC': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
  ma

[CV]  base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'), 'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'K-Nearest-Neighbour': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='distance'), 'SVC': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
  ma

[CV]  base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'), 'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'K-Nearest-Neighbour': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='distance'), 'SVC': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
  ma

[CV]  base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'), 'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'K-Nearest-Neighbour': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='distance'), 'SVC': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
  ma

[CV]  base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'), 'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'K-Nearest-Neighbour': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='distance'), 'SVC': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
  ma

[CV]  base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'), 'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'K-Nearest-Neighbour': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='distance'), 'SVC': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
  ma

[CV]  base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'), 'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'K-Nearest-Neighbour': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='distance'), 'SVC': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
  ma

[CV]  base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'), 'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'K-Nearest-Neighbour': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='distance'), 'SVC': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
  ma

[CV]  base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'), 'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'K-Nearest-Neighbour': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='distance'), 'SVC': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
  ma

[CV]  base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'), 'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'K-Nearest-Neighbour': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='distance'), 'SVC': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
  ma

[CV]  base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'), 'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'K-Nearest-Neighbour': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='distance'), 'SVC': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
  ma

[CV]  base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'), 'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'K-Nearest-Neighbour': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='distance'), 'SVC': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
  ma

[CV]  base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'), 'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'K-Nearest-Neighbour': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='distance'), 'SVC': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
  ma

[CV]  base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'), 'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'K-Nearest-Neighbour': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='distance'), 'SVC': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
  ma

[CV]  base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'), 'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'K-Nearest-Neighbour': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='distance'), 'SVC': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
  ma

[CV]  base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'), 'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'K-Nearest-Neighbour': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='distance'), 'SVC': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
  ma

[CV]  base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'), 'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'K-Nearest-Neighbour': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='distance'), 'SVC': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
  ma

[CV]  base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best'), 'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'K-Nearest-Neighbour': KNeighborsClassifier(algorithm='auto', leaf_size=30, metric='minkowski',
           metric_params=None, n_jobs=1, n_neighbors=5, p=2,
           weights='distance'), 'SVC': SVC(C=1, cache_size=200, class_weight=None, coef0=0.0,
  decision_function_shape='ovr', degree=3, gamma=1, kernel='linear',
  ma

[CV]  base_clfs={'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'Random Forest': RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features=8, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, n_estimators=400, n_jobs=-1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False), 'Multi Layer Perceptron': MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=400, learning_rate='constant',
       learning_rate_init=0.001, max_ite

[CV]  base_clfs={'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'Random Forest': RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features=8, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, n_estimators=400, n_jobs=-1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False), 'Multi Layer Perceptron': MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=400, learning_rate='constant',
       learning_rate_init=0.001, max_ite

[CV]  base_clfs={'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'Random Forest': RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features=8, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, n_estimators=400, n_jobs=-1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False), 'Multi Layer Perceptron': MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=400, learning_rate='constant',
       learning_rate_init=0.001, max_ite

[CV]  base_clfs={'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'Random Forest': RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features=8, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, n_estimators=400, n_jobs=-1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False), 'Multi Layer Perceptron': MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=400, learning_rate='constant',
       learning_rate_init=0.001, max_ite

[CV]  base_clfs={'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'Random Forest': RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features=8, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, n_estimators=400, n_jobs=-1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False), 'Multi Layer Perceptron': MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=400, learning_rate='constant',
       learning_rate_init=0.001, max_ite

[CV]  base_clfs={'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'Random Forest': RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features=8, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, n_estimators=400, n_jobs=-1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False), 'Multi Layer Perceptron': MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=400, learning_rate='constant',
       learning_rate_init=0.001, max_ite

[CV]  base_clfs={'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'Random Forest': RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features=8, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, n_estimators=400, n_jobs=-1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False), 'Multi Layer Perceptron': MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=400, learning_rate='constant',
       learning_rate_init=0.001, max_ite

[CV]  base_clfs={'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'Random Forest': RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features=8, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, n_estimators=400, n_jobs=-1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False), 'Multi Layer Perceptron': MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=400, learning_rate='constant',
       learning_rate_init=0.001, max_ite

[CV]  base_clfs={'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'Random Forest': RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features=8, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, n_estimators=400, n_jobs=-1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False), 'Multi Layer Perceptron': MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=400, learning_rate='constant',
       learning_rate_init=0.001, max_ite

[CV]  base_clfs={'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'Random Forest': RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features=8, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, n_estimators=400, n_jobs=-1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False), 'Multi Layer Perceptron': MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=400, learning_rate='constant',
       learning_rate_init=0.001, max_ite

[Parallel(n_jobs=1)]: Done  40 out of  40 | elapsed: 105.0min finished


GridSearchCV took 6548.38 seconds for 8 candidate parameter settings.


Evaluate the performance of the model selected by the grid search on a hold-out dataset

The best combination choosen was: 
    
base learners: "Logistic Regression", "Random Forest", 'Multi Layer Perceptron', 'Gradient Boosting','Bagging Classifer', "SVC"

stack_learner: "Logistic Regression"

Probabilty stack layer: True 


In [40]:
report(grd.cv_results_)

Model with rank: 1
Mean validation score: 0.813 (std: 0.018)
Parameters: {'base_clfs': {'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), 'Random Forest': RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',
            max_depth=None, max_features=8, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, n_estimators=400, n_jobs=-1,
            oob_score=False, random_state=None, verbose=0,
            warm_start=False), 'Multi Layer Perceptron': MLPClassifier(activation='relu', alpha=0.01, batch_size='auto', beta_1=0.9,
       beta_2=0.999, early_stopping=False, epsilon=1e-08,
       hidden_layer_sizes=

In [41]:
# load hold out test dataset
test_dataset = pd.read_csv('fashion-mnist_test.csv')
# Seperate the target feature
test_X = test_dataset[test_dataset.columns[1:]]
test_Y = np.array(test_dataset["label"])
# normalise the data
test_X = test_X/255

In [42]:
# create and fit the tuned classifer 
clf_tuned = SuperLearnerClassifier(base_clfs = grd.best_params_['base_clfs'],stacked_clf= grd.best_params_['stacked_clf'], base_proba_layer = grd.best_params_['base_proba_layer'] )
clf_tuned.fit(X,y)

SuperLearnerClassifier(base_clfs={'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False), ...r',
  max_iter=-1, probability=True, random_state=None, shrinking=True,
  tol=0.001, verbose=False)},
            base_proba_layer=True, bootstrap_cv_stage=True,
            bootstrap_full_training=True, cv=10,
            stacked_clf={'Logistic Regression': LogisticRegression(C=0.2, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='ovr', n_jobs=1,
          penalty='l2', random_state=None, solver='newton-cg', tol=0.0001,
          verbose=0, warm_start=False)},
            use_training_data=False, verbose=0)

We can see that the performance on the hold out set is quite to the mean average score for the grid search. Again I feel that the classifier would of performed much better with a larger sample of the dataset

Again we can see that the classifier especially struggles to predict the class label 6. With it mainly confusing 0 for 6.


In [44]:
# Make a set of predictions for the test data
y_predict = clf_tuned.predict(test_X)

# Print performance details
accuracy = metrics.accuracy_score(test_Y, y_predict) # , normalize=True, sample_weight=None
print("Accuracy: " +  str(accuracy))
print(metrics.classification_report(test_Y, y_predict))

# Print confusion matrix
print("Confusion Matrix")
pd.crosstab(np.array(test_Y), y_predict, rownames=['True'], colnames=['Predicted'], margins=True)

Accuracy: 0.8229
             precision    recall  f1-score   support

          0       0.71      0.85      0.78      1000
          1       0.96      0.94      0.95      1000
          2       0.69      0.72      0.70      1000
          3       0.83      0.84      0.84      1000
          4       0.69      0.79      0.74      1000
          5       0.91      0.91      0.91      1000
          6       0.71      0.43      0.54      1000
          7       0.89      0.87      0.88      1000
          8       0.93      0.95      0.94      1000
          9       0.90      0.92      0.91      1000

avg / total       0.82      0.82      0.82     10000

Confusion Matrix


Predicted,0,1,2,3,4,5,6,7,8,9,All
True,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0,850,6,25,57,2,4,37,1,18,0,1000
1,5,943,20,24,3,0,5,0,0,0,1000
2,35,2,720,11,165,1,51,0,15,0,1000
3,55,24,21,844,45,2,8,0,1,0,1000
4,3,2,106,33,790,1,60,0,5,0,1000
5,1,0,1,0,2,914,2,49,6,25,1000
6,241,5,134,36,132,1,429,0,22,0,1000
7,0,0,0,0,0,50,0,868,0,82,1000
8,2,0,16,11,5,4,6,6,949,1,1000
9,0,0,0,0,0,24,3,51,0,922,1000


## Evaluating the Impact of Adding Original Descriptive Features at the Stack Layer

Evaluated the impact of adding original descriptive features at the stack layer.

From analysing the results it seems that adding the original descriptive feature affects the overall performance of the classifier with a mean cross validation score of 72% when the original descriptive features is added to the stack layer compared to a mean cross validation score of 78% when they are not added.

In [45]:
# store the scores 
org_training_added_scores = {}


print("Results where Original Descriptive Features were added to the stack layer")
clf = SuperLearnerClassifier(use_training_data=True)
scores = cross_val_score(clf, X_train_plus_valid, y_train_plus_valid, cv=cv_folds)
org_training_added_scores["desciptive__f_added_mean_score"] = scores.mean()
print("\n" + "Cross Validation Scores:")
print(scores)
print("\n" + "Mean of the scores where where Original Descriptive Features is added is True :%f" %(scores.mean()))
clf.fit(X_train, y_train)


Results where Original Descriptive Features were added to the stack layer

Cross Validation Scores:
[ 0.74269006  0.71345029  0.73529412  0.70909091  0.71165644]

Mean of the scores where where Original Descriptive Features is added is True :0.722436


SuperLearnerClassifier(base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fr...timators=150, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)},
            base_proba_layer=False, bootstrap_cv_stage=True,
            bootstrap_full_training=True, cv=10,
            stacked_clf={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')},
            use_training_data=True, verbose=0)

In [46]:
y_predict = clf.predict(X_test)



# Print performance details
accuracy = metrics.accuracy_score(y_test, y_predict) # , normalize=True, sample_weight=None
print("\n" + "Accuracy: " +  str(accuracy))
print(metrics.classification_report(y_test, y_predict))

# Print confusion matrix
print("Confusion Matrix")
pd.crosstab(np.array(y_test), y_predict, rownames=['True'], colnames=['Predicted'], margins=True)



Accuracy: 0.688888888889
             precision    recall  f1-score   support

          0       0.81      0.71      0.76        42
          1       0.81      0.89      0.85        28
          2       0.67      0.56      0.61        43
          3       0.71      0.60      0.65        40
          4       0.34      0.77      0.47        26
          5       0.90      0.78      0.84        46
          6       0.80      0.30      0.43        27
          7       0.61      0.65      0.63        34
          8       0.81      0.92      0.86        37
          9       0.71      0.68      0.69        37

avg / total       0.73      0.69      0.69       360

Confusion Matrix


Predicted,0,1,2,3,4,5,6,7,8,9,All
True,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0,30,2,1,3,4,0,0,0,2,0,42
1,0,25,0,2,1,0,0,0,0,0,28
2,2,0,24,1,16,0,0,0,0,0,43
3,0,3,3,24,10,0,0,0,0,0,40
4,0,0,5,0,20,0,1,0,0,0,26
5,0,0,1,0,0,36,0,6,2,1,46
6,5,1,2,3,8,0,8,0,0,0,27
7,0,0,0,0,0,3,0,22,0,9,34
8,0,0,0,1,0,0,1,1,34,0,37
9,0,0,0,0,0,1,0,7,4,25,37


In [47]:

print("Results where Original Descriptive Features were NOT added to the stack layer")
clf = SuperLearnerClassifier(use_training_data=False)
scores = cross_val_score(clf, X_train_plus_valid, y_train_plus_valid, cv=cv_folds)
org_training_added_scores["desciptive__f_NOT_added_mean_score"] = scores.mean()
print("\n" + "Cross Validation Scores:")
print(scores)
print("\n" + "Mean of the scores where where Original Descriptive Features is added is False :%f" %(scores.mean()))
clf.fit(X_train, y_train)


Results where Original Descriptive Features were NOT added to the stack layer

Cross Validation Scores:
[ 0.79532164  0.78947368  0.8         0.77575758  0.7791411 ]

Mean of the scores where where Original Descriptive Features is added is False :0.787939


SuperLearnerClassifier(base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fr...timators=150, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False)},
            base_proba_layer=False, bootstrap_cv_stage=True,
            bootstrap_full_training=True, cv=10,
            stacked_clf={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')},
            use_training_data=False, verbose=0)

In [48]:
y_predict = clf.predict(X_test)

# Print performance details
accuracy = metrics.accuracy_score(y_test, y_predict) # , normalize=True, sample_weight=None
print("\n" + "Accuracy: " +  str(accuracy))
print(metrics.classification_report(y_test, y_predict))

# Print confusion matrix
print("Confusion Matrix")
pd.crosstab(np.array(y_test), y_predict, rownames=['True'], colnames=['Predicted'], margins=True)



Accuracy: 0.772222222222
             precision    recall  f1-score   support

          0       0.66      0.83      0.74        42
          1       0.76      0.93      0.84        28
          2       0.72      0.72      0.72        43
          3       0.86      0.62      0.72        40
          4       0.50      0.50      0.50        26
          5       0.91      0.85      0.88        46
          6       0.48      0.44      0.46        27
          7       0.84      0.91      0.87        34
          8       0.97      0.86      0.91        37
          9       0.92      0.92      0.92        37

avg / total       0.78      0.77      0.77       360

Confusion Matrix


Predicted,0,1,2,3,4,5,6,7,8,9,All
True,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0,35,2,1,1,2,0,1,0,0,0,42
1,0,26,0,2,0,0,0,0,0,0,28
2,0,1,31,0,6,0,5,0,0,0,43
3,9,2,2,25,1,0,1,0,0,0,40
4,0,0,6,1,13,0,6,0,0,0,26
5,0,1,0,0,0,39,0,3,1,2,46
6,9,2,2,0,2,0,12,0,0,0,27
7,0,0,0,0,0,2,0,31,0,1,34
8,0,0,0,0,2,2,0,1,32,0,37
9,0,0,1,0,0,0,0,2,0,34,37


## Explore the Ensemble Model

Perform an analysis to investigate the strength of the base estimators and the strengths of the correlations between them.

In [49]:
# create all base esimators 
base_learners_list = ["Decision Tree", "Logistic Regression",\
                        "K-Nearest-Neighbour", "SVC", "Random Forest",\
            'Multi Layer Perceptron', 'Ada Boost', 'Bagging Classifer', 'Naive Bayes', "Extra Trees", 'Gradient Boosting']



base_learners = create_base_learners(base_learners_list)

stack_learner = create_stacked_learner(["Decision Tree"])

#create superlearner classifer and fit it. 
clf = SuperLearnerClassifier(base_clfs= base_learners, stacked_clf= stack_learner, verbose=0)
clf.fit(X_train_plus_valid, y_train_plus_valid)

SuperLearnerClassifier(base_clfs={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fr...       presort='auto', random_state=None, subsample=1.0, verbose=0,
              warm_start=False)},
            base_proba_layer=False, bootstrap_cv_stage=True,
            bootstrap_full_training=True, cv=10,
            stacked_clf={'Decision Tree': DecisionTreeClassifier(class_weight=None, criterion='entropy', max_depth=47,
            max_features=None, max_leaf_nodes=None,
            min_impurity_decrease=0.0, min_impurity_split=None,
            min_samples_leaf=1, min_samples_split=50,
            min_weight_fraction_leaf=0.0, presort=False, random_state=None,
            splitter='best')},
            use_training_data=False, verbose=0)

In [50]:
# Make a set of predictions for the test data
y_predict = clf.predict(test_X)

# Print performance details
accuracy = metrics.accuracy_score(test_Y, y_predict) # , normalize=True, sample_weight=None
print("Accuracy: " +  str(accuracy))
print(metrics.classification_report(test_Y, y_predict))

# Print confusion matrix
print("Confusion Matrix")
pd.crosstab(np.array(test_Y), y_predict, rownames=['True'], colnames=['Predicted'], margins=True)

Accuracy: 0.7697
             precision    recall  f1-score   support

          0       0.76      0.75      0.75      1000
          1       0.96      0.94      0.95      1000
          2       0.71      0.52      0.60      1000
          3       0.73      0.89      0.81      1000
          4       0.56      0.76      0.64      1000
          5       0.81      0.88      0.84      1000
          6       0.55      0.38      0.45      1000
          7       0.82      0.83      0.82      1000
          8       0.92      0.89      0.91      1000
          9       0.89      0.87      0.88      1000

avg / total       0.77      0.77      0.76     10000

Confusion Matrix


Predicted,0,1,2,3,4,5,6,7,8,9,All
True,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
0,753,5,19,115,21,4,55,3,25,0,1000
1,2,935,8,40,9,0,6,0,0,0,1000
2,14,15,515,14,303,11,118,0,10,0,1000
3,14,13,10,893,49,1,17,0,3,0,1000
4,0,1,88,45,755,13,94,0,4,0,1000
5,0,0,12,1,1,879,0,79,5,23,1000
6,214,9,74,93,192,19,375,1,23,0,1000
7,0,0,0,0,0,80,0,829,2,89,1000
8,0,0,4,16,13,24,17,37,889,0,1000
9,0,0,0,0,0,60,0,65,1,874,1000


#### Strenghts of correlations between base estimators 

The base_learner_correlation() function creates a person correlation heatmap of the stack layer training set in the fit function which is a output of all cross validation predictions of the base learners on the original training set.

From analysing the below correlation chart we can see that the decision tree based base estimators correlate highly with each other but do not correlate highly with other base estimator types such as SVC, Multi Layer Perceptron and Naive Bayes. 

Ada Boost and Naive Bayes have the lowest correlation with the other base classifiers, this may be due to both classifiers performing poorly on this data and therefore being weak learners in this situation. 



In [61]:
clf.base_learner_correlation()

Unnamed: 0,Decision Tree,Logistic Regression,K-Nearest-Neighbour,SVC,Random Forest,Multi Layer Perceptron,Ada Boost,Bagging Classifer,Naive Bayes,Extra Trees,Gradient Boosting
Decision Tree,1.0,0.82,0.8,0.82,0.84,0.83,0.7,0.81,0.72,0.82,0.8
Logistic Regression,0.82,1.0,0.89,0.91,0.91,0.92,0.78,0.84,0.77,0.93,0.88
K-Nearest-Neighbour,0.8,0.89,1.0,0.87,0.9,0.88,0.75,0.82,0.77,0.91,0.85
SVC,0.82,0.91,0.87,1.0,0.88,0.91,0.75,0.83,0.76,0.9,0.85
Random Forest,0.84,0.91,0.9,0.88,1.0,0.89,0.76,0.87,0.79,0.92,0.88
Multi Layer Perceptron,0.83,0.92,0.88,0.91,0.89,1.0,0.76,0.85,0.76,0.9,0.88
Ada Boost,0.7,0.78,0.75,0.75,0.76,0.76,1.0,0.71,0.7,0.78,0.73
Bagging Classifer,0.81,0.84,0.82,0.83,0.87,0.85,0.71,1.0,0.76,0.85,0.83
Naive Bayes,0.72,0.77,0.77,0.76,0.79,0.76,0.7,0.76,1.0,0.79,0.77
Extra Trees,0.82,0.93,0.91,0.9,0.92,0.9,0.78,0.85,0.79,1.0,0.87


#### Strength of base estimators 

From analysing the results below we can see the power of the base estimators. Extra Trees, Multi Layer Perceptron and Logistic regression are all the best performers with a accuracy score of 79%-80% and a F1 score of 78.3%-80%. Svc, random forest, Gradient Boosting and K-NN all perform periodically with an accuracy of 73.2%-77.5% and a F1 score of 73.1%-77.5%. Then the Decision Tree, Bagging Classifier, Ada Boost and Naive Bayes estimators all perform very poor and are clearly weak learners in this case. They have an accuracy score of 55%-65% and a F1 score of 52%-65%. 

I imaged that the decision tree classifier would of preformed better. It may be down to the parameter that were chosen for it or maybe it struggled with the structure of the stack layer. 

It would be interesting to test the top 4 base estimators in the super learner classifier to see how it performs.


In [62]:
df = clf.predictive_power()
df.sort_values(by=["Accuracy"], ascending= False)

Unnamed: 0,Accuracy,F1 Score
Extra Trees,0.804762,0.800513
Multi Layer Perceptron,0.79881,0.794834
Logistic Regression,0.791667,0.783413
SVC,0.779762,0.775003
Random Forest,0.758333,0.749325
Gradient Boosting,0.752381,0.748669
K-Nearest-Neighbour,0.732143,0.731254
Decision Tree,0.655952,0.652292
Bagging Classifer,0.619048,0.580553
Ada Boost,0.596429,0.592451
