# Stacking

The idea in this notebook is to implement stacking methods on the kickstarter data. Hopefully, to reach the ultimate score.

1. [Premises](#Premises)
    - [Packages](#Packages)
    - [Functions](#Functions)
    
2. [Final predictions](#Final-predictions)
    - [Importing models and datasets](#Importing-models-and-datasets)
    - [Preparing the inputs of the level 1 model](#Preparing-the-inputs-of-the-level-1-model)
    - [Majority Vote](#Majority-vote)
    - [Stacking](#Stacking)
        a. [Adaboost](#Adaboost)
        b. [Logistic regression](#Logistic-regression)
        c. [Neural network](#Neural-network)
        d. [SVM](#SVM)
        e. [Random forest](#Random-forest)
        f. [XGB](#XGBoost)

3. [Enhancements](#Enhancements)
    - [Gridsearch](#Gridsearch)
        a. [Adaboost](#GS-:-Adaboost)
        b. [Logistic regression](#GS-:-Logistic-regression)
        c. [SVM](#GS-:-SVM)
        d. [Random forest](#GS-:-Random-forest)
        e. [XGBoost](#GS-:-XGBoost)
        
4. [Conclusion](#Conclusion)

## Premises
### Packages
Let's import the packages we need.


In [16]:
import pandas as pd
pd.set_option('display.max_columns', 50) # Display up to 50 columns at a time
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from matplotlib import cm
plt.style.use('seaborn')
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 12,5
import glob # To read all csv files in the directory
import seaborn as sns
import calendar
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_curve, auc
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report
from sklearn.ensemble import AdaBoostClassifier
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier 
from sklearn.ensemble import GradientBoostingClassifier as xgb
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import confusion_matrix, f1_score, precision_recall_fscore_support
import itertools
import time
import glob
import pickle as pk
from keras.models import load_model
import os
from sklearn.metrics import precision_recall_fscore_support
from keras.models import Sequential, load_model
from keras.layers import Dense, Activation
from keras.utils import to_categorical
import numpy as np
from sklearn.linear_model import LogisticRegression as LR
from sklearn.externals import joblib
from sklearn import svm

### Functions
We will need a few functions througout this notebook.

In [2]:
def stacked_dataset(models, inputX, scaler = None, fit_scaler = False):
    """
    Input : list of learners, np.array, sklearn object, bool
    Output: np.array
    The function takes a list of pretrained models,
    the training observations and eventually a standard scaler to scale the stacked data.
    The last boolean is an indicator to tell if the standard scaler ought to be trained or if it has already
    been. Then it returns the concatenated predictions of each and every model in a flattened array. The output will be the
    input of the level 1 model to train with trainStack.
    """
    stackX = None
    for model in models:
        # make prediction
        yhat = model.predict(inputX)
        # stack predictions into [rows, members, probabilities]
        if stackX is None:
            stackX = yhat
        else:
            stackX = np.dstack((stackX, yhat))
    # flatten predictions to [rows, members x probabilities]
    if len(models) > 1:
        stackX = stackX.reshape((stackX.shape[1], stackX.shape[2]))
        
#     print("Il y a {0} modèles, le format des observations est : {1} et celui des observations empilées est : {2}".format(len(models), inputX.shape, stackX.shape))
#     print("Les cinq premières lignes ressemblent à ceci : {}".format(stackX[:5,:]))

    if scaler is not None:
        if fit_scaler:
            stackX = scaler.fit_transform(stackX)
        else:
            stackX = scaler.transform(stackX)
    
#     print('Les résultats agglomérés des modèles ressemblent à ça : {}'.format(stackX[:5,:]))
    return stackX

In [3]:
def trainStack(first_models, final_model, X_stacked, y_train, is_MLP = False, epochs = 300):
    """
    Input : list of learners, learner, np.array, np.array, bool, int
    Output : learner
    The function takes the level 0 trained learners, the level 1 learner to train, the stacked training observations, the 
    training labels, the boolean telling whether or not the top-level classifier is a Multi-Layer Perceptron and the integer
    corresponding to the number of training epochs if we have an MLP.
    It returns the level 1 trained model.
    """
    
    if is_MLP:
        y_train_categ = to_categorical(y_train)
#         print("Les labels pour l'entrainement ont cette forme : {}".format(y_train_categ[:5]))
        final_model.fit(X_stacked, y_train_categ, epochs = epochs, verbose = 0)
    else:
        final_model.fit(X_stacked, y_train)
    
    return final_model

In [4]:
def predictStack(final_model, X_stacked):
    """
    Returns the prediction of the final model on X_stacked which is the stack of the predictions of the level 0 models.
    """        
    y_predicted = final_model.predict(X_stacked)
    return y_predicted

In [5]:
def probabilitiesToClass(array):
    """
    Input: np.array of size (n,2)
    Output : np.array of size (n,)
    The function takes the array whose lines are the observations and column values correspond to the probability of belonging 
    to a given class. If the probability in a given column is superior to 0.5, then we will return the number of that class. 
    For instance : [0.33,0.66] for an observation will render [1] and [0.7,0.3] will render [0].
    """
    res = np.array(array.shape[0]*[0])
    for i in range(array.shape[0]):
        if array[i,1] > array[i,0]:
            res[i] = 1
    return res

In [6]:
def displayPerformances(y_test, y_test_pred, y_train, y_train_pred):
    """
    Input: np.array (n,1), np.array (n,1), np.array (N,1), np.array (N,1)
    Output : pd.DataFrame
    The function takes the labels from the training set, the predicted labels, and the same for the test set. It returns
    a DataFrame containing the values on each data set of the precision, the recall and the f1-score.
    """
    test_precision, test_recall, test_f1score, test_support = precision_recall_fscore_support(y_test, y_test_pred, average='weighted')
    train_precision, train_recall, train_f1score, train_support = precision_recall_fscore_support(y_train, y_train_pred, average='weighted')
    results = {'Precision':[test_precision, train_precision], 'Recall':[test_recall, train_recall], 'F1_score': [test_f1score, train_f1score]}
    results = pd.DataFrame(results, index=['Test','Train'])
    
    return results

## Final predictions
Now we we'll start to make our final predictions based on the outputs of the level 0 classifiers.

### Importing models and datasets

In [7]:
#Let's import a bunch of the level-0 classifiers
level0_models = []

dir_name = 'zero_level_models'
file_list = os.listdir(dir_name)

for f in file_list:        
    if ('model1_' in f) or ('model2_' in f):
        with open(dir_name+'/'+f,'rb') as file:
            pickler = pk.Unpickler(file)
            try:
                temp = pickler.load()
                level0_models.append(temp)
            except:
                continue



In [8]:
# Loading the datasets. Since the models were trained on the first part of the dataset, we must choose different datasets this
# time, for instance the second subsets
X = pd.read_csv('processed_datasets/observations3.csv', index_col=0)
y = pd.read_csv('processed_datasets/labels3.csv', index_col=0)

In [9]:
length = X.shape[0]//2
X_train = X.iloc[:length,:]
y_train = y.iloc[:length,:]
X_test = X.iloc[length:,:]
y_test = y.iloc[length:,:]

### Preparing the inputs of the level 1 model
We will make the predictions of every model on the same dataset in order not to reprocess it each time we want to train the level 1 model.

In [36]:
samples_train = X_train.shape[0]
samples_test = X_test.shape[0]

In [37]:
X_train_stacked = stacked_dataset(level0_models, X_train.iloc[:samples_train,:])

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 200 out of 200 | elapsed:    1.7s finished


In [38]:
X_test_stacked = stacked_dataset(level0_models, X_test.iloc[:samples_test,:])

[Parallel(n_jobs=1)]: Using backend SequentialBackend with 1 concurrent workers.
[Parallel(n_jobs=1)]: Done   1 out of   1 | elapsed:    0.0s remaining:    0.0s
[Parallel(n_jobs=1)]: Done 200 out of 200 | elapsed:    1.7s finished


### Majority vote

One should keep in mind that ensemble learning and stacking may behave better with **lots of level 0 classifiers**. Indeed, to take a majority vote, it may seem better to have a lot of different outputs from level 0 learners. 

In [76]:
def majorityVote(first_models, X_stacked):
    """
    Input : list of learners, np.array (N,d)
    Output : np.array (N,)
    The function takes the level 0 trained learners, and the stacked training observations. It returns the array containing 
    the majority vote coming from the level-0 classifiers.
    """
    res = np.array(X_stacked.shape[0]*[0])
    
    for i in range(X_stacked.shape[0]):
        line = X_stacked[i,:]
        if (line == 1).sum() > (line == 0).sum():
            res[i] = 1
    
    return res

In [186]:
y_mv = majorityVote(level0_models, X_test_stacked)

In [187]:
# The training performances are meaningless
res_mv = displayPerformances(y_test.iloc[:samples_test,:], y_mv, y_train.iloc[:10,:],y_train.iloc[:10,:])
res_mv

Unnamed: 0,Precision,Recall,F1_score
Test,0.750401,0.741585,0.738815
Train,1.0,1.0,1.0


### Stacking
#### Adaboost
The level 1 classifier is the adaboost algorithm.

In [188]:
top_model1 = AdaBoostClassifier();

In [189]:
stack1 = trainStack(level0_models, top_model1, X_train_stacked, y_train.iloc[:samples_train,:])

  y = column_or_1d(y, warn=True)


In [190]:
y_test_adab = predictStack(stack1, X_test_stacked)

In [191]:
# Displaying the results
y_train_adab = predictStack(stack1, X_train_stacked)

In [192]:
res = displayPerformances(y_test.iloc[:samples_test,:], y_test_adab, y_train.iloc[:samples_train,:], y_train_adab)
res

Unnamed: 0,Precision,Recall,F1_score
Test,0.76372,0.756853,0.754888
Train,0.761941,0.759152,0.75656


#### Logistic regression
The level 1 classifier is a logistic regression. At first, we did not adjust the parameters but then, after gridsearching, we took the best parameters found by the gridsearch algorithm (see the [Enhancements section](#Enhancements)).

In [193]:
top_model_lr = LogisticRegression(penalty = 'l2', fit_intercept = True, C = 0.02)
sc = StandardScaler();

In [194]:
# We want to scale our data
X_train_scaled = sc.fit_transform(X_train_stacked)
X_test_scaled = sc.transform(X_test_stacked)

In [195]:
stack_lr = trainStack(level0_models, top_model_lr, X_train_scaled, y_train.iloc[:samples_train,:])

  y = column_or_1d(y, warn=True)


In [196]:
y_test_lr = predictStack(stack_lr, X_test_scaled)

In [197]:
# Displaying the results
y_train_lr = predictStack(stack_lr, X_train_scaled)

In [198]:
res_lr = displayPerformances(y_test.iloc[:samples_test,:], y_test_lr, y_train.iloc[:samples_train,:],y_train_lr)
res_lr

Unnamed: 0,Precision,Recall,F1_score
Test,0.76372,0.756853,0.754888
Train,0.761941,0.759152,0.75656


#### Neural network
The level 1 classifier is a neural network

In [199]:
nn = Sequential()
nn.add(Dense(25, input_dim=len(level0_models), activation='relu'))
nn.add(Dense(2, activation='softmax'))
nn.compile(loss='binary_crossentropy', optimizer='adam', metrics=['binary_accuracy']);

In [200]:
stack_nn = trainStack(level0_models, nn, X_train_scaled, y_train.iloc[:samples_train,:], is_MLP = True)

In [201]:
y_test_nn = probabilitiesToClass(predictStack(stack_nn, X_test_scaled))

In [202]:
# Displaying the results
y_train_nn = probabilitiesToClass(predictStack(stack_nn, X_train_scaled))

In [203]:
res_nn = displayPerformances(y_test.iloc[:samples_test,:], y_test_nn, y_train.iloc[:samples_train,:],y_train_nn)
res_nn

Unnamed: 0,Precision,Recall,F1_score
Test,0.761077,0.754474,0.752539
Train,0.766211,0.763448,0.760977


#### SVM
The top level classifier is a support vector machine.

In [204]:
from sklearn import svm
svm_poly = svm.SVC(kernel = 'linear', C = 0.01, coef0 = 0.01, gamma = 'scale')

In [205]:
stack_svm = trainStack(level0_models, svm_poly, X_train_scaled, y_train.iloc[:samples_train,:])

  y = column_or_1d(y, warn=True)


In [206]:
y_test_svm = predictStack(stack_svm, X_test_scaled)

In [207]:
# Displaying the results
y_train_svm = predictStack(stack_svm, X_train_scaled)

In [208]:
res_svm = displayPerformances(y_test.iloc[:samples_test,:], y_test_svm, y_train.iloc[:samples_train,:],y_train_svm)
res_svm

Unnamed: 0,Precision,Recall,F1_score
Test,0.769431,0.75703,0.753717
Train,0.768447,0.761034,0.756668


#### Random forest
The top level classifier is a random forest. First, we trained the model without adjusting its parrameters, but then, after gridsearching, we took the supposedly best parameters (see the Gridsearching section).

In [209]:
RandomForestClassifier(max_depth=3, n_estimators=50, ccp_alpha = 0.001, min_samples_leaf = 1)
# max_depth=10, min_samples_split=0.001, n_estimators=100

RandomForestClassifier(bootstrap=True, ccp_alpha=0.001, class_weight=None,
                       criterion='gini', max_depth=3, max_features='auto',
                       max_leaf_nodes=None, max_samples=None,
                       min_impurity_decrease=0.0, min_impurity_split=None,
                       min_samples_leaf=1, min_samples_split=2,
                       min_weight_fraction_leaf=0.0, n_estimators=50,
                       n_jobs=None, oob_score=False, random_state=None,
                       verbose=0, warm_start=False)

In [210]:
stack_rf = trainStack(level0_models, rf, X_train_scaled, y_train.iloc[:samples_train,:])

  app.launch_new_instance()


In [211]:
y_test_rf = predictStack(stack_rf, X_test_scaled)

In [212]:
# Displaying the results
y_train_rf = predictStack(stack_rf, X_train_scaled)

In [213]:
res_rf = displayPerformances(y_test.iloc[:samples_test,:], y_test_rf, y_train.iloc[:samples_train,:],y_train_rf)
res_rf

Unnamed: 0,Precision,Recall,F1_score
Test,0.769431,0.75703,0.753717
Train,0.768447,0.761034,0.756668


#### XGBoost
The top level classifier is an XGBoost. First, we trained the model without adjusting its parameters, but then, after gridsearching, we took the supposedly best parameters (see the Gridsearching section).

In [49]:
from sklearn.ensemble import GradientBoostingClassifier as xgb
xgboost = xgb(n_estimators = 30, min_samples_leaf = 3, max_depth = 5, ccp_alpha = 0);

In [50]:
stack_xgb = trainStack(level0_models, xgboost, X_train_stacked, y_train.iloc[:samples_train,:])

  y = column_or_1d(y, warn=True)


In [51]:
y_test_xgb = predictStack(stack_xgb, X_test_stacked)

In [52]:
# Displaying the results
y_train_xgb = predictStack(stack_xgb, X_train_stacked)

In [53]:
res_xgb = displayPerformances(y_test.iloc[:samples_test,:], y_test_xgb, y_train.iloc[:samples_train,:], y_train_xgb)
res_xgb

Unnamed: 0,Precision,Recall,F1_score
Test,0.764739,0.756782,0.754543
Train,0.766897,0.7632,0.760322


# Enhancements
We will try to enhance our top-level models.

## Gridsearch

One of the best ways to do so, is to find the best parameters for each algorithm with a gridsearch. However, it can be computationaly expensive, so we will take a small number of observations and assume that the best parameters yielded by the gridsearches are the same when we take a greater number of observations.
Please notice that all the gridsearch we ran are not subjected here, we ran the cells and changed the parameters each time, in order to save time and space.

### GS : Adaboost

Let's grid search on the parameters of adaboost.

In [112]:
# Using GridSearchCV to test different parameters
adab_start = time.time()

adab = AdaBoostClassifier()

params_adab = [{'n_estimators': [50, 75, 100]}]

grid_adab = GridSearchCV(estimator=adab, param_grid=params_adab, cv=2)

grid_adab.fit(X_train_stacked, y_train.iloc[:samples_train,:])

adab_end = time.time()

adab_best_score = grid_adab.best_score_
adab_best_params = grid_adab.best_params_

print(f"Time taken to run: {round((adab_end - adab_start)/60,1)} minutes")
print("Best accuracy:", round(adab_best_score,2))
print("Best parameters:", adab_best_params)

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


Time taken to run: 0.1 minutes
Best accuracy: 0.75
Best parameters: {'n_estimators': 50}


### GS : Logistic regression
Let's grid search over the parameters of the logistic regression.

In [118]:
# Using GridSearchCV to test different parameters
lr_start = time.time()

lr = LogisticRegression()

params_lr = [{'penalty': ['l2'], 'fit_intercept': [True],'C': [0.01,0.02,0.03,0.04,0.05,0.06,0.07,0.08,0.09,0.1]}]

grid_lr = GridSearchCV(estimator=lr, param_grid=params_lr, cv=2)

grid_lr.fit(X_train_stacked, y_train.iloc[:samples_train,:])

lr_end = time.time()

lr_best_score = grid_lr.best_score_
lr_best_params = grid_lr.best_params_

print(f"Time taken to run: {round((lr_end - lr_start)/60,1)} minutes")
print("Best accuracy:", round(lr_best_score,2))
print("Best parameters:", lr_best_params)

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


Time taken to run: 0.0 minutes
Best accuracy: 0.75
Best parameters: {'C': 0.02, 'fit_intercept': True, 'penalty': 'l2'}


  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


### GS : SVM
Let's grid search over the parameters of the Support Vector Machine.

In [139]:
# Using GridSearchCV to test different parameters
svm_start = time.time()

# svm = svm.SVC()

params_svm = [{'kernel': ['linear'], 'gamma':['scale'], 'coef0':[0.01, 0.01],'C': [0.01, 0.01]}]

grid_svm = GridSearchCV(estimator=svm, param_grid=params_svm, cv=2)

grid_svm.fit(X_train_stacked, y_train.iloc[:samples_train,:])

svm_end = time.time()

svm_best_score = grid_svm.best_score_
svm_best_params = grid_svm.best_params_

print(f"Time taken to run: {round((svm_end - svm_start)/60,1)} minutes")
print("Best accuracy:", round(svm_best_score,2))
print("Best parameters:", svm_best_params)

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


Time taken to run: 0.2 minutes
Best accuracy: 0.76
Best parameters: {'C': 0.01, 'coef0': 0.01, 'gamma': 'scale', 'kernel': 'linear'}


### GS : Random Forest
Let's grid search over the parameters of the Random forest algorithm.

In [149]:
# Using GridSearchCV to test different parameters
rf_start = time.time()

rf = RandomForestClassifier()

params_rf = [{'max_depth': [3,4,5], 'min_samples_leaf':[1,2,3], 'ccp_alpha':[0,0.001]}]

grid_rf = GridSearchCV(estimator=rf, param_grid=params_rf, cv=2)

grid_rf.fit(X_train_stacked, y_train.iloc[:samples_train,:])

rf_end = time.time()

rf_best_score = grid_rf.best_score_
rf_best_params = grid_rf.best_params_

print(f"Time taken to run: {round((rf_end - rf_start)/60,1)} minutes")
print("Best accuracy:", round(rf_best_score,2))
print("Best parameters:", rf_best_params)

  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_

  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  estimator.fit(X_train, y_train, **fit_params)
  self.best_estimator_.fit(X, y, **fit_params)


Time taken to run: 0.3 minutes
Best accuracy: 0.76
Best parameters: {'ccp_alpha': 0.001, 'max_depth': 3, 'min_samples_leaf': 1}


### GS : XGBoost
Let's grid search over the parameters of the Gradient Boosting Algorithm.

In [48]:
from sklearn.ensemble import GradientBoostingClassifier
# Using GridSearchCV to test different parameters
xgb_start = time.time()

xgb = GradientBoostingClassifier()

params_xgb = [{'n_estimators': [10, 20, 30], 'min_samples_leaf':[3], 'max_depth':[5, 6], 'ccp_alpha':[0]}]

grid_xgb = GridSearchCV(estimator = xgb, param_grid = params_xgb, cv=2)

grid_xgb.fit(X_train_stacked, y_train.iloc[:samples_train,:])

xgb_end = time.time()

xgb_best_score = grid_xgb.best_score_
xgb_best_params = grid_xgb.best_params_

print(f"Time taken to run: {round((xgb_end - xgb_start)/60,1)} minutes")
print("Best accuracy:", round(xgb_best_score,2))
print("Best parameters:", xgb_best_params)

  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)
  y = column_or_1d(y, warn=True)


Time taken to run: 0.1 minutes
Best accuracy: 0.76
Best parameters: {'ccp_alpha': 0, 'max_depth': 5, 'min_samples_leaf': 3, 'n_estimators': 30}


# Conclusion
Finally, after we found the best parameters (maybe not the ultimate parameters, the computational time necessary to do so is gigantic), the <font color = purple>logistic regression and adaboost</font> proved to be the best top-level learners with a final **f1-score of 75,48%**. We found better results than the author of the base notebook we used.
For further improvements, we could have taken a closer look to the misclassified examples and could have done some DIY to overcome this very misclassification.