# Ensemble methods. Exercises


In this section we have only one exercise:

1. Find the best three classifier in the stacking method using the classifiers from scikit-learn package, such as:


* Linear regression,
* Nearest Neighbors,
* Linear SVM,
* Decision Tree,
* Naive Bayes,
* QDA.

In [1]:
%store -r data_set
%store -r labels
%store -r test_data_set
%store -r test_labels
%store -r unique_labels

## Exercise 1: Find the best three classifier in the stacking method

In [2]:
import numpy as np
from sklearn.metrics import accuracy_score

from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

In [3]:
def build_classifiers():
    """
    Creates objects of models from Sklearn library: 
        - Linear regression
        - Nearest Neighbors
        - Linear SVM
        - Decision Tree
        - Naive Bayes
        - QDA
        
    Returns
    -------
    dict(str: BaseEstimator)
        Dictionary containing all models
    
    """
    
    linear_regression_model = LinearRegression()
    linear_regression_model.fit(data_set, labels)
    
    k_neighbors_model = KNeighborsClassifier()
    k_neighbors_model.fit(data_set, labels)
    
    svc_model = SVC()
    svc_model.fit(data_set, labels)
    
    decision_tree_model = DecisionTreeClassifier()
    decision_tree_model.fit(data_set, labels)
    
    naive_bayes_model = GaussianNB()
    naive_bayes_model.fit(data_set, labels)
    
    quadratic_discriminant_model = QuadraticDiscriminantAnalysis()
    quadratic_discriminant_model.fit(data_set, labels)
    
    return dict(linear_regression_model = linear_regression_model,
                k_neighbors_model = k_neighbors_model,
                svc_model = svc_model,
                decision_tree_model = decision_tree_model,
                naive_bayes_model = naive_bayes_model, 
                quadratic_discriminant_model = quadratic_discriminant_model)

In [4]:
def build_stacked_classifier(classifiers):
    output = []
    for classifier in classifiers:
        output.append(classifier.predict(data_set))
    output = np.array(output).reshape((130,3))
    
    # stacked classifier part:
    stacked_classifier = DecisionTreeClassifier()
    stacked_classifier.fit(output.reshape((130,3)), labels.reshape((130,)))
    test_set = []
    for classifier in classifiers:
        test_set.append(classifier.predict(test_data_set))
    test_set = np.array(test_set).reshape((len(test_set[0]),3))
    predicted = stacked_classifier.predict(test_set)
    return predicted

In [5]:
import itertools

def predict_all():
    """
    Creates all listed classifiers, for each combination creates stacked classifier
    and saves the accuracy.
    
    Returns
    -------
    classifiers_sets: List[ dict( str: BaseEstimator, float) ]
        each dictionary contains 3 values of BaseEstimator 
        and float value for 'accuracy' key
    
    """
    all_classifiers = build_classifiers()
    classifiers_sets = []
    combinations = itertools.combinations(all_classifiers, 3)
    for c in combinations:
        # if labels are unique
        if len(c) == len(set(c)):
            c_set = {}
            for i in c:
                c_set[i] = all_classifiers[i]
            classifiers_sets.append(c_set)

    for c_set in classifiers_sets:
        predicted = build_stacked_classifier(c_set.values())
        accuracy = accuracy_score(test_labels, predicted)
        c_set['accuracy'] = accuracy
        
    return classifiers_sets

In [6]:
classifiers_sets = predict_all()
best_set = max(classifiers_sets, key = lambda x: x['accuracy'])
accuracy = best_set.pop('accuracy')

print("Best set : {} , accuracy : {} ".format(best_set.keys(),accuracy ))

Best set : dict_keys(['k_neighbors_model', 'decision_tree_model', 'quadratic_discriminant_model']) , accuracy : 0.95 




## Summary

The best accuracy of stacked classifier that was generated is 0,95.
The stacked classifier was created basen on 3 classifiers : Nearest Neighbors, Decision Tree and QDA. 
