# Ensemble methods. Exercises


In this section we have only one exercise:

1. Find the best three classifier in the stacking method using the classifiers from scikit-learn package, such as:


* Linear regression,
* Nearest Neighbors,
* Linear SVM,
* Decision Tree,
* Naive Bayes,
* QDA.

In [6]:
%store -r data_set
%store -r labels
%store -r test_data_set
%store -r test_labels
%store -r unique_labels

## Exercise 1: Find the best three classifier in the stacking method

In [7]:
import numpy as np
from sklearn.metrics import accuracy_score

from sklearn.linear_model import LinearRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

from itertools import combinations

In [8]:
def build_classifiers():
    
    linear_regression = LinearRegression()
    linear_regression.fit(data_set, labels)
    
    neighbors = KNeighborsClassifier()
    neighbors.fit(data_set, labels)
    
    svc = SVC(gamma='auto')
    svc.fit(data_set, labels)
    
    decision_tree = DecisionTreeClassifier()
    decision_tree.fit(data_set, labels)

    gaussian = GaussianNB()
    gaussian.fit(data_set, labels)
    
    qda = QuadraticDiscriminantAnalysis()
    qda.fit(data_set, labels)
    
    classifiers = [linear_regression, neighbors, svc, decision_tree, gaussian, qda]
    comb = combinations(classifiers, 3)

    return comb

In [9]:
def build_stacked_classifier(combinations):
    for classifiers in combinations:
        output = []
        trio = []
        for classifier in classifiers:
            output.append(classifier.predict(data_set))
            trio.append(classifier.__class__.__name__)
        output = np.array(output).reshape((130,3))
    
        # stacked classifier part:
        stacked_classifier = DecisionTreeClassifier() # For stacked classifier - DecisionTree CLassifier
        stacked_classifier.fit(output.reshape((130,3)), labels.reshape((130,)))
        test_set = []
        for classifier in classifiers:
            test_set.append(classifier.predict(test_data_set))
        test_set = np.array(test_set).reshape((len(test_set[0]),3))
        predicted = stacked_classifier.predict(test_set)
        
        accuracy = accuracy_score(test_labels, predicted)
        print(trio[0],"+",trio[1],"+",trio[2]," --> ",accuracy)


In [10]:
classifiers = build_classifiers()
predicted = build_stacked_classifier(classifiers)

LinearRegression + KNeighborsClassifier + SVC  -->  0.65
LinearRegression + KNeighborsClassifier + DecisionTreeClassifier  -->  0.7
LinearRegression + KNeighborsClassifier + GaussianNB  -->  0.7
LinearRegression + KNeighborsClassifier + QuadraticDiscriminantAnalysis  -->  0.75
LinearRegression + SVC + DecisionTreeClassifier  -->  0.65
LinearRegression + SVC + GaussianNB  -->  0.1
LinearRegression + SVC + QuadraticDiscriminantAnalysis  -->  0.65
LinearRegression + DecisionTreeClassifier + GaussianNB  -->  0.1
LinearRegression + DecisionTreeClassifier + QuadraticDiscriminantAnalysis  -->  0.75
LinearRegression + GaussianNB + QuadraticDiscriminantAnalysis  -->  0.6
KNeighborsClassifier + SVC + DecisionTreeClassifier  -->  0.85
KNeighborsClassifier + SVC + GaussianNB  -->  0.0
KNeighborsClassifier + SVC + QuadraticDiscriminantAnalysis  -->  0.85
KNeighborsClassifier + DecisionTreeClassifier + GaussianNB  -->  0.0
KNeighborsClassifier + DecisionTreeClassifier + QuadraticDiscriminantAnalysis

### The best set is (SVC + DecisionTreeClassifier + QuadraticDiscriminantAnalysis) which gives us 0.95 of accuracy.