# Ensemble methods. Exercises


In this section we have only two exercise:

1. Find the best three classifier in the stacking method using the classifiers from scikit-learn package.

2. Build arcing arc-x4 method. 

In [1]:
%store -r data_set
%store -r labels
%store -r test_data_set
%store -r test_labels
%store -r unique_labels

## Exercise 1: Find the best three classifier in the stacking method

Please use the following classifiers:

* Linear regression,
* Nearest Neighbors,
* Linear SVM,
* Decision Tree,
* Naive Bayes,
* QDA.

In [2]:
import numpy as np
from sklearn.metrics import accuracy_score
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

In [3]:
def select_top_classifiers(classifiers, X, y):
    scored = []
    for name, clf in classifiers:
        predictions = clf.predict(X)
        acc = accuracy_score(y, predictions)
        scored.append((acc, name, clf))
    
    scored.sort(reverse=True)
    
    top_three = [(name, clf) for _, name, clf in scored[:3]]
    print("Top 3 classifiers:")
    for acc, name, _ in scored[:3]:
        print(f"{name}: {acc:.4f}")
    
    return [clf for name, clf in top_three]

In [4]:
def build_classifiers():
    classifiers = [
        ("Logistic Regression", LogisticRegression(max_iter=1000)),
        ("KNN", KNeighborsClassifier()),
        ("SVM", SVC()),
        ("Decision Tree", DecisionTreeClassifier()),
        ("Naive Bayes", GaussianNB()),
        ("QDA", QuadraticDiscriminantAnalysis())
    ]
    
    trained = []
    for name, clf in classifiers:
        clf.fit(data_set, labels)
        trained.append((name, clf))
    
    top_classifiers = select_top_classifiers(trained, data_set, labels)
    return top_classifiers

In [5]:
def build_stacked_classifier(classifiers):
    output = []
    for classifier in classifiers:
        output.append(classifier.predict(data_set))
    
    output = np.array(output).T
    
    stacked_classifier = LogisticRegression(max_iter=1000)
    stacked_classifier.fit(output, labels)
    
    test_set = []
    for classifier in classifiers:
        test_set.append(classifier.predict(test_data_set))
    
    test_set = np.array(test_set).T
    predicted = stacked_classifier.predict(test_set)
    return predicted

In [6]:
classifiers = build_classifiers()
predicted = build_stacked_classifier(classifiers)
accuracy = accuracy_score(test_labels, predicted)
print(accuracy)

Top 3 classifiers:
Decision Tree: 1.0000
QDA: 0.9846
Naive Bayes: 0.9692
0.85


## Exercise 2: 

Use the boosting method and change the code to fullfilt the following requirements:

* the weights should be calculated as:
$w_{n}^{(t+1)}=\frac{1+ I(y_{n}\neq h_{t}(x_{n})}{\sum_{i=1}^{N}1+I(y_{n}\neq h_{t}(x_{n})}$,
* the prediction is done with a voting method.

In [7]:
import numpy as np
from sklearn.tree import DecisionTreeClassifier

# prepare data set

def generate_data(sample_number, feature_number, label_number):
    data_set = np.random.random_sample((sample_number, feature_number))
    labels = np.random.choice(label_number, sample_number)
    return data_set, labels

labels = 2
dimension = 2
test_set_size = 1000
train_set_size = 5000
train_set, train_labels = generate_data(train_set_size, dimension, labels)
test_set, test_labels = generate_data(test_set_size, dimension, labels)

# init weights
number_of_iterations = 10
weights = np.ones((train_set_size,)) / train_set_size

def train_model(classifier, weights):
    return classifier.fit(X=train_set, y=train_labels, sample_weight=weights)

def calculate_error(model):
    predicted = model.predict(test_set)
    I=calculate_accuracy_vector(predicted, test_labels)
    Z=np.sum(I)
    return (1+Z)/1.0

Fill the two functions below:

In [8]:
def set_new_weights(model):
    predictions = model.predict(train_set)
    I = (predictions != train_labels).astype(int)
    new_weights = 1 + I
    new_weights = new_weights / np.sum(new_weights)
    return new_weights

Train the classifier with the code below:

In [9]:
classifier = DecisionTreeClassifier(max_depth=1, random_state=1)
classifier.fit(X=train_set, y=train_labels)
alphas = []
classifiers = []
for iteration in range(number_of_iterations):
    model = train_model(classifier, weights)
    weights = set_new_weights(model)
    classifiers.append(model)

print(weights)


validate_x, validate_label = generate_data(1, dimension, labels)

[0.0001317 0.0001317 0.0002634 ... 0.0001317 0.0001317 0.0001317]


Set the validation data set:

In [10]:
validate_x, validate_label = generate_data(1, dimension, labels)

Fill the prediction code:

In [11]:
from collections import Counter

def get_prediction(x):
    preds = []
    for model in classifiers:
        preds.append(model.predict(x))
    preds = np.array(preds)
    
    final_preds = []
    for i in range(preds.shape[1]):
        votes = preds[:, i]
        most_common = Counter(votes).most_common(1)[0][0]
        final_preds.append(most_common)
    return np.array(final_preds)

Test it:

In [12]:
prediction = get_prediction(validate_x)[0]

print(prediction)

1
