# Aprendizagem Automática Avançada - Assignment 3
# Cláudia Afonso nº 36273 & Rita Rodrigues nº 54859

# Problem 2

**Implement a Weighted Average Ensemble System in Python.**
**This ensemble system combines the output of several experts with a linear combination, whose weights are the accuracy scores of the experts on the dataset. This output should be rounded to the nearest integer.**

**Use different types of classifiers as experts, with different hyperparameters, e.g. two Decision Trees with gini and entropy as criterion, two SVM’s with polynomial and RBF kernel, and other classifiers you have previously learned.**

In [33]:
from sklearn.datasets import load_breast_cancer
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score

In the following code, the breast cancer dataset was first loaded using the load_breast_cancer() function from scikit-learn. The feature data (data.data) and target values (data.target) was then extracted.

To inspect the data, a pandas DataFrame (df) object was created from the feature data and the columns were labelled with the respective feature names (data.feature_names).

The presence of any missing values in the created DataFrame was subsequently checked by calling the isna() method and summing the results.

In [34]:
data = load_breast_cancer()
X = data.data
y = data.target

In [35]:
df = pd.DataFrame(data.data, columns=data.feature_names)
df

Unnamed: 0,mean radius,mean texture,mean perimeter,mean area,mean smoothness,mean compactness,mean concavity,mean concave points,mean symmetry,mean fractal dimension,...,worst radius,worst texture,worst perimeter,worst area,worst smoothness,worst compactness,worst concavity,worst concave points,worst symmetry,worst fractal dimension
0,17.99,10.38,122.80,1001.0,0.11840,0.27760,0.30010,0.14710,0.2419,0.07871,...,25.380,17.33,184.60,2019.0,0.16220,0.66560,0.7119,0.2654,0.4601,0.11890
1,20.57,17.77,132.90,1326.0,0.08474,0.07864,0.08690,0.07017,0.1812,0.05667,...,24.990,23.41,158.80,1956.0,0.12380,0.18660,0.2416,0.1860,0.2750,0.08902
2,19.69,21.25,130.00,1203.0,0.10960,0.15990,0.19740,0.12790,0.2069,0.05999,...,23.570,25.53,152.50,1709.0,0.14440,0.42450,0.4504,0.2430,0.3613,0.08758
3,11.42,20.38,77.58,386.1,0.14250,0.28390,0.24140,0.10520,0.2597,0.09744,...,14.910,26.50,98.87,567.7,0.20980,0.86630,0.6869,0.2575,0.6638,0.17300
4,20.29,14.34,135.10,1297.0,0.10030,0.13280,0.19800,0.10430,0.1809,0.05883,...,22.540,16.67,152.20,1575.0,0.13740,0.20500,0.4000,0.1625,0.2364,0.07678
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
564,21.56,22.39,142.00,1479.0,0.11100,0.11590,0.24390,0.13890,0.1726,0.05623,...,25.450,26.40,166.10,2027.0,0.14100,0.21130,0.4107,0.2216,0.2060,0.07115
565,20.13,28.25,131.20,1261.0,0.09780,0.10340,0.14400,0.09791,0.1752,0.05533,...,23.690,38.25,155.00,1731.0,0.11660,0.19220,0.3215,0.1628,0.2572,0.06637
566,16.60,28.08,108.30,858.1,0.08455,0.10230,0.09251,0.05302,0.1590,0.05648,...,18.980,34.12,126.70,1124.0,0.11390,0.30940,0.3403,0.1418,0.2218,0.07820
567,20.60,29.33,140.10,1265.0,0.11780,0.27700,0.35140,0.15200,0.2397,0.07016,...,25.740,39.42,184.60,1821.0,0.16500,0.86810,0.9387,0.2650,0.4087,0.12400


In [36]:
df.isna().sum().sum()

0

Since the obtained value was zero, it was confirmed that the dataset does not contain any missing values. 

The dataset was then split into training and test sets using the train_test_split() function from scikit-learn. A test size of 25% of the dataset and a random state to 42 were chosen for reproducibility. The training set was subsequently split into a new training set and a validation set using the same function.

Before training and fitting the model, the values associated to the features were scaled using the StandardScaler() function from scikit-learn to ensure that these are all on a similar scale. To accomplish this, the scaler was applied separately on the training, validation and test sets.

In [37]:
# Split the dataset into train and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# Split the train set into train and validation sets
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.25, random_state=42)

# scale features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
X_val = scaler.transform(X_val)

In order to implement a weighted average ensemble system, it is necessary to first train several expert classifiers on the data. For this purpose, four different classifiers were initialized with varying hyperparameters. Specifically, two decision tree classifiers, one with a gini criterion and a maximum depth of 5, and the other with an entropy criterion and a maximum depth of 3, were used. Additionally, two support vector machine classifiers, one with a polynomial kernel of degree 3, and the other with a RBF kernel, were used as well. A naïve Bayes and logistic regression classifier were also applied.

After initializing the expert classifiers, each one was trained on the training set and their accuracy was evaluated on the validation set using the accuracy_score function of scikit-learn. The accuracy score was stored for each expert classifier in the expert_accuracies list.

In [38]:
# Initialize expert classifiers
experts = [
    DecisionTreeClassifier(criterion='gini', max_depth=5),
    DecisionTreeClassifier(criterion='entropy', max_depth=3),
    SVC(kernel='poly', degree=3),
    SVC(kernel='rbf'),
    GaussianNB(),
    LogisticRegression(),
]

# Train experts and evaluate their accuracy
expert_accuracies = []
for expert in experts:
    expert.fit(X_train, y_train)
    y_pred = expert.predict(X_val)
    expert_accuracies.append(accuracy_score(y_val, y_pred))

expert_accuracies

[0.9252336448598131,
 0.9252336448598131,
 0.9158878504672897,
 1.0,
 0.9345794392523364,
 0.9906542056074766]

Following training and evaluation of the accuracy for each expert classifier on the validation set, the weights of each expert were computed to determine the importance of each expert in the ensemble system. To accomplish this, the softmax function was applied to the list of expert accuracies.

In [39]:
# Compute expert weights using softmax
expert_weights = np.exp(expert_accuracies) / np.sum(np.exp(expert_accuracies))
expert_weights

array([0.16272525, 0.16272525, 0.16121154, 0.175358  , 0.16425318,
       0.17372677])

After computing the weights for each expert using the softmax function, these weights can be used to calculate the ensemble output on the test set. The idea is to combine the predictions of each expert by multiplying them with their respective weights, and then taking the sum of these weighted predictions to obtain a final prediction.

In [40]:
# Make predictions using the weighted average ensemble system
predictions = []
for i in range(len(X_test)):
    ensemble_prediction = 0
    for j in range(len(experts)):
        expert_prediction = experts[j].predict([X_test[i]])[0]
        ensemble_prediction += expert_weights[j] * expert_prediction
    predictions.append(round(ensemble_prediction))
predictions[:10]

[1, 0, 0, 1, 1, 0, 0, 0, 1, 1]

In [41]:
# Evaluate the performance of the ensemble system
accuracy = accuracy_score(y_test, predictions)
print("Accuracy:", accuracy)

Accuracy: 0.9790209790209791


We calculated the accuracy of the weighted average ensemble system on the test set and obtained a result of 0.979, indicating that the system correctly predicted the class of 97.9% of the instances in the test set. This high accuracy suggests that the ensemble system is effective at combining the strengths of multiple expert classifiers to make accurate predictions.

It's worth noting that the accuracy of the ensemble system is higher than the majority of the accuracies of the individual experts on the validation set, which ranged from 0.916 to 1.0. This highlights the benefits of using an ensemble system, which can often outperform individual classifiers by leveraging their complementary strengths.