<a href="https://colab.research.google.com/github/Marukos/Ensemble-Methods/blob/main/EnsembleMethods.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## About iPython Notebooks ##

iPython Notebooks are interactive coding environments embedded in a webpage. You will be using iPython notebooks in this class. Make sure you fill in any place that says `# BEGIN CODE HERE #END CODE HERE`. After writing your code, you can run the cell by either pressing "SHIFT"+"ENTER" or by clicking on "Run" (denoted by a play symbol). Before you turn this problem in, make sure everything runs as expected. First, **restart the kernel** (in the menubar, select Kernel$\rightarrow$Restart) and then **run all cells** (in the menubar, select Cell$\rightarrow$Run All).

 **What you need to remember:**

- Run your cells using SHIFT+ENTER (or "Run cell")
- Write code in the designated areas using Python 3 only
- Do not modify the code outside of the designated areas
- In some cases you will also need to explain the results. There will also be designated areas for that.

Fill in your **NAME** and **AEM** below:

In [None]:
NAME = "Markos Koletsas"
AEM = "3557"

---

# Assignment 3 - Ensemble Methods #

Welcome to your third assignment. This exercise will test your understanding on Ensemble Methods.

In [None]:
# Always run this cell
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import balanced_accuracy_score, f1_score, make_scorer
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import StackingClassifier, RandomForestClassifier, GradientBoostingClassifier,  ExtraTreesClassifier, BaggingClassifier, AdaBoostClassifier, VotingClassifier
from sklearn.svm import LinearSVC
from sklearn.neural_network import MLPClassifier
from sklearn.model_selection import StratifiedShuffleSplit, cross_validate
from sklearn.tree import DecisionTreeClassifier
# USE THE FOLLOWING RANDOM STATE FOR YOUR CODE
RANDOM_STATE = 42

## Download the Dataset ##
Download the dataset using the following cell or from this [link](https://github.com/sakrifor/public/tree/master/machine_learning_course/EnsembleDataset) and put the files in the same folder as the .ipynb file.
In this assignment you are going to work with a dataset originated from the [ImageCLEFmed: The Medical Task 2016](https://www.imageclef.org/2016/medical) and the **Compound figure detection** subtask. The goal of this subtask is to identify whether a figure is a compound figure (one image consists of more than one figure) or not. The train dataset consits of 4197 examples/figures and each figure has 4096 features which were extracted using a deep neural network. The *CLASS* column represents the class of each example where 1 is a compoung figure and 0 is not.


In [None]:
import urllib.request
url_train = 'https://github.com/sakrifor/public/raw/master/machine_learning_course/EnsembleDataset/train_set.csv'
filename_train = 'train_set.csv'
urllib.request.urlretrieve(url_train, filename_train)
url_test = 'https://github.com/sakrifor/public/raw/master/machine_learning_course/EnsembleDataset/test_set_noclass.csv'
filename_test = 'test_set_noclass.csv'
urllib.request.urlretrieve(url_test, filename_test)

('test_set_noclass.csv', <http.client.HTTPMessage at 0x228f9343910>)

In [None]:
# Run this cell to load the data
train_set = pd.read_csv("train_set.csv").sample(frac=1).reset_index(drop=True)
train_set.head()
X = train_set.drop(columns=['CLASS'])
y = train_set['CLASS'].values

## 1.0 Testing different ensemble methods ##
In this part of the assignment you are asked to create and test different ensemble methods using the train_set.csv dataset. You should use **10-fold cross validation** for your tests and report the average f-measure weighted and balanced accuracy of your models. You can use [cross_validate](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.cross_validate.html#sklearn.model_selection.cross_validate) and select both metrics to be measured during the evaluation. Otherwise, you can use [KFold](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html#sklearn.model_selection.KFold).

### !!! Use n_jobs=-1 where is posibble to use all the cores of a machine for running your tests ###

### 1.1 Voting ###
Create a voting classifier which uses three **simple** estimators/classifiers. Test both soft and hard voting and choose the best one. Consider as simple estimators the following:


*   Decision Trees
*   Linear Models
*   Probabilistic Models (Naive Bayes)
*   KNN Models  

In [None]:
# BEGIN CODE HERE

cls1 = DecisionTreeClassifier(random_state=RANDOM_STATE, max_depth=1) # Classifier #1
cls2 = KNeighborsClassifier(n_jobs=-1, n_neighbors=7) # Classifier #2
cls3 = LogisticRegression(n_jobs=-1, random_state=RANDOM_STATE) # Classifier #3

soft_vcls = VotingClassifier(voting='soft', estimators=[('Decision Tree', cls1), ('7NN', cls2),
                                                        ('Logistic Regression', cls3)],
                             n_jobs=-1) # Voting Classifier

hard_vcls = VotingClassifier(voting='hard', estimators=[('Decision Tree', cls1), ('KNN', cls2),
                                                        ('Logistic Regression', cls3)],
                             n_jobs=-1) # Voting Classifier

svlcs_scores = cross_validate(soft_vcls, X, y, n_jobs=-1, scoring={'f1 weighted':make_scorer(f1_score, average='weighted'),
                                                                   'balanced accuracy':make_scorer(balanced_accuracy_score)},
                              cv=10)

s_avg_fmeasure = sum(svlcs_scores['test_f1 weighted'])/len(svlcs_scores['test_f1 weighted']) # The average f-measure
s_avg_accuracy = sum(svlcs_scores['test_balanced accuracy'])/len(svlcs_scores['test_balanced accuracy']) # The average accuracy

hvlcs_scores = cross_validate(hard_vcls, X, y, n_jobs=-1, scoring={'f1 weighted':make_scorer(f1_score, average='weighted'),
                                                                   'balanced accuracy':make_scorer(balanced_accuracy_score)},
                              cv=10)

h_avg_fmeasure = sum(hvlcs_scores['test_f1 weighted'])/len(hvlcs_scores['test_f1 weighted']) # The average f-measure
h_avg_accuracy = sum(hvlcs_scores['test_balanced accuracy'])/len(hvlcs_scores['test_balanced accuracy']) # The average accuracy
#END CODE HERE

In [None]:
print("Classifier:")
print(soft_vcls)
print("F1 Weighted-Score: {} & Balanced Accuracy: {}".format(round(s_avg_fmeasure,4), round(s_avg_accuracy,4)))

Classifier:
VotingClassifier(estimators=[('Decision Tree',
                              DecisionTreeClassifier(max_depth=1,
                                                     random_state=42)),
                             ('7NN',
                              KNeighborsClassifier(n_jobs=-1, n_neighbors=7)),
                             ('Logistic Regression',
                              LogisticRegression(n_jobs=-1, random_state=42))],
                 n_jobs=-1, voting='soft')
F1 Weighted-Score: 0.8413 & Balanced Accuracy: 0.8334


You should achive above 82% (Soft Voting Classifier)

In [None]:
print("Classifier:")
print(hard_vcls)
print("F1 Weighted-Score: {} & Balanced Accuracy: {}".format(round(h_avg_fmeasure,4), round(h_avg_accuracy,4)))

Classifier:
VotingClassifier(estimators=[('Decision Tree',
                              DecisionTreeClassifier(max_depth=1,
                                                     random_state=42)),
                             ('KNN',
                              KNeighborsClassifier(n_jobs=-1, n_neighbors=7)),
                             ('Logistic Regression',
                              LogisticRegression(n_jobs=-1, random_state=42))],
                 n_jobs=-1)
F1 Weighted-Score: 0.8215 & Balanced Accuracy: 0.8084


You should achieve above 80% in both! (Hard Voting Classifier)

### 1.2 Stacking ###
Create a stacking classifier which uses two more complex estimators. Try different simple classifiers (like the ones mentioned before) for the combination of the initial estimators. Report your results in the following cell.

Consider as complex estimators the following:

*   Random Forest
*   SVM
*   Gradient Boosting
*   MLP




In [None]:
# BEGIN CODE HERE

cls1 = MLPClassifier(random_state=RANDOM_STATE) # Classifier #1
cls2 = LinearSVC(random_state=RANDOM_STATE, max_iter=500000) # Classifier #2
cls3 = GradientBoostingClassifier(random_state=RANDOM_STATE)
cls4 = '' # Classifier #3 (Optional)(Final Estimator)
scls = StackingClassifier(estimators=[('MLP',cls1),('Linear SVM',cls2),('Gradient Boosting', cls3)], n_jobs=-1) # Stacking Classifier

scores = cross_validate(scls, X, y, n_jobs=-1, scoring={'f1 weighted':make_scorer(f1_score, average='weighted'),
                                                        'balanced accuracy':make_scorer(balanced_accuracy_score)}, cv=10)

avg_fmeasure = sum(scores['test_f1 weighted'])/len(scores['test_f1 weighted']) # The average f-measure
avg_accuracy = sum(scores['test_balanced accuracy'])/len(scores['test_balanced accuracy']) # The average accuracy
#END CODE HERE

In [None]:
print("Classifier:")
print(scls)
print("F1 Weighted Score: {} & Balanced Accuracy: {}".format(round(avg_fmeasure,4), round(avg_accuracy,4)))

Classifier:
StackingClassifier(estimators=[('MLP', MLPClassifier(random_state=42)),
                               ('Linear SVM',
                                LinearSVC(max_iter=500000, random_state=42)),
                               ('Gradient Boosting',
                                GradientBoostingClassifier(random_state=42))],
                   n_jobs=-1)
F1 Weighted Score: 0.8556 & Balanced Accuracy: 0.8489


You should achieve above 85% in both

## 2.0 Randomization ##

**2.1** You are asked to create three ensembles of decision trees where each one uses a different method for producing homogeneous ensembles. Compare them with a simple decision tree classifier and report your results in the dictionaries (dict) below using as key the given name of your classifier and as value the f1_weighted/balanced_accuracy score. The dictionaries should contain four different elements.  

In [None]:
# BEGIN CODE HERE

forest=[
    DecisionTreeClassifier(random_state=RANDOM_STATE, max_features='auto'),
    DecisionTreeClassifier(random_state=RANDOM_STATE, max_features='sqrt'),
    DecisionTreeClassifier(random_state=RANDOM_STATE, max_features='log2')
]

voting_soft = VotingClassifier(voting='soft', estimators=[('Criterion Entropy', forest[0]), ('Criterion Log Loss', forest[1]),
                                                        ('Criterion Gini', forest[2])],
                             n_jobs=-1) # Voting Classifier

voting_hard = voting_soft = VotingClassifier(voting='hard', estimators=[('Criterion Entropy', forest[0]), ('Criterion Log Loss', forest[1]),
                                                                        ('Criterion Gini', forest[2])],
                             n_jobs=-1) # Voting Classifier

stack = StackingClassifier(estimators=[('Criterion Entropy', forest[0]), ('Criterion Log Loss', forest[1]),
                                       ('Criterion Gini', forest[2])],
                           n_jobs=-1) # Stacking Classifier


ens1 = BaggingClassifier(base_estimator=voting_soft, random_state=RANDOM_STATE, n_jobs=-1)
ens2 = BaggingClassifier(base_estimator=voting_hard, random_state=RANDOM_STATE, n_jobs=-1)
ens3 = BaggingClassifier(base_estimator=stack, random_state=RANDOM_STATE, n_jobs=-1)
tree = DecisionTreeClassifier(random_state=RANDOM_STATE)

f_measures = dict()
accuracies = dict()

for classifier, name in [(ens1,'Bugging and Soft Voting Trees'), (ens2,'Bugging with Hard Voting Trees'),
                        (ens3,'Bugging Stacking Trees'), (tree, 'Simple Decision')]:

    scores = cross_validate(classifier, X, y, n_jobs=-1, scoring={'f1 weighted':make_scorer(f1_score, average='weighted'),
                                                        'balanced accuracy':make_scorer(balanced_accuracy_score)}, cv=10,
                            error_score='raise')
    avg_fmeasure = sum(scores['test_f1 weighted'])/len(scores['test_f1 weighted']) # The average f-measure
    avg_accuracy = sum(scores['test_balanced accuracy'])/len(scores['test_balanced accuracy']) # The average accuracy
    f_measures[name] = avg_fmeasure
    accuracies[name] = avg_accuracy
# Example f_measures = {'Simple Decision': 0.8551, 'Ensemble with random ...': 0.92, ...}

#END CODE HERE

In [None]:
print(ens1)
print(ens2)
print(ens3)
print(tree)
for name,score in f_measures.items():
    print("Classifier:{} -  F1 Weighted:{}".format(name,round(score,4)))
for name,score in accuracies.items():
    print("Classifier:{} -  BalancedAccuracy:{}".format(name,round(score,4)))

BaggingClassifier(base_estimator=VotingClassifier(estimators=[('Criterion '
                                                               'Entropy',
                                                               DecisionTreeClassifier(max_features='auto',
                                                                                      random_state=42)),
                                                              ('Criterion Log '
                                                               'Loss',
                                                               DecisionTreeClassifier(max_features='sqrt',
                                                                                      random_state=42)),
                                                              ('Criterion Gini',
                                                               DecisionTreeClassifier(max_features='log2',
                                                                                      r

**2.2** Describe your classifiers and your results.

Έχουμε τρία απλά Decision Trees στα οποία χρησιμοποιούμε τρεις διαφορετικούς τρόπους για την επιλογή χαρακτηριστικών και βάσει αυτών δημιουργούμε τρία διαφορετικά ensembles, ένα voting classifier με soft voting, έναν voting classifier με hard voting και ένα stacking classifier. Παρατηρούμε πως οι voting classifiers, soft και hard, για μια τόσο μικρή τροποποίηση στις υπερπαραμέτρους καταφέρνουν και πετυχαίνουν το ίδιο σκορ, είτε μιλάμε για f1 weighted, είτε για balanced accuracy. Έπειτα, έχουμε τον stacking classifier που πετυχαίνει λίγο χαμηλότερο σκορ. Ωστόσο, αυτό που μας ενδιαφέρει να δώσουμε έμφαση είναι πως όλοι ξεπέρασαν το σκορ του απλά δέντρου αποφάσεων παρά τις αμελητέες αλλαγές στις υπερπαραμέτρους, έτσι αντιλαμβανόμαστε και εμείς οι ίδιοι το πλεονέκτημα των ensemble έναντι των απλών classifier.

**2.3** Increasing the number of estimators in a bagging classifier can drastically increase the training time of a classifier. Is there any solution to this problem? Can the same solution be applied to boosting classifiers?

Στους bagging classifiers μπορούμε να εκπαίδευσουμε και να χρησιμοποιήσουμε για προβλέψεις κάθε μοντέλο ξεχωριστά, οπότε μπορούμε να εκπαιδεύουμε περισσότερα του ενός μοντέλου παράλληλα. Έτσι, μπορεί να γίνει χρήση πολλαπλών υπολογιστικών ταυτόχρονα προκειμένου να επιταχύνουμε την εκπαίδευση ενός bagging classifier.
Από την άλλη πλευρά δεν ισχύει το ίδιο και για τους boosting classifiers, καθώς αυτοί χαρακτηρίζονται από διαδοχικότητα, αντί για παραλληλία, οπότε είναι και φυσικό να χρειάζεται περισσότερος χρόνος για την εκπαίδευση τέτοιων μοντέλων.

## 3.0 Creating the best classifier ##

**3.1** In this part of the assignment you are asked to train the best possible ensemble! Describe the process you followed to achieve this result. How did you choose your classifier and your parameters and why. Report the f-measure (weighted) & balanced accuracy (10-fold cross validation) of your final classifier and results of classifiers you tried in the cell following the code. Can you achieve a balanced accuracy over 83-84%?

In [None]:
# BEGIN CODE HERE

best_cls = StackingClassifier(estimators=[('MLP', MLPClassifier(random_state=RANDOM_STATE)),
                                      ('Gradient Boosting',GradientBoostingClassifier(random_state=RANDOM_STATE)),
                                      ('Linear SVM',LinearSVC(random_state=RANDOM_STATE, max_iter=500000))],
                          n_jobs=-1) # Stacking Classifier


scores = cross_validate(best_cls, X, y, n_jobs=-1, scoring={'f1 weighted':make_scorer(f1_score, average='weighted'),
                                                        'balanced accuracy':make_scorer(balanced_accuracy_score)},
                        cv=10, error_score="raise")

best_fmeasure = sum(scores['test_f1 weighted'])/len(scores['test_f1 weighted']) # The average f-measure
best_accuracy = sum(scores['test_balanced accuracy'])/len(scores['test_balanced accuracy']) # The average accuracy

#END CODE HERE

In [None]:
print("Classifier:")
print(best_cls)
print("F1 Weighted-Score:{} & Balanced Accuracy:{}".format(best_fmeasure, best_accuracy))

Classifier:
StackingClassifier(estimators=[('MLP', MLPClassifier(random_state=42)),
                               ('Gradient Boosting',
                                GradientBoostingClassifier(random_state=42)),
                               ('Linear SVM',
                                LinearSVC(max_iter=500000, random_state=42))],
                   n_jobs=-1)
F1 Weighted-Score:0.8555955691096606 & Balanced Accuracy:0.8489449221507813


**3.2** Describe the process you followed to achieve this result. How did you choose your classifier and your parameters and why. Report the f-measure & accuracy (10-fold cross validation) of your final classifier and results of classifiers you tried in the cell following the code.

Έγιναν πολλές δοκιμές, με συνδυασμούς stacking και voting classifiers. Είτε κάνοντας stacking από voting και άλλους classifier, είτε κάνοντας stacking από voting και άλλους classifiers. Ωστόσο, δεν κατάφερε κανένας να ξεπεράσει τα αποτελέσματα του classifier της ασκέήσεως 1.2. Η λογική που δημιουργήσαμε εκέινον τον classifier ήταν η εξής: Βρίσκουμε πρώτα τους δύο διαφορετικούς classifier που ταιριάζουν καλύτερα μεταξύ τους (MLP & SVM) και ύστερα βρίσκουμε με ποιον άλλον classifier συνεργάζονται εξίσου καλά αυτοί η δύο (Gradient Boosting) και τους συνδυάζουμε. Έν τέλει για να κάνουμε το μοντέλο μας ακόμα λίγο πιο πολύπλοκο δοκιμάσαμε διαφορετικούς classifiers ως τελικούς estimators, παρόλα αυτά αυτό δε μας έφερε τα επιθυμητά αποτελέσματα, καθώς μπορεί να αύξαναν τη μία μετρική εις βάρος της άλλης, οπότε καταλήξαμε να χρησιμοποιούμε και πάλι Linear Regression που φαινόταν κιόλας να απέδιδε αρκέτα καλά στο πρόβλημα ακόμη και μόνη της. Επομένως, καταλήξαμε στον ίδιο Classifier με της άσκησης 1.2.

**3.3** Create a classifier that is going to be used in production - in a live system. Use the *test_set_noclass.csv* to make predictions. Store the predictions in a list.  

In [None]:
# BEGIN CODE HERE
data = pd.read_csv('train_set.csv')
X = data.drop(columns=['CLASS']).values
y = data['CLASS'].values
sss = StratifiedShuffleSplit(n_splits = 10, test_size = 0.33, random_state=RANDOM_STATE)

best = [[0.0, 0.0], [0.0, 0.0], [0.0, 0.0]]
best_train = [[[], []], [[], []], [[], []]]
for train_indexes, test_indexes in sss.split(X, y):
    X_train, X_test = X[train_indexes], X[test_indexes]
    y_train, y_test = y[train_indexes], y[test_indexes]
    classifier = StackingClassifier(estimators=[('MLP', MLPClassifier(random_state=RANDOM_STATE)),
                                      ('Gradient Boosting',GradientBoostingClassifier(random_state=RANDOM_STATE)),
                                      ('Linear SVM',LinearSVC(random_state=RANDOM_STATE, max_iter=500000))],
                              n_jobs=-1) # Stacking Classifier 0
    classifier.fit(X_train,y_train)
    y_predict = classifier.predict(X_test)
    fmeasure = f1_score(y_test, y_predict, average='weighted')
    accuracy = balanced_accuracy_score(y_test, y_predict)

    if fmeasure > best[0][0]:
        best[0][0] = fmeasure
        best[0][1] = accuracy
        best_train[0][0] = X_train
        best_train[0][1] = y_train

    if accuracy > best[1][1]:
        best[1][0] = fmeasure
        best[1][1] = accuracy
        best_train[1][0] = X_train
        best_train[1][1] = y_train

    if fmeasure > best[2][0] and accuracy > best[2][1]:
        best[2][0] = fmeasure
        best[2][1] = accuracy
        best_train[2][0] = X_train
        best_train[2][1] = y_train

# best = [[0.85425292-15319144, 0.84-1473921780401-1],
        # [0.85382512-10873148, 0.8474281079359889],
        # [0.85382512-10873148, 0.8474281079359889]]

cls = classifier = StackingClassifier(estimators=[('MLP', MLPClassifier(random_state=RANDOM_STATE)),
                                      ('Gradient Boosting',GradientBoostingClassifier(random_state=RANDOM_STATE)),
                                      ('Linear SVM',LinearSVC(random_state=RANDOM_STATE, max_iter=500000))],
                              n_jobs=-1) # Stacking Classifier 0

cls.fit(best_train[2][0], best_train[2][1])

# END CODE HERE
test_set = pd.read_csv("test_set_noclass.csv")
predictions = cls.predict(test_set)

Το μόνο σχόλιο που χρειάζεται είναι πως το εκπαιδεύουμε με ένα μέρος του dataset (70%) το οποίο είναι Stratified και χρησιμοποιούμε το υπόλοιπο (30%) για να ελέγξουμε την ποιότητα του split. Έπειτα διαλέγουμε το καλύτερο split βάσει μίας από τις δύο μετρικές ή και των δύο ταυτόχρονα (έγινε σύμφωνα με τη προσωπική μας κρίση αυτή τη φορά η επιλογή και κατά πόσο απόκλιση υπήρχε ανάμεσα στα μέγιστα σκορ τους) και το εκπαιδεύουμε με αυτό.

#### This following cell will not be executed. The test_set.csv with the classes will be made available after the deadline and this cell is for testing purposes!!! Do not modify it! ###

In [None]:
if False:
    from sklearn.metrics import f1_score, balanced_accuracy_score
    final_test_set = pd.read_csv('test_set.csv')
    ground_truth = final_test_set['CLASS']
    print("Balanced Accuracy: {}".format(balanced_accuracy_score(predictions, ground_truth)))
    print("F1 Weighted-Score: {}".format(f1_score(predictions, ground_truth, average='weighted')))

Both should aim above 85%!