# 3 - Comparing Machine Learning Algorithms

Welcome to the third part of the practical session of the Workshop __Machine Learning for Optical Network Systems!__ In this exercise you will:

 - Build multiple models to use in the same use-case as exercise 2 using various algorithms.
 - Learn how to compare these models against each other to make you select an appropriate selection.
 
__Let's go!__

Similar to exercise 2, this part of the practical session is based on the papers:
- Active Wavelength Load as a Feature for QoT Estimation Based on Support Vector Machine
- A Performance Analysis of Supervised Learning Classifiers for QoT Estimation in ROADM-based Networks

## Getting started

Please refer to the publications for detailed explanations on the use cases. Go to File -> Open -> and click on the directory publications/ and files diaz2019active.pdf and diaz2019performance.pdf.

### 1 - Import the required libraries

As in the previous exercises, execute the cell below to import the necessary libraries.

In [None]:
import file_reader as fr
import numpy as np
import time
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV
from sklearn.neural_network import MLPClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier, BaggingClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn import metrics

### 2 - Load the dataset

Now, get the data from the files below (given to you), and use a scaler to transform your data so it can be easily used by all the algorithms.

In [None]:
training_set_file = 'dataset/balanced-20372.csv'
testing_set_file = 'dataset/testset-2351.csv'

# Retrieve and format data for training set
X_train, y_train = fr.FileReader.read_array_three_class(fr.FileReader(), training_set_file)

# Retrieve and format data for testing set
X_test, y_test = fr.FileReader.read_array_three_class(fr.FileReader(), testing_set_file)

# Use a StandardScaler to scale your dataset.
scaler = StandardScaler()

# apply the transformations to the data:
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)

### 3 - Declaring parameters

The cell below may seem like a lot of code, but you are simply generating lists with names and parameters to be used by your multiple algorithms. __Don't forget to execute it though!__

In [None]:
names = ["Nearest Neighbors", "RBF SVM", "Linear SVM",
         "Logistic Regression","Decision Tree", "Neural Network",
         "Naive Bayes", "LDA"]
   
ensemble_names = ["Random Forest", "AdaBoost", "Bagging"]

parameters = {
    "Nearest Neighbors": {'n_neighbors': [1]},
    "RBF SVM": {'kernel':['rbf'], 'gamma':[100], 'C':[0.0001]},
    "Linear SVM": {'kernel':['linear'], 'gamma':[100], 'C':[0.0001]},
    "Logistic Regression":{'solver':['lbfgs'], 'multi_class':['multinomial'], 'random_state':[1]}, 
    "Decision Tree": {'max_depth':[5]},
    "Neural Network": {'alpha':[1], 'max_iter':[10000]},
    "Naive Bayes": {},
    "LDA": {'n_components':[None], 'priors':[None], 'shrinkage':['auto'],
              'solver':['lsqr']}         
    }

ensemble_parameters = {
    "Random Forest": {'max_depth':[5], 'n_estimators':[10], 'max_features':[1]},
    "AdaBoost": {'n_estimators':[10]}, 
    "Bagging":{'n_estimators':[100],'max_samples':[0.8], 'max_features':[0.8]}
    }

classifiers = [
    KNeighborsClassifier(1),
    SVC(),
    SVC(),
    LogisticRegression(), 
    DecisionTreeClassifier(),
    MLPClassifier(),
    GaussianNB(),
    LinearDiscriminantAnalysis()
    ]
    
ensemble_classifiers = [
    RandomForestClassifier(),
    AdaBoostClassifier(),
    BaggingClassifier()
    ]

y_train2 = np.argmax(y_train, axis=1)
y_test2 = np.argmax(y_test, axis=1)

y2_clfs = ["Linear SVM", "RBF SVM", "Logistic Regression", "Decision Tree", "AdaBoost", "Naive Bayes", "LDA", "Bagging"]

classifier_stats = {}

### 4 - Fitting data into the models and evaluate

Create a for loop to iterate through the implementation of the multiple Machine Learning algorithms and check the accuracy of each. This may take several minutes to run.

In [None]:
# iterate over classifiers
for name, clf in zip(names, classifiers):
    print("Running execution for classifier: %s" %name )
    # use the GridSearchCV for n_jobs and cv instantiation
    clf_grid = GridSearchCV(clf, parameters[name], n_jobs=10, cv=5)
    # check y variables modification
    if name in y2_clfs:
        ts = time.time()
        clf_grid.fit(X_train, y_train2)
        new_ts = time.time()
        total_time = new_ts - ts
        score = clf_grid.score(X_test, y_test2) # compute average accuracy
        y_pred = clf_grid.predict(X_test) # predict X_test based on trained model
        f1_score = metrics.f1_score(y_test2, y_pred, average='micro')
    else:
        ts = time.time()
        clf_grid.fit(X_train, y_train)
        new_ts = time.time()
        total_time = new_ts - ts
        score = clf_grid.score(X_test, y_test)
        y_pred = clf_grid.predict(X_test) # predict X_test based on trained model
        f1_score = metrics.f1_score(y_test, y_pred, average='micro')
    # save results on classifier statistics object
    classifier_stats[name] = (score, f1_score, total_time)

print("\n\n\n")
for clfs in classifier_stats:
    (score, f1_score, total_time) = classifier_stats[clfs]
    print("Classifier: %s.\nF1-score: %s.\nExecution time: %s seconds." %(clfs, f1_score, total_time))
    print("% - % - % - % - % - % - % - % - % - % - % - %")
    print("\n")

### __Congratulations!__

Now you know the basic tools through the Scikit-learn framework to execute Machine Learning algorithms plain and easy! Feel free to go back to the previous cell to also check the performance of the ensemble classifiers and/or change the hyperparameters of the algorithms to verify what works best!