<a href="https://colab.research.google.com/github/Sjoerd-de-Witte/Machine-Learning-2023/blob/main/4_4_Model_comparison_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
!gdown -O /tmp/ml.py 174lBNvDBJSVWs3OpNL3a68cnhWIcWYuY
%run /tmp/ml.py

Downloading...
From: https://drive.google.com/uc?id=174lBNvDBJSVWs3OpNL3a68cnhWIcWYuY
To: /tmp/ml.py
  0% 0.00/1.31k [00:00<?, ?B/s]100% 1.31k/1.31k [00:00<00:00, 4.57MB/s]


# No Free Lunch Theorem

In English, "No free Lunch Theorem" means there is no easy shortcut towards success. Wolpert and Macready argue that on average all algorithms perform equally well if you average their performance over all possible optimization functions. Thus, if you make no assumption about the data, there is no reason to prefer one model over the other.

In practice, we commonly analyze the data before proceeding to choose a model. And sometimes, data analysis makes us lean towards a selection of models, for example image classification -> convolutional networks. Although there certainly are problem classes where a specific model is dominantly used, they are rare. And it can be wise to keep an open mind and try out several models instead of prematurely dismissing models based on our (perhaps false) assumptions.

# Data

You will use the `wine_quality` dataset. In this dataset, (chemical) observations are recorded for bottles of Portguese Red wine. The target variable `quality` is a grade that was given by an expert jury on a scale 1-10. We will turn this dataset into a classification task assigning the quality range 1-5 to class 0 (bad wine) and the quality range 6-10 to class 1 (good wine).

In [2]:
from pipetorch import DFrame
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import recall_score, precision_score

# Data

For the Iris dataset, we will compare four binary classification algorithms. To supress sampling bias, we will setup the data first and use the exact same train/test split for all experiments.

In [3]:
df = DFrame.read_from_kaggle('uciml/iris')
df = df.loc[ df.Species != 'Iris-setosa',
             ['PetalLengthCm', 'PetalWidthCm', 'Species']
           ]
df['Species'] = (df.Species == 'Iris-virginica')

Downloading dataset uciml/iris from kaggle to /root/.pipetorchuser/iris


In [4]:
train_X, test_X, train_y, test_y = train_test_split(
        df.drop(columns='Species'),
        df.Species,
        test_size=0.5
    )

In [5]:
scaler = StandardScaler()
train_X = scaler.fit_transform(train_X)
test_X = scaler.transform(test_X)

# Refactoring

For a fair comparison between two or more models, it is important that we train and evaluate each model in **exactly** the same way. One way we can mess this up is by duplicating (copy-pasting) the code to run an experiment (once for every model). If in the process of doing experiments we make changes, we might mistakenly overlook one of the copies an no longer have a valid comparison.

A general guideline for programming is: **Don't Repeat Yourself (DRY)**. By applying a technique that is more formally called **refactoring**, you convert repeating code to (parameterized) functions. This should make you code shorter and more readable by giving these functions and parameters informative names. And since changes are only done in one location, you ensure consistency between experiments.

Therefore, we write a function to run an experiment that we pass a model to train and validate. Another issue might be the use of global variables, which is used a lot in notebooks online. You comparisons are also invalid when these are somehow changed between experiments. On top of that, mistakes with global variables are difficult to trace. The easy fix is: don't be lazy and just never ever use global variables in your functions.

In [6]:
def run_experiment(model, train_X, train_y, test_X, test_y):

    model.fit(train_X, train_y)
    predictions = model.predict(test_X)

    recall = recall_score(test_y, predictions)
    precision = precision_score(test_y, predictions)

    return model.__class__.__name__, recall, precision

# Model and Train

We can then write a loop over two different models, call our `run_experiment` function and collect the results.

In [9]:
# write code to run two experiments, appending the results to a list
results = []
for m in [LogisticRegression(), KNeighborsClassifier(n_neighbors=1)]:
    model_name, recall, precision = run_experiment(m, train_X, train_y, test_X, test_y)
    results.append((model_name, recall, precision))

# Evaluate

Then we can compare the results.

In [10]:
pd.DataFrame(results, columns=['model', 'recall', 'precision'])

Unnamed: 0,model,recall,precision
0,LogisticRegression,1.0,0.851852
1,KNeighborsClassifier,1.0,0.793103


# Exercise

Add a RandomForestClassifier and an SVC to the comparison:

In [11]:
# write code to run four experiments, for four different models
results = []
for m in [LogisticRegression(), KNeighborsClassifier(n_neighbors=1), RandomForestClassifier(), SVC()]:
    model_name, recall, precision = run_experiment(m, train_X, train_y, test_X, test_y)
    results.append((model_name, recall, precision))

pd.DataFrame(results, columns=['model', 'recall', 'precision'])

Unnamed: 0,model,recall,precision
0,LogisticRegression,1.0,0.851852
1,KNeighborsClassifier,1.0,0.793103
2,RandomForestClassifier,1.0,0.821429
3,SVC,1.0,0.884615


In [None]:
halt_notebook()