## Exercises classifiers and evaluation

These exercises follow the general setup of [this tutorial in the R language](https://compgenomr.github.io/book/model-tuning-and-avoiding-overfitting.html).

In this exercise session you will go through some concepts related to classifiers and evaluation in Python. Parts of the code are already implemented, you need to fill in the remaining parts.

Your code should produce figures similar to the ones in the tutorial, but because the data you use is different, you will not get the same values.

If you want to see whether you get the same plots for the tutorial's data, check earlier chapters of the tutorial.

## Load all the things

In [None]:
# Packages
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd


# Clone repository with example images
!rm -rf fyp2022-imaging
!git clone https://github.com/vcheplygina/fyp2022-imaging.git


# Load features and labels
file_data = 'fyp2022-imaging/data/example_ground_truth.csv'
file_features = 'fyp2022-imaging/features/features.csv'

df = pd.read_csv(file_data)
features = pd.read_csv(file_features)


# Combine variables we want in one place
df = df.drop(['image_id','seborrheic_keratosis'],axis=1)
df['area'] = features['area']
df['perimeter'] = features['perimeter']

# Please remember that area and perimeter alone are often not sufficient for classification.
# When doing your project, you could also try the other features here.

print(df.head())

Cloning into 'fyp2022-imaging'...
remote: Enumerating objects: 325, done.[K
remote: Total 325 (delta 0), reused 0 (delta 0), pack-reused 325 (from 1)[K
Receiving objects: 100% (325/325), 825.97 MiB | 26.69 MiB/s, done.
Resolving deltas: 100% (94/94), done.
Updating files: 100% (309/309), done.
   melanoma      area  perimeter
0       0.0  216160.0     2013.0
1       0.0  130493.0     1372.0
2       0.0  205116.0     1720.0
3       0.0  161705.0     1344.0
4       0.0  317040.0     2063.0


In [None]:
# Prepare development (train and validation) and test splits
from sklearn.model_selection import train_test_split

x = df[['area','perimeter']]
y = df[['melanoma']]

dev_x, test_x, dev_y, test_y = train_test_split(
        x, y, stratify=y, random_state=0)

train_x, val_x, train_y, val_y = train_test_split(
        dev_x, dev_y, stratify=dev_y)


## TODOs for students

# For reducing computation, you may want to reduce the size of the selected data using train_size and test_size parameters.
# However, reducing the size of the dataset will lead to more variability of your results

# Think about exercises from last lecture, do we still want to do anything with the data? Do it here



In [None]:
#Import classifier and metric
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import roc_auc_score


# Example of training a classifier
knn1 = KNeighborsClassifier(n_neighbors=1) # other hyperparameters possible
knn1_trained = knn1.fit(train_x, np.ravel(train_y))

# Example of prediction
val_pred_knn1 = knn1_trained.predict(val_x)
val_auc_knn1 = roc_auc_score(val_y, val_pred_knn1)

print(val_auc_knn1)


# TODOs for students:

# 1) Try out different parameters of K using a for loop

# 2) Make a plot of the training vs validation AUC in the same plot.
# You can also plot the accuracy metric in a different plot.
# Remember to add proper axes labels and legend to the plots (Fig 5.5 in tutorial)

# Q: Do you see the values you would expect?



0.7347826086956522


In [None]:
# Visualize the decision boundary

# TODO for students:

# Use the exercises from previous weeks to visualize the boundary of some appropriate / less appropriate choices for k (Fig 5.7 in tutorial)


In [None]:
# Now let's try with a different classifier
from sklearn.tree import DecisionTreeClassifier

# TODOs for students:

# 1) Investigate the max_depth parameter of the classifier, and repeat the procedure above

# 2) Optionally, look at some other parameters of the classifier.




## Variance of performance depending on the split

Notice the random state parameter of the classifiers. Why do we need this?

You can try different choices for this parameter and investigate the effect on the results

In [None]:
# TODO for students:

# Choose one of the experiments above, and create the performance vs parameter plot, for two different seeds




1) What kind of strategies could you use to obtain more reliable performances?

ANSWER:


2) What does this tell you about the random_state, how is it different from for example a tunable k parameter?

ANSWER:

## Cross-validation

In this part we will use cross-validation on the development set to find good k / max_depth parameters

In [None]:
from sklearn.model_selection import KFold

# Create the folds
kfold = KFold(n_splits=5, random_state=None, shuffle=False)
kfold.get_n_splits(dev_x, dev_y)

# Check the contents of the folds
for i, (train_index, dev_index) in enumerate(kfold.split(dev_x)):

    print(f"Fold {i}:")

    print(f"  Train: index={train_index}")

    print(f"  Test:  index={dev_index}")

# TODO for students:

# Investigate other parameters of the KFold


Fold 0:
  Train: index=[ 23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40
  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58
  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76
  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94
  95  96  97  98  99 100 101 102 103 104 105 106 107 108 109 110 111]
  Test:  index=[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22]
Fold 1:
  Train: index=[  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
  18  19  20  21  22  46  47  48  49  50  51  52  53  54  55  56  57  58
  59  60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76
  77  78  79  80  81  82  83  84  85  86  87  88  89  90  91  92  93  94
  95  96  97  98  99 100 101 102 103 104 105 106 107 108 109 110 111]
  Test:  index=[23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45]
Fold 2:
  Train: index=[  0   1   2   3   4   5   6   7   

## Other types of KFold

In many medical imaging applications instead of KFold we need to use GroupKFold or other options in the model_selection library.

**Question:**  

Why might we need to use GroupKFold? Does it apply to this example data you are using? And does it apply to the PAD-UFES data? Which variable creates the groups?



## Effect of noisy features

Uninformative/noisy features will affect the generalization ability of your classifiers.

In this part we add uninformative (not correlated to class label) features and test (some) classifiers

In [None]:
# Generate some noisy features
n_noisy_features = 20
noise = np.random.RandomState(42).uniform(0, 0.1, size=(df.shape[0], n_noisy_features))

# Add the noisy data to the informative features
x_noisy = np.hstack((df[['area', 'perimeter']], noise))


# TODO for students
# Investigate the behavior of classifiers with regards to overfitting

## Combining everything and final evaluation

The examples above give you an idea of how you can investigate the quality of different classifiers and parameters. To keep track of the results (on the development set) of the options you try, you might want to make a "result dataset" where you track all the classifier versions and the corresponding results. MLFlow also allows doing this, but we do not cover it in the course (but perhaps the TAs can show you)

Then once you have evaluated all the options, you can select some of the better ones, retrain them, now you can use the entire development set for retraining.

Your choice of these options needs to be reproducible in your project, i.e., you should not try out some classifiers/parameters and then completely remove them from your project.

You can evaluate the re-trained classifiers on the so far held-out test set. Your performances (how good each classifier was) might not reflect your results on the validation set.

You are NOT allowed to edit the classifiers after this point


In [None]:
# TODO for students

# Keep track of different classifiers/parameters with an array

# Select some better ones (e.g. top 3)

# Re-train

# Evaluate on held-out set

In [None]:
# After this point you are not allowed to adjust the classifier, otherwise you are overfitting!