# 12. Investing KNN generating the same results regardless of the model

For certain datasets, such as the `birds`, the KNN generates all the same results regardless of the multilabel classifier being used. This Jupyter notebook aims to investigate this issue.

## 12.1. Setup

In [1]:
import os
import sys

module_path = os.path.abspath(os.path.join('..'))
if module_path not in sys.path:
    sys.path.append(module_path)

In [33]:
from skmultilearn.dataset import load_dataset
import pandas as pd
import sklearn.metrics as metrics
from sklearn.preprocessing import MinMaxScaler
from sklearn.neighbors import KNeighborsClassifier
from skmultilearn.problem_transform import BinaryRelevance

from lib.base_models import (DependantBinaryRelevance, PatchedClassifierChain,
                             StackedGeneralization)
from lib.classifiers import (ClassifierChainWithFTestOrdering,
                             ClassifierChainWithGeneticAlgorithm,
                             ClassifierChainWithLOP,
                             PartialClassifierChainWithLOP, StackingWithFTests)

from sklearn.ensemble import RandomForestClassifier

from scipy import sparse

## 11.2. Exploring the data

Let's start by exploring the data.

### 12.2.1. Loading the data

In [7]:
full_data = load_dataset("birds", "undivided")
train_data = load_dataset("birds", "train")
test_data = load_dataset("birds", "test")


birds:undivided - exists, not redownloading
birds:train - exists, not redownloading
birds:test - exists, not redownloading


In [8]:
X_train, y_train, _, _ = train_data
X_test, y_test, _, _ = test_data

In [9]:
X_train.todense()[:5,:5]

matrix([[0.016521, 0.039926, 0.089632, 0.134119, 0.17047 ],
        [0.0066  , 0.035984, 0.089956, 0.123214, 0.172273],
        [0.006894, 0.017722, 0.048062, 0.065802, 0.103443],
        [0.031046, 0.127675, 0.221428, 0.272707, 0.358743],
        [0.064721, 0.226644, 0.304482, 0.274662, 0.34698 ]])

In [11]:
df_X_train = pd.DataFrame(X_train.todense())
display(df_X_train.shape)
display(df_X_train.head())
display(df_X_train.describe())

(322, 260)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,250,251,252,253,254,255,256,257,258,259
0,0.016521,0.039926,0.089632,0.134119,0.17047,0.176872,0.171546,0.182392,0.162482,0.159083,...,0.0,13.0,16.384615,20.617394,46.769231,71.863118,788.923077,1761.80218,1.0,2.0
1,0.0066,0.035984,0.089956,0.123214,0.172273,0.177068,0.165507,0.179655,0.161744,0.163678,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0
2,0.006894,0.017722,0.048062,0.065802,0.103443,0.091397,0.084931,0.088666,0.075676,0.074408,...,0.0,2.0,24.0,2.828427,28.0,1.414214,674.0,113.137085,1.0,2.0
3,0.031046,0.127675,0.221428,0.272707,0.358743,0.349389,0.316029,0.330656,0.310752,0.306288,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0
4,0.064721,0.226644,0.304482,0.274662,0.34698,0.334063,0.307223,0.324666,0.29707,0.292258,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,250,251,252,253,254,255,256,257,258,259
count,322.0,322.0,322.0,322.0,322.0,322.0,322.0,322.0,322.0,322.0,...,322.0,322.0,322.0,322.0,322.0,322.0,322.0,322.0,322.0,322.0
mean,0.074173,0.063212,0.093619,0.108182,0.138203,0.128656,0.119044,0.119205,0.102015,0.101231,...,0.01067,3.475155,21.736089,32.136264,18.195065,22.491003,1141.726623,3233.798286,0.478261,5.36646
std,0.114796,0.122165,0.114913,0.109565,0.119769,0.113204,0.105552,0.103957,0.094116,0.092338,...,0.060898,5.584888,31.435103,66.028183,23.710292,45.592704,2407.629423,9478.450729,0.500305,3.273969
min,0.001333,0.002663,0.005359,0.0077,0.015742,0.01791,0.019795,0.021672,0.018647,0.021929,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.009321,0.011322,0.022108,0.031978,0.055613,0.046923,0.044714,0.048128,0.040252,0.040736,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.25
50%,0.025292,0.022394,0.0484,0.065138,0.099753,0.090633,0.080998,0.080968,0.068796,0.065617,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,6.0
75%,0.09277,0.054388,0.121966,0.137353,0.182059,0.175599,0.166207,0.166231,0.133771,0.134478,...,0.0,5.0,36.15,42.463354,32.0,23.385103,1342.770833,2016.496446,1.0,8.0
max,0.850176,1.31813,0.916178,0.58258,0.638798,0.62409,0.602372,0.623863,0.571527,0.551693,...,0.5,36.0,188.0,596.180941,111.538462,251.984335,18638.333333,90578.948552,1.0,11.0


### 12.2.2. Features with big intervals

We can see that some features have values that have really big intervals from min to max.

**Hypothesis**: this causes KNN to ignore labels as features, as their values are too small compared to other features (from only 0 to 1). Since KNN considers the distance, **the distance between two points is dominated by the other features**.

Let's apply a normalization to the data, so that all features have values between 0 and 1.

In [16]:
train_scaler = MinMaxScaler()
train_scaler.fit(X_train.todense())
X_norm_train = train_scaler.transform(X_train.todense())

df_norm_X_train = pd.DataFrame(X_norm_train)
display(df_norm_X_train.shape)
display(df_norm_X_train.head())
display(df_norm_X_train.describe())

(322, 260)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,250,251,252,253,254,255,256,257,258,259
0,0.017893,0.028327,0.092524,0.219905,0.248337,0.262236,0.260482,0.266892,0.260156,0.258896,...,0.0,0.361111,0.087152,0.034582,0.41931,0.285189,0.042328,0.01945,1.0,0.181818
1,0.006205,0.02533,0.09288,0.200936,0.251231,0.262559,0.250116,0.262347,0.258821,0.26757,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818
2,0.006551,0.011448,0.046884,0.101068,0.140759,0.12123,0.111807,0.11125,0.103149,0.099061,...,0.0,0.055556,0.12766,0.004744,0.251034,0.005612,0.036162,0.001249,1.0,0.181818
3,0.035004,0.095032,0.237225,0.460978,0.550514,0.546833,0.508489,0.5131,0.528333,0.536765,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818
4,0.074676,0.170267,0.328411,0.464379,0.531634,0.52155,0.493373,0.503153,0.503587,0.510282,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,250,251,252,253,254,255,256,257,258,259
count,322.0,322.0,322.0,322.0,322.0,322.0,322.0,322.0,322.0,322.0,...,322.0,322.0,322.0,322.0,322.0,322.0,322.0,322.0,322.0,322.0
mean,0.085811,0.046028,0.096901,0.174788,0.196548,0.182694,0.170362,0.161963,0.150789,0.149693,...,0.02134,0.096532,0.115617,0.053904,0.163128,0.089256,0.061257,0.035701,0.478261,0.48786
std,0.135238,0.092868,0.126164,0.190588,0.192229,0.18675,0.181181,0.172631,0.170228,0.174301,...,0.121795,0.155136,0.167208,0.110752,0.212575,0.180935,0.129176,0.104643,0.500305,0.297634
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.00941,0.006582,0.018389,0.042231,0.063993,0.047861,0.042773,0.043934,0.039078,0.0355,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.204545
50%,0.028225,0.014999,0.047256,0.099912,0.134838,0.119969,0.105055,0.098467,0.090704,0.082468,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.545455
75%,0.10772,0.03932,0.128024,0.225531,0.266937,0.260135,0.251318,0.240055,0.208225,0.212452,...,0.0,0.138889,0.192287,0.071226,0.286897,0.092804,0.072044,0.022262,1.0,0.727273
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


In [17]:
test_scaler = MinMaxScaler()
test_scaler.fit(X_test.todense())
X_norm_test = test_scaler.transform(X_test.todense())

df_norm_X_test = pd.DataFrame(X_norm_test)
display(df_norm_X_test.shape)
display(df_norm_X_test.head())
display(df_norm_X_test.describe())

(323, 260)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,250,251,252,253,254,255,256,257,258,259
0,0.169427,0.119099,0.405931,0.532735,0.603135,0.587116,0.572269,0.607537,0.617235,0.605901,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818
1,0.129555,0.107622,0.406638,0.500621,0.570366,0.573708,0.579976,0.614983,0.620716,0.607323,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818
2,0.004787,0.012634,0.067817,0.099825,0.132754,0.115211,0.111944,0.103125,0.099715,0.092307,...,0.0,0.708333,0.134164,0.087809,0.488784,0.984225,0.194613,0.087097,1.0,0.181818
3,0.022433,0.008429,0.040841,0.058439,0.079501,0.082712,0.080382,0.064052,0.052151,0.048243,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818
4,0.007192,0.009875,0.052332,0.064084,0.081095,0.077372,0.07826,0.061142,0.052566,0.046756,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,250,251,252,253,254,255,256,257,258,259
count,323.0,323.0,323.0,323.0,323.0,323.0,323.0,323.0,323.0,323.0,...,323.0,323.0,323.0,323.0,323.0,323.0,323.0,323.0,323.0,323.0
mean,0.082006,0.056813,0.189124,0.214874,0.227654,0.210094,0.200754,0.193137,0.17852,0.178267,...,0.009367,0.135191,0.101243,0.066762,0.094406,0.100478,0.068542,0.031759,0.47678,0.449479
std,0.1313,0.104293,0.230108,0.244094,0.236043,0.223869,0.216624,0.211537,0.208748,0.212759,...,0.068002,0.207031,0.148171,0.141721,0.131754,0.197365,0.142054,0.101176,0.500236,0.297613
min,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,0.008174,0.006533,0.028549,0.038427,0.051637,0.044253,0.038992,0.037729,0.030069,0.029417,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.181818
50%,0.029879,0.015652,0.07897,0.10924,0.132754,0.116464,0.112831,0.116895,0.098248,0.089366,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.454545
75%,0.110474,0.059874,0.268487,0.318377,0.32469,0.289363,0.276364,0.263191,0.242605,0.238897,...,0.0,0.229167,0.161392,0.063922,0.164501,0.129262,0.070653,0.014673,1.0,0.681818
max,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0


## 12.3. Testing the KNN

### 12.3.1. Testing the KNN with the original data

In [27]:
br_model = BinaryRelevance(
    classifier=KNeighborsClassifier(),
    require_dense=[False, True]
)
br_model.fit(X_train, y_train)
predictions = br_model.predict(X_test)

print("accuracy")
print(metrics.accuracy_score(y_test, predictions))

print("hamming loss")
print(metrics.hamming_loss(y_test, predictions))

print("f1 score")
print(metrics.f1_score(y_test, predictions, average="macro"))

accuracy
0.47368421052631576
hamming loss
0.05719406876323937
f1 score
0.060496581458468944


In [28]:
sg_model = StackedGeneralization(
    base_classifier=KNeighborsClassifier(),
)
sg_model.fit(X_train, y_train)
predictions = sg_model.predict(X_test)

print("accuracy")
print(metrics.accuracy_score(y_test, predictions))

print("hamming loss")
print(metrics.hamming_loss(y_test, predictions))

print("f1 score")
print(metrics.f1_score(y_test, predictions, average="macro"))

FIT: X shape is (322, 260)
FIT: X_extended shape is (322, 279)
accuracy
0.47368421052631576
hamming loss
0.05719406876323937
f1 score
0.060496581458468944


In [29]:
dbr_model = DependantBinaryRelevance(
    base_classifier=KNeighborsClassifier(),
)
dbr_model.fit(X_train, y_train)
predictions = dbr_model.predict(X_test)

print("accuracy")
print(metrics.accuracy_score(y_test, predictions))

print("hamming loss")
print(metrics.hamming_loss(y_test, predictions))

print("f1 score")
print(metrics.f1_score(y_test, predictions, average="macro"))

FIT: X shape is (322, 260)
FIT: X_extended shape, for label 0, is (322, 278)
FIT: X_extended shape, for label 1, is (322, 278)
FIT: X_extended shape, for label 2, is (322, 278)
FIT: X_extended shape, for label 3, is (322, 278)
FIT: X_extended shape, for label 4, is (322, 278)
FIT: X_extended shape, for label 5, is (322, 278)
FIT: X_extended shape, for label 6, is (322, 278)
FIT: X_extended shape, for label 7, is (322, 278)
FIT: X_extended shape, for label 8, is (322, 278)
FIT: X_extended shape, for label 9, is (322, 278)
FIT: X_extended shape, for label 10, is (322, 278)
FIT: X_extended shape, for label 11, is (322, 278)
FIT: X_extended shape, for label 12, is (322, 278)
FIT: X_extended shape, for label 13, is (322, 278)
FIT: X_extended shape, for label 14, is (322, 278)
FIT: X_extended shape, for label 15, is (322, 278)
FIT: X_extended shape, for label 16, is (322, 278)
FIT: X_extended shape, for label 17, is (322, 278)
FIT: X_extended shape, for label 18, is (322, 278)
accuracy
0.473

### 12.3.2. Testing the KNN with normalization

In [30]:
br_model = BinaryRelevance(
    classifier=KNeighborsClassifier(),
    require_dense=[False, True]
)
br_model.fit(sparse.csr_matrix(X_norm_train), sparse.csr_matrix(y_train))
predictions = br_model.predict(sparse.csr_matrix(X_norm_test))

print("accuracy")
print(metrics.accuracy_score(y_test, predictions))

print("hamming loss")
print(metrics.hamming_loss(y_test, predictions))

print("f1 score")
print(metrics.f1_score(y_test, predictions, average="macro"))

accuracy
0.5108359133126935
hamming loss
0.04741730487208734
f1 score
0.22682211086927584


In [31]:
sg_model = StackedGeneralization(
    base_classifier=KNeighborsClassifier(),
)
sg_model.fit(sparse.csr_matrix(X_norm_train), sparse.csr_matrix(y_train))
predictions = sg_model.predict(sparse.csr_matrix(X_norm_test))

print("accuracy")
print(metrics.accuracy_score(y_test, predictions))

print("hamming loss")
print(metrics.hamming_loss(y_test, predictions))

print("f1 score")
print(metrics.f1_score(y_test, predictions, average="macro"))

FIT: X shape is (322, 260)
FIT: X_extended shape is (322, 279)
accuracy
0.5077399380804953
hamming loss
0.04709141274238227
f1 score
0.2223864362331855


In [32]:
dbr_model = DependantBinaryRelevance(
    base_classifier=KNeighborsClassifier(),
)
dbr_model.fit(sparse.csr_matrix(X_norm_train), sparse.csr_matrix(y_train))
predictions = dbr_model.predict(sparse.csr_matrix(X_norm_test))

print("accuracy")
print(metrics.accuracy_score(y_test, predictions))

print("hamming loss")
print(metrics.hamming_loss(y_test, predictions))

print("f1 score")
print(metrics.f1_score(y_test, predictions, average="macro"))

FIT: X shape is (322, 260)
FIT: X_extended shape, for label 0, is (322, 278)
FIT: X_extended shape, for label 1, is (322, 278)
FIT: X_extended shape, for label 2, is (322, 278)
FIT: X_extended shape, for label 3, is (322, 278)
FIT: X_extended shape, for label 4, is (322, 278)
FIT: X_extended shape, for label 5, is (322, 278)
FIT: X_extended shape, for label 6, is (322, 278)
FIT: X_extended shape, for label 7, is (322, 278)
FIT: X_extended shape, for label 8, is (322, 278)
FIT: X_extended shape, for label 9, is (322, 278)
FIT: X_extended shape, for label 10, is (322, 278)
FIT: X_extended shape, for label 11, is (322, 278)
FIT: X_extended shape, for label 12, is (322, 278)
FIT: X_extended shape, for label 13, is (322, 278)
FIT: X_extended shape, for label 14, is (322, 278)
FIT: X_extended shape, for label 15, is (322, 278)
FIT: X_extended shape, for label 16, is (322, 278)
FIT: X_extended shape, for label 17, is (322, 278)
FIT: X_extended shape, for label 18, is (322, 278)
accuracy
0.501

### 12.3.3. Results so far

**Good news: the metrics now change**. _Bad news: they change for the worse when we try to add a bit of label correlations_. It is probably worthy it to run a full metrics pipeline using normalized data.

The results show clearly that, without any sort of normalization, the metrics end up being the same. Since there's no randomness to the KNN algorithm, the only way to change the results are by changing the features. When the features are added to the regular dataset, **no changes are observed**. This is a strong indication that the the label features are really too small (in numeric value) when compared to the other metrics.

When using the normalized data, the metrics change. This is a good sign, as it means that the label features are now being considered and they are capable of influencing the final result. It is just a bit sad that the changes observed in this case are for the worse.

## 12.4. Testing the `RandomForestClassifier`

To have a comparison, let's test the `RandomForestClassifier` with the same data.