# KNN Distance Matrix

Here, we run a KNN classifier using a distance matrix. The classifier is run on multiple randomized training and testing sets, where the number of iterations can be specified with the _nBootstrapping_ parameter in the function ``KNN_bootstrapping``. The classifier can try multiple values for k, and the results from the k that resulted in the highest classification score (F1) will be outputted. k can be specified in the same function, for example, one can write:
``kNeighbours=np.int_(np.linspace(3, 12, 10))``

#### Imports

In [1]:
import numpy as np
import pandas as pd
from KNN_distanceMatrix import *
import os
import tqdm

#### Data

In [2]:
# THINGS TO EDIT:
dfMeta = pd.read_csv("pots\\Code\\leavesIndex.csv")
dfDists = pd.read_csv("pots\\Code\\Leaves_Scaled_L2_Dists.csv")
classType = int #NOTE: if you're classes are integers e.g. Species 1-10, then write int, else, if they're stings, write str

############################################
classes = np.unique(list(dfMeta['Class']))

#### KNN Bootstrapping

In [16]:
results = KNN_Bootstrapping(
    dfMeta,
    dfDists,
    classes,
    nBootstrapping=3,
    classType=classType,
    trainingSetSize=15,
    trainingProportion=0.35
)

#### Saving Results

In [8]:
# Save classification results that outputted the highest F1 score in the KNN bootstrapping:
results['topClassification'].to_csv('ClassificationTest.csv',index=False)

In [19]:
# Save the F1 scores from the KNN bootstrapping:
pd.DataFrame(results['allF1Scores'])
pd.DataFrame(results['allF1Scores']).to_csv('allF1Scores_Test.csv',index=False)

Unnamed: 0,0,1,2,3,4
0,0.825791,0.798759,0.762409,0.784465,0.775548


In [20]:
# Save the randomized training sets from the KNN bootstrapping, in case you need to recreate the results:
pd.DataFrame(results['allTrainingSamples']).T.to_csv('allTrainingSamples.csv',index=False)

In [5]:
# Save training and testing samples from the top KNN test:
topTraining = results['topTrainingTesting'][0]
topTesting = results['topTrainingTesting'][1]
all_names = list(dfMeta['Name'])
training_testing = ["train" if nm in topTraining else "test" for nm in all_names]
trainTestData = pd.DataFrame([all_names,training_testing]).T
trainTestData = trainTestData.rename(columns={0: "Name", 1: "Train_Test"})
trainTestData.to_csv('trainingTestingSamples.csv',index=False)

In [7]:
trainTestData[trainTestData['Train_Test']=='train']

Unnamed: 0,Name,Train_Test
1,l10nr002,train
2,l10nr003,train
8,l10nr011,train
16,l10nr039,train
21,l10nr046,train
...,...,...
418,l9nr009,train
423,l9nr015,train
428,l9nr021,train
433,l9nr027,train


In [29]:
all_classification = pd.concat([allClass for allClass in results['allClassificationResults']],axis=1)
    
all_classification.to_csv('allClassificationResults.csv',index=False)