<a href="https://colab.research.google.com/github/Tyred/TimeSeries_OCC-PUL/blob/main/Notebooks/OC_JKNN.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1> One-Class J-K Nearest Neighbor Classifier</h1>
The main idea is to fit a KNN classifier with data from the positive class only and then perform the OCC as follows [1]:

- For each data sample in the test dataset, do:
    - Compute the distance to its J nearest neighbours and find their average D_j
    - Compute the distance of each J nearest neighbor to its K nearest neighbours and find their average D_k
    - if D_j/D_k <= T then the test sample is classified as a member of the positive class.
    - else the test sample is classified as not a member of the positive class.

- Evaluate the Model's Accuracy, Precision and Recall.

We have 3 hyperparameters, K, J and T. The classifier performance may vary a lot with differents values of these hyperparameters. Initially we will use the value 1 for each hyperparameter just for a proof-of-concept. Later we will develop a Parameter Optimization technique.

[1] [Relationship between Variants of One-Class Nearest
Neighbours and Creating their Accurate Ensembles](https://arxiv.org/abs/1604.01686)

# Imports

In [131]:
import numpy as np
import matplotlib.pyplot as plt

from sklearn.neighbors import NearestNeighbors
from sklearn.metrics import precision_score, accuracy_score, recall_score, f1_score

from sklearn.decomposition import PCA
from fastdtw import fastdtw
from scipy.spatial.distance import euclidean

# OneClassJKNN Class

In [132]:
class OneClassJKNN():
    
    def __init__(self, j, k): # j, k >= 1
        self.j = j
        self.k = k

        self.nbrs = NearestNeighbors(n_neighbors=self.k+1)

    def fit(self, train_data):
        self.nbrs.fit(train_data)
        distances, indices = self.nbrs.kneighbors(train_data)

        self.distances_avg = []
        for sample in distances:
            avg = np.mean(sample[1:])
            self.distances_avg.append(avg)
        
        return self

    def predict(self, test_samples, threshold):
        predictions = np.zeros(len(test_samples))
        sample_no = 0
        for test_sample in test_samples:
            distances, indices = self.nbrs.kneighbors(test_sample.reshape(1,-1), self.j)
            jnbrs_avg = np.mean(distances)
            
            j_knbrs_sum = 0
            for idx in indices[0]:
                j_knbrs_sum += self.distances_avg[idx]
            
            j_knbrs_avg = j_knbrs_sum/self.j

            if jnbrs_avg <= (j_knbrs_avg * threshold):
                predictions[sample_no] = 1
            else:
                predictions[sample_no] = -1
            
            sample_no += 1

        return predictions

# Reports function

In [133]:
def print_stats(predictions, labels):
    print("Accuracy = %.2f"  % (accuracy_score(labels, predictions)  *100) + "%")
    print("Precision = %.2f" % (precision_score(labels, predictions) *100) + "%")
    print("Recall = %.2f"    % (recall_score(labels, predictions)    *100) + "%")
    print("F1-Score = %.2f"  % (f1_score(labels, predictions)        *100) + "%")

# Reading Data from Google Drive

In [134]:
path = 'drive/My Drive/UFSCar/FAPESP/IC/Data/UCRArchive_2018'

dataset = input('Dataset: ')
tr_data = np.genfromtxt(path + "/" + dataset + "/" + dataset + "_TRAIN.tsv", delimiter="\t",)
te_data = np.genfromtxt(path + "/" + dataset + "/" + dataset + "_TEST.tsv", delimiter="\t",)

labels = te_data[:, 0]
print("Labels:", np.unique(labels))

Dataset: Yoga
Labels: [1. 2.]


# Choosing the Positive Class label
This is necessary in order to emulate the One-Class Classification scenario.

In [135]:
class_label = int(input('Positive class label: '))

train_data  = tr_data[tr_data[:, 0] == class_label, 1:] # train
test_data   = te_data[:, 1:]                            # test

print("Train data shape:", train_data.shape)
print("Test data shape:", test_data.shape)

Positive class label: 2
Train data shape: (163, 426)
Test data shape: (3000, 426)


# Labeling for OCC Task

In [136]:
occ_labels = [1 if x == class_label else -1 for x in labels]
print("Positive samples:", occ_labels.count(1))
print("Negative samples:", occ_labels.count(-1))

Positive samples: 1607
Negative samples: 1393


# Results

In [150]:
j = 2
k = 5
threshold = 1.5

In [151]:
clf = OneClassJKNN(j, k).fit(train_data)

result_labels = clf.predict(test_data, threshold)

print_stats(result_labels, occ_labels)

Accuracy = 61.97%
Precision = 59.19%
Recall = 93.34%
F1-Score = 72.45%
