In this notebook, you will see how an unsupervised anomaly detection algorithm can be imported into the `oab` framework to be evaluated.
After installing `oab`, we will see what this algorithm can look like and how its performance is evaluated.

In [1]:
import sys
sys.path.append('../..')

%load_ext autoreload
%autoreload 2

In [2]:
# download example algorithm and inspect content
import wget
wget.download('https://raw.githubusercontent.com/jandeller/test/main/RandomGuesser.py', "RandomGuesser.py")
!cat RandomGuesser.py

import numpy as np

class RandomGuesser():

    def fit(self, X):
        "Assign a random number to each sample"
        n_samples = X.shape[0]
        self.decision_scores_ = np.random.randn(n_samples)


The sample `RandomGuesser` algorithm shown here is - as the name suggests - a random guesser, i.e., it assigns random anomaly scores to the samples.

An algorithm used for unsupervised anomaly detection needs to specify a `fit(X)` method that eventually assigns values to `self.decision_scores_`, where the length of `self.decision_scores_` has to be the number of samples in `X`. 

It is of course possible to rename the method and field, make a method for accessing the anomaly scores, etc. Note that if this is done, the following code has to be changed accordingly. 

Adhering to the conventions described above has the advantage of using algorithms from [PyOD](https://pyod.readthedocs.io/en/latest/) as shown when [comparing algorithms using `oab`](https://colab.research.google.com/drive/1aV_itaYCJgzdZ1lQ7SUyHQ7z01xSPxDN?usp=sharing#scrollTo=QnAfCGTGL7xv).

In [3]:
# import objects/functions from oab
from oab.data.load_dataset import load_dataset
from oab.evaluation import EvaluationObject

# and import the RandomGuesser
from RandomGuesser import RandomGuesser

In [4]:
#load dataset
wilt = load_dataset('wilt')

Credits: Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.


In [5]:
# evaluate the random guesser
eval_obj = EvaluationObject("Random")

for (x, y), settings in wilt.sample_multiple(n=50, n_steps=5, contamination_rate=0.1):
    rg = RandomGuesser()
    rg.fit(x) # data is fitted to RandomGuesser
    pred = rg.decision_scores_ # and decision_scores_ is accessed
    eval_obj.add(y, pred, settings)
_ = eval_obj.evaluate()

Evaluation on dataset wilt with normal labels ['n'] and anomaly labels ['w'].
Total of 5 datasets. Per dataset:
50 instances, contamination_rate 0.1.
Mean 	 Std_dev 	 Metric
0.543 	 0.080 		 roc_auc
0.187 	 0.076 		 average_precision
0.096 	 0.084 		 adjusted_average_precision


As one would expect, the results are not better than random.