# Test LODA and RS-Hash with River methods

The objective of this notebook is to test LODA and RS-Hash methods adapted to a River framework (but not fitting perfectly the guidelines, and still using numpy arrays).

### imports

In [1]:
import numpy as np
import pandas as pd

Arrhythmia dataset

In [2]:
import scipy.io
#data = scipy.io.loadmat('/Users/Sophie/Library/CloudStorage/OneDrive-Personnel/IP/Data Stream Processing/PySAD_to_River/data/cover.mat') 
data_arr = scipy.io.loadmat('C:/Users/e32cl/Documents/M2/P2 Data Stream/Projet/projet_v2/PySAD_to_River/data/arrhythmia.mat') 

In [3]:
data = data_arr
X = data['X']
y = data['y']
y_flat = data['y'].flatten()

In [4]:
print(type(X),X.shape)
print(type(y),y.shape)
print(y.flatten().shape)

<class 'numpy.ndarray'> (452, 274)
<class 'numpy.ndarray'> (452, 1)
(452,)


In [5]:
y.mean()

0.14601769911504425

#### import our models

In [6]:
from anomaly.base import AnomalyDetector

In [7]:
from river import utils, metrics
from river.stream import iter_array, iter_pandas

In [8]:
from models.loda import LODA
from models.rs_hash import RSHash

In [9]:
from pysad.evaluation import AUROCMetric
from pysad.transform.probability_calibration import ConformalProbabilityCalibrator

### LODA

In [10]:
model = LODA()

# River ROC-AUC
ROCAUC = metrics.ROCAUC()
# PySAD/sklearn ROC-AUC
ROCAUC_1 = AUROCMetric()

i = 0
for xi, yi in iter_array(X, y_flat):
    model = model.learn_one(xi)
    anomaly_score = model.score_one(xi)
    if i%10 ==0:
        print(anomaly_score)
    i+=1
    ROCAUC.update(yi, anomaly_score)
    ROCAUC_1.update(yi, anomaly_score)
print('ROCAUC River : ', ROCAUC)
print('ROCAUC sklearn : ', ROCAUC_1.get())

9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.000133971466786e-14
9.00013397

Anomaly scores go to 0 $(10^{-14})$, seems to be an issue to use histograms and projections using only one sample at a time.
Same results obtained with PySAD, and a calibrator is not enough for this issue.

### RS-Hash

In [20]:
model = RSHash(X.min(axis=0), X.max(axis=0))

# River ROC-AUC
ROCAUC = metrics.ROCAUC()
# PySAD/sklearn ROC-AUC
ROCAUC_1 = AUROCMetric()

for xi, yi in iter_array(X, y):
    model = model.learn_one(xi)          
    anomaly_score = model.score_one(xi)
    ROCAUC.update(yi[0], -anomaly_score)
    ROCAUC_1.update(yi[0], -anomaly_score)
print('ROCAUC River : ', ROCAUC)
print('ROCAUC sklearn : ', ROCAUC_1.get())

ROCAUC River :  ROCAUC: -0.00%
ROCAUC sklearn :  0.7524336630554247


RS-Hash : take as input the minimum and maximum boundaries of the features : how to implement this for online learning with one sample ? Potential solution : rolling preprocessing, but not easy to implement and maybe not relevant for this problem.
Anomalousness score is reversed : low score = high potential of abnormality (use a minus in the ROC-AUC update ?)

#### Test PySAD calibrator
Test the Conformal Probability Calibrator, with a window size of 300 on RS-Hash implementation, to check if the results are improved/changed

In [21]:
model = RSHash(X.min(axis=0), X.max(axis=0))

calibrator = ConformalProbabilityCalibrator(windowed=True, window_size=300)
# River ROC-AUC
ROCAUC = metrics.ROCAUC()
# PySAD/sklearn ROC-AUC
ROCAUC_1 = AUROCMetric()

i = 0
for xi, yi in iter_array(X, y):
    model = model.learn_one(xi)          
    anomaly_score = model.score_one(xi)
    calibrated_score = calibrator.fit_transform(np.array([anomaly_score]))
    if i%10 == 0:
        print(calibrated_score)
    i+=1
    ROCAUC.update(yi[0], calibrated_score[0])
    ROCAUC_1.update(yi[0], calibrated_score[0])
print('ROCAUC River : ', ROCAUC)
print('ROCAUC sklearn : ', ROCAUC_1.get())

[0.]
[0.18181818]
[0.42857143]
[0.]
[0.04878049]
[0.17647059]
[0.42622951]
[0.07042254]
[0.]
[0.15384615]
[0.11881188]
[0.09009009]
[0.10743802]
[0.00763359]
[0.]
[0.10596026]
[0.26708075]
[0.04678363]
[0.]
[0.17801047]
[0.1641791]
[0.71563981]
[0.02714932]
[0.10822511]
[0.04149378]
[0.60557769]
[0.29501916]
[0.]
[0.70462633]
[0.19931271]
[0.65666667]
[0.31333333]
[0.65666667]
[0.57]
[0.18333333]
[0.82333333]
[0.10666667]
[0.93333333]
[0.51333333]
[0.04333333]
[0.25666667]
[0.51]
[0.42333333]
[0.98]
[0.56333333]
[0.29]
ROCAUC River :  ROCAUC: 68.51%
ROCAUC sklearn :  0.6870976605432564


Calibration may work with River ROCAUC, sklearn ROCAUC already has a calibrator, maybe over-computation that reduces the performance ?