# Summary

This notebook illustrates the approach of transforming temporal data into tabular data by feature extraction.

Using the library TSFresh https://tsfresh.readthedocs.io/en/latest/index.html we can extract temportal features from overlapping rolling windows.

The library tsfresh is not performing anomaly detection, it only handles feature engineering. 
It can be used in combination with any other libary performing anomaly detection. This notebook uses the baseline approach of using sklearn algorithms (see sklearn.ipynb) and illustrates the increased performance of leveraging temporal features instead of independent samples (tabular data) .

In [1]:
import sys
sys.path.append('../src')
import evaluation_utils, data_utils

import json
from sklearn.ensemble import IsolationForest
from sklearn.svm import OneClassSVM
from sklearn.neighbors import LocalOutlierFactor
from tsfresh import extract_features
from tsfresh.utilities.dataframe_functions import roll_time_series

import pandas as pd

In [2]:
# Load data
X, y = data_utils.get_data('../data/6_cardio.npz')

In [3]:
# convert the matrix to a dataframe
df = pd.DataFrame(data=X)
# Tsfresh is expecting a long format to handle multivariate time series
long_df = df.reset_index().melt(id_vars='index', var_name='id', value_name='value')

In [4]:
# set sliding rolling window
rolled_df = roll_time_series(long_df, column_id='id', column_sort='index', max_timeshift=30, min_timeshift=10)

Rolling: 100%|██████████| 40/40 [00:10<00:00,  3.65it/s]


In [5]:
# extract temporal features from each rolling window
features = extract_features(rolled_df, column_id='id', column_sort='index')


Feature Extraction: 100%|██████████| 40/40 [07:40<00:00, 11.50s/it]  


In [6]:
features = features.reset_index()
# drop columns with missing values
features = features.dropna(axis=1)
# the first two columns are the id and the index
cols = features.columns[2:].values
# pivot the table to have the features as columns
extracted_features = features.pivot(index='level_1', columns='level_0', values=cols)

In [7]:
# the rolling window creation is missing the last samples of each time series
y = y[extracted_features.index.values]

In [8]:
# Define the anomaly detection methods
methods = {
    "Isolation Forest": IsolationForest(contamination=0.1),
    "One-Class SVM": OneClassSVM(nu=0.1),
    "Local Outlier Factor": LocalOutlierFactor(n_neighbors=20, contamination=0.1)
}

# Apply each method
for name, method in methods.items():
    if name == "Local Outlier Factor":
        predicted_anomalies = method.fit_predict(extracted_features)
    else:
        method.fit(extracted_features)
        predicted_anomalies = method.predict(extracted_features)
    
    # Reshape the prediction values to 0 for valid, 1 for anomalies 
    predicted_anomalies[predicted_anomalies == 1] = 0
    predicted_anomalies[predicted_anomalies == -1] = 1

    print(f"\n{name} Results:")
    scores = evaluation_utils.run_evaluation(y, predicted_anomalies, do_point_adjustment=True)
    print(json.dumps(scores, indent=4))
    # Save the results to a file
    data_utils.save_results(f"results/tsfresh sklearn {name}.npz", scores)


Isolation Forest Results:
{
    "AUCROC": 0.9101201989499861,
    "AUCPR": 0.6991923507574249,
    "F1": 0.8268106438927704,
    "Precision": 0.8131868131868132,
    "Recall": 0.8409090909090909,
    "Adjusted AUCROC": 0.9896656534954407,
    "Adjusted AUCPR": 0.8380952380952381,
    "Adjusted F1": 0.9119121372655744,
    "Adjusted Precision": 0.8380952380952381,
    "Adjusted Recall": 1.0
}
Results saved to results/tsfresh sklearn Isolation Forest.npz

One-Class SVM Results:
{
    "AUCROC": 0.5343465045592706,
    "AUCPR": 0.10304449648711944,
    "F1": 0.1868348242363068,
    "Precision": 0.10304449648711944,
    "Recall": 1.0,
    "Adjusted AUCROC": 0.5343465045592706,
    "Adjusted AUCPR": 0.10304449648711944,
    "Adjusted F1": 0.1868348242363068,
    "Adjusted Precision": 0.10304449648711944,
    "Adjusted Recall": 1.0
}
Results saved to results/tsfresh sklearn One-Class SVM.npz

Local Outlier Factor Results:
{
    "AUCROC": 0.5390266648245372,
    "AUCPR": 0.1082726307190887,
 