QCAD: Explainable Contextual Anomaly Detection using Quantile Regression Forests

Paper accepted by DAMI (Data Mining and Knowledge Discovery journal) for publication (July 2023), and this paper is available via this link.

Repo Structure

the QCAD repo includes two folders, Code and Data.

Code

Specifcially, the Code folder contains the following sub-folders:

Implementation: which includes the implementations of contextual anomaly detection algorithms and traditional anomaly detection algorithms as follows:
- QCAD.py: algorithm proposed and implemented by us.
- CAD.py: algorithm proposed by Song, Xiuyao, et al. 2007; implemented by us.
- ROCOD.py: algorithm proposed by Liang, Jiongqian, and Srinivasan Parthasarathy. 2016; implemented by us.
- LoPAD.py: algorithm proposed by Lu, Sha, et al.; implemented by us.
- PyODtest.py: other traditional anomaly detection algorithms such as KNN,LOF,SOD,IForest, HBOS, implemented by Yue Zhao, Zain Nasrullah, and Zheng Li., 2019; API written by us.
Utilities: which contains some utility function/scripts as follows:
- SynDataGen.py: generate synthetic datasets.
- ContextualAnomalyInject.py: inject contextual anomalies.
- FindMB.R: find Markov Blankets for the LoPAD algorithm.
Examples: which contains the following scripts used to generate examples in our paper.
- ExampleFootball.py: generate the football application example in the Experiment Results section;
- ExampleQuantileHeight.py: generate the figures in the Introduction section;
- ExampleBeanPlot.py: generate the Beanplot in the Method section;
MultipleRunningAverage: which run all involved detection algothms 10 times independently.
- AverageTest.py: execute all anomaly detection algorithms except CAD on 20 real-world datasets 10 times, respectively.
- AverageTestCAD.py: execute CAD separately on 20 real-world datasets 10 times, respectively. This is because it takes a long time.
- SynAverageTest.py: execute all anomaly detection algorithms except CAD on 10 synthetic datasets 10 times, respectively.
- SynAverageTestCAD.py: execute CAD separately on 10 synthetic datasets 10 times, respectively. This is because it takes a long time.
AblationStuides: which investigate the impacts of different components on detection performance.
- AblationStudy.py: conduct two ablation stuides.
RuntimeAnalysis: which inspects the computational cost of QCAD and CAD.
- RuntimeAnalysis.py: inspect the running time by varying the number of behaviroual features, contextual features or samples, respectively.
SensitivityStudies: which investigate the impact of parameter k.
- SensitivityOfNeighbours.py: inspect the detection accuracy in terms of RUC AUC, PR AUC, P@n by varying the number of neighbours.

Data

Specifcially, the Data folder contains the following sub-folders:

RawData: 20 real-world datasets without contextual anomalies (assumption)
SynData: 10 synthetic datasets without contextual anomalies
GenData: 20 real-world datasets with injected contextual anomalies, 10 synthetic datasets with contextual anomalies, and the Markov Blankets of these 30 datasets in the subfolder ~/MB/
Examples: the football dataset with unkown real-world contextual anomalies
TempFiles: temporary or intermediate results

Name		Name	Last commit message	Last commit date
Latest commit History 87 Commits
Code		Code
Data		Data
.DS_Store		.DS_Store
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

QCAD: Explainable Contextual Anomaly Detection using Quantile Regression Forests

Repo Structure

Code

Data

About

Releases 1

Packages

Languages

ZhongLIFR/QCAD

Folders and files

Latest commit

History

Repository files navigation

QCAD: Explainable Contextual Anomaly Detection using Quantile Regression Forests

Repo Structure

Code

Data

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages