MIDAS

C++ implementation of

Real-time Streaming Anomaly Detection in Dynamic Graphs. Siddharth Bhatia, Rui Liu, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos. TKDD 2022.
MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams. Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos. AAAI 2020.

The old implementation is in another branch OldImplementation, it should be considered as being archived and will hardly receive feature updates.

Features

Finds Anomalies in Dynamic/Time-Evolving Graph: (Intrusion Detection, Fake Ratings, Financial Fraud)
Detects Microcluster Anomalies (suddenly arriving groups of suspiciously similar edges e.g. DoS attack)
Theoretical Guarantees on False Positive Probability
Constant Memory (independent of graph size)
Constant Update Time (real-time anomaly detection to minimize harm)
Up to 55% more accurate and 929 times faster than the state of the art approaches
Experiments are performed using the following datasets:

Demo

If you use Windows:

Open a Visual Studio developer command prompt, we want their toolchain
cd to the project root MIDAS/
cmake -DCMAKE_BUILD_TYPE=Release -GNinja -S . -B build/release
cmake --build build/release --target Demo
cd to MIDAS/build/release/
.\Demo.exe

If you use Linux/macOS:

Open a terminal
cd to the project root MIDAS/
cmake -DCMAKE_BUILD_TYPE=Release -S . -B build/release
cmake --build build/release --target Demo
cd to MIDAS/build/release/
./Demo

The demo runs on MIDAS/data/DARPA/darpa_processed.csv, which has 4.5M records, with the filtering core (MIDAS-F).

The scores will be exported to MIDAS/temp/Score.txt, higher means more anomalous.

All file paths are absolute and "hardcoded" by CMake, but it's suggested NOT to run by double clicking on the executable file.

Requirements

Core

C++11
C++ standard libraries

Demo (if experimental ROC-AUC impl)

C++ standard libraries

Demo (if sklearn ROC-AUC impl)

Python 3 (MIDAS/util/EvaluateScore.py)
- pandas: I/O
- scikit-learn: Compute ROC-AUC

Experiment

(Optional) Intel TBB: Parallelization
(Optional) OpenMP: Parallelization

Other python utility scripts

Python 3
- pandas
- scikit-learn

Customization

Switch to `sklearn` ROC-AUC Implementation

In MIDAS/example/Demo.cpp.
Comment out section "Evaluate scores (experimental)"
Uncomment section "Write output scores" and "Evaluate scores".

Different CMS Size / Decay Factor / Threshold

Those are arguments of cores' constructors, which are at MIDAS/example/Demo.cpp:67-69.

Switch Cores

Cores are instantiated at MIDAS/example/Demo.cpp:67-69, uncomment the chosen one.

Custom Dataset + `Demo.cpp`

You need to prepare three files:

Meta file
- Only includes an integer N, the number of records in the dataset
- Use its path for pathMeta
- E.g. MIDAS/data/DARPA/darpa_shape.txt
Data file
- A header-less csv format file of shape [N,3]
- Columns are sources, destinations, timestamps
- Use its path for pathData
- E.g. MIDAS/data/DARPA/darpa_processed.csv
Label file
- A header-less csv format file of shape [N,1]
- The corresponding label for data records
  - 0 means normal record
  - 1 means anomalous record
- Use its path for pathGroundTruth
- E.g. MIDAS/data/DARPA/darpa_ground_truth.csv

Custom Dataset + Custom Runner

Include the header MIDAS/src/NormalCore.hpp, MIDAS/src/RelationalCore.hpp or MIDAS/src/FilteringCore.hpp
Instantiate cores with required parameters
Call operator() on individual data records, it returns the anomaly score for the input record

Other Files

`example/`

`Experiment.cpp`

The code we used for experiments.
It will try to use Intel TBB or OpenMP for parallelization.
You should comment all but only one runner function call in the main() as most results are exported to MIDAS/temp/Experiiment.csv together with many intermediate files.

`Reproducible.cpp`

Similar to Demo.cpp, but with all random parameters hardcoded and always produce the same result.
It's for other developers and us to test if the implementation in other languages can produce acceptable results.

`util/`

DeleteTempFile.py, EvaluateScore.py and ReproduceROC.py will show their usage and a short description when executed without any argument.

`AUROC.hpp`

Experimental ROC-AUC implementation in C++11. More info at this repo.

`PreprocessData.py`

The code to process the raw dataset into an easy-to-read format.
Datasets are always assumed to be in a folder in MIDAS/data/.
It can process the following dataset(s)

DARPA/darpa_original.csv -> DARPA/darpa_processed.csv, DARPA/darpa_ground_truth.csv, DARPA/darpa_shape.txt

In Other Languages

Python: Rui Liu's MIDAS.Python, Ritesh Kumar's pyMIDAS
Python (pybind): Wong Mun Hou's MIDAS
Golang: Steve Tan's midas
Ruby: Andrew Kane's midas
Rust: Scott Steele's midas_rs
R: Tobias Heidler's MIDASwrappeR
Java: Joshua Tokle's MIDAS-Java
Julia: Ashrya Agrawal's MIDAS.jl

Online Coverage

Citation

If you use this code for your research, please consider citing the arXiv preprint

@misc{bhatia2020realtime,
    title={Real-Time Anomaly Detection in Edge Streams},
    author={Siddharth Bhatia and Rui Liu and Bryan Hooi and Minji Yoon and Kijung Shin and Christos Faloutsos},
    booktitle={Transactions on Knowledge Discovery from Data (TKDD)},
    year={2022}
}

or the AAAI paper

@inproceedings{bhatia2020midas,
    title="MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams",
    author="Siddharth {Bhatia} and Bryan {Hooi} and Minji {Yoon} and Kijung {Shin} and Christos {Faloutsos}",
    booktitle="AAAI Conference on Artificial Intelligence (AAAI)",
    year="2020"
}

Name		Name	Last commit message	Last commit date
Latest commit History 206 Commits
asset		asset
data/DARPA		data/DARPA
example		example
gcn_numpy		gcn_numpy
src		src
temp		temp
util		util
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CMakeLists.txt		CMakeLists.txt
Copy_of_Copy_of_midas4.ipynb		Copy_of_Copy_of_midas4.ipynb
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MIDAS

Table of Contents

Features

Demo

Requirements

Customization

Switch to `sklearn` ROC-AUC Implementation

Different CMS Size / Decay Factor / Threshold

Switch Cores

Custom Dataset + `Demo.cpp`

Custom Dataset + Custom Runner

Other Files

`example/`

`Experiment.cpp`

`Reproducible.cpp`

`util/`

`AUROC.hpp`

`PreprocessData.py`

In Other Languages

Online Coverage

Citation

About

Releases

Packages

Languages

License

MahjabeenTahir/MIDAS

Folders and files

Latest commit

History

Repository files navigation

MIDAS

Table of Contents

Features

Demo

Requirements

Customization

Switch to sklearn ROC-AUC Implementation

Different CMS Size / Decay Factor / Threshold

Switch Cores

Custom Dataset + Demo.cpp

Custom Dataset + Custom Runner

Other Files

example/

Experiment.cpp

Reproducible.cpp

util/

AUROC.hpp

PreprocessData.py

In Other Languages

Online Coverage

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Switch to `sklearn` ROC-AUC Implementation

Custom Dataset + `Demo.cpp`

`example/`

`Experiment.cpp`

`Reproducible.cpp`

`util/`

`AUROC.hpp`

`PreprocessData.py`

Packages