Skip to content
Anomaly Detection on Dynamic (time-evolving) Graphs in Real-time and Streaming manner. Detecting intrusions (DoS and DDoS attacks), frauds, fake rating anomalies. C++, Golang, Python, R, Rust, Ruby implementations.
C++ Python Other
Branch: master
Clone or download

Latest commit

Latest commit b17d3d6 May 27, 2020

Files

Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
LICENSE Initial commit Oct 23, 2019
Makefile optimisation Nov 3, 2019
README.md update figure May 27, 2020
anom.cpp comments Jan 25, 2020
anom.hpp Add files via upload Oct 23, 2019
argparse.hpp added arguments Nov 3, 2019
auc.py Evaluation Feb 27, 2020
demo.sh Add missing imports, demos.sh executable Dec 5, 2019
edgehash.cpp replace reserve with resize in vector Dec 8, 2019
edgehash.hpp Add files via upload Oct 23, 2019
example.csv Add files via upload Oct 23, 2019
main.cpp Add files via upload Oct 23, 2019
nodehash.cpp replace reserve with resize in vector Dec 8, 2019
nodehash.hpp Add files via upload Oct 23, 2019

README.md

MIDAS

Microcluster-Based Detector of Anomalies in Edge Streams

GIF demo ...

Table of Contents

Features

  • Finds Anomalies in Dynamic/Time-Evolving Graphs
  • Detects Microcluster Anomalies (suddenly arriving groups of suspiciously similar edges e.g. DoS attack)
  • Theoretical Guarantees on False Positive Probability
  • Constant Memory (independent of graph size)
  • Constant Update Time (real-time anomaly detection to minimize harm)
  • Up to 48% more accurate and 644 times faster than the state of the art approaches

For more details, please read the paper - MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams. Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos. AAAI 2020.

Use Cases

  1. Intrusion Detection
  2. Fake Ratings
  3. Financial Fraud

Getting Started

  1. Run make to compile code and create the executable.
  2. Run ./midas -i followed by the input file path and name.

Demo

  1. Run ./demo.sh to compile the code and run it on an example dataset.

Command-Line Options

  • -h --help: produce help message
  • -i --input: input file name
  • -o --output: output file name (default: scores.txt)
  • -r --rows: Number of Hash Functions (default: 2)
  • -b --buckets: Number of Buckets (default: 769)
  • -a --alpha: Temporal Decay Factor (default: 0.6)
  • --norelations : Run MIDAS instead of MIDAS-R
  • --undirected : Treat graph as undirected instead of directed

Input File Format

MIDAS expects the input edge stream to be stored in a single file containing the following three columns in order:

  1. source (int): source ID of the edge
  2. destination (int): destination ID of the edge
  3. time (int): timestamp of the edge

Thus, each line represents an edge. Edges should be sorted in non-decreasing order of their timestamps and the column delimiter should be ,

Datasets

  1. DARPA: Original Format, MIDAS format
  2. TwitterWorldCup2014
  3. TwitterSecurity

Online Articles

  1. KDnuggets: Introducing MIDAS: A New Baseline for Anomaly Detection in Graphs
  2. Towards Data Science: Controlling Fake News using Graphs and Statistics
  3. Towards Data Science: Anomaly detection in dynamic graphs using MIDAS
  4. Towards AI: Anomaly Detection with MIDAS
  5. AIhub Interview

MIDAS in other Languages

  1. Golang by Steve Tan
  2. Ruby by Andrew Kane
  3. Rust by Scott Steele
  4. R by Tobias Heidler
  5. Python by Ritesh Kumar

Citation

If you use this code for your research, please consider citing our paper.

@article{bhatia2019midas,
  title={MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams},
  author={Bhatia, Siddharth and Hooi, Bryan and Yoon, Minji and Shin, Kijung and Faloutsos, Christos},
  journal={arXiv preprint arXiv:1911.04464},
  year={2019}
}


Webpage https://www.comp.nus.edu.sg/~sbhatia/  ·  Email siddharth@comp.nus.edu.sg  ·  Twitter @siddharthb_

You can’t perform that action at this time.