Skip to content

LauraDiao/Anomaly_Detectives

Repository files navigation

Anomaly Detectives

An in depth approach to detecting significant real-time shifts in network performance indicating network degradation. Building on the data generation process behind DANE and Viasat's network stats, we build a classification system that determines if there are substantial changes to packet loss rate and degree of latency. Please visit our webpage for a more comprehensive view of this project.


Quick Links


To generate data for this project:

  1. Generate data using our modified fork of DANE

    • make, docker.io, and docker-compose are required on your machine to run modified_dane properly.
    • a recursive flag is required to properly install modified_dane:
      git clone https://github.com/jenna-my/modified_dane --recursive
  2. Clone this branch of the repository

    git clone https://github.com/LauraDiao/Anomaly_Detectives
    
  3. Place all raw DANE csv files within the directory data/raw of this repository. If the directory has not been created, run the command run.py once to generate all relevant directories.


To use this repository:

Each of these targets implements a core feature of the repository within run.py. All code can be executed with the run.py according to various targets specified below.
Example call: python run.py data inference

Target List:

  • data: generates features from unseen and seen data
  • eda: Generates visualizations used in exploring which features to use for the model
  • train: prints results of model performance tested on training ("seen") data with four different models with varying architectures: decision tree, random forest, extra trees, and gradient boost
  • inference: (deprecated) prints results of model performance tested on testing ("unseen") data with the same exact models.
  • clean: Removes files generated by targets in commonly used output directories
  • test: Verifies target functionality by running the targets data,eda, train, and inference with a subset of the original model training data.
  • all: runs all targets except test



Our modified version of DANE creates csv files with a naming scheme in the following format:

datevalue_latency-loss-deterministic-laterlatency-laterloss-iperf.csv

e.g. 20220117T015822_200-100-true-200-10000-iperf.csv

this format is crucial for the model to train on the proper labels.

Configuration Files

eda.json

  • lst: [1, 2], # list of runs to compare side by side made by plottogether() inside of eda.py
  • filen1: "combined_subset_latency.csv", - subset of the processed data to make eda
  • filen2: "combined_t_latency.csv", - features generated from processed data
  • filen3: "combined_all_latency.csv" - all processed -

model.json

  • n_jobs: -1 - number of cores the model training is done on
  • train_window: 20 - number of seconds that the model will aggregate on for training window size
  • pca_components: 4 - number of components for PCA, we determined 4 was optimal for our model
  • test_size: 0.005 - model validation set size (train test split)
  • threshold: -0.15 - threshold for loss anomaly detection
  • emplosswindow: 25 - rolling window aggregation of empirical loss, set at 25 seconds
  • pct_change_window: 2 - how many seconds the anomaly detection system looks back for determining change.
  • verbose: "True" - whether terminal output should be verbose or not. For debugging purposes.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages