Anomaly Detectives

An in depth approach to detecting significant real-time shifts in network performance indicating network degradation. Building on the data generation process behind DANE and Viasat's network stats, we build a classification system that determines if there are substantial changes to packet loss rate and degree of latency. Please visit our webpage for a more comprehensive view of this project.

Quick Links

To generate data for this project:

Generate data using our modified fork of DANE
- make, docker.io, and docker-compose are required on your machine to run modified_dane properly.
- a recursive flag is required to properly install modified_dane:
  git clone https://github.com/jenna-my/modified_dane --recursive

Clone this branch of the repository

git clone https://github.com/LauraDiao/Anomaly_Detectives

Place all raw DANE csv files within the directory data/raw of this repository. If the directory has not been created, run the command run.py once to generate all relevant directories.

To use this repository:

Each of these targets implements a core feature of the repository within run.py. All code can be executed with the run.py according to various targets specified below.
Example call: python run.py data inference

Target List:

data: generates features from unseen and seen data
eda: Generates visualizations used in exploring which features to use for the model
train: prints results of model performance tested on training ("seen") data with four different models with varying architectures: decision tree, random forest, extra trees, and gradient boost
inference: (deprecated) prints results of model performance tested on testing ("unseen") data with the same exact models.
clean: Removes files generated by targets in commonly used output directories
test: Verifies target functionality by running the targets data,eda, train, and inference with a subset of the original model training data.
all: runs all targets except test

Our modified version of DANE creates csv files with a naming scheme in the following format:

datevalue_latency-loss-deterministic-laterlatency-laterloss-iperf.csv

e.g. 20220117T015822_200-100-true-200-10000-iperf.csv

this format is crucial for the model to train on the proper labels.

Configuration Files

eda.json

lst: [1, 2], # list of runs to compare side by side made by plottogether() inside of eda.py
filen1: "combined_subset_latency.csv", - subset of the processed data to make eda
filen2: "combined_t_latency.csv", - features generated from processed data
filen3: "combined_all_latency.csv" - all processed -

model.json

n_jobs: -1 - number of cores the model training is done on
train_window: 20 - number of seconds that the model will aggregate on for training window size
pca_components: 4 - number of components for PCA, we determined 4 was optimal for our model
test_size: 0.005 - model validation set size (train test split)
threshold: -0.15 - threshold for loss anomaly detection
emplosswindow: 25 - rolling window aggregation of empirical loss, set at 25 seconds
pct_change_window: 2 - how many seconds the anomaly detection system looks back for determining change.
verbose: "True" - whether terminal output should be verbose or not. For debugging purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
__pycache__		__pycache__
config		config
docker		docker
docs		docs
notebooks		notebooks
outputs		outputs
src		src
test_data		test_data
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
report.pdf		report.pdf
run.py		run.py
submission.json		submission.json

License

LauraDiao/Anomaly_Detectives

Folders and files

Latest commit

History

Repository files navigation

Anomaly Detectives

Quick Links

To generate data for this project:

To use this repository:

Target List:

Configuration Files

eda.json

model.json

About

Resources

License

Stars

Watchers

Forks

Languages