Skip to content

Commit

Permalink
Merge pull request #7 from MentatInnovations/dmo-review
Browse files Browse the repository at this point in the history
Take one
  • Loading branch information
canagnos committed Dec 22, 2017
2 parents 9308062 + eb7a060 commit 8489604
Show file tree
Hide file tree
Showing 29 changed files with 763 additions and 1,283 deletions.
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -134,4 +134,7 @@ vignettes/*.pdf

# Temporary files created by R markdown
*.utf8.md
*.knit.md
*.knit.md

# VSCode
.vscode
Empty file added HISTORY.md
Empty file.
53 changes: 39 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,47 @@
# datastream.io
An open-source framework for real-time anomaly detection using Python, ElasticSearch and Kibana
An open-source framework for real-time anomaly detection using Python, ElasticSearch and Kibana.

## MODULES
## Installation
The recommended installation method is to use pip within a Python 3.x virtalenv.

We will offer the following functionality for v0.1:
virtualenv --python=python3 dsio-env
source dsio-env/bin/activate
pip install -e git+https://github.com/MentatInnovations/datastream.io#egg=dsio

### File listener
A ulitity that can listen to a folder, either local or in HDFS, and track new files or additions to existing files, which it then submits for ingestion to another service.
## Usage

### Re-streaming
A utility that can take an offline batch CSV file, and re-stream it at realistic speeds to an ingestion service.
### Elasticsearch & Kibana

### Anomaly Detector Class Interface
An abstract interface for anomaly detection on CSV files with a set of assumptions about the data to be made explicit, and generalised further in subsequent versions
You need to have access to running Elasticsearch and Kibana 5.x instances in order to use dsio. If you don't have them already, you can easily start them up in your machine using the docker-compose.yaml file within the examples directory. Docker and docker-compose need to be installed for this to work.

### Basic Pipeline
- Setup Data Sample > Kibana dashboard generation
- Data Ingest > forked to:
- DSIO > generates alerts > ES > Kibana
- raw data write to ES > Kibana
cd dsio-env/src/dsio/examples
docker-compose up -d

Check that Elasticsearch and Kibana are up.

docker-compose ps

Once you're done working with dsio you can bring them down.

docker-compose down

Keep in mind that docker-compose commands need to be run in the directory where the docker-compose.yaml file resides (e.g. dsio-env/src/dsio/examples)

### Examples

You can use the example csv datasets or provide your own. If the dataset includes a time dimension, dsio will attempt to detect it automatically. Alternatively, you can --timefield argument to manually configure the field that designates the time dimension. If no such field exists, dsio will assume the data is a time series starting from now with 1sec intervals between samples.

dsio data/cardata_sample.csv

The above command will load the cardata sample csv and will use the default Quantile1D anomaly detector to apply scores for each numeric column. Then it will generate an appropriate Kibana dashboard and will restream the data to Elasticsearch. A browser window should open that will point to the generated Kibana dashboard. Elasticsearch and Kibana are assumed to be running in the default location, http://localhost:9200/ and http://localhost:5601/app/kibana respectively. You can customize these locations using the --es-uri and --kibana-uri arguments.

You can experiment with different datasets and anomaly detectors. E.g.

dsio --detector gaussian1d data/kddup_sample.csv

### Defining your own anomaly detectors

You can use dsio with your own hand coded anomaly detectors. These should inherit from the AnomalyDetector abstract base class and implement at least the train, update & score methods. You can find an example 99th percentile anomaly detector in the examples dir. Load the python modules that contain your detectors using the --modules argument and select the target detector by providing its class name to the --detector argument (case insensitive).

dsio --modules detector.py --detector percentile data/cardata_sample.csv

76 changes: 0 additions & 76 deletions arxiv/config_json_car.py

This file was deleted.

76 changes: 0 additions & 76 deletions arxiv/config_json_cyber.py

This file was deleted.

1 change: 0 additions & 1 deletion arxiv/dashboard_example.json

This file was deleted.

82 changes: 0 additions & 82 deletions arxiv/main.py

This file was deleted.

Loading

0 comments on commit 8489604

Please sign in to comment.