Skip to content

Commit

Permalink
Initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
tatbul committed Nov 30, 2018
0 parents commit 3d88d30
Show file tree
Hide file tree
Showing 55 changed files with 468,614 additions and 0 deletions.
7 changes: 7 additions & 0 deletions AUTHORS
@@ -0,0 +1,7 @@
The original authors of this software package are:

Nesime Tatbul, Intel Labs, nesime.tatbul@intel.com
Justin Gottschlich, Intel Labs, justin.gottschlich@intel.com

Future contributors can add their names here:

9 changes: 9 additions & 0 deletions LICENSE
@@ -0,0 +1,9 @@
The MIT License

Copyright (c) 2018 Intel Corporation

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
124 changes: 124 additions & 0 deletions README.md
@@ -0,0 +1,124 @@

# TSAD-Evaluator

This software package, publicly released under the MIT License, implements the customizable evaluation model for time series anomaly detection presented in the following paper:

**"Precision and Recall for Time Series"**, Nesime Tatbul, Tae Jun Lee, Stan Zdonik, Mejbah Alam, Justin Gottschlich, 32nd Annual Conference on Neural Information Processing Systems (NeurIPS'18), Montreal, Canada, December 2018. (https://arxiv.org/abs/1803.03639/)

## Building

```
cd src
make clean
make
```

## Running

There are two alternative ways to run TSAD-Evaluator:

```
./evaluate {-v} [-c | -t | -n] <real_data_file> <predicted_data_file>
```

```
./evaluate {-v} [-c | -t | -n] <real_data_file> <predicted_data_file> <beta> <alpha_r> <gamma> <delta_p>
<delta_r>
```

Here is a description of all command line options, inputs, and parameters:

```
-v : Produce verbose output.
-c : Compute classical metrics.
-t : Compute time series metrics.
-n : Compute numenta-like metrics.
<real_data_file> : File with real data labels.
<predicted_data_file> : File with predicted data labels.
<beta> : F-Score parameter (relative importance of Recall vs. Precision).
Positive real number, Default = 1, Most common = 1.
<alpha_r> : Relative weight of existence reward for Recall.
Real number in [0 .. 1], Default = 0, Most common = 0.
<gamma> : Customizable overlap cardinality function for Precision&Recall.
Values = {one, reciprocal, udf_gamma}.
Default = one, Most common = reciprocal.
<delta_p> : Customizable positional bias function for Precision.
Values = {flat, front, middle, back, udf_delta}.
Default = flat, Most common = flat.
<delta_r> : Customizable positional bias function for Recall.
Values = {flat, front, middle, back, udf_delta}.
Default = flat, Most common = {flat, front, back}.
```

When no parameters are specified (like in the first usage above), then the following default values are used:

```
./evaluate {-v} [-c | -t | -n] <real_data_file> <predicted_data_file> 1 0 one flat flat
```

To produce verbose output (i.e., to print the list of all real and predicted anomaly ranges), please use the `-v` option.

It is important to note that the use of `-v` is optional, whereas the metric option (`-c` or `-t` or `-n`) must always be specified.

When the `-c` option is used, then both anomaly intervals are represented as unit-size intervals (i.e, as points), and one of the following parameter settings are expected:

```
./evaluate -c <real_data_file> <predicted_data_file> <beta> 0 one flat flat
```

```
./evaluate -c <real_data_file> <predicted_data_file> <beta> 1 x x x
```

In the second usage above, `x` means that the value of this parameter doesn't really matter.

When the `-t` option is used, then both anomaly intervals are represented as ranges, and parameters can be customly set as required by the application. For example:

```
./evaluate -t examples/simple/simple.real examples/simple/simple.pred 1 0 reciprocal flat front
```

When the `-n` option is used, then the predicted anomaly intervals are represented as unit-size intervals (i.e, as points) and the real anomaly intervals are represented as ranges, and one of the following parameter settings are expected:

```
./evaluate -n <real_data_file> <predicted_data_file> 1 0 one flat front
```

```
./evaluate -n <real_data_file> <predicted_data_file> 0.5 0 one flat front
```

```
./evaluate -n <real_data_file> <predicted_data_file> 2 0 one flat front
```

The first mimics Numenta-Standard, the second mimics Numenta-Reward-Low-FP, and
the third mimics Numenta-Reward-Low-FN, respectively.

## Additional Usage Notes

+ `<real_data_file>` and `<predicted_data_file>` are CSV files with 0/1 anomaly labels that correspond to each timestamp. In v1.0, these files contain only the labels, simply assuming label=1 at line=t indicates the presence of an anomaly at timestamp=t. More sophisticated input file formats can be supported in future releases. For example input files, please see: `examples/*/*.real` and `examples/*/*.pred`.

+ An anomaly range is represented as a pair of timestamps `<start,end>`. In v1.0, we simply assume timestamps are integers `>= 0`. For example, `<1,5>` denotes the time period from 1 to 5, inclusive on both ends.

+ A collection of anomaly ranges is represented as a vector of anomaly ranges, ordered by ascending timestamps. For example, `[<1,5>,<10,12>]` denotes a vector of two anomalous ranges.

+ The application developer can provide user-defined functions for `<gamma>`, `<delta_p>`, and `<delta_r>`, if the pre-defined choices are not sufficient for his/her purpose. In order to do this, please select `<udf_gamma>` and/or `<udf_delta>` for the corresponding command line arguments above. Furthermore, please make sure to define these custom functions in `evaluator.cpp` by filling in the following templates provided in this file and rebuilding afterwards:

```
double evaluator::udf_gamma_def(int overlap_count, e_metric m)
double evaluator::udf_delta_def(timestamp t, timestamp anomaly_length, e_metric m)
```

## References

+ Paper: https://arxiv.org/abs/1803.03639/
+ Poster: https://people.csail.mit.edu/tatbul/talks/NeurIPS18_poster.pdf
+ Slides: https://people.csail.mit.edu/tatbul/talks/NeurIPS18_spotlight.pdf
+ Short Video: https://www.youtube.com/watch?v=K5f-dUBiQP4

## Contact

+ Nesime Tatbul, Intel Labs, nesime.tatbul@intel.com
+ Justin Gottschlich, Intel Labs, justin.gottschlich@intel.com

0 comments on commit 3d88d30

Please sign in to comment.