Add new multivariate dataset to pipeline and test in notebook #72

Chaoste · 2018-06-12T15:53:04Z

Addresses #40

~~Comments are missing~~
Pollution is a bit random (pollution param defines the sampling of random.choice)
~~Code might be confusing -> I'll refactor it with the purpose of reusing it for other types of MV outliers.~~

UPDATE:

Only supports two dimensions right now
SyntheticMultivariateDataset is the class for generating any multivariate anomaly
For anomaly examples see the "dim2" functions defined in the same file as the class.
The class is now able to handle some parameters for configuring the dataset
Code is commented as good as possible

Already implemented anomalies:

doubled_dim2 (Anomaly: not doubled but quadrupled values)
inversed_dim2 (Anomaly: not doubled but inversed doubled values)
shrinked_dim2 (Anomaly: curve has half the length of the original curve)
delayed_dim2 (Anomaly: curve is delayed by 10% of the curve)
xor_dim2 (Anomaly: curve is occurring in both dimensions at the same time or at least overlapping)

# Conflicts: # src/algorithms/lstm_ad.py

…rce/MP-2018 into fix/lstmad_threshold

…endencies

WGierke · 2018-06-18T08:50:50Z

src/datasets/synthetic_multivariate_dataset.py

+from .dataset import Dataset
+
+"""
+TODO:


Maybe add an issue about this (with some more details)?

Deprecated comment removed

WGierke · 2018-06-18T08:55:22Z

src/datasets/synthetic_multivariate_dataset.py

+"""
+
+
+def get_random(x, strength=1):


Maybe "add_scaled_random" or something would be a more suitable method name?

maxifischer

LGTM

WGierke · 2018-06-19T13:08:52Z

Nice! Do you mind adding the datasets to main.py?

WGierke · 2018-06-19T13:12:55Z

src/datasets/synthetic_multivariate_dataset.py

+        self.mean_curve_length = mean_curve_length
+        self.mean_curve_amplitude = mean_curve_amplitude
+        self.global_noise = 0.1  # Noise added to all dimensions over the whole timeseries
+        self.dim2 = dim2


Why "dim2"?

Might be better to call it "anomaly_function". At the moment there are only 2D anomalies, another PR will solve that

WGierke · 2018-06-19T13:14:31Z

src/datasets/synthetic_multivariate_dataset.py

+# The last two values are ignored for generation of not anomalous data
+
+
+def doubled_dim2(curve_values, anomalous, interval_length):


In the next 4 methods you're not using interval_length. I think it'd make sense to rename it to _ to indicate that it is not used.

WGierke · 2018-06-19T13:16:05Z

src/datasets/synthetic_multivariate_dataset.py

+    if not anomalous:
+        return curve_values, -1, -1
+    else:
+        # The curve in the second dimension occures a few timestamps later


WGierke · 2018-06-19T13:17:30Z

src/datasets/synthetic_multivariate_dataset.py

+        # Add anomaly labels with slight padding (dont start with the first interval value).
+        # The padding is curve_length / padding_factor
+        if create_anomaly:
+            assert end > start and start >= 0, f'Invalid anomaly indizes: {start} to {end}'


You can simplify that to assert end > start >= 0

Also: indices

…th these

WGierke · 2018-06-20T08:09:42Z

Do you mind uncommenting run_experiments() in main.py so the experiments are actually run when main.py is executed?

maxifischer and others added 24 commits June 5, 2018 16:12

rename special variable input to input_vars

c04661a

shift binarized values by len_out-len_in

11d85bb

lower threshold

609499e

Merge branch 'master' into fix/lstmad_threshold

d5163d9

Remove scaling factor

11f8fad

Fix padding

d65c46e

Merge remote-tracking branch 'origin/master' into fix/lstmad_threshold

1cf13fa

# Conflicts: # src/algorithms/lstm_ad.py

fix binarize function, adjusted threshold

a165731

add LSTMAD to CircleCI

4bb1294

Merge branch 'fix/lstmad_threshold' of https://github.com/KDD-OpenSou…

754fc45

…rce/MP-2018 into fix/lstmad_threshold

add get_optimal_threshold function to call in benchmarks binarize

5efad36

Merge branch 'master' into feature/dynamic_thresholds

3853d15

fix build

3988733

rename, refactor threshold plots

e584e73

rename th

20b7864

lint

2c62a85

lint more

a382a88

Merge branch 'master' into feature/dynamic_thresholds

d822e81

add differing extreme outlier experiment

7f1d7bb

merge master & add extremeness experiment

aabc9c1

fix merge conflicts

46bd3e0

lint

e9a4d16

Merge branch 'feature/dynamic_thresholds' into extremeness_experiment

149d6fb

Add new multivariate dataset to pipeline and test in notebook

873e229

Chaoste added the enhancement New feature or request label Jun 12, 2018

Chaoste self-assigned this Jun 12, 2018

Chaoste added this to To do in MP via automation Jun 12, 2018

Merge remote-tracking branch 'origin/master' into feature-40/func-dep…

8dde2db

…endencies

Chaoste moved this from To do to In progress in MP Jun 12, 2018

merge master

61083c6

Chaoste added 4 commits June 15, 2018 23:38

Fix merge conflicts in main py

0de017b

Add param for pause length

13e1cd8

Prettify multivariate dataset before adding more features

632df15

Working MUltivariate anomaly timeseries

c0919cb

WGierke reviewed Jun 18, 2018

View reviewed changes

Chaoste added 3 commits June 18, 2018 14:32

Plots figures in notebook, update code

d246bde

flake

43bcdd6

PR Review

03c8fa7

maxifischer approved these changes Jun 19, 2018

View reviewed changes

WGierke reviewed Jun 19, 2018

View reviewed changes

maxifischer added 3 commits June 19, 2018 15:20

unused parameter to _

90f174f

renaming

7ba7379

refactored anomaly functions in new class & created experiment run wi…

f57569b

…th these

maxifischer added 7 commits June 20, 2018 11:12

comment run_experiments in

e8c4773

merge master

d52df1f

flake8

58b9441

clean run_pipeline and comment in

1b75b00

refactor run_experiments into main, move experiments to base dir

6ad1bb9

add CircleCI option for experiments

096aeef

flake8

9cd3da1

WGierke merged commit 09d7010 into master Jun 20, 2018

MP automation moved this from In progress to Done Jun 20, 2018

WGierke deleted the feature-40/func-dependencies branch June 20, 2018 11:50

WGierke mentioned this pull request Jun 20, 2018

define algorithms including params in a global function #84

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new multivariate dataset to pipeline and test in notebook #72

Add new multivariate dataset to pipeline and test in notebook #72

Chaoste commented Jun 12, 2018 •

edited

WGierke Jun 18, 2018

Chaoste Jun 18, 2018

WGierke Jun 18, 2018

maxifischer left a comment

WGierke commented Jun 19, 2018

WGierke Jun 19, 2018

maxifischer Jun 19, 2018

WGierke Jun 19, 2018

WGierke Jun 19, 2018

WGierke Jun 19, 2018

WGierke Jun 19, 2018

WGierke commented Jun 20, 2018

		# The last two values are ignored for generation of not anomalous data


		def doubled_dim2(curve_values, anomalous, interval_length):

Add new multivariate dataset to pipeline and test in notebook #72

Add new multivariate dataset to pipeline and test in notebook #72

Conversation

Chaoste commented Jun 12, 2018 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

maxifischer left a comment

Choose a reason for hiding this comment

WGierke commented Jun 19, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WGierke commented Jun 20, 2018

Chaoste commented Jun 12, 2018 •

edited