Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new multivariate dataset to pipeline and test in notebook #72

Merged
merged 55 commits into from
Jun 20, 2018

Conversation

Chaoste
Copy link
Contributor

@Chaoste Chaoste commented Jun 12, 2018

Addresses #40

  • Comments are missing
  • Pollution is a bit random (pollution param defines the sampling of random.choice)
  • Code might be confusing -> I'll refactor it with the purpose of reusing it for other types of MV outliers.

UPDATE:

  • Only supports two dimensions right now
  • SyntheticMultivariateDataset is the class for generating any multivariate anomaly
  • For anomaly examples see the "dim2" functions defined in the same file as the class.
  • The class is now able to handle some parameters for configuring the dataset
  • Code is commented as good as possible

Already implemented anomalies:

  • doubled_dim2 (Anomaly: not doubled but quadrupled values)
  • inversed_dim2 (Anomaly: not doubled but inversed doubled values)
  • shrinked_dim2 (Anomaly: curve has half the length of the original curve)
  • delayed_dim2 (Anomaly: curve is delayed by 10% of the curve)
  • xor_dim2 (Anomaly: curve is occurring in both dimensions at the same time or at least overlapping)

@Chaoste Chaoste added the enhancement New feature or request label Jun 12, 2018
@Chaoste Chaoste self-assigned this Jun 12, 2018
@Chaoste Chaoste added this to To do in MP via automation Jun 12, 2018
@Chaoste Chaoste moved this from To do to In progress in MP Jun 12, 2018
from .dataset import Dataset

"""
TODO:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add an issue about this (with some more details)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deprecated comment removed

"""


def get_random(x, strength=1):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "add_scaled_random" or something would be a more suitable method name?

Copy link
Contributor

@maxifischer maxifischer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@WGierke
Copy link
Contributor

WGierke commented Jun 19, 2018

Nice! Do you mind adding the datasets to main.py?

self.mean_curve_length = mean_curve_length
self.mean_curve_amplitude = mean_curve_amplitude
self.global_noise = 0.1 # Noise added to all dimensions over the whole timeseries
self.dim2 = dim2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why "dim2"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to call it "anomaly_function". At the moment there are only 2D anomalies, another PR will solve that

# The last two values are ignored for generation of not anomalous data


def doubled_dim2(curve_values, anomalous, interval_length):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the next 4 methods you're not using interval_length. I think it'd make sense to rename it to _ to indicate that it is not used.

if not anomalous:
return curve_values, -1, -1
else:
# The curve in the second dimension occures a few timestamps later
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

occurs

# Add anomaly labels with slight padding (dont start with the first interval value).
# The padding is curve_length / padding_factor
if create_anomaly:
assert end > start and start >= 0, f'Invalid anomaly indizes: {start} to {end}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can simplify that to assert end > start >= 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: indices

@WGierke
Copy link
Contributor

WGierke commented Jun 20, 2018

Do you mind uncommenting run_experiments() in main.py so the experiments are actually run when main.py is executed?

@WGierke WGierke merged commit 09d7010 into master Jun 20, 2018
MP automation moved this from In progress to Done Jun 20, 2018
@WGierke WGierke deleted the feature-40/func-dependencies branch June 20, 2018 11:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
No open projects
MP
  
Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants