Skip to content
Willie Wheeler edited this page Jun 22, 2018 · 15 revisions

Why are there only three anomaly categories (normal, weak anomaly, strong anomaly)? Isn't that limiting?

Most if not all anomaly detection algorithms work by assigning anomaly scores to data points in some algorithm-specific way. In many cases these scores are continuous, and so it may seem that discretizing this into three categories is limiting.

This isn't necessarily the case, though. The first thing to keep in mind is that anomaly detection passes the anomaly score along to downstream processes, so anybody who cares about the actual score has access to that.

The other thing to keep in mind is the main point of having the classification in the first place: we're trying to provide a hint to the anomaly validation process. And there seem to be three logical possibilities, corresponding to the three categories:

  • we're pretty sure it's not an anomaly (normal),
  • we're pretty sure that it is an anomaly (strong), and
  • we're not sure whether it's an anomaly (weak).

The classification drives validator behavior. If we're confident that the data point isn't an anomaly, there's no point in processing it further. If we're confident that it is an anomaly, then we're going to fire off an alert even if the validator can't find any underlying cause. And if we don't know whether it's an anomaly, then the validator is going to investigate a bit further so we can decide whether to generate an alert.

Ideally, anomaly scores would reduce to continuous probabilities, and we could decide to fire alerts based on crossing some probability threshold. In practice though, most anomaly scores aren't probabilities and don't directly reduce to probabilities. And the validator needs to decide what to do either way. The classification helps with that.


Some of the time series models use machine learning, but the anomaly classifier just uses manually-specified rules (e.g., anomaly if too many sigmas away from the mean). Can't we use machine learning to train the classifier too? Can we somehow incorporate user feedback about classifications in that training?

We would very much like to do this. One challenge we need to address is that the labels for classifier training mostly don't exist. There might be algorithms we can use that can take advantage of partially-labeled data. See for example this paper. But something simple we can likely do is auto-adjust alerting sensitivity in response to user feedback.


Why is the Aquila anomaly detector in a separate repo, when the other detectors are in the Adaptive Alerting repo?

The AA repo contains well-known algorithms and models, and intentionally avoids experimental approaches since experimentation involves a level of churn incompatible with having a stable core library. Aquila falls under the "experimental" category and hence has its own repo.