DES

Brian Harrington edited this page Mar 1, 2016 · 22 revisions
Clone this wiki locally

Double exponential smoothing (DES) is a simple technique for generating a smooth trend line from another time series. This technique is often used to generate a dynamic threshold for alerting.

:warning: Alerts on dynamic thresholds should be expected to be noisy. They are looking for strange behavior rather than an actual problem causing impact. Make sure you will actually spend the time to tune and investigate the alarms before using this approach. See the alerting philosophy guide for more information on best practices.

Tuning

The :des operator takes 4 parameters:

  • An input time series
  • training - the number of intervals to use for warming up before generating an output
  • alpha - is a data smoothing factor
  • beta - is a trend smoothing factor

Training

The training parameter defines how many intervals to allow the DES to warmup. In the graph below the gaps from the start of the chart to the smoothed lines reflects the training window used:

bfc8f139.png

Typically a training window of 10 has been sufficient as DES will adjust to the input fairly quick. However, in some cases if there is a massive change in the input it can cause DES to oscillate, for example:

fa5f443a.png

Alpha

Alpha is the data smoothing factor. A value of 1 means no smoothing. The closer the value gets to 0 the smoother the line should get. Example:

16c8869d.png

Beta

Beta is a trend smoothing factor. Visually it is most apparent when alpha is small. Example with alpha = 0.01:

7632e077.png

Recommended Values

Experimentally we have converged on 3 sets of values based on how quickly it should adjust to changing levels in the input signal.

Helper Alpha Beta
:des-fast 0.1 0.02
:des-slower 0.05 0.03
:des-slow 0.03 0.04

Here is an example of how they behave for a sharp drop and recovery:

f2d7a074.png

For a more gradual drop:

d9edd36f.png

If the drop is smooth enough then DES can adjust without ever triggering.

Alerting

For alerting purposes the DES line will typically get multiplied by a fraction and then checked to see whether the input line drops below the DES value for a given interval.

# Query to generate the input line
nf.cluster,alerttest,:eq,
name,requestsPerSecond,:eq,:and,
:sum,

# Create a copy on the stack
:dup,

# Apply a DES function to generate a prediction
:des-fast,

# Used to set a threshold. The prediction should
# be roughly equal to the line, in this case the
# threshold would be 85% of the prediction.
0.85,:mul,

# Create a boolean signal line that is 1
# for datapoints where the actual value is
# less than the prediction and 0 where it
# is greater than or equal the prediction.
# The 1 values are where the alert should
# trigger.
:lt,

# Apply presentation details.
:rot,$name,:legend,

The vertical spans show when the expression would have triggered with due to the input dropping below the DES line at 85%:

b0a33c92.png

fcff822d.png

Epic Macros

There are two helper macros, des-epic-signal and des-epic-viz, that match the behavior of the previous epic DES alarms. The first generates a signal line for the alarm. The second creates a visualization to make it easier to see what is happening. Both take the following arguments:

  • line - input line
  • trainingSize - training size parameter for DES
  • alpha - alpha parameter for DES
  • beta - beta parameter for DES
  • maxPercent - percentage offset to use for the upper bound. Can be set to NaN to disable the upper bound check.
  • minPercent - percentage offset to use for the lower bound. Can be set to NaN to disable the lower bound check.
  • noise - a fixed offset that is the minimum difference between the signal and prediction that is required before the signal should trigger. This is primarily used to avoid false alarms where the percentage bound can become ineffective for routine noise during the troughs.

Examples:

/api/v1/graph?
  e=2012-01-01T09:00
  &h=150
  &l=0
  &q=
    nf.cluster,alerttest,:eq,
    name,requestsPerSecond,:eq,
    :and,
    :sum,
    10,0.1,0.02,0.15,0.15,10,:des-epic-viz
  &s=e-6h
  &tz=UTC
  &w=750

cd7253aa.png

Example with no lower bound:

/api/v1/graph?
  e=2012-01-02T09:00
  &h=150
  &l=0
  &q=
    nf.cluster,alerttest,:eq,
    name,requestsPerSecond,:eq,
    :and,
    :sum,
    10,0.1,0.02,0.15,NaN,10,:des-epic-viz
  &s=e-6h
  &tz=UTC
  &w=750

1dfef7be.png