In [56]:
from anomaly import Anomaly, shape, amplitudes
import numpy as np
import bqplot.pyplot as plt
import pandas as pd

# Introduction to Anomaly Generation for Time Series Datasets with the anomaly package

In this tutorial we will go over the basics synthetic anomaly geneartion using the anomaly python package. The goal is to be able to introduce different types of anomalies that we believe will occur in our system to ensure that our outlier detection algorithms are able to pick them up. The anomaly package is designed to take raw timeseries data an perturb it in specific ways to help you generate syntehtic datasets with realistic anomalies specific to your use cases. 

Lets start out with a simple example using the Anomaly object. Here we are going to set the anomaly locations at samples 300 and 700. We are going to guarantee they are generated by setting our anomaly rate to 1. We will only specify a single frequency_range, cycle_range and amplitude_range to begin with.

In [70]:
a = Anomaly(frequency_range=[10], 
                cycle_range=[5],
                amplitude_range=[100],
                anomaly_rate=1,
                anomaly_locations=[300, 700]
                )

Now that we have specified our anomaly creator object, we are going to pass some data in to see what types of anomalies are generated. For this case we are only passing in data that is all 0's and specifying the sample rate of this data to be 100.

Notice how the two anomalies are created at the locatiosn specified 300 and 700. By default, the anomalies are generated with a Sinusoidal pattern and a constant amplitude. We'll change that shortly.

In [71]:
data = a.modify_signal(np.zeros(1000), sample_rate=100)
fig = plt.figure(); fig.layout.width='100%'; plt.plot(data); fig

Figure(axes=[Axis(scale=LinearScale()), Axis(orientation='vertical', scale=LinearScale())], fig_margin={'top':…

Lets add a few more options to our anomaly creation model. Instead of just generating a single type, it will choose random combinations of the supplied values. lets see what we get.

In [59]:
a = Anomaly(frequency_range=[10,20], 
                cycle_range=[5,2],
                amplitude_range=[100,200,400],
                anomaly_rate=1,
                anomaly_locations=[100, 300, 700, 900]
                )
data = a.modify_signal(np.zeros(1000), sample_rate=100)
fig = plt.figure(); fig.layout.width='100%'; plt.plot(data); fig

Figure(axes=[Axis(scale=LinearScale()), Axis(orientation='vertical', scale=LinearScale())], fig_margin={'top':…

We can also look at the log property to see what types of anomalies were actually generated.

In [60]:
pd.DataFrame(a.anomaly_description)

Unnamed: 0,amplitude,anomaly_frequency,num_cycles,index,duration
0,400,10,2,100,20
1,100,20,5,300,25
2,100,10,5,700,50
3,100,20,5,900,25


Next, lets stop passing in specific anomaly locations. When we do this, they will be generated in random locations. We'll also reduce the anomaly rate to .005 so that they aren't always generated.

After running this cell you'll see a variety of anomalies generated at many locations

In [61]:
a = Anomaly(frequency_range=[10,20], 
                cycle_range=[5,2],
                amplitude_range=[100,200,400],
                anomaly_rate=.005,
                )

data = a.modify_signal(np.zeros(1000), sample_rate=100)
fig = plt.figure(); fig.layout.width='100%'; plt.plot(data); fig

Figure(axes=[Axis(scale=LinearScale()), Axis(orientation='vertical', scale=LinearScale())], fig_margin={'top':…

What happens when we specify a sample rate that is too low to detect our anomalies? Notice now that there aren't any anomalies showing up on the graph. However, if we look at the anomaly description, we will see quite a few are there they are just occuring at a faster rate that we are sampling at.

In [62]:
a = Anomaly(frequency_range=[10,20], 
                cycle_range=[5,2],
                amplitude_range=[100,200,400],
                anomaly_rate=.005,
                )

data = a.modify_signal(np.zeros(1000), sample_rate=1)
fig = plt.figure(); plt.plot(data); fig

Figure(axes=[Axis(scale=LinearScale()), Axis(orientation='vertical', scale=LinearScale())], fig_margin={'top':…

In [63]:
pd.DataFrame(a.anomaly_description)

Unnamed: 0,amplitude,anomaly_frequency,num_cycles,index,duration
0,400,20,5,52,1
1,400,20,5,263,1
2,200,10,5,478,1
3,200,10,2,542,1
4,200,20,2,598,1
5,200,20,5,626,1
6,100,20,5,667,1
7,200,10,5,763,1
8,200,10,5,792,1
9,400,20,5,926,1


Ok, finally, lets add a differnt type of anomaly to our plot. We will change both the amplitude function as well as the shape function, lets set our sample rate back to 100 again as well. Can you see the difference, we now have a function that is always possitive, as well as the amplitude which is shaped like a pyramid

In [64]:
a.set_shape(shape.Peak)
a.set_adjust_amplitude(amplitudes.Pyramid)


data = a.modify_signal(np.zeros(1000), sample_rate=100)
fig = plt.figure(); fig.layout.width='100%'; plt.plot(data); fig

Figure(axes=[Axis(scale=LinearScale()), Axis(orientation='vertical', scale=LinearScale())], fig_margin={'top':…

You can easily create and add your own custom shapes and functions as well.

In [65]:
class LinearSpike(object):
    def _adjust_amplitude(self, anomaly):
        amplitude_shift = np.linspace(0, self.amplitude, num=len(anomaly))
        anomaly = np.multiply(anomaly, amplitude_shift)

        return anomaly

    
class Sinusoidal(object):
    def _shape(self):

        x = np.arange(
            0, self.num_cycles / (self.anomaly_frequency), 1 / self.sample_rate
        )

        y = np.sin(2 * np.pi * x * self.anomaly_frequency)

        return y
    
    
a.set_shape(Sinusoidal)
a.set_adjust_amplitude(LinearSpike)

In [68]:
data = a.modify_signal(np.zeros(1000), sample_rate=100)
fig = plt.figure(); fig.layout.width='100%'; plt.plot(data); fig

Figure(axes=[Axis(scale=LinearScale()), Axis(orientation='vertical', scale=LinearScale())], fig_margin={'top':…

Thats all for generating anomalies, in the next tutorial we will look at how you can apply this to real datasets to test anomaly detection algorithms on these synthetically generated data sets.