---
# Collective Anomalies  
<img src="../img/collective.png" width="400">

Occur when a record is anomalous when considered with adjacent records. This is prevalent in time-series data where a set of consecutive records are anomalous compared to the rest of the dataset

![](../img/collective2.png)

Concretely, this means that the record may not be considered as anomalous alone; however, when combined within a collective of sequential records, it may be anomalous.

--- 
# Low-Pass Statistical Filter

## Anomalies on sunspots

The file has 3,143 rows, which contain information about sunspots collected between the years 1749-1984. Sunspots are defined as dark spots on the surface of the sun. The study of sunspots helps scientists understand the sun's properties over a period of time; in particular, its magnetic properties.

In [None]:
import numpy as np
import pandas as pd
from itertools import count
from pandas.core.window import Rolling
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
%matplotlib inline

In [None]:
df = pd.read_csv("../datasets/sunspots.csv", sep='\t')

In [None]:
df.head()

In [None]:
plt.figure(figsize=(16, 8))
plt.title("Sunspots")
plt.plot(range(len(df)), df['sunspots'], '-', c='forestgreen')

#### Moving Average Using Discrete Linear Convolution

In [None]:
def moving_average(data, window_size):
    window = np.ones(int(window_size))/float(window_size)
    return np.convolve(data, window, 'same')

In [None]:
def explain_anomalies(y, window_size, sigma=1.0):
    avg = moving_average(y, window_size).tolist()
    residual = y - avg
    std = np.std(residual)
    return {'standard_deviation': round(std, 3),
            'anomalies_dict': {index: y_i for index, y_i, avg_i in zip(count(), y, avg)
              if (y_i > avg_i + (sigma*std)) | (y_i < avg_i - (sigma*std))}}

In [None]:
def explain_anomalies_rolling_std(y, window_size, sigma=1.0):
    avg = moving_average(y, window_size)
    avg_list = list(avg)#.tolist()
    residual = y - avg
    testing_std = Rolling(residual, window_size).std()
    testing_std_as_df = pd.DataFrame(testing_std)
    rolling_std = list(testing_std_as_df.replace(np.nan, testing_std_as_df.iloc[window_size - 1]).round(3).iloc[:,0])#.tolist()
    std = np.std(residual)
    return {'stationary standard_deviation': round(std, 3),
            'anomalies_dict': {index: y_i for index, y_i, avg_i, rs_i in zip(count(), y, avg_list, rolling_std)
              if (y_i > avg_i + (sigma * rs_i)) | (y_i < avg_i - (sigma * rs_i))}}

In [None]:
# This function is repsonsible for displaying how the function performs on the given dataset.
def plot_results(x, y, window_size, sigma_value=1, text_xlabel="X Axis", text_ylabel="Y Axis", applying_rolling_std=False):
    plt.figure(figsize=(15, 8))
    plt.plot(x, y, "k.", alpha=0.4, c='blue')
    y_av = moving_average(y, window_size)
    plt.plot(x, y_av, color='green')
    plt.xlim(0, len(y))
    plt.xlabel(text_xlabel)
    plt.ylabel(text_ylabel)
    events = {}
    if applying_rolling_std:
        events = explain_anomalies_rolling_std(y, window_size=window_size, sigma=sigma_value)
    else:
        events = explain_anomalies(y, window_size=window_size, sigma=sigma_value)

    x_anomaly = np.fromiter(events['anomalies_dict'].keys(), dtype=int, count=len(events['anomalies_dict']))
    y_anomaly = np.fromiter(events['anomalies_dict'].values(), dtype=float,
                                            count=len(events['anomalies_dict']))
    plt.scatter(x_anomaly, y_anomaly, s=50, c='red')
    plt.grid(True)
    plt.show()

In [None]:
plot_results(range(len(df.index)), 
             y=df['sunspots'], 
             window_size=10, 
             text_xlabel="Months", 
             sigma_value=3, 
             text_ylabel="Values")

In [None]:
events = explain_anomalies(df['sunspots'], window_size=5, sigma=3)

print("Information about the anomalies model: {}".format(events.keys()))

In [None]:
events['anomalies_dict']

In [None]:
events_std = explain_anomalies_rolling_std(df['sunspots'], window_size=5, sigma=3)

print("Information about the anomalies model:{}".format(events_std))