# Anomaly Detection Playground

Dataset used: [AnoML-IoT](https://www.kaggle.com/datasets/hkayan/anomliot)

Sample dataset format:

| Time       | Temperature | Humidity | ... |
|------------|-------------|----------|-----|
| 1623781306 | 37.94       | 28.94    | ... |
| ...        | ...         | ...      | ... |

Although the data is unlabelled, it was stated in the description of the dataset that anomalies are created in the following period of time:

- 18:21:46 - 19:37:16 (first day)
- 02:26:36 - 04:15:56 (second day)
- 08:54:46 - 10:45:36 (second day)

## Load Dataset

In [None]:
import pandas as pd

anoml_iot_dataset = pd.read_csv('./datasets/dataset_final.csv')

In [None]:
anoml_iot_dataset.plot("Time", ["Temperature"])

## Make the Dataframe a Timeseries Object

In [None]:
from darts.timeseries import TimeSeries

ts_data = TimeSeries.from_dataframe(anoml_iot_dataset, time_col="Time", value_cols=["Temperature"])
ts_data

## Generate Anomaly Detection Ground Truth

In [None]:
def is_data_point_anomaly(row):
    # 18:21:46 - 19:37:16 (first day)
    if row["Time"] >= 1623781306 and row["Time"] <= 1623785836:
        return 1
    # 02:26:36 - 04:15:56 (second day)
    if row["Time"] >= 1623810396 and row["Time"] <= 1623816956:
        return 1
    # 08:54:46 - 10:45:36 (second day)
    if row["Time"] >= 1623833686 and row["Time"] <= 1623840336:
        return 1
    return 0

In [None]:
anoml_iot_dataset["Anomaly"] = anoml_iot_dataset.apply(is_data_point_anomaly, axis=1)

anomaly_ground_truth = TimeSeries.from_dataframe(anoml_iot_dataset, time_col="Time", value_cols=["Anomaly"])
anomaly_ground_truth

## Anomaly Detection

In [None]:
from darts.ad.anomaly_model.filtering_am import FilteringAnomalyModel
from darts.models.filtering.moving_average_filter import MovingAverageFilter
from darts.ad.scorers.difference_scorer import DifferenceScorer
from darts.ad.anomaly_model.forecasting_am import ForecastingAnomalyModel
from darts.models.forecasting.auto_arima import AutoARIMA
from darts.models.forecasting.arima import ARIMA

ad_model = FilteringAnomalyModel(model=MovingAverageFilter(window=59, centered=False), scorer=DifferenceScorer())
# ad_model = ForecastingAnomalyModel(model=ARIMA(p=2), scorer=DifferenceScorer())
# ad_model.fit(ts_data, allow_model_training=True)


In [None]:
ad_model.show_anomalies(ts_data)

In [None]:
score = ad_model.score(ts_data)

In [None]:
from darts.ad.detectors.quantile_detector import ThresholdDetector

THRESHOLD_RANGE = 2

td = ThresholdDetector(low_threshold=-THRESHOLD_RANGE,high_threshold=THRESHOLD_RANGE)
anomalies = td.detect(score)
anomalies.pd_dataframe()

## Evaluate Model Accuracy

In [None]:
td.eval_accuracy(anomaly_ground_truth, anomalies, metric="accuracy")

## Plot Anomalies

In [None]:
anomalies.plot()

In [None]:
anomalies.pd_dataframe()

anomalies_df_raw = anomalies.pd_dataframe()

# convert to dataframe

anomaly_df = pd.DataFrame(
    {
        "Time": anomalies_df_raw["0"].keys(),
        "IsAnomaly": anomalies_df_raw["0"].values,
    },
)

# Filter out anomalies
anomaly_df = anomaly_df[anomaly_df["IsAnomaly"] > 0]
anomaly_df["Time"] = pd.to_datetime(anomaly_df["Time"], unit="s", origin="unix")

anomaly_df

In [None]:
temperature_df = pd.DataFrame(
    {
        "Time": ts_data.pd_dataframe()["Temperature"].keys(),
        "Temperature":ts_data.pd_dataframe()["Temperature"].values,
    },
)

temperature_df["Time"] = pd.to_datetime(temperature_df["Time"], unit="s", origin="unix")
temperature_df


In [None]:
filtered_temperature = temperature_df[temperature_df["Time"].isin(anomaly_df["Time"])]
filtered_temperature

In [None]:
import matplotlib.pyplot as plt

fig = plt.figure(figsize=(20, 10))

plt.title('Anomalies in the Temperature time series')
plt.ylabel('Temperature')
plt.xlabel('Time')
plt.plot(temperature_df["Time"], temperature_df["Temperature"], label='Temperature')
# Mark the anomalous data point with a red circle, based on the binary output of the anomaly detector
plt.plot(filtered_temperature["Time"], filtered_temperature["Temperature"], 'ro', label='Anomaly')
plt.legend()
plt.show()


## Simulate Real-time Anomaly Detection (Incremental/Online Learning)

Darts seems to be designed for batch learning instead of incremental learning, making it not suitable for real-time anomaly detection.