# Anomaly Detection Analysis

This notebook demonstrates the process of anomaly detection using ARIMA forecasting residuals.

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.arima.model import ARIMA
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.seasonal import seasonal_decompose
import sys
sys.path.append('..')
from src.data_generation import generate_synthetic_data
from src.anomaly_detection import detect_anomalies
from src.visualization import plot_time_series_with_anomalies, plot_residuals_distribution

%matplotlib inline

## Generate and Explore Data

In [2]:
# Generate synthetic data
df = generate_synthetic_data()
df.to_csv('../data/synthetic_data.csv', index=False)
print("Synthetic data generated and saved to data/synthetic_data.csv")

# Display the first few rows and basic info about the dataset
print(df.head())
print(df.info())

## Time Series Decomposition

We'll decompose the time series to understand its components: trend, seasonality, and residuals.

In [3]:
# Aggregate data by date for decomposition
df_agg = df.groupby('date')['value'].sum().reset_index()
df_agg.set_index('date', inplace=True)

# Perform time series decomposition
result = seasonal_decompose(df_agg['value'], model='additive', period=365)
result.plot()
plt.tight_layout()
plt.show()

The decomposition shows:
1. A clear upward trend in the data.
2. A strong seasonal component with yearly cycles.
3. Residuals that appear mostly random, but with some potential anomalies.

## ARIMA Modeling and Anomaly Detection

In [4]:
# Perform anomaly detection
df_with_anomalies = detect_anomalies(df_agg)
print(f"Detected {df_with_anomalies['anomaly'].sum()} anomalies")

# Plot the time series with detected anomalies
plot_time_series_with_anomalies(df_with_anomalies)

# Plot the distribution of residuals
residuals = df_with_anomalies['value'] - df_with_anomalies['forecast']
plot_residuals_distribution(residuals)

## Interpretation of Results

1. The anomaly detection algorithm identified several points that deviate significantly from the expected values.
2. These anomalies could represent important events or issues in the data that warrant further investigation.
3. The distribution of residuals appears to be roughly normal, with some outliers that correspond to the detected anomalies.

Next steps could include:
1. Investigating the specific dates and circumstances of the detected anomalies.
2. Refining the ARIMA model parameters for potentially better forecasting.
3. Applying this method to real-world data and validating the results with domain experts.