# Three-Lens Approach to Time Series Anomaly Detection

## Introduction

This project explores time series anomaly detection with the goal of developing a deeper, hands-on understanding of the methods available and how they behave across diverse contexts. My motivation comes from a broader interest in time series analysis: after completing a forecasting course, I became curious about anomaly detection as a related but less emphasized area. Since anomalies are rare, difficult to define, and highly context-dependent, I saw this as an opportunity to **explore techniques** that I would not encounter in as much depth through regular coursework.

The purpose of the project is **not to benchmark methods or produce definitive results**, but rather to **gain intuition about model strengths and weaknesses through experimentation**. I focused on building experience with both the data processing pipeline and with modeling approaches, comparing how they perform in different situations.


### Datasets
To cover a range of settings, I worked with four distinct datasets. See `/notebooks` for individual reports on each. 

- Seasonal synthetic series: `short_seasonal`
- Financial series: `quant_finance`
- Quality control dataset: `quality_control`
- Physics dataset: `physics_oscillation`

In choosing these datasets, I deliberately emphasized diversity in areas like synthetic vs. real, short vs. long, univariate vs. multivariate, labeled vs. unlabeled, and spanning multiple domains.

### Models

On the modeling side, I experimented with approaches from three broad categories: Classical/statistical, Self-trained machine learning, and Pre-trained or package-based methods.

The following table indicates which models are applied to which datasets.
|Models|`short_seasonal` | `quant_finance`|`quality_control`| `physics_oscillation`|
|-------|-----------------|----------------|-----------------|----------------------|
|STL| Yes | Yes |||
|SARIMA/ARMA| Yes| Yes||Yes|
|Kalman Filters| | Yes | | Yes|
|GARCH | | Yes | | |
|MVN|||Yes||
|BOCPD|  | | Yes | |
|CUSUM| Yes | | Yes | Yes |
|LSTM-AE | Yes | Yes | Yes | Yes |
|Isolation forest| Yes | Yes | Yes | Yes |
|Prophet| Yes|Yes||Yes|

Taken together, these experiments form a broad exploratory study into anomaly detection in time series. The emphasis throughout is on learning by doing—understanding how different techniques behave, what assumptions they rely on, and where they succeed or fail.

## Lessons Learned

Through this project I gained a number of insights into both the practice of anomaly detection and the broader process of working with time series data.

**Model strengths and weaknesses are highly context-dependent**
- Every model explored had situations where it worked well and others where it was a poor fit. 
- Classical statistical methods in particular excel when applied to datasets that align with their design assumptions: The GARCH model stood out in the financial dataset, identifying volatility-driven anomalies that no other model detected. STL and SARIMA fit the seasonal synthetic dataset exceptionally well, as they are designed to handle seasonality in a way that machine learning models are not.
- These cases suggest that simple, domain-tailored statistical methods can outperform more complex approaches, while also offering significant interpretability.

**Machine learning models are more adaptable but less specialized**
- The LSTM autoencoder tended to produce reasonable results across all datasets. Their flexibility makes them broadly applicable, but effective use requires hyperparameter tuning. 
- This process highlighted both the potential and the frustration of ML methods: they can adapt across domains, but their opacity and tuning burden can make them difficult to interpret and control.

**The nature of the anomaly itself matters**
- Large, obvious deviations in value are detected by almost any model, regardless of fit. 
- Subtle shifts in mean or variance are much more difficult to capture, even with methods specifically intended for them (e.g., CUSUM or GARCH). This underlined how anomaly detection is not only about choosing the right model but also about carefully defining what constitutes an “anomaly” in context.

**Tooling support is strong**
- A pleasant surprise was that all models and libraries used in this project were straightforward to implement.
- The accessibility of modern anomaly detection packages made it possible to focus on exploration and comparison rather than wrestling with implementation details.

**Hands-on experience builds intuition**
- Working through the full pipeline—data preparation, model application, synthetic data generation, and visualization—gave me a much stronger sense of how different approaches behave in practice. 
- I now have a clearer intuition for which models are suited to which datasets, what types of anomalies they can detect, and how to think about anomaly detection problems in general.

## Next Steps

Building on the insights from this project, there are several directions I plan to pursue:

- **Expanding model coverage**: Thus far, the project has emphasized classical approaches with some coverage of self-trained and package-based models. The next step will be to broaden this scope by experimenting with modern methods such as transformers, additional neural network architectures, and pre-trained services such as AWS Lookout for Metrics. This will provide a stronger comparison between traditional, self-trained, and pre-trained approaches.

- **Improving modularity and reusability**: As the number of models and datasets grows, I plan to refactor and modularize the codebase so that functions and pipelines can be reused more easily. The goal is to structure the project in a way that supports straightforward addition of new datasets and models, while also acting as a small personal library for time series experiments.

- **Diversifying datasets with a focus on real-world data**: Future work will include exploring more datasets, especially real-world cases where anomalies are subtle, labels are scarce, and preprocessing plays a larger role. By doing so, I hope to gain more practical experience with the challenges of applying anomaly detection methods outside of controlled or synthetic settings.

- **Continuing exploration within time series**: I have greatly enjoyed working with time series data, and plan to continue developing expertise in this area. Once I feel more confident in anomaly detection, I may expand into related areas such as time series prediction, data synthesis, or benchmarking models. This project is therefore both a learning experience in itself and a stepping stone toward broader exploration in time series analysis.