# Chapter 58: Anomaly Detection

## Learning Objectives

By the end of this chapter, you will be able to:

- Understand different types of anomalies in time‑series data: point, contextual, and collective
- Apply statistical methods such as Z‑score, IQR, and modified Z‑score to detect outliers in NEPSE stock data
- Implement machine learning models for anomaly detection, including Isolation Forest, One‑Class SVM, and Local Outlier Factor
- Use deep learning techniques like autoencoders, LSTM autoencoders, and variational autoencoders for more complex anomaly patterns
- Recognise the importance of domain knowledge in distinguishing true anomalies from normal market events
- Evaluate anomaly detection performance using precision, recall, and F1‑score when labels are available
- Build real‑time anomaly detection pipelines for monitoring NEPSE data streams
- Interpret detected anomalies to inform trading decisions or data quality checks

---

## Introduction

In financial markets, anomalies can signal both opportunities and risks. A sudden spike in trading volume might indicate an impending price move, an erroneous data feed, or insider trading. A sharp drop in price could be a flash crash or a genuine market correction. Detecting these unusual events in real time allows traders to react quickly and data engineers to clean noisy data.

**Anomaly detection** is the task of identifying patterns in data that do not conform to expected behaviour. In time‑series, anomalies can be of three types:

- **Point anomalies**: A single data point that deviates significantly from the rest (e.g., a price jump caused by a typo).
- **Contextual anomalies**: A data point that is unusual in a specific context (e.g., a high temperature in winter; a stock price spike during a typically quiet period).
- **Collective anomalies**: A sequence of points that is anomalous as a whole, even if individual points are normal (e.g., a flatline in price for several hours).

For the NEPSE system, anomaly detection can be applied to:

- **Data quality**: Detect missing or corrupted ticks in the live feed.
- **Market surveillance**: Identify unusual trading activity that may precede announcements.
- **Risk management**: Flag extreme price movements that could trigger circuit breakers.
- **Feature engineering**: Create features indicating anomalous events (e.g., "was yesterday an anomaly?").

In this chapter, we will explore a range of anomaly detection techniques, from simple statistical rules to sophisticated deep learning models. We will use the NEPSE dataset to illustrate each method and discuss how to deploy them in production.

---

## 58.1 Types of Anomalies

Before diving into algorithms, it is essential to understand the nature of anomalies in time‑series.

### 58.1.1 Point Anomalies

A point anomaly is an individual data point that is far from the rest of the data. In stock prices, this could be a data entry error (e.g., a price of 10,000 NPR when the typical range is 500–600) or a genuine but extreme event (e.g., a 10% drop in one day). Point anomalies are often detected using statistical tests on the value itself.

### 58.1.2 Contextual Anomalies

A contextual anomaly is a point that is normal in the global sense but abnormal in a specific context. For example, a 2% daily gain might be normal, but if it occurs during a typically flat period (e.g., lunch hour), it could be suspicious. Contextual anomalies require modelling the expected behaviour conditioned on time, season, or other variables.

### 58.1.3 Collective Anomalies

A collective anomaly is a subsequence of the time‑series that is anomalous as a whole, even if each point individually is within normal range. Examples include a prolonged period of zero volatility (flatline) or a repeating pattern that deviates from the usual seasonality. Collective anomalies often require analysing the shape of the curve, not just individual values.

In financial markets, all three types occur. A flash crash (a rapid drop and recovery) is a collective anomaly. A single erroneous tick is a point anomaly. A price spike during an earnings announcement is a contextual anomaly (normal in that context, but if it happens on a random Tuesday, it might be anomalous).

---

## 58.2 Statistical Methods

Statistical methods are simple, fast, and often effective for point anomalies. They assume that the data follows some distribution (usually Gaussian) and flag points that deviate significantly.

### 58.2.1 Z‑Score

The Z‑score measures how many standard deviations a point is from the mean. For a normally distributed variable, values with |Z| > 3 are often considered outliers.

```python
import numpy as np
import pandas as pd

def zscore_outliers(data, threshold=3):
    mean = np.mean(data)
    std = np.std(data)
    z_scores = (data - mean) / std
    return np.abs(z_scores) > threshold

# Example on NEPSE daily returns
returns = df['NABIL'].pct_change().dropna()
outliers = zscore_outliers(returns)
print(f"Detected {outliers.sum()} point anomalies via Z-score")
```

**Explanation:**  
The Z‑score assumes the data is Gaussian and independent. Financial returns often have heavy tails, so a threshold of 3 might still produce many false positives. A higher threshold (e.g., 4) can be used for extreme anomalies.

### 58.2.2 Modified Z‑Score

The modified Z‑score uses the median and median absolute deviation (MAD), which are robust to outliers. It is preferred for non‑normal or heavy‑tailed distributions.

```python
def modified_zscore_outliers(data, threshold=3.5):
    median = np.median(data)
    mad = np.median(np.abs(data - median))
    modified_z = 0.6745 * (data - median) / mad  # 0.6745 makes MAD comparable to std for normal
    return np.abs(modified_z) > threshold

outliers_mod = modified_zscore_outliers(returns)
print(f"Detected {outliers_mod.sum()} anomalies via modified Z-score")
```

**Explanation:**  
The factor 0.6745 scales the MAD to be a consistent estimator of the standard deviation for normal data. The threshold 3.5 is commonly used.

### 58.2.3 IQR (Interquartile Range)

The IQR method defines outliers as points below Q1 – 1.5×IQR or above Q3 + 1.5×IQR. It is robust and does not assume normality.

```python
def iqr_outliers(data, k=1.5):
    Q1 = np.percentile(data, 25)
    Q3 = np.percentile(data, 75)
    IQR = Q3 - Q1
    lower_bound = Q1 - k * IQR
    upper_bound = Q3 + k * IQR
    return (data < lower_bound) | (data > upper_bound)

outliers_iqr = iqr_outliers(returns)
print(f"Detected {outliers_iqr.sum()} anomalies via IQR")
```

**Explanation:**  
The multiplier `k` controls sensitivity. For financial data, `k=3` might be used to detect extreme events, while `k=1.5` captures mild outliers.

---

## 58.3 Machine Learning Methods

Machine learning models can capture more complex patterns and are suitable for high‑dimensional data (e.g., multiple features per time step). Many anomaly detection algorithms are unsupervised, learning the normal behaviour from the data and flagging deviations.

### 58.3.1 Isolation Forest

Isolation Forest isolates anomalies by randomly partitioning the data. Anomalies are few and different, so they require fewer splits to isolate. The algorithm builds an ensemble of trees, and the anomaly score is based on the average path length to isolation.

```python
from sklearn.ensemble import IsolationForest

# Prepare data: use multiple features (e.g., returns, volume, volatility)
features = df[['return_NABIL', 'volume_NABIL', 'volatility_NABIL']].dropna()
X = features.values

iso_forest = IsolationForest(contamination=0.01, random_state=42)  # assume 1% anomalies
preds = iso_forest.fit_predict(X)  # -1 for anomaly, 1 for normal
anomalies = preds == -1
print(f"Detected {anomalies.sum()} anomalies with Isolation Forest")
```

**Explanation:**  
`contamination` is the expected proportion of anomalies in the data. If unknown, you can set it to 'auto' or tune it. The algorithm works well with high dimensions and does not assume any distribution.

### 58.3.2 One‑Class SVM

One‑Class SVM learns a boundary around the normal data and classifies points outside as anomalies. It is effective when the normal data is clustered.

```python
from sklearn.svm import OneClassSVM

svm = OneClassSVM(nu=0.01, kernel='rbf', gamma='auto')
preds = svm.fit_predict(X)
anomalies = preds == -1
print(f"Detected {anomalies.sum()} anomalies with One-Class SVM")
```

**Explanation:**  
`nu` is an upper bound on the fraction of anomalies (similar to contamination). The RBF kernel can capture non‑linear boundaries. One‑Class SVM can be slow on large datasets.

### 58.3.3 Local Outlier Factor (LOF)

LOF measures the local density deviation of a point compared to its neighbours. Points with substantially lower density than neighbours are considered outliers.

```python
from sklearn.neighbors import LocalOutlierFactor

lof = LocalOutlierFactor(n_neighbors=20, contamination=0.01)
preds = lof.fit_predict(X)
anomalies = preds == -1
print(f"Detected {anomalies.sum()} anomalies with LOF")
```

**Explanation:**  
LOF is local: it works well when the data has regions of varying density. The parameter `n_neighbors` controls the locality.

---

## 58.4 Deep Learning Methods

Deep learning models can capture complex temporal dependencies and are particularly suited for collective and contextual anomalies.

### 58.4.1 Autoencoders

An autoencoder is a neural network trained to reconstruct its input. The assumption is that normal data can be reconstructed well, while anomalies result in high reconstruction error. The model is trained on normal data only (or on all data but with the assumption that anomalies are rare).

**Example: Simple dense autoencoder for multivariate time steps**

```python
import torch
import torch.nn as nn
import torch.optim as optim

class Autoencoder(nn.Module):
    def __init__(self, input_dim, encoding_dim):
        super().__init__()
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, 64),
            nn.ReLU(),
            nn.Linear(64, encoding_dim),
            nn.ReLU()
        )
        self.decoder = nn.Sequential(
            nn.Linear(encoding_dim, 64),
            nn.ReLU(),
            nn.Linear(64, input_dim)
        )

    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded

# Prepare data: each sample is a vector of features at a single time step
# (if you want to capture temporal context, you need to include lags)
X_train = torch.tensor(features.values, dtype=torch.float32)

model = Autoencoder(input_dim=X_train.shape[1], encoding_dim=16)
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Train on normal data (assuming most is normal)
epochs = 100
batch_size = 64
dataset = torch.utils.data.TensorDataset(X_train)
dataloader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=True)

for epoch in range(epochs):
    for batch in dataloader:
        x = batch[0]
        reconstructed = model(x)
        loss = criterion(reconstructed, x)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    if epoch % 10 == 0:
        print(f"Epoch {epoch}, loss {loss.item():.4f}")

# Compute reconstruction error on all data
model.eval()
with torch.no_grad():
    reconstructions = model(X_train)
    mse = torch.mean((X_train - reconstructions)**2, dim=1).numpy()

# Set threshold (e.g., 95th percentile)
threshold = np.percentile(mse, 95)
anomalies = mse > threshold
print(f"Detected {anomalies.sum()} anomalies with autoencoder")
```

**Explanation:**  
The autoencoder learns a compressed representation of normal data. Anomalies will have higher reconstruction error. The threshold can be set based on a validation set with known anomalies or a percentile of the training errors.

### 58.4.2 LSTM Autoencoder

For time‑series, it's natural to use recurrent networks. An LSTM autoencoder reads the input sequence, compresses it into a context vector, and then reconstructs the sequence. It can capture temporal dependencies.

```python
class LSTMAutoencoder(nn.Module):
    def __init__(self, seq_len, n_features, encoding_dim):
        super().__init__()
        self.seq_len = seq_len
        self.n_features = n_features
        self.encoder_lstm = nn.LSTM(input_size=n_features, hidden_size=64, batch_first=True)
        self.encoder_fc = nn.Linear(64, encoding_dim)
        self.decoder_fc = nn.Linear(encoding_dim, 64)
        self.decoder_lstm = nn.LSTM(input_size=64, hidden_size=n_features, batch_first=True)

    def forward(self, x):
        # x: (batch, seq_len, n_features)
        # Encode
        _, (hidden, _) = self.encoder_lstm(x)
        # hidden: (1, batch, 64)
        hidden = hidden[-1]  # (batch, 64)
        encoding = self.encoder_fc(hidden)  # (batch, encoding_dim)
        # Decode
        decoded_hidden = self.decoder_fc(encoding).unsqueeze(0)  # (1, batch, 64)
        decoded_cell = torch.zeros_like(decoded_hidden)
        # Repeat the same hidden state for each time step
        decoder_input = torch.zeros(x.size(0), 1, 64)  # initial input
        outputs = []
        for t in range(self.seq_len):
            decoder_input, (decoded_hidden, decoded_cell) = self.decoder_lstm(decoder_input, (decoded_hidden, decoded_cell))
            outputs.append(decoder_input)
        outputs = torch.cat(outputs, dim=1)  # (batch, seq_len, 64)
        # Final projection to features
        outputs = outputs  # already n_features from LSTM output? In this simple example, we need a linear layer.
        # For clarity, we'll add a linear layer after LSTM to project to n_features.
        # (Simplification: we can set decoder LSTM to output n_features directly)
        return outputs

# This is a simplified sketch; a full implementation is more involved.
# Libraries like `keras` provide LSTM autoencoder layers more conveniently.
```

**Explanation:**  
The encoder LSTM compresses the sequence into a fixed‑size vector. The decoder LSTM then reconstructs the sequence step by step. Anomalies yield higher reconstruction error.

### 58.4.3 Variational Autoencoder (VAE)

VAEs learn a probabilistic latent space and can provide a measure of anomaly through reconstruction probability rather than error. They are more robust to noise.

```python
# Using a library like `pyod` which includes VAE for anomaly detection
from pyod.models.vae import VAE

vae = VAE(encoder_neurons=[64, 32], decoder_neurons=[32, 64], contamination=0.01)
vae.fit(X_train)
y_pred = vae.predict(X_train)  # 0 normal, 1 anomaly
anomalies = y_pred == 1
```

**Explanation:**  
VAE assumes a probabilistic generative process. The reconstruction probability (the likelihood of the data given the latent encoding) is used as anomaly score.

---

## 58.5 Time‑Series Specific Methods

### 58.5.1 Seasonal Decomposition + Residual Analysis

A classic approach for univariate time‑series with seasonality: decompose the series into trend, seasonal, and residual components, then detect anomalies in the residual. The STL (Seasonal‑Trend decomposition using Loess) is a robust method.

```python
from statsmodels.tsa.seasonal import STL
import matplotlib.pyplot as plt

# Assume daily data with weekly seasonality (period=5 for NEPSE trading days)
series = df['NABIL_close'].dropna()
stl = STL(series, period=5, robust=True)
result = stl.fit()

# Extract residual
residual = result.resid

# Detect outliers in residual (e.g., using IQR)
outliers = iqr_outliers(residual, k=3)
print(f"Detected {outliers.sum()} anomalies in residual")

# Plot
fig = result.plot()
plt.show()
```

**Explanation:**  
The residual contains the irregular component after removing trend and seasonality. Anomalies will appear as spikes in the residual. This method captures contextual anomalies because it accounts for expected seasonal patterns.

### 58.5.2 Twitter's AnomalyDetection

Twitter's AnomalyDetection R package (also available in Python via `twitter‑ads‑anomaly‑detection`) uses seasonal decomposition and statistical testing to flag anomalies.

```python
# Install: pip install twitter-adspackage
from twitter_adspackage import detect_anomalies

# Uses ETS decomposition and generalized extreme studentized deviate test
results = detect_anomalies(series, max_anoms=0.02, direction='both')
anomalies = results['anomaly']
print(anomalies)
```

---

## 58.6 Evaluation

When ground truth labels are available (e.g., manually flagged erroneous ticks), we can evaluate anomaly detectors using standard classification metrics:

- **Precision** = TP / (TP + FP)
- **Recall** = TP / (TP + FN)
- **F1‑score** = 2 * (Precision * Recall) / (Precision + Recall)

For unlabeled data, we can use domain knowledge to check if detected anomalies coincide with known events (earnings announcements, circuit breaker triggers).

**Example evaluation:**

```python
from sklearn.metrics import classification_report

# Assuming true_labels is a boolean array (True if anomaly)
true_labels = df['is_anomaly'].values  # from external source
pred_labels = outliers  # from some detector

print(classification_report(true_labels, pred_labels))
```

---

## 58.7 Real‑Time Anomaly Detection

For production, anomaly detection must run in real time on streaming data. The methods can be adapted:

- **Statistical methods**: maintain rolling mean/std and flag points as they arrive.
- **Isolation Forest**: can be trained offline and used online (scoring is fast).
- **Autoencoder**: can score new points instantly.

**Example: Rolling Z‑score on a stream**

```python
class RollingZScore:
    def __init__(self, window=100, threshold=3):
        self.window = window
        self.threshold = threshold
        self.values = []

    def update(self, value):
        self.values.append(value)
        if len(self.values) > self.window:
            self.values.pop(0)
        if len(self.values) < 2:
            return False
        mean = np.mean(self.values)
        std = np.std(self.values)
        if std == 0:
            return False
        z = (value - mean) / std
        return abs(z) > self.threshold

# Simulate stream
detector = RollingZScore()
for price in stream:
    if detector.update(price):
        print(f"Anomaly detected: {price}")
```

**Explanation:**  
This class maintains a sliding window of recent values and computes Z‑score online. It is suitable for detecting point anomalies in real time.

---

## 58.8 Interpretation of Anomalies

Detecting an anomaly is only half the work; understanding why it happened is crucial. Domain knowledge must be applied:

- **Market events**: Check if the anomaly coincides with news, earnings, or regulatory changes.
- **Data issues**: Examine raw data for obvious errors (e.g., price outside bid‑ask spread).
- **Model limitations**: Some anomalies may be false positives caused by model inadequacy (e.g., failure to capture volatility changes).

Visualisation helps: plot the time series with anomalies highlighted, and overlay relevant context (e.g., trading volume, news indicators).

```python
import matplotlib.pyplot as plt

plt.figure(figsize=(12,6))
plt.plot(series.index, series.values, label='Price')
plt.scatter(series.index[anomalies], series.values[anomalies], color='red', label='Anomaly')
plt.legend()
plt.show()
```

---

## 58.9 Case Study: Detecting Anomalies in NEPSE Volume

Let's apply some methods to detect unusual volume spikes in NEPSE stocks. Volume anomalies can indicate institutional activity or data errors.

```python
# Load volume data for a stock
volume = df['NABIL_volume'].dropna()

# Method 1: IQR on log volume (since volume is log‑normally distributed)
log_vol = np.log(volume + 1)
outliers_iqr = iqr_outliers(log_vol, k=2.5)

# Method 2: Isolation Forest using multiple features (volume, price change, volatility)
features = pd.DataFrame({
    'volume': volume,
    'return': df['NABIL_return'].loc[volume.index],
    'volatility': df['NABIL_volatility'].loc[volume.index]
}).dropna()
X = features.values

iso = IsolationForest(contamination=0.02, random_state=42)
preds = iso.fit_predict(X)
outliers_if = preds == -1

# Compare
print(f"IQR outliers: {outliers_iqr.sum()}")
print(f"Isolation Forest outliers: {outliers_if.sum()}")

# Check if they coincide
common = (outliers_iqr[outliers_iqr.index.isin(features.index)] & outliers_if)
print(f"Common: {common.sum()}")
```

**Explanation:**  
Volume often follows a log‑normal distribution, so log transform improves IQR. Using multiple features (return and volatility) helps detect contextual anomalies (e.g., high volume on a low‑volatility day).

---

## Chapter Summary

In this chapter, we explored anomaly detection techniques and their application to the NEPSE stock prediction system. We covered:

- The three types of anomalies: point, contextual, and collective.
- Statistical methods: Z‑score, modified Z‑score, IQR – simple and fast for point anomalies.
- Machine learning methods: Isolation Forest, One‑Class SVM, Local Outlier Factor – suitable for multivariate data.
- Deep learning methods: autoencoders, LSTM autoencoders, VAEs – capture complex patterns and temporal dependencies.
- Time‑series specific methods: STL decomposition, Twitter's AnomalyDetection – incorporate seasonality.
- Evaluation metrics and real‑time implementation considerations.
- Interpretation of anomalies using domain knowledge and visualisation.

Anomaly detection is a vital component of a robust prediction system, ensuring data quality, enabling rapid response to market events, and providing input features for models. In the next chapter, we will discuss **Causal Inference**, exploring how to go beyond correlation to understand cause‑effect relationships in financial time‑series.

---

**End of Chapter 58**