# Lesson 17 - Anomaly Detection


## Objectives
- Fit a Gaussian model for anomaly detection.
- Compute anomaly scores and thresholds.
- Visualize anomalies on a 2D dataset.


## From the notes

**Anomaly detection**
- Fit $p(x)$ to normal data and flag low-probability points.
- Use multivariate Gaussian to model features.

_TODO: Validate anomaly detection formulas in the CS229 main notes PDF._


## Intuition
Anomalies are points that have low probability under the normal data distribution. A Gaussian model provides a simple baseline for this detection.


## Data
We create a 2D dataset with a few injected outliers.


In [None]:
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(42)

X_norm = np.random.multivariate_normal([0, 0], np.eye(2), 200)
X_out = np.random.uniform(low=-6, high=6, size=(10, 2))
X = np.vstack([X_norm, X_out])

mu = X_norm.mean(axis=0)
cov = np.cov(X_norm.T)

def gaussian_pdf(X, mean, cov):
    d = X.shape[1]
    diff = X - mean
    inv = np.linalg.pinv(cov)
    exp = np.exp(-0.5 * np.sum(diff @ inv * diff, axis=1))
    return exp / np.sqrt((2 * np.pi) ** d * np.linalg.det(cov))

scores = gaussian_pdf(X, mu, cov)


## Experiments


In [None]:
threshold = np.percentile(scores, 5)
anomalies = scores < threshold
anomalies.sum()


## Visualizations


In [None]:
plt.figure(figsize=(6,4))
plt.scatter(X[:,0], X[:,1], c=anomalies, cmap="coolwarm", alpha=0.7)
plt.title("Anomaly detection")
plt.xlabel("x1")
plt.ylabel("x2")
plt.show()

plt.figure(figsize=(6,4))
plt.hist(scores, bins=30, alpha=0.7)
plt.axvline(threshold, color="red", linestyle="--")
plt.title("Anomaly score distribution")
plt.xlabel("p(x)")
plt.ylabel("count")
plt.show()


## Takeaways
- A Gaussian density can flag low-probability anomalies.
- Thresholds can be set via percentiles or validation data.


## Explain it in an interview
- Explain how you would choose an anomaly threshold.
- Describe when anomaly detection is preferred over supervised classification.


## Exercises
- Use a diagonal covariance instead of full covariance.
- Inject more subtle anomalies and evaluate detection.
- Implement precision/recall evaluation for anomalies.
