
# Autoencoder-Based Anomaly Detection on Scikit-learn Data

**Author**: Ammar Yousuf Abrahani  
**Course/Project**: A novel Deep Q Learning Anomaly Detection 

---

### Description:
This notebook evaluates the performance of a basic feedforward Autoencoder for anomaly detection on a synthetic dataset generated using `make_blobs`. Anomalies are introduced manually via uniformly distributed noise. The Autoencoder is trained only on normal data and attempts to reconstruct inputs. Anomalies are identified based on high reconstruction error (MSE).

---

### References and Source Links:
- [TensorFlow Keras Documentation](https://www.tensorflow.org/api_docs/python/tf/keras)
- [Keras Autoencoder Example (TensorFlow Guide)](https://www.tensorflow.org/tutorials/generative/autoencoder)
- [scikit-learn make_blobs](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_blobs.html)
- [scikit-learn F1-score Metric](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html)
---

> **Disclaimer:** This notebook was created for academic and research purposes. Please cite all libraries and external sources properly. Do not reuse or submit without proper attribution.


In [1]:
# Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs, make_moons
from sklearn.metrics import f1_score
from sklearn.preprocessing import StandardScaler
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.optimizers import Adam

## Step 1: Define Autoencoder Model

In [2]:

# Define helper function to build Autoencoder
def build_autoencoder(input_dim):
    input_layer = Input(shape=(input_dim,))
    encoder = Dense(8, activation="relu")(input_layer)
    decoder = Dense(input_dim, activation="linear")(encoder)
    autoencoder = Model(inputs=input_layer, outputs=decoder)
    autoencoder.compile(optimizer=Adam(learning_rate=0.001), loss='mse')
    return autoencoder


## Step 2: Generate Dataset

In [3]:

# Generate datasets
n_samples = 300
outliers_fraction = 0.15
n_outliers = int(outliers_fraction * n_samples)
n_inliers = n_samples
blobs_params = dict(random_state=0, n_samples=n_inliers, n_features=2)
datasets = [
    make_blobs(centers=[[0, 0], [0, 0]], cluster_std=0.5, **blobs_params)[0],
]


## Step 3: Train Autoencoder and Evaluate

In [4]:

# Evaluate Autoencoder on each dataset
results = []

for i, X in enumerate(datasets):
    rng = np.random.RandomState(42)
    X_outliers = rng.uniform(low=-6, high=6, size=(n_outliers, 2))
    X_full = np.concatenate([X, X_outliers], axis=0)
    y_true = np.concatenate([np.zeros(n_inliers), np.ones(n_outliers)])

    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X_full)

    model = build_autoencoder(input_dim=2)
    model.fit(X_scaled[y_true == 0], X_scaled[y_true == 0],
              epochs=20, batch_size=16, verbose=0)

    reconstructions = model.predict(X_scaled)
    mse = np.mean(np.square(X_scaled - reconstructions), axis=1)
    threshold = np.percentile(mse, 100 * (1 - outliers_fraction))
    y_pred = (mse > threshold).astype(int)

    f1 = f1_score(y_true, y_pred, average='macro')
    acc = np.mean(y_true == y_pred)

    results.append({
        'Model': 'Autoencoder',
        'Macro F1-Score': round(f1, 4),
        'Accuracy': round(acc, 4)
    })


[1m11/11[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m0s[0m 3ms/step 


## Step 4: Print Results

In [5]:

df = pd.DataFrame(results)
print(df.to_string(index=False))


      Model  Macro F1-Score  Accuracy
Autoencoder           0.922    0.9623
