# Autoencoder: Advanced Tutorial

In this notebook, we build and analyze autoencoders using real data.
We cover:
- Theory: what are autoencoders?
- Dense autoencoder with Keras
- Dimensionality reduction vs PCA
- Image reconstruction
- Anomaly detection with autoencoders


## 1. Import Required Libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import tensorflow as tf
from tensorflow.keras import layers, models
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA

sns.set(style="whitegrid")


## 2. What is an Autoencoder?

An **Autoencoder (AE)** is a type of neural network used to learn compressed representations of data (encoding), typically for the purposes of:
- Dimensionality reduction
- Denoising
- Anomaly detection

It consists of two parts:
- **Encoder**: Compresses the input
- **Decoder**: Reconstructs the input from compressed form

The model is trained to minimize reconstruction error (e.g., MSE loss).


## 3. Load and Preprocess Data

In [None]:
digits = load_digits()
X = digits.data / 16.0  # normalize
y = digits.target

X_train, X_test, _, _ = train_test_split(X, y, test_size=0.3, random_state=42)

print("Input shape:", X_train.shape)


## 4. Build a Dense Autoencoder with Keras

In [None]:
input_dim = X_train.shape[1]
encoding_dim = 16

# Encoder
input_layer = layers.Input(shape=(input_dim,))
encoded = layers.Dense(64, activation='relu')(input_layer)
encoded = layers.Dense(encoding_dim, activation='relu')(encoded)

# Decoder
decoded = layers.Dense(64, activation='relu')(encoded)
decoded = layers.Dense(input_dim, activation='sigmoid')(decoded)

# Autoencoder model
autoencoder = models.Model(inputs=input_layer, outputs=decoded)
autoencoder.compile(optimizer='adam', loss='mse')

autoencoder.summary()


## 5. Train the Autoencoder

In [None]:
history = autoencoder.fit(X_train, X_train,
                          epochs=50,
                          batch_size=32,
                          shuffle=True,
                          validation_data=(X_test, X_test))


## 6. Plot Loss Curve

In [None]:
plt.plot(history.history['loss'], label='Train Loss')
plt.plot(history.history['val_loss'], label='Validation Loss')
plt.title("Training Loss Curve")
plt.xlabel("Epoch")
plt.ylabel("MSE Loss")
plt.legend()
plt.show()


## 7. Compare PCA vs Autoencoder Embedding

In [None]:
encoder = models.Model(inputs=input_layer, outputs=encoded)
X_encoded = encoder.predict(X_test)

pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_test)

plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.scatter(X_encoded[:, 0], X_encoded[:, 1], c=y[:len(X_encoded)], cmap='tab10', s=15)
plt.title("Autoencoder Embedding (2D)")

plt.subplot(1, 2, 2)
plt.scatter(X_pca[:, 0], X_pca[:, 1], c=y[:len(X_pca)], cmap='tab10', s=15)
plt.title("PCA Embedding (2D)")

plt.show()


## 8. Visualize Reconstruction Quality

In [None]:
X_test_decoded = autoencoder.predict(X_test)

n = 10
plt.figure(figsize=(20, 4))
for i in range(n):
    # original
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(X_test[i].reshape(8, 8), cmap="gray")
    plt.axis("off")
    
    # reconstruction
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(X_test_decoded[i].reshape(8, 8), cmap="gray")
    plt.axis("off")

plt.suptitle("Original vs Reconstructed Images")
plt.show()


## 9. Anomaly Detection: Inject Noise

In [None]:
# Create a noisy version of X_test
X_noisy = X_test.copy()
X_noisy[:10] += np.random.normal(0, 1.5, X_noisy[:10].shape)

# Predict and measure error
X_decoded = autoencoder.predict(X_noisy)
mse = np.mean((X_noisy - X_decoded) ** 2, axis=1)

plt.hist(mse, bins=30)
plt.title("Reconstruction Error (MSE)")
plt.xlabel("Error")
plt.ylabel("Frequency")
plt.axvline(np.mean(mse[:10]), color='red', linestyle='--', label="Anomalies")
plt.legend()
plt.show()


## 10. Summary

- Autoencoders compress and reconstruct data
- Great for dimensionality reduction and anomaly detection
- Can be extended to convolutional and variational autoencoders
- Compare embeddings with PCA, visualize reconstructions, and track reconstruction error