# Autoencoder for Anomaly Detection in Energy Systems

This notebook demonstrates the implementation of a denoising autoencoder for anomaly detection in energy system data. Autoencoders are powerful neural networks that can learn to compress and reconstruct data, making them excellent tools for identifying anomalies and denoising signals.

## Introduction to Autoencoders

An **autoencoder** is a type of neural network designed to learn efficient representations of data by compressing it into a lower-dimensional space (encoding) and then reconstructing it back to the original dimensions (decoding).

### Key Components:
- **Encoder**: Compresses input data into a latent representation
- **Bottleneck/Latent Space**: Compressed representation of the input
- **Decoder**: Reconstructs the original data from the latent representation

### Applications in Energy Systems:
- **Anomaly Detection**: Identifying unusual patterns in power consumption, equipment failures
- **Signal Denoising**: Cleaning sensor data from electrical noise and interference
- **Predictive Maintenance**: Detecting early signs of equipment degradation
- **Data Compression**: Efficient storage and transmission of sensor data
- **Quality Control**: Identifying faulty measurements or sensor malfunctions

## Problem Setup: Denoising Energy System Signals

**Objective**: Train an autoencoder to remove noise from complex energy system signals while preserving important patterns.

**Challenge**: Energy systems produce complex, multi-frequency signals that can be corrupted by:
- Electrical interference
- Sensor noise
- Environmental factors
- Equipment vibrations
- Communication errors

## 1. Import Required Libraries

We'll use the following libraries:
- **PyTorch**: Deep learning framework for building and training the autoencoder
- **NumPy**: Numerical operations and data generation
- **Matplotlib**: Visualization of results
- **scikit-learn**: Data splitting utilities

PyTorch is particularly well-suited for this task because:
- Automatic differentiation for gradient computation
- Flexible neural network building blocks
- GPU acceleration capabilities
- Extensive optimization algorithms

In [None]:
import torch
import torch.nn as nn
import torch.nn.init as init
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

## 2. Set Random Seeds for Reproducibility

Setting random seeds ensures that:
- **Consistent Results**: Same output across different runs
- **Debugging**: Easier to identify and fix issues
- **Comparison**: Fair evaluation of different approaches
- **Research**: Reproducible experiments for scientific validity

This is crucial for machine learning experiments where randomness affects:
- Weight initialization
- Data shuffling
- Dropout layers
- Data generation

In [None]:
# Set random seeds for reproducibility
torch.manual_seed(50)
np.random.seed(50)    

## 3. Generate Complex Synthetic Energy System Data

Since real energy system data may be proprietary or sensitive, we'll create realistic synthetic data that mimics common characteristics of energy system signals.

### Signal Characteristics:
Our synthetic data simulates typical energy system signals with:

1. **Multi-frequency Components**:
   - **Fundamental frequency**: Main sinusoidal component (50/60 Hz power systems)
   - **Harmonics**: Higher frequency components (2x, 4x fundamental)
   - **Phase variations**: Random phase shifts simulating different operating conditions

2. **Complex Noise Model**:
   - **Gaussian noise**: Random measurement uncertainty
   - **Spike noise**: Sudden electrical disturbances or switching events
   - **Uniform noise**: Background electrical interference

### Why This Approach?
- **Realistic**: Captures real-world signal complexity
- **Controlled**: We know the ground truth for evaluation
- **Scalable**: Can easily modify parameters for different scenarios
- **Educational**: Demonstrates how autoencoders handle various noise types

In [None]:
# Step 1: Generate Complex Synthetic Dataset
def generate_complex_sequence(length=50, num_sequences=1000, noise_factor=0.5):
    x = np.linspace(0, 4 * np.pi, length)
    
    # Create complex clean signal by combining sinusoids of different frequencies
    clean_sequences = np.array([
        np.sin(x + np.random.uniform(0, 2 * np.pi)) + 
        0.5 * np.sin(2 * x + np.random.uniform(0, 2 * np.pi)) + 
        0.25 * np.sin(4 * x + np.random.uniform(0, 2 * np.pi)) 
        for _ in range(num_sequences)
    ])
    
    # Add complex noise: Gaussian noise + occasional spikes + uniform noise
    gaussian_noise = noise_factor * np.random.normal(size=clean_sequences.shape)
    spike_noise = np.random.choice([0, 1], size=clean_sequences.shape, p=[0.98, 0.02]) * np.random.uniform(-3, 3, size=clean_sequences.shape)
    uniform_noise = noise_factor * np.random.uniform(-1, 1, size=clean_sequences.shape)
    
    noisy_sequences = clean_sequences + gaussian_noise + spike_noise + uniform_noise
    return torch.tensor(noisy_sequences, dtype=torch.float32), torch.tensor(clean_sequences, dtype=torch.float32)


## 4. Data Preparation and Train-Test Split

### Dataset Creation:
- **1000 sequences** of length 50 time steps each
- **Training set**: 80% of the data (800 sequences)
- **Test set**: 20% of the data (200 sequences)

### Training Strategy:
The autoencoder will be trained to:
- **Input**: Noisy sequences (what sensors actually measure)
- **Target**: Clean sequences (ideal signal without noise)
- **Learn**: Mapping from noisy → clean (denoising function)

This supervised learning approach allows the autoencoder to learn robust features that distinguish between signal and noise.

In [None]:
# Generate data
noisy_data, clean_data = generate_complex_sequence()
train_noisy, test_noisy, train_clean, test_clean = train_test_split(noisy_data, clean_data, test_size=0.2)

## 5. Autoencoder Architecture Design

### Network Structure:
Our denoising autoencoder follows a symmetric encoder-decoder architecture:

**Encoder Path:**
1. **Input Layer**: 50 time steps → Hidden Layer (64 neurons)
2. **Hidden Layer**: 64 neurons → Bottleneck (32 neurons)
3. **Activation**: ReLU (Rectified Linear Unit) for non-linearity

**Decoder Path:**
1. **Bottleneck**: 32 neurons → Hidden Layer (64 neurons)
2. **Hidden Layer**: 64 neurons → Output Layer (50 time steps)
3. **Final Output**: Reconstructed/denoised signal

### Key Design Decisions:

**Bottleneck Size (32)**: 
- Smaller than input (50) forces compression
- Large enough to retain important information
- Creates information bottleneck that filters out noise

**ReLU Activation**:
- Introduces non-linearity for complex pattern learning
- Computationally efficient
- Helps with gradient flow during training

**Xavier Initialization**:
- Prevents vanishing/exploding gradients
- Ensures proper weight scaling
- Improves training stability and convergence

In [None]:
# Step 2: Define the Autoencoder Model
# Step 2: Define the Autoencoder Model with Initialization
class DenoisingAutoencoder(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(DenoisingAutoencoder, self).__init__()
        self.encoder = nn.Sequential(
            nn.Linear(input_size, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, hidden_size // 2),
            nn.ReLU()
        )
        self.decoder = nn.Sequential(
            nn.Linear(hidden_size // 2, hidden_size),
            nn.ReLU(),
            nn.Linear(hidden_size, input_size)
        )
        
        # Initialize weights
        self.apply(self._init_weights)

    def forward(self, x):
        encoded = self.encoder(x)
        decoded = self.decoder(encoded)
        return decoded

    def _init_weights(self, module):
        if isinstance(module, nn.Linear):
            init.xavier_uniform_(module.weight)  # Xavier initialization for weights
            if module.bias is not None:
                init.zeros_(module.bias)  # Initialize biases to zero

## 6. Model Configuration and Training Setup

### Loss Function: Mean Squared Error (MSE)
$$\text{MSE} = \frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2$$

**Why MSE?**
- **Pixel-wise comparison**: Measures point-by-point reconstruction accuracy
- **Smooth gradients**: Provides stable training signals
- **Noise sensitivity**: Penalizes deviations from clean signal
- **Standard choice**: Well-established for regression and reconstruction tasks

### Optimizer: Adam
**Advantages of Adam:**
- **Adaptive learning rates**: Automatically adjusts step sizes for each parameter
- **Momentum**: Helps escape local minima and accelerates convergence
- **Bias correction**: Accounts for initialization bias in early training
- **Robust**: Works well across different problem types

### Learning Rate: 0.001
- **Conservative choice**: Prevents overshooting optimal solutions
- **Stable training**: Reduces risk of divergence
- **Fine-tuning**: Can be adjusted based on training progress

In [None]:
# Model, loss function, and optimizer
input_size = train_noisy.shape[1]
hidden_size = 64
model = DenoisingAutoencoder(input_size, hidden_size)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

## 7. Training Process

### Training Loop Mechanics:

**Batch Processing:**
- **Batch size**: 32 sequences processed simultaneously
- **Memory efficiency**: Balances memory usage and gradient estimation
- **Parallel computation**: Leverages vectorized operations

**Training Steps:**
1. **Forward Pass**: Input noisy sequences → Model → Reconstructed sequences
2. **Loss Calculation**: Compare reconstructed vs. clean sequences (MSE)
3. **Backward Pass**: Compute gradients via backpropagation
4. **Parameter Update**: Adjust weights using Adam optimizer
5. **Repeat**: Continue for all batches in each epoch

### Training Dynamics:
- **1000 epochs**: Sufficient iterations for convergence
- **Progress monitoring**: Loss printed every 10 epochs
- **Expected behavior**: Loss should decrease and stabilize

### What the Model Learns:
- **Feature extraction**: Identifies important signal characteristics
- **Noise patterns**: Distinguishes between signal and noise
- **Reconstruction mapping**: Learns inverse transformation from compressed representation
- **Generalization**: Applies learned patterns to unseen data

In [None]:
# Step 3: Train the Autoencoder
num_epochs = 1000
batch_size = 32

for epoch in range(num_epochs):
    for i in range(0, len(train_noisy), batch_size):
        batch_noisy = train_noisy[i:i+batch_size]
        batch_clean = train_clean[i:i+batch_size]
        
        # Forward pass
        outputs = model(batch_noisy)
        loss = criterion(outputs, batch_clean)
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
    
    if (epoch+1) % 10 == 0:
        print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}')


## 8. Results Visualization and Analysis

This visualization allows us to evaluate the autoencoder's denoising performance across different test sequences.

### Plot Interpretation:

**Three Signal Types Displayed:**
1. **Abnormal noisy Input (Blue)**: Original noisy measurements from sensors
2. **Clean Sequence (Orange)**: Ground truth signal (ideal, noise-free)
3. **Denoised Output (Green, dashed)**: Autoencoder's reconstruction

### Performance Metrics to Observe:

**Visual Assessment:**
- **Signal fidelity**: How well does the denoised output match the clean signal?
- **Noise reduction**: Is random noise effectively suppressed?
- **Feature preservation**: Are important signal characteristics maintained?
- **Artifact introduction**: Does the model introduce any unwanted distortions?

**Expected Results:**
- Denoised output should closely follow the clean signal
- Random noise should be significantly reduced
- Important signal features (peaks, trends) should be preserved
- Smooth reconstruction without high-frequency artifacts

### Real-World Implications:
- **Equipment monitoring**: Cleaner signals enable better fault detection
- **Control systems**: Reduced noise improves control accuracy
- **Data analysis**: Cleaner data leads to more reliable insights
- **Predictive maintenance**: Early anomaly detection becomes more sensitive

In [None]:
# Step 4: Visualize Results
def visualize_results(model, noisy_data, clean_data):
    model.eval()
    with torch.no_grad():
        test_outputs = model(noisy_data)
    
    fig, axs = plt.subplots(3, 1, figsize=(10, 8), sharex=True)
    for i in range(3):
        axs[i].plot(noisy_data[i].numpy(), label='Abnormal noisy Input')
        axs[i].plot(clean_data[i].numpy(), label='Clean Sequence')
        axs[i].plot(test_outputs[i].numpy(), label='Denoised Output', linestyle='dashed')
        axs[i].legend()
        axs[i].set_title(f'Sequence {i+1}')
    
    plt.xlabel('Time Step')
    plt.show()

# Visualize on test data
visualize_results(model, test_noisy, test_clean)

## 9. Key Insights and Anomaly Detection Applications

### How Autoencoders Enable Anomaly Detection:

**Reconstruction Error as Anomaly Indicator:**
$$\text{Anomaly Score} = \|\text{Input} - \text{Reconstructed}\|_2^2$$

- **Normal data**: Low reconstruction error (model has seen similar patterns)
- **Anomalous data**: High reconstruction error (model struggles to reconstruct unfamiliar patterns)

### Energy System Applications:

**1. Equipment Health Monitoring:**
- **Motor vibration analysis**: Detect bearing wear, misalignment
- **Transformer monitoring**: Identify insulation degradation, hot spots
- **Power quality assessment**: Detect harmonics, voltage fluctuations

**2. Predictive Maintenance:**
- **Early fault detection**: Identify subtle changes before failures
- **Maintenance scheduling**: Optimize based on actual equipment condition
- **Cost reduction**: Prevent unexpected downtime and catastrophic failures

**3. Grid Stability Analysis:**
- **Load pattern anomalies**: Detect unusual consumption patterns
- **Frequency deviations**: Monitor grid stability indicators
- **Cybersecurity**: Identify potential cyber attacks on grid infrastructure

### Advantages of Autoencoder Approach:

**Unsupervised Learning:**
- No need for labeled anomaly data
- Learns normal patterns automatically
- Adapts to changing operational conditions

**Feature Learning:**
- Automatically extracts relevant features
- Handles high-dimensional data
- Captures complex non-linear patterns

**Scalability:**
- Handles multiple sensors simultaneously
- Processes real-time streaming data
- Scales to large energy systems

In [None]:
## 10. Exercises and Advanced Topics

### Try These Modifications:

**1. Architecture Experiments:**
- **Deeper networks**: Add more layers to encoder/decoder
- **Different activations**: Try Tanh, LeakyReLU, or Swish
- **Regularization**: Add dropout layers or weight decay
- **Attention mechanisms**: Implement attention-based autoencoders

**2. Data Variations:**
- **Different noise types**: Try correlated noise, periodic interference
- **Variable sequence lengths**: Handle time series of different durations
- **Multi-channel data**: Simulate multiple sensor inputs
- **Real-world data**: Apply to actual energy system datasets

**3. Training Improvements:**
- **Learning rate scheduling**: Implement decay or cyclic schedules
- **Early stopping**: Monitor validation loss to prevent overfitting
- **Data augmentation**: Add noise variations during training
- **Transfer learning**: Pre-train on one system, fine-tune on another

**4. Anomaly Detection Extensions:**
- **Threshold selection**: Develop statistical methods for anomaly thresholds
- **Online detection**: Implement real-time anomaly monitoring
- **Multivariate analysis**: Handle multiple correlated signals
- **Temporal patterns**: Detect anomalous sequences rather than individual points

### Advanced Autoencoder Variants:

**Variational Autoencoders (VAE):**
- Probabilistic approach with explicit uncertainty modeling
- Better for generating new samples and handling uncertainty

**Convolutional Autoencoders:**
- Excellent for spatial data (images, 2D sensor arrays)
- Preserve spatial relationships in energy system layouts

**Recurrent Autoencoders (LSTM/GRU):**
- Handle sequential dependencies in time series
- Better for long-term temporal patterns

**Transformer-based Autoencoders:**
- State-of-the-art for sequence modeling
- Handle long-range dependencies effectively

### Performance Evaluation Metrics:

**Reconstruction Quality:**
- **MSE/RMSE**: Root mean squared error
- **MAE**: Mean absolute error
- **SSIM**: Structural similarity index
- **Correlation coefficient**: Linear relationship preservation

**Anomaly Detection Performance:**
- **Precision/Recall**: Balance between false positives and detection rate
- **F1-Score**: Harmonic mean of precision and recall
- **AUC-ROC**: Area under receiver operating characteristic curve
- **Detection delay**: Time between anomaly occurrence and detection

### Real-World Implementation Considerations:

**Data Preprocessing:**
- Normalization strategies for different signal ranges
- Handling missing data and sensor failures
- Time synchronization across multiple sensors

**Computational Efficiency:**
- Model compression for edge deployment
- Quantization for faster inference
- Batch processing for throughput optimization

**Robustness:**
- Handling concept drift in operational conditions
- Graceful degradation with sensor failures
- Cybersecurity considerations for industrial deployment