The AI Build-from-Scratch Challenges 💡
1. Designing AI-Powered Emissions Anomaly Detection 📈
Scenario: We need a real-time pipeline to detect anomalies in diverse emissions data streams, considering seasonality and noise.

Tasks:

Architect & Build: Design an end-to-end pipeline. Implement a basic LSTM Autoencoder in PyTorch or TensorFlow. The provided snippet has a flawed structure. Fix the model architecture (ensure input/output shapes match, layers are appropriate) and write a basic training loop (using simulated data) that calculates reconstruction error.
Refine: Explain your choices and how you'd handle preprocessing (scaling, detrending) and feature extraction for real-world deployment.
Code Sample (PyTorch LSTM AE - Needs Fixing!):


In [8]:
import torch
import torch.nn as nn

class LSTM_AE(nn.Module):
    def __init__(self, input_dim=5, hidden_dim=32, latent_dim=16, num_layers=1):
        super(LSTM_AE, self).__init__()
        
        self.encoder = nn.LSTM(input_dim, hidden_dim, num_layers, batch_first=True)
        
        self.fc_enc = nn.Linear(hidden_dim, latent_dim)
        
        self.fc_dec = nn.Linear(latent_dim, hidden_dim)
        
        self.decoder = nn.LSTM(hidden_dim, input_dim, num_layers, batch_first=True)

    
    def forward(self, x):
        seq_len = x.size(1) # original sequence length

        # 1 Encoder: we process the input x → hidden states
        _, (hidden, _) = self.encoder(x)

        # 2 Latent space: take last layer’s hidden state and project into latent_dim
        latent = self.fc_enc(hidden[-1])  # → (batch, latent_dim)
        
        # 3. Decoder prep: expand latent back, add time axis, repeat
        decoder_hidden = self.fc_dec(latent)
        decoder_input = decoder_hidden.unsqueeze(1).repeat(1, seq_len, 1)
        
        # 4 Decoder: we decode to reconstruct the sequence from repeated latent representations
        reconstructed, _ = self.decoder(decoder_input)  # → (batch, seq_len, input_dim)
        
        return reconstructed

The encoder compresses the input sequence into a hidden representation.

I made a fully connected layer to compress  the final hidden state into a latent vector (bottleneck).

then a fully conntected layer to expand it back to the decoder’s hidden dimension.

finally,  the decoder reconstructs the sequence from  a repeated version of this expanded latent vector.


for problem3, the *final* hidden state is got from last_hidden = hidden[-1]. This last_hidden tensor, with shape (batch_size, hidden_dim) is then passed through a linear layer to form the latent bottleneck : latent = self.fc_enc(last_hidden) 

for problem4, The decoder expects a full sequence of inputs, not just one vector. We convert our latent vector into a “fake” sequence by expanding the latent vector back to the decoder's hidden size (step3) andthen repeat it across all steps and later feed it into the decoder.

PROBLEM 5: Needs a training loop
  1) Generate synthetic emissions-like data
  2) Wrap it in a DataLoader
  3) Initialize model, MSE loss, and optimizer
  4) For each epoch:
       a) Forward pass → reconstruction
       b) Compute MSE loss between recon and input
       c) Backward + optimizer.step()
       d) Optionally print/log average loss

Our Take: LSTM Autoencoders require both a proper architecture (encoder, bottleneck, decoder) and a straightforward training loop that measures reconstruction error.


In [9]:
# Training setup:
from torch.utils.data import DataLoader, TensorDataset

In [10]:
# 1) Generate data
def generate_data(n_seq=1000, seq_len=50, feat=5):
    t = torch.linspace(0, 2*torch.pi, seq_len)
    data = torch.sin(t).unsqueeze(-1).repeat(1, feat)  # shape (seq_len, feat)
    data = data + 0.1 * torch.randn(n_seq, seq_len, feat)
    return data

In [11]:
dataset = TensorDataset(generate_data())
loader  = DataLoader(dataset, batch_size=32, shuffle=True)

In [12]:

# 2) Model, loss, optimizer
learning_rate = 0.001
model     = LSTM_AE(input_dim=5, hidden_dim=32, latent_dim=16)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)


In [13]:

# 3) Training loop
num_epochs = 50 
for epoch in range(num_epochs):
    total_loss = 0.0
    for x_batch, in loader:
        recon = model(x_batch)
        loss  = criterion(recon, x_batch)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        total_loss += loss.item()

    avg_loss = total_loss / len(loader)
    print(f"Epoch {epoch:2d} — Avg Recon MSE: {avg_loss:.4f}")
   

Epoch  0 — Avg Recon MSE: 0.5060
Epoch  1 — Avg Recon MSE: 0.4999
Epoch  2 — Avg Recon MSE: 0.4997
Epoch  3 — Avg Recon MSE: 0.4998
Epoch  4 — Avg Recon MSE: 0.4997
Epoch  5 — Avg Recon MSE: 0.4999
Epoch  6 — Avg Recon MSE: 0.4999
Epoch  7 — Avg Recon MSE: 0.4999
Epoch  8 — Avg Recon MSE: 0.4998
Epoch  9 — Avg Recon MSE: 0.4997
Epoch 10 — Avg Recon MSE: 0.4998
Epoch 11 — Avg Recon MSE: 0.4997
Epoch 12 — Avg Recon MSE: 0.4999
Epoch 13 — Avg Recon MSE: 0.4997
Epoch 14 — Avg Recon MSE: 0.4997
Epoch 15 — Avg Recon MSE: 0.4998
Epoch 16 — Avg Recon MSE: 0.4998
Epoch 17 — Avg Recon MSE: 0.4999
Epoch 18 — Avg Recon MSE: 0.4998
Epoch 19 — Avg Recon MSE: 0.4998
Epoch 20 — Avg Recon MSE: 0.5000
Epoch 21 — Avg Recon MSE: 0.4997
Epoch 22 — Avg Recon MSE: 0.4997
Epoch 23 — Avg Recon MSE: 0.4998
Epoch 24 — Avg Recon MSE: 0.4998
Epoch 25 — Avg Recon MSE: 0.4998
Epoch 26 — Avg Recon MSE: 0.4998
Epoch 27 — Avg Recon MSE: 0.4998
Epoch 28 — Avg Recon MSE: 0.4998
Epoch 29 — Avg Recon MSE: 0.4998
Epoch 30 —

In [14]:
# 4) Save the model
torch.save(model.state_dict(), 'lstm_ae.pth')