# Building a simple auto-encoder

In this notebook I'm trying to create a simple neural nets that act as an auto-encoder - compressing data like images and then decompressing them at inference.

In [2]:
import torch
from torch import nn, optim
from torchvision import datasets, transforms
from torch.utils.data import DataLoader

In [3]:
#data prep work

transform = transforms.ToTensor()
train_dataset = datasets.MNIST(root='./data', train=True, transform=transform, download=True)
train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)

100%|██████████| 9.91M/9.91M [00:00<00:00, 16.1MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 491kB/s]
100%|██████████| 1.65M/1.65M [00:00<00:00, 4.49MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 7.29MB/s]


In [10]:
#model prep

class AutoEncoder(nn.Module):

  def __init__(self) -> None:
    super().__init__()

    self.encoder = nn.Sequential(
        nn.Linear(28*28, 128),
        nn.ReLU(),
        nn.Linear(128,32)
    )

    self.decoder = nn.Sequential(
        nn.Linear(32, 128),
        nn.ReLU(),
        nn.Linear(128, 28*28),
        nn.Sigmoid()
    )

  def forward(self,x):
    x = x.view(x.size(0),-1) #reshape the 4d tensor mnsit input to flat vector for encoder

    #The .view() function is PyTorch’s way to reshape tensors.
    #x.size(0) is the batch size — here, 64. So this line says:
    #“Keep the batch size dimension the same, but flatten everything else into one long vector.”

    #That -1 is a neat PyTorch trick telling it to figure out the size automatically
    #for that dimension based on the total number of elements and other given dimensions.

    z = self.encoder(x)
    out = self.decoder(z)
    out = out.view(x.size(0),1,28,28) #batch, channels, height, width
    return out


In [11]:
model = AutoEncoder()
criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(),lr=0.001)

In [13]:
#train loop

epochs = 5

for epoch in range(epochs):
  running_loss = 0.0

  for imgs, _ in train_loader:
    optimizer.zero_grad()
    outputs = model(imgs)
    loss = criterion(outputs,imgs)
    loss.backward()
    optimizer.step()

    running_loss+= loss.item()

  avg_loss = running_loss / len(train_loader)

  print(f"Epoch {epoch+1}/{epochs}, Loss: {avg_loss:.4f}")

Epoch 1/5, Loss: 0.0366
Epoch 2/5, Loss: 0.0156
Epoch 3/5, Loss: 0.0117
Epoch 4/5, Loss: 0.0101
Epoch 5/5, Loss: 0.0091


## Some quick notes:

- __Why ReLU:__ ReLU is the go-to activation because it’s simple, fast, and helps the network learn complex stuff without suffering from vanishing gradients like sigmoid or tanh can. It basically says, “If the input is positive, keep it; if not, zero it out.” This sparsifies activations and speeds up training, which is handy in those hidden layers.

- __Why sigmoid at the end of decoder:__ The sigmoid at the end is because MNIST images are normalized between 0 and 1 (grayscale pixel intensities). Sigmoid squashes outputs to exactly that range, so the decoder’s output can be compared directly to the original input pixels. Without it, the network might output weird values outside that range, messing up the reconstruction loss.

- __Why Adam:__ Adam optimizer is like the Swiss Army knife of optimizers — it adapts learning rates on the fly for each parameter, combining the benefits of momentum and RMSProp. This makes training faster and more stable, especially for smaller networks like our autoencoder. Plus, it’s just widely used and reliable in practice.

- The MNSIT dataset has a 4D Tensor where the 4 dimensions are:
(__batch_size__, __channels__, __height__, __width__)