# Lab 9 - Multi Layer Perceptron, MLP

Dominik Gaweł

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/dg7s/Machine-Learning/blob/main/hw/Neural_Network_Dreams.ipynb)
-------------------------------

# **Homework Assignment - *Do Androids Dream of Electric Sheep?***

-------------------------------------  

"Do Androids Dream of Electric Sheep?" – the famous title of Philip K. Dick’s novel – raises a fascinating question: if artificial intelligence could dream, what would it see?  

In this assignment, we explore a phenomenon known as **neural network dreams**, where instead of optimizing a neural network's weights, we **optimize the input itself** to achieve a desired classification outcome. Given a fully trained MNIST classification network, your goal is to manipulate its inputs so that it confidently predicts each digit from 0 to 9, starting from pure noise.  

## **Tasks Description**  

During this class we designed and trained a **MNIST classification neural network**, which takes a **batch of grayscale images** of size **$28 \times 28$** as input and outputs a probability distribution over the 10 digit classes (0–9). However, instead of using real MNIST images, you will **treat the input batch itself as a set of trainable parameters** and optimize it so that the network classifies each image as a specific digit.  

1. Your first task is to generate **a batch of 10 images**, where each image is
   classified as one of the digits **0, 1, 2, ..., 9**, starting from an initial batch of ten random Gaussian noise images.  

   Discuss the following question: do the generated images resemble real MNIST digits? Why or why not?  

2. Discuss, how you would approach a second task of
   generating an image that   
   bares similarity to two or more digits simultaneously. **Implement your idea to see the results.**

3. Third task: repeat the previous tasks with an additional L2 penalty on noise within the images. Experiment with adding `lambda_l2 * dreamed_input_batch.pow(2).mean()` loss term, with `lambda_l2` being the penalty cooefficient within an exponential progression, say from 0.001 to 10.0. Are the new digits recognized correctly? How does the penalty impact the digit quality? Explain.

### **Optimization Process for Task 1**  

1. Start with a **batch of 10 random Gaussian noise images** as the initial input and $(0, 1, 2, \ldots, 9)$ as the expected output batch of target digits.  
2. Define the objective: maximize the neural network's confidence for the corresponding target digit for each image in the batch.  
3. Use **gradient descent** to modify the pixels in each image, making the network classify each one as the assigned digit.  
4. Repeat until the network assigns suffieciently high confidence to each image’s target class.  

### **Implementation Details**  

- The neural network weights **must remain frozen** during optimization. You are modifying only the input images.  
- The loss function should be the **cross-entropy loss** between the predicted probabilities and the desired class labels (plus an optional weighted L2 penalty regularizing the images in task 3).


## **Points to Note**  

1. **Visualize** the optimization process: Save images of the generated inputs at different steps and plot the classification confidence evolution over iterations.  
3. **Document your findings** and explain the behavior you observe.  

## **Task & Deliverables**  

- A **Colab notebook** containing solutions for both tasks:
  - The full implementation.
  - Visualizations of the generated batch of images.
  - A written explanation of your observations.
- **Bonus:** If you create an **animation** showing the evolution of the input images during optimization, it will be considered a strong enhancement to your submission.
  - You can generate an animation programmatically (e.g., using Matplotlib or OpenCV).
  - Or, save image frames and use external tools to create a video.
  - Provide a **link** to any video files in the README.
- Upload your notebook and results to your **GitHub repository** for the course.
- In the **README**, include a **link** to the notebook.
- In the notebook, include **“Open in Colab”** badge so it can be launched directly.




In [None]:
import torch
import torchvision
import matplotlib.pyplot as plt
import torch.nn.functional as F
from matplotlib.animation import FuncAnimation
from IPython.display import HTML
import random

In [None]:
SEED = 42
torch.manual_seed(SEED)
random.seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(SEED)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

In [None]:
transform = torchvision.transforms.Compose(
    [ torchvision.transforms.ToTensor(),
      torchvision.transforms.Normalize((0.1307), (0.3081))])

trainset = torchvision.datasets.MNIST(root='./data',
                                      train=True,
                                      download=True,
                                      transform=transform)

trainloader = torch.utils.data.DataLoader(trainset,
                                          batch_size=2048,
                                          shuffle=True)

100%|██████████| 9.91M/9.91M [00:02<00:00, 4.59MB/s]
100%|██████████| 28.9k/28.9k [00:00<00:00, 134kB/s]
100%|██████████| 1.65M/1.65M [00:01<00:00, 1.09MB/s]
100%|██████████| 4.54k/4.54k [00:00<00:00, 8.81MB/s]


In [None]:
class MLP(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.mlp = torch.nn.Sequential(
            torch.nn.Flatten(),
            torch.nn.Linear(1*28*28, 1024),
            torch.nn.ReLU(),
            torch.nn.Linear(1024, 2048),
            torch.nn.ReLU(),
            torch.nn.Linear(2048, 256),
            torch.nn.ReLU(),
            torch.nn.Linear(256, 10),
        )
        self.dropout = torch.nn.Dropout(0.05)

    def forward(self, x):
        x = self.mlp(x)
        x = self.dropout(x)
        return x

In [None]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Working on {device}")

net = MLP().to(device)
optimizer = torch.optim.Adam(net.parameters(), 0.001)

net.train()
for epoch in range(8):

    loss = 0.0
    for batch, data in enumerate(trainloader):
        batch_inputs, batch_labels = data

        batch_inputs = batch_inputs.to(device)
        batch_labels = batch_labels.to(device)

        optimizer.zero_grad()

        batch_outputs = net(batch_inputs)
        loss = torch.nn.functional.cross_entropy(batch_outputs, batch_labels, reduction = "mean")
        print("epoch:", epoch, "batch:", batch, "current batch loss:", loss.item())
        loss.backward()
        optimizer.step()


Working on cuda
epoch: 0 batch: 0 current batch loss: 2.310203790664673
epoch: 0 batch: 1 current batch loss: 2.0768327713012695
epoch: 0 batch: 2 current batch loss: 1.624901294708252
epoch: 0 batch: 3 current batch loss: 1.1932710409164429
epoch: 0 batch: 4 current batch loss: 1.1437439918518066
epoch: 0 batch: 5 current batch loss: 0.9622821807861328
epoch: 0 batch: 6 current batch loss: 0.9507745504379272
epoch: 0 batch: 7 current batch loss: 0.8162147998809814
epoch: 0 batch: 8 current batch loss: 0.7120282053947449
epoch: 0 batch: 9 current batch loss: 0.6511026620864868
epoch: 0 batch: 10 current batch loss: 0.6360457539558411
epoch: 0 batch: 11 current batch loss: 0.6028648018836975
epoch: 0 batch: 12 current batch loss: 0.5230752825737
epoch: 0 batch: 13 current batch loss: 0.5151314735412598
epoch: 0 batch: 14 current batch loss: 0.4442121982574463
epoch: 0 batch: 15 current batch loss: 0.4606562554836273
epoch: 0 batch: 16 current batch loss: 0.46927446126937866
epoch: 0 bat

### Task 1

In [None]:
net.eval()
dream = torch.randn(10, 1, 28, 28, device=device, requires_grad=True)
targets = torch.arange(10, device=device)
optimizer = torch.optim.Adam([dream], lr=0.1)

fig, axes = plt.subplots(2, 5, figsize=(10, 5))
plt.tight_layout()

frame_images = []
frame_probs = []

for step in range(50):
    optimizer.zero_grad()
    logits = net(dream)
    loss = F.cross_entropy(logits, targets)
    loss.backward()
    optimizer.step()
    dream.data.clamp_(-1, +1)

    with torch.no_grad():
        imgs = dream.detach().cpu() * 0.3081 + 0.1307
        imgs = imgs.clamp(0, 1).numpy()
        probs = F.softmax(net(dream), dim=1).detach().cpu().numpy()

    frame_images.append(imgs)
    frame_probs.append(probs)

imshow_objects = []
text_objects = []
for idx, ax in enumerate(axes.flatten()):
    img = ax.imshow(frame_images[0][idx, 0], cmap='gray')
    txt = ax.set_title(f"{idx}: {frame_probs[0][idx, idx]*100:.1f}%")
    ax.axis('off')
    imshow_objects.append(img)
    text_objects.append(txt)

def update(frame):
    for idx in range(10):
        imshow_objects[idx].set_data(frame_images[frame][idx, 0])
        text_objects[idx].set_text(f"{idx}: {frame_probs[frame][idx, idx]*100:.1f}%")
    return imshow_objects + text_objects

ani = FuncAnimation(fig, update, frames=len(frame_images), interval=100, blit=True)

plt.close()
HTML(ani.to_html5_video())

 The optimized images don’t look like real MNIST samples. Because we used a plain MLP, each pixel is optimized independently. Spatial relationships aren’t preserved, so the global shape of a digit breaks down and only high-confidence noise patterns remain.

----

### Task 2

In [None]:
def generate_hybrid_digit(class1, class2, steps=50, blend_ratio=0.5):
    img = torch.randn(1, 1, 28, 28, device=device, requires_grad=True)
    target = torch.zeros(10, device=device)
    target[class1] = blend_ratio
    target[class2] = 1 - blend_ratio

    optimizer = torch.optim.Adam([img], lr=0.1)
    frames = []
    prob_history = []

    for step in range(steps):
        optimizer.zero_grad()
        logits = net(img)
        probs = F.softmax(logits, dim=1)
        loss = F.kl_div(F.log_softmax(logits, dim=1), target.unsqueeze(0), reduction='batchmean')
        loss.backward()
        optimizer.step()
        img.data.clamp_(-1, +1)

        with torch.no_grad():
            denorm = img.detach().cpu() * 0.3081 + 0.1307
            frames.append(denorm.squeeze().numpy())
            prob_history.append(probs.detach().cpu().numpy())

    return frames, prob_history

hybrid_frames, prob_history = generate_hybrid_digit(3, 5)

fig, ax = plt.subplots()
plt.close()

fig, ax = plt.subplots(figsize=(6, 6))
im = ax.imshow(hybrid_frames[0], cmap='gray')
ax.axis('off')
title = ax.set_title(f'Step 0\nP(3): {prob_history[0][0,3]:.2f}, P(5): {prob_history[0][0,5]:.2f}')

def update_hybrid(frame):
    im.set_array(hybrid_frames[frame])
    title.set_text(f'Step {frame}\nP(3): {prob_history[frame][0,3]:.2f}, P(5): {prob_history[frame][0,5]:.2f}')
    return [im, title]

ani = FuncAnimation(fig, update_hybrid, frames=len(hybrid_frames), interval=100, blit=True)

plt.close()
HTML(ani.to_html5_video())

We optimized the inputs so the network’s output splits 50 % for one target digit and 50 % for the other by averaging their cross-entropy losses. The resulting images are classified at about 48 % vs. 52 %, confirming the method works.

---


### Task 3

In [None]:
initial_dream = torch.randn(10, 1, 28, 28, device=device)

def dream_with_l2_animated(initial_dream, lambda_l2, steps=200):
    dream = initial_dream.clone().detach().requires_grad_(True)
    net.eval()
    targets = torch.arange(10, device=device)

    optimizer = torch.optim.Adam([dream], lr=0.1)
    frames = []
    confidence_history = []

    for step in range(steps):
        optimizer.zero_grad()
        logits = net(dream)
        ce_loss = F.cross_entropy(logits, targets)
        l2_loss = lambda_l2 * dream.pow(2).mean()
        total_loss = ce_loss + l2_loss

        total_loss.backward()
        optimizer.step()
        dream.data.clamp_(-1, +1)

        if step % 10 == 0 or step == steps-1:
            with torch.no_grad():
                denorm = dream.detach().cpu() * 0.3081 + 0.1307
                denorm = denorm.clamp(0, 1).numpy()
                probs = F.softmax(net(dream), dim=1)
                confidences = probs[range(10), targets].detach().cpu().numpy()

            frames.append(denorm)
            confidence_history.append(confidences)

    fig, axes = plt.subplots(2, 5, figsize=(12, 5))
    fig.suptitle(f"λ = {lambda_l2}", fontsize=14, y=0.95)
    plt.subplots_adjust(top=0.85, hspace=0.3, wspace=0.3)
    plt.close()

    ims = []
    for i, ax in enumerate(axes.flatten()):
        im = ax.imshow(frames[0][i,0], cmap='gray', animated=True)
        ax.axis('off')
        ax.set_title(f"{i}\n", fontsize=10)
        ims.append(im)

    def update(frame):
        for i, im in enumerate(ims):
            im.set_array(frames[frame][i,0])
            axes.flatten()[i].set_title(f"{i}\n{confidence_history[frame][i]:.1%}", fontsize=10)
        return ims

    return FuncAnimation(fig, update, frames=len(frames), interval=100, blit=True)

lambdas = [0.001, 0.01, 0.1, 1.0, 10.0]

for l in lambdas:
    ani = dream_with_l2_animated(initial_dream, l)
    display(HTML(ani.to_html5_video()))
    plt.close()

As  $\lambda$ increases, the images become progressively smoother but also more blurred. $L_2$ penalizes large pixel values equally, so the optimizer is forced to use only darker tones. All generated digits are still classified correctly, however, at $\lambda=10$ the network’s confidence never reaches $100\%$, instead stabilizing at about $98\%$. Only at $\lambda=10$ do the outputs for digits $0$ and $5$, despite their blurriness, actually resemble recognizable MNIST digits.
