<a href="https://colab.research.google.com/github/MihaiDogariu/CV3/blob/main/laborator/CV%203%20-%20Lab%20%238.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Autoencoder
În acest laborator vom studia modul de funcționare al autoencoder-ului și vom implementa un caz particular al acestuia: autoencoder-ului de eliminare a zgomotului (denoising autoencoder). Autoencoder-ul presupune transformarea imaginilor de intrare intr-un cod de dimensiune mai mica (undercomplete autoencoder) sau de dimensiune mai mare (overcomplete autoencoder) decat cea a imaginii de intrare.

Structura unui autoencoder este cea din Figura 1.
<div>
  <center>
    <img src="https://drive.google.com/uc?export=view&id=13B18HaVSoVULk5uMVA4Hv9cOWyneXNV8" width="300" class="center">
    <p>Figura 1. Structura generală a unui autoencoder.</p>
  </center>
</div>

Autoencoder-ul este format din 2 subansamble:
- encoder - transformă imaginea de intrare într-un descriptor latent: $h=f(x)$
- decoder - transformă descriptorul latent într-o imagine: $r=g(h)=g(f(x))$. Ideal, $g(x) = f^{-1}(x)$, astfel încât $r=x$.

Pentru a evalua reconstrucția autoencoder-ului se folosește eroarea pătratică medie:
$MSE = \frac{1}{N}\sum_{i=1}^{N}{(x_i-r_i)^2}$

Un caz particular al autoencoder-ului este autoencoder-ul de eliminare a zgomotului, cu structura din Figura 2.
<div>
  <center>
    <img src="https://drive.google.com/uc?export=view&id=1_Cx7qkHIA2K-x6Zx6p-wjB-ge11KaM5D" width="300" class="center">
    <p>Figura 2. Structura generală a unui autoencoder de eliminare a zgomotului.</p>
  </center>
</div>

În acest caz se adaugă o "corupere" a imaginii de intrare, i.e. se adaugă zgomot peste imaginea de intrare și autoencoder-ul este forțat să învețe să reproducă imaginea fără zgomot. În acest caz:
- intrarea cu zgomot: $\tilde{x}=x+n$, unde $n$ este zgomot alb Gaussian
- encoder: $h=f(\tilde{x})$
- decoder: $r=g(h)=g(f(\tilde{x}))=g(f(x+n))\approx{x}$

Pentru a evalua reconstrucția autoencoder-ului se folosește eroarea pătratică medie, la fel ca în cazul autoencoder-ului original.

#TODO:
1. găsiți o valoare acceptabilă a hiperparametrilor astfel încât reconstrucția imaginilor să fie acceptabilă;
2. scrieți funcția de adăugare a zgomotului alb Gaussian;
3. înlocuiți encoder-ul și decoder-ul cu niște arhitecturi complet conectate (fully connected layers).

In [None]:
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

import torch
from torch import nn, optim
import torch.nn.functional as F
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

In [None]:
#transformare pentru normalizarea datelor de intrare
transform = transforms.Compose([transforms.ToTensor(),
                                transforms.Normalize((0.5,), (0.5,))])

# descarcarea bazei de date de train
trainset = datasets.FashionMNIST('./data', download=True, train=True, transform=transform)
train_loader = DataLoader(trainset, batch_size=64, shuffle=True)

# descarcarea bazei de date de test
validationset = datasets.FashionMNIST('./data', download=True, train=False, transform=transform)
val_loader = DataLoader(validationset, batch_size=64, shuffle=True)

In [None]:
# hiperparametri
latent_dims = 10
num_epochs = 3
batch_size = 128
capacity = 64
learning_rate = 1e-2
use_gpu = True

In [None]:
class Encoder(nn.Module):
    def __init__(self):
        super(Encoder, self).__init__()
        c = capacity
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=c, kernel_size=4, stride=2, padding=1)
        self.conv2 = nn.Conv2d(in_channels=c, out_channels=c*2, kernel_size=4, stride=2, padding=1)
        self.fc = nn.Linear(in_features=c*2*7*7, out_features=latent_dims)
            
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.relu(self.conv2(x))
        x = x.view(x.size(0), -1) # flatten batch of multi-channel feature maps to a batch of feature vectors
        x = self.fc(x)
        return x

In [None]:
class Decoder(nn.Module):
    def __init__(self):
        super(Decoder, self).__init__()
        c = capacity
        self.fc = nn.Linear(in_features=latent_dims, out_features=c*2*7*7)
        self.conv2 = nn.ConvTranspose2d(in_channels=c*2, out_channels=c, kernel_size=4, stride=2, padding=1)
        self.conv1 = nn.ConvTranspose2d(in_channels=c, out_channels=1, kernel_size=4, stride=2, padding=1)
            
    def forward(self, x):
        x = self.fc(x)
        x = x.view(x.size(0), capacity*2, 7, 7) # unflatten batch of feature vectors to a batch of multi-channel feature maps
        x = F.relu(self.conv2(x))
        x = torch.tanh(self.conv1(x)) # last layer before output is tanh, since the images are normalized and 0-centered
        return x

In [None]:
class Autoencoder(nn.Module):
    def __init__(self):
        super(Autoencoder, self).__init__()
        self.encoder = Encoder()
        self.decoder = Decoder()
    
    def forward(self, x):
        latent = self.encoder(x)
        x_recon = self.decoder(latent)
        return x_recon

    def get_latent_representation(self, x):
        return self.encoder(x)
    
    def get_reconstruction(self, x):
        return self.decoder(x)

In [None]:
# functie de afisare a unei singure imagini
def display(img, img_size=28, title=""):
    plt.imshow(img.cpu().detach().numpy().reshape((img_size, img_size)), cmap='gray')
    plt.title(title)
    plt.show()

In [None]:
# functie de afisare a unui set de 10 imagini: original + reconstruit
def display10(original, reconstructed):
  n = 10
  plt.figure(figsize=(30, 6))
  for i in range(n):
    # afisare original
    ax = plt.subplot(2, n, i + 1)
    plt.imshow(original.cpu().detach().numpy()[i].squeeze())
    plt.title("original")
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # afisare reconstructie
    ax = plt.subplot(2, n, i + 1 + n)
    plt.imshow(reconstructed.cpu().detach().numpy()[i].squeeze())
    plt.title("reconstruit")
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
  plt.show()

In [None]:
# functie de afisare a unui set de 10 imagini: original + zgomots + reconstruit
def display10_denoising(original, noisy, reconstructed):
  n = 10
  plt.figure(figsize=(30, 6))
  for i in range(n):
    # afisare original
    ax = plt.subplot(3, n, i + 1)
    plt.imshow(original.cpu().detach().numpy()[i].squeeze())
    plt.title("original")
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # afisare zgomots
    ax = plt.subplot(3, n, i + 1 + n)
    plt.imshow(noisy.cpu().detach().numpy()[i].squeeze())
    plt.title("zgomotos")
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)

    # afisare reconstructie
    ax = plt.subplot(3, n, i + 1 + 2*n)
    plt.imshow(reconstructed.cpu().detach().numpy()[i].squeeze())
    plt.title("reconstruit")
    plt.gray()
    ax.get_xaxis().set_visible(False)
    ax.get_yaxis().set_visible(False)
  plt.show()

In [None]:
def add_noise(original_batch, mean=0., std=0.2):
#TODO

In [None]:
autoencoder = Autoencoder()

device = torch.device("cuda:0" if use_gpu and torch.cuda.is_available() else "cpu")
autoencoder = autoencoder.to(device)

num_params = sum(p.numel() for p in autoencoder.parameters() if p.requires_grad)
print('Number of parameters: %d' % num_params)

In [None]:
optimizer = torch.optim.Adam(params=autoencoder.parameters(), lr=learning_rate, weight_decay=1e-5)
loss = nn.MSELoss()

# setam in modul de train
autoencoder.train()

denoising = True

train_loss_avg = []

print('Training ...')
for epoch in range(num_epochs):
    train_loss_avg.append(0)
    num_batches = 0
    
    for image_batch, _ in train_loader:
        if denoising:
            noisy_image_batch = add_noise(image_batch)

        image_batch = image_batch.to(device)
        
        if denoising:
            image_batch_recon = autoencoder(noisy_image_batch.to(device))
        else:
            image_batch_recon = autoencoder(image_batch)
        
        # eroarea de reconstructie
        reconstruction_loss = loss(image_batch_recon, image_batch)
        
        # backpropagation
        optimizer.zero_grad()
        reconstruction_loss.backward()
        
        # pas de optimizare
        optimizer.step()
        
        train_loss_avg[-1] += reconstruction_loss.item()
        num_batches += 1
        
    train_loss_avg[-1] /= num_batches
    print('Epoca [%d / %d] Eroare de reconstructie medie: %f' % (epoch+1, num_epochs, train_loss_avg[-1]))

In [None]:
# setam autoencoderul in modul de test
autoencoder.eval()

# extragem un batch din setul de testare si il procesam cu autoencoderul
val_batch_original = next(iter(val_loader))[0]
if denoising:
    val_batch_noisy = add_noise(val_batch_original)
    val_batch_noisy = val_batch_noisy.to(device)
    val_batch_original = val_batch_original.to(device)
    val_batch_reconstructed = autoencoder(val_batch_noisy)
else:
    val_batch_original = val_batch_original.to(device)
    val_batch_reconstructed = autoencoder(val_batch_original)

In [None]:
if denoising:
    display10_denoising(val_batch_original, val_batch_noisy, val_batch_reconstructed)
else:
    display10(val_batch_original, val_batch_reconstructed)