**Aeronautics Institute of Technology – ITA**

**Computer Vision – CM-203**

**Professors:** 

Marcos Ricardo Omena de Albuquerque Maximo

Gabriel Adriano de Melo


**Instructions:**

Before submitting your lab, be sure that everything is running correctly (in sequence): first, **restart the kernel** (`Runtime->Restart Runtime` in Colab or `Kernel->Restart` in Jupyter). Then, execute all cells (`Runtime->Run All` in Colab or `Cell->Run All` in Jupyter) and verifies that all cells run without any errors, expecially the automatic grading ones, i.e. the ones with `assert`s.

**Do not delete the answer cells**, i.e. the ones that contains `WRITE YOUR CODE HERE` or `WRITE YOUR ANSWER HERE`, because they contain metadata with the ids of the cells for the grading system. For the same reason, **do not delete the test cells**, i.e. the ones with `assert`s. The autograding system executes all the code sequentially, adding extra tests in the test cells. There is no problem in creating new cells, as long as you do not delete answer or test cells. Moreover, keep your solutions within the reserved spaces.

The notebooks are implemented to be compatible with Google Colab, and they install the dependencies and download the datasets automatically. The commands which start with ! (exclamation mark) are bash commands and can be executed in a Linux terminal.

---

## Laboratory 9 - Generator Adversarial Network (GAN)

In this laboratory, you will implement the DCGAN (Deep Convolutional Generative Adversarial Network) technique to generate fake images of cats.
This laboratory was based on many tutorials available on the Internet. Therefore, I incentivize that you try to implement the functions without looking for DCGAN tutorials. Of course, you can check the libraries' documentations and other websites. Moreover, do not copy code from tutorials. Remember that the intention here is that you learn.

## Installations and Configurations

The following cells install dependencies and configure the notebook for the implementation of the laboratory.

In [None]:
!pip install numpy matplotlib torch torchvision tqdm opencv-python

## Imports

The following cell imports the needed libraries.

In [None]:
import  os
import random
import numpy as np
from torch.utils.data import DataLoader
from torchvision.datasets import ImageFolder
import torchvision.transforms as tt
import torch
import torch.nn as nn
import cv2
from tqdm.notebook import tqdm
import torch.nn.functional as F
from torchvision.utils import save_image
from torchvision.utils import make_grid
import matplotlib.pyplot as plt
import random
%matplotlib inline

In [None]:
# This cell defines a function for resetting the random seeds

def reset_seeds(seed=42):
  # 42 is the answer to the Ultimate Question of Life, the Universe, and Everything
  random.seed(seed)
  np.random.seed(seed)
  torch.manual_seed(seed)

In [None]:
reset_seeds()

## Downloading the Dataset

The following cell downloads a dataset with many pictures of cute cats.

In [None]:
if not os.path.exists('archive.zip'):
    !gdown https://drive.google.com/uc?id=1WrW8nXvYFafz5qNY9Mi8tU32Y-Z2M0Zd
if not os.path.exists('cats'):
    !unzip archive.zip

In [None]:
DATA_DIR = '.'

## Visualizing the Dataset

The following cell shows some pictures of the data. Beware of cuteness!

In [None]:
grid_size = 8
num_imgs = grid_size * grid_size
fig = plt.figure(figsize=(grid_size, grid_size))
for num, fn in enumerate(os.listdir(DATA_DIR + '/cats')[:num_imgs]):
    path = DATA_DIR + '/cats/' + fn
    img = cv2.imread(path)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
    plt.subplot(grid_size, grid_size, num + 1)
    plt.axis('off')
    plt.imshow(img)

## Creating the Data Loader

The following cell creates a data loader which resizes and normalize the images. 

In [None]:
image_size = 64
batch_size = 128
normalize_mean = [0.5, 0.5, 0.5]
normalize_std = [0.5, 0.5, 0.5]

train_ds = ImageFolder(DATA_DIR + '/cats', transform=tt.Compose([
    tt.Resize(image_size),
    tt.CenterCrop(image_size),
    tt.ToTensor(),
    tt.Normalize(mean=normalize_mean, std=normalize_std)
]))

train_dl = DataLoader(train_ds, batch_size, shuffle=True, num_workers=2)

In [None]:
# Defining some variables

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
latent_size = 100 # the dimension of the latent space used in the generator
size_multiplier = 64 # multiplier used to define the number of filters at each layer
num_color_channels = 3 # number of color channels of the input image
learning_rate = 0.0002 # learning rate used in the optimization algorithm
beta1 = 0.5 # hyperparameter beta1 of the Adam optimization algorithm
beta2 = 0.999 # hyperparameter beta2 of the Adam optimization algorithm

## Implementing the Generator

In the following cell, implement the generator network.

Instructions:
- The architecture of the network is based on the DCGAN architecture, but is not exactly the same.
- Five convolutional-transpose layers are used to convert the latent space into a 3x64x64 image.
- Each convolutional-transpose layer uses a kernel size of 4.
- The stride of the first convolutional-transpose layer is 1, while the other ones use stride of 2.
- No padding is used in the first convolutional-transpose layer, but the subsequent ones use a padding of 1.
- No bias is used in the convolutional-transpose layers.
- The intermediary layers use the ReLU activation function, while the last one uses Tanh.
- Batch Normalization is used after each convolutional-transpose layer, except for the last one.
- Use `size_multiplier` to help you obtain the correct dimensions at each layer.

In [None]:
class Generator(nn.Module):
    """
    Defines the generator network of the DCGAN technique.
    """
    def __init__(self):
        """
        Constructor of the generator.
        """
        super(Generator, self).__init__()
        self.network = nn.Sequential(
            # in: latent_size x 1 x 1
            nn.ConvTranspose2d(latent_size, 8 * size_multiplier, kernel_size=4, stride=1, padding=0, bias=False),
            nn.BatchNorm2d(8 * size_multiplier),
            nn.ReLU(True),
            # tensor: 512 x 4 x 4        
            nn.ConvTranspose2d(8 * size_multiplier, 4 * size_multiplier, kernel_size=4, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(4 * size_multiplier),
            nn.ReLU(True),
            # tensor: 256 x 8 x 8
            # Implement the next layers:
            # tensor: 128 x 16 x 16
            # tensor: 64 x 32 x 32
            # WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
            raise NotImplementedError()
            nn.ConvTranspose2d(size_multiplier, num_color_channels, kernel_size=4, stride=2, padding=1, bias=False),
            nn.Tanh(),
            # out: 3 x 64 x 64
        )

    def forward(self, input):
        return self.network(input)

In [None]:
# Prints the generator network
generator = Generator().to(device)
print(generator)

In [None]:
def count_parameters(model):
    """
    Auxiliary function to count the number of trainable parameters of a PyTorch model.
    """
    return sum(p.numel() for p in model.parameters() if p.requires_grad)

In [None]:
reset_seeds()

generator = Generator().to(device)

assert count_parameters(generator) == 3576704

latent = torch.randn(batch_size, latent_size, 1, 1, device=device)

output = generator.forward(latent)

assert output.shape[0] == batch_size
assert output.shape[1] == num_color_channels
assert output.shape[2] == image_size
assert output.shape[3] == image_size


## Implement the Discriminator

In the following cell, implement the discriminator.

Instructions:
- The architecture of the network is based on the DCGAN architecture, but is not exactly the same.
- Five convolutional layers are used to transform the input image into a scalar which encodes the probability of the image being a real one.
- Each convolutional layer uses a kernel size of 4.
- The stride of the last convolutional layer is 1, while the other ones use stride of 2.
- No padding is used in the last convolutional layer, but the previous ones use a padding of 1.
- No bias is used in the convolutional layers.
- The intermediary layers use the LeakyReLU activation function with a negative slope of 0.2, while the last one uses Sigmoid.
- Batch Normalization is used after each convolutional layer, except for the last one.
- Use `size_multiplier` to help you obtain the correct dimensions at each layer.

In [None]:
class Discriminator(nn.Module):
    def __init__(self):
        super(Discriminator, self).__init__()
        self.network = nn.Sequential(
            # in: 3 x 64 x 64
            nn.Conv2d(num_color_channels, size_multiplier, kernel_size=4, stride=2, padding=1, bias=False),
            nn.BatchNorm2d(size_multiplier),
            nn.LeakyReLU(0.2, inplace=True),
            # tensor: 64 x 32 x 32
            # Implement the next layers:
            # tensor: 128 x 16 x 16
            # tensor: 256 x 32 x 32
            # tensor: 512 x 4 x 4       
            # WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
            raise NotImplementedError()
            nn.Conv2d(8 * size_multiplier, 1, kernel_size=4, stride=1, padding=0, bias=False),
            # out: 1 x 1 x 1
            nn.Flatten(),
            nn.Sigmoid(),
        )

    def forward(self, input):
        return self.network(input)

In [None]:
# Prints the discriminator network

discriminator = Discriminator().to(device)
print(discriminator)

In [None]:
reset_seeds()

discriminator = Discriminator().to(device)

assert count_parameters(discriminator) == 2765696

random_img = torch.randn(batch_size, num_color_channels, image_size, image_size, device=device)

output = discriminator.forward(random_img)

assert output.shape[0] == batch_size
assert output.shape[1] == 1


## Initializing the Weights

In DCGAN, the weights are initialized in a particular way. The weights of the convolutional and convolutional-tranpose layers are initialized with a Normal distribution with zero mean and standard deviation of 0.02. Notice that the convolutional and convolutional-transpose do not have biases in this case.

In [None]:
def weights_init(m):
    """
    Initializes the weights of a given layer following the DCGAN convention.
    """
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        # WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
        raise NotImplementedError()
    elif classname.find('BatchNorm') != -1:
        nn.init.normal_(m.weight.data, 1.0, 0.02)
        nn.init.constant_(m.bias.data, 0)

In [None]:
reset_seeds()

discriminator = Discriminator().to(device)
generator = Generator().to(device)

discriminator.apply(weights_init)
generator.apply(weights_init)

assert (torch.sum(discriminator.network[0].weight.data) - (-1.0316)) < 1e-3
assert (torch.sum(discriminator.network[1].weight.data) - 64.0033) < 1e-3

assert (torch.sum(generator.network[0].weight.data) - 5.0504) < 1e-3
assert (torch.sum(generator.network[1].weight.data) - 512.3093) < 1e-3


In [None]:
sample_dir = 'generated'
os.makedirs(sample_dir, exist_ok=True)

def save_samples(index, latent_tensors, show=True):
    """
    Auxiliary function to save the generated samples during training.
    """
    fake_images = generator(latent_tensors)
    fake_fname = 'generated_images_{0:0=4d}.png'.format(index)
    save_image(fake_images, os.path.join(sample_dir, fake_fname), nrow=8, padding=2, normalize=True)
    print('Saving ', fake_fname)
    if show:
        fig, ax = plt.subplots(figsize=(8,8))
        ax.set_xticks([])
        ax.set_yticks([])
        ax.imshow(make_grid(fake_images.cpu().detach(), nrow=8).permute(1, 2, 0))

## Implementing the Discriminator's Training

To train the discriminator, we use the following loss function:
\begin{equation}
J^{(D)} = -\mathbb{E}_{x \sim p_{\mathrm{data}}} \log D(x) - \mathbb{E}_z \log \left( 1 - D(G(z)) \right),
\end{equation}
where the first and second terms are the losses related to the real and fake (i.e. generated by the generator) images, respectively. Using this loss function, implement an iteration of the discriminator's training in the following cell.
Hints:
- Use `F.binary_cross_entropy(predictions, targets)` for computing the binary cross-entropy between `predictions` and `targets`.
- To learn how to generate fake images, have a look at the test of the generator network.
- Use the code related to how the real loss is computed as a template.

In [None]:
def training_step_discriminator(discriminator, generator, real_images, opt_d):
    """
    Executes an iteration of the discriminator's training.
    param discriminator: the discriminator's model.
    param generator: the generator's model.
    param real_images: real images from the dataset.
    param opt_d: the optimizer used to execute a step of Gradient Descent.
    return: three value are returned: the total loss, the score from the 
            real images, and the score from the fake images.
    """
    opt_d.zero_grad()
    
    real_preds = discriminator(real_images)
    real_targets = torch.ones(real_images.size(0), 1, device=device)
    real_loss = F.binary_cross_entropy(real_preds, real_targets)
    real_score = torch.mean(real_preds).item()
    
    # Create fake images and their targets
    # Compute the fake loss using binary cross-entropy
    # Use how the real loss is computed as a template
    # WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
    raise NotImplementedError()
    
    loss = real_loss + fake_loss
    loss.backward()
    opt_d.step()
    
    return loss.item(), real_score, fake_score

In [None]:
reset_seeds()

generator = Generator().to(device)
discriminator = Discriminator().to(device)

discriminator.apply(weights_init)
generator.apply(weights_init)

opt_d = torch.optim.Adam(discriminator.parameters(), lr=learning_rate, betas=(beta1, beta2))

latent = torch.randn(batch_size, latent_size, 1, 1, device=device)

real_images = generator(latent) # Using them as real images, but they are fake
loss_d, real_score, fake_score = training_step_discriminator(discriminator, generator, real_images.to(device), opt_d)

assert (loss_d - 1.8112266063690186) < 1e-3
assert (real_score - 0.6484211087226868) < 1e-3
assert (fake_score - 0.6575933694839478) < 1e-3


## Implementing the Generator's Training

To train the generator, use as loss function:
\begin{equation}
J^{(G)} = -\mathbb{E}_z \log \left( D(G(z)) \right).
\end{equation}
Using this loss function, implement an iteration of the generator's training in the following cell.
Hints:
- Use `F.binary_cross_entropy(predictions, targets)` for computing the binary cross-entropy between `predictions` and `targets`.
- To learn how to generate fake images, have a look at the test of the generator network.
- Use the code related to how the real loss is computed in the discriminator's training step as a template.

In [None]:
def training_step_generator(discriminator, generator, opt_g):
    """
    Executes an iteration of the discriminator's training.
    param discriminator: the discriminator's model.
    param generator: the generator's model.
    param opt_g: the optimizer used to execute a step of Gradient Descent.
    return: the loss value.
    """
    opt_g.zero_grad()

    # WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
    raise NotImplementedError()
    
    loss.backward()
    opt_g.step()

    return loss.item()

In [None]:
reset_seeds()

generator = Generator().to(device)
discriminator = Discriminator().to(device)

discriminator.apply(weights_init)
generator.apply(weights_init)

opt_g = torch.optim.Adam(generator.parameters(), lr=learning_rate, betas=(beta1, beta2))

latent = torch.randn(batch_size, latent_size, 1, 1, device=device)

real_images = generator(latent) # Using them as real images, but they are fake
loss_g = training_step_generator(discriminator, generator, opt_g)

assert (loss_g - 0.49504929780960083) < 1e-3


In [None]:
# Generating a fixed latent vector so we can compare results from different epochs
reset_seeds()

fixed_latent = torch.randn(batch_size, latent_size, 1, 1, device=device)

### Implementing the Training

As mentioned in class, the training of the discriminator and the generator are interleaved. In the following function, implement the training step considering that we first execute a training step of the discriminator, then a training step of the generator. **Hint:** take a look at the tests of the training step functions.

In [None]:
def fit_step(discriminator, generator, real_images, opt_d, opt_g):
    """
    Executes a step of the DCGAN training.
    :param discriminator: the discriminator's model.
    :param generator: the generator's model.
    :param real_images: the real images.
    :param opt_d: the discriminator's optimizer.
    :param opt_g: the generator's optimizer.
    :return: the discriminator's loss, the generator's loss, the score related to the real images, and
             the score related to the fake images.
    """
    # WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
    raise NotImplementedError()
    return loss_d, loss_g, real_score, fake_score

In [None]:
reset_seeds()

generator = Generator().to(device)
discriminator = Discriminator().to(device)

discriminator.apply(weights_init)
generator.apply(weights_init)

real_images, _ = next(iter(train_dl))
fit_step(discriminator, generator, real_images, opt_d, opt_g)

assert (torch.sum(discriminator.network[0].weight.data) - (-1.0316)) < 1e-3
assert (torch.sum(discriminator.network[1].weight.data) - 64.0033) < 1e-3

assert (torch.sum(generator.network[0].weight.data) - 5.0504) < 1e-3
assert (torch.sum(generator.network[1].weight.data) - 512.3093) < 1e-3


In [None]:
def fit(num_epochs, learning_rate, start_idx=1, show=False, autograding=False):
    """
    Trains the DCGAN.
    param num_epochs: number of epochs used in the training.
    param learning_rate: learning rate used for the optimizers.
    param start_idx: start index of the epochs (used if training is resumed).
    param show: if the training results should be shown during training.
    """
    torch.cuda.empty_cache()

    # Create the optimizers
    opt_d = torch.optim.Adam(discriminator.parameters(), lr=learning_rate, betas=(beta1, beta2))
    opt_g = torch.optim.Adam(generator.parameters(), lr=learning_rate, betas=(beta1, beta2))

    for epoch in range(num_epochs):
        for real_images, _ in tqdm(train_dl):
            # Executes a step of the training
            loss_d, loss_g, real_score, fake_score = fit_step(discriminator, generator, real_images.to(device), opt_d, opt_g)
            
        # Log losses and scores from the last batch
        print('Epoch [{}/{}], loss_g: {:.4f}, loss_d: {:.4f}, real_score: {:.4f}, fake_score: {:.4f}'.format(
            epoch + 1, num_epochs, loss_g, loss_d, real_score, fake_score
        ))

        save_samples(epoch + start_idx, fixed_latent, show)

## Training More

Training for a single epoch gives results that resembles faint images of cats. For better quality, we need to train for more time. The following cell trains the GAN for 10 epochs.

You definitely need a GPU for executing this training. This training is optional, but I highly recommend you execute it to see the results. You can get much better quality with even more training.

To see the results, look for a folder called `generated` in your computer or in Google Colab.

Even with 10 epochs, the cats will still look like cute furry demons :). You can also see the evolution across the epochs.

In [None]:
reset_seeds()

generator = Generator().to(device)
discriminator = Discriminator().to(device)

discriminator.apply(weights_init)
generator.apply(weights_init)

train_dl = DataLoader(train_ds, batch_size, shuffle=True, num_workers=2)

num_epochs = 10
fit(num_epochs, learning_rate)

# Your data and feedback:

Write a feedback for the lab so we can make it better for the next years.

In the following variables, write the number of hours spent on this lab, the perceived difficulty, and the expected grade (you may delete the `raise` and the comments):

In [None]:
# meta_eval manual_graded_answer 0

horas_gastas = None    # 1.5   - Float number with the number of hours spent 
dificuldade_lab = None # 0     - Float number from 0.0 to 10.0 (inclusive)
nota_esperada = None   # 10    - Float number from 0.0 to 10.0 (inclusive)

# WRITE YOUR CODE HERE! (you can delete this comment, but do not delete this cell so the ID is not lost)
raise NotImplementedError()

Write below other comments or feedbacks about the lab. If you did not understand anything about the lab, please also comment here.

If you find any typo or bug in the lab, please comment below so we can fix it.

WRITE YOUR SOLUTION HERE! (do not change this first line):

**ATTENTION**

**ATTENTION**

**ATTENTION**

**ATTENTION**

**DISCURSIVE QUESTION**

WRITE YOUR ANSWER HERE (do not delete this cell so the ID is not lost)

**ATTENTION**

**ATTENTION**


**End of the lab!**