# **Generative Adverserial Network; From Data to Dollars: Using GANs to Generate Bags for Your Online Shop**

**Spring 2023,
Course: Imaging for Security Applications**

**J.-L. Dugelay, Sahar Husseini **

**Student(s) name:** Amalie Urdshals


**General instructions**

- Send the complete notebook with your answers (in English) before **April 21th**.

- Remember to indicate your name & surname at the beginning of the notebook.

-  The code of the notebook has to run. If it presents compilation errors the student will be asked to submit a new running version and will be penalized.

**Challenge description**
The goal of this project is to familiarize you with Generative Adversarial Networks (GAN) architecture and to improve the model's architecture to generate better images. You will be working with the Fashion MNIST dataset and trying to produce realistic bags for your online shop.
- In this script you find all the parts related to GAN architecture. First run the code with the initial architecture and parameters and then your task is to change network's layers and hyper-parameters to improve generator's performance such that it generate more realistic images.

# Installation
You need to install matplotlib, tqdm, torchvision and torch. If you are going to use gpu, remember to install torch with gpu. (Optional: If you are going to use cpu in windows, you can use the requirement.txt file and anaconda for the installation.)

This code is based on pytorch. If you are not familiar with pytorch you can refer to the following links:

https://pytorch.org/tutorials/

https://pytorch.org/docs/stable/nn.html


In [1]:
#Important libraries to be imported
from os import makedirs
import torch
from torch import nn
from tqdm.auto import tqdm
from torchvision import transforms
from torchvision.datasets import MNIST # Training dataset
from torchvision.datasets import FashionMNIST
from torchvision.utils import make_grid
from torch.utils.data import DataLoader
import matplotlib.pyplot as plt
from torchvision.utils import save_image
torch.manual_seed(0) # Set for testing purposes, please do not change!

<torch._C.Generator at 0x7dca7c8bf910>

# Visualization
You will use save_tensor_images_models function to visualize your generate images and save corresponding models. You do not need to modify this function.

In [2]:
def save_tensor_images_models(step, image_tensor, model, num_images=25, size=(1, 28, 28)):
    '''
    Function for saving generated images and models: Given a tensor of images, number of images, and
    size per image, save the image generated in specific step with the corresponding model.
    '''
    image_unflat = image_tensor.detach().cpu().view(-1, *size)
    image_grid = make_grid(image_unflat[:num_images], nrow=5)
    save_image(image_grid,'results_baseline/generated_plot_%03d.png' % (step) )
    torch.save(model.state_dict(), 'results_baseline/model_%03d.h5' % (step))

# Generator:
The initial stage in developing the generator component involves the creation of a function that is capable of constructing a single layer or block for the neural network. This block is composed of a linear transformation operation that maps the input to another shape, a batch normalization procedure to enhance stability, and a non-linear activation function, namely Rectified Linear Unit (ReLU), that introduces non-linearity to the model. While future modifications to this block may be necessary to enhance performance, at this stage, it is recommended to maintain the block's current structure and avoid making any modifications.

In [3]:
def get_generator_block(input_dim, output_dim):
    '''
    Function for returning a block of the generator's neural network
    given input and output dimensions.
    Parameters:
        input_dim: the dimension of the input vector, a scalar
        output_dim: the dimension of the output vector, a scalar
    Returns:
        a generator neural network layer, with two linear transformation
          followed by a batch normalization and then a relu activation
    '''
    return nn.Sequential(
        nn.Linear(input_dim, output_dim),
        nn.Linear(output_dim, output_dim),
        nn.BatchNorm1d(output_dim),
        nn.ReLU(inplace=True),
    )

**The generator class can now be constructed by providing 3 values: (See next code section below)**

- The noise vector dimension (z_dim)
- The image dimension (im_dim)
- The initial hidden dimension (hidden_dim)

Using these values, the generator will build a neural network with 3 layers/blocks. Beginning with the noise vector, the generator will apply non-linear transformations via the block function until the tensor is mapped to the size of the image to be outputted (the same size as the real images from fashion MNIST).

# your task:
- You will need to change generator's architecture to improve quality of the generated images.


In [4]:
class Generator(nn.Module):
    '''
    Generator Class
    Values:
        z_dim: the dimension of the noise vector, a scalar
        im_dim: the dimension of the images, fitted for the dataset used, a scalar
          (fashion MNIST images are 28 x 28 = 784 so that is your default)
        hidden_dim: the inner dimension, a scalar
    '''
    def __init__(self, z_dim=10, im_dim=784, hidden_dim=128):
        super(Generator, self).__init__()
        # Build the neural network

        self.gen = nn.Sequential(
          #get_generator_block(z_dim, hidden_dim),
          #get_generator_block(hidden_dim, hidden_dim * 2),
          #get_generator_block(hidden_dim * 2, hidden_dim * 4),
          #get_generator_block(hidden_dim * 4, hidden_dim * 8),
          #nn.Linear(hidden_dim * 8, im_dim),
          #nn.Sigmoid()

          get_generator_block(z_dim, hidden_dim),
          get_generator_block(hidden_dim, hidden_dim),
          get_generator_block(hidden_dim, hidden_dim * 2),
          get_generator_block(hidden_dim * 2, hidden_dim * 2),
          nn.Linear(hidden_dim * 2, im_dim),
          nn.Sigmoid()
        )


    def forward(self, noise):
        '''
        A forward pass function for the generator that takes a noise tensor as input and generates images as output.
        Parameters:
            noise: a noise tensor with dimensions (n_samples, z_dim)
        '''

        return self.gen(noise)

    # Needed for grading
    def get_gen(self):
        '''
        Returns:
            the sequential model
        '''
        return self.gen

# Noise
To be able to use your generator, you will need to be able to create noise vectors. The noise vector z has the important role of making sure the images generated from the same class don't all look the same -- think of it as a random seed. You will generate it randomly using pyTorch by sampling random numbers from the normal distribution. Since multiple images will be processed per pass, you will generate all the noise vectors at once.
(You do not need to modify following function)

In [5]:
def get_noise(n_samples, z_dim, device='cpu'):
    '''
    A function to create noise vectors that takes the dimensions (n_samples, z_dim) as input and generates a tensor of
    the same shape filled with random numbers drawn from a normal distribution.
    Parameters:
        n_samples: the number of samples to generate, a scalar
        z_dim: the dimension of the noise vector, a scalar
        device: the device type
    '''
    return torch.randn(n_samples,z_dim,device=device)


# Discriminator
To construct the second component, the discriminator, you will need to begin by creating a function that builds a neural network block for the discriminator. See below

In [6]:
def get_discriminator_block(input_dim, output_dim):
    '''
    Discriminator Block
    Function for returning a neural network of the discriminator given input and output dimensions.
    Parameters:
        input_dim: the dimension of the input vector, a scalar
        output_dim: the dimension of the output vector, a scalar
    Returns:
        a discriminator neural network layer, with a linear transformation
          followed by an nn.LeakyReLU activation with negative slope of 0.2
    '''
    return nn.Sequential(
         nn.Linear(input_dim, output_dim), #Layer 1
         nn.LeakyReLU(0.2, inplace=True)
    )

**The discriminator class can now be constructed by providing 2 values: (See next code section below)**


- The image dimension (im_dim)
- The hidden dimension (hidden_dim)

The neural network for the discriminator, which take the image tensor as input and transform it until it produces a single numerical output, indicating whether the image is genuine or fake.

# your task:
- You will need to change discriminator's architecture to help generator to generate better images.

In [7]:
class Discriminator(nn.Module):
    '''
    Discriminator Class
    Values:
        im_dim: the dimension of the images, fitted for the dataset used, a scalar
            (MNIST images are 28x28 = 784 so that is your default)
        hidden_dim: the inner dimension, a scalar
    '''
    def __init__(self, im_dim=784, hidden_dim=128):
        super(Discriminator, self).__init__()
        self.disc = nn.Sequential(
            #get_discriminator_block(im_dim, hidden_dim),
            #get_discriminator_block(hidden_dim, hidden_dim * 2),
            #get_discriminator_block(hidden_dim * 2, hidden_dim * 4),
            #nn.Linear(hidden_dim * 4, 1)

            #get_discriminator_block(im_dim, hidden_dim * 2),
            #get_discriminator_block(hidden_dim * 2, hidden_dim),
            #get_discriminator_block(hidden_dim, hidden_dim),
            #nn.Linear(hidden_dim, 1)

            get_discriminator_block(im_dim, hidden_dim * 2),
            get_discriminator_block(hidden_dim * 2, hidden_dim * 4),
            get_discriminator_block(hidden_dim * 4, hidden_dim * 8),
            nn.Linear(hidden_dim * 8, 1)
            )

    def forward(self, image):
        '''
        Function for completing a forward pass of the discriminator: Given an image tensor,
        returns a 1-dimension tensor representing fake/real.
        Parameters:
            image: a flattened image tensor with dimension (im_dim)
        '''
        return self.disc(image)

    def get_disc(self):
        '''
        Returns:
            the sequential model
        '''
        return self.disc

# Training parameters
Now that you have defined the two network architectures, you can put it all together! First, you will set your parameters:

- criterion: the loss function
- n_epochs: the number of times you iterate through the entire dataset when training
- z_dim: the dimension of the noise vector
- display_step: how often to display/visualize the images
- batch_size: the number of images per forward/backward pass
- lr: the learning rate
- device: the device type, CPU or GPU

# your task:
- Change network hyper-parameters to improve your architecture performance



In [8]:
# Set your parameters
criterion = nn.BCEWithLogitsLoss()
# increasing from 10 to 20 or 30 to allow the model more time to learn from the data
n_epochs = 30
# (64) Increase the dimensionality of the noise vector to make it easier for the generator to create diverse images
z_dim = 2
display_step = 94
# 128 Increase the batch size to allow the model to see more examples in each iteration
batch_size = 128
# 0.2 Reduce the learning rate (lr) to allow the model to converge more slowly and potentially reach a better optimum.
lr = 0.001
# cpu
device = 'cpu'
#loading bag images from MNIST fashion dataset
dataset = FashionMNIST(root=".", transform=transforms.ToTensor(), download=True)
idx = dataset.targets==8
dataset.data = dataset.data[idx]
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True)


Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz to ./FashionMNIST/raw/train-images-idx3-ubyte.gz


100%|██████████| 26421880/26421880 [00:01<00:00, 18269837.08it/s]


Extracting ./FashionMNIST/raw/train-images-idx3-ubyte.gz to ./FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz to ./FashionMNIST/raw/train-labels-idx1-ubyte.gz


100%|██████████| 29515/29515 [00:00<00:00, 303853.01it/s]


Extracting ./FashionMNIST/raw/train-labels-idx1-ubyte.gz to ./FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-images-idx3-ubyte.gz to ./FashionMNIST/raw/t10k-images-idx3-ubyte.gz


100%|██████████| 4422102/4422102 [00:00<00:00, 5463871.00it/s]


Extracting ./FashionMNIST/raw/t10k-images-idx3-ubyte.gz to ./FashionMNIST/raw

Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz to ./FashionMNIST/raw/t10k-labels-idx1-ubyte.gz


100%|██████████| 5148/5148 [00:00<00:00, 13336798.64it/s]


Extracting ./FashionMNIST/raw/t10k-labels-idx1-ubyte.gz to ./FashionMNIST/raw



At this point, it's time to set up your generator, discriminator, and optimizers. Keep in mind that each optimizer is designed to optimize the parameters of a specific model, so you'll need to assign each optimizer to the appropriate model. This ensures that each optimizer only optimizes the model it's intended for.

In [9]:
gen = Generator(z_dim).to(device)
gen_opt = torch.optim.Adam(gen.parameters(), lr=lr)
disc = Discriminator().to(device)
disc_opt = torch.optim.Adam(disc.parameters(), lr=lr)

In preparation for GAN training, it's necessary to develop functions that can calculate the losses of both the discriminator and the generator. These loss functions serve as a feedback mechanism for the discriminator and generator to evaluate their performance and improve. However, because the generator is involved in calculating the discriminator's loss, it's important to use the .detach() method to prevent the generator's parameters from being updated during this step.

In [10]:
def get_disc_loss(gen, disc, criterion, real, num_images, z_dim, device):
    '''
    Return the loss of the discriminator given inputs.
    Parameters:
        gen: the generator model, which returns an image given z-dimensional noise
        disc: the discriminator model, which returns a single-dimensional prediction of real/fake
        criterion: the loss function, which should be used to compare
               the discriminator's predictions to the ground truth reality of the images
               (e.g. fake = 0, real = 1)
        real: a batch of real images
        num_images: the number of images the generator should produce,
                which is also the length of the real images
        z_dim: the dimension of the noise vector, a scalar
        device: the device type
    Returns:
        disc_loss: a torch scalar loss value for the current batch
    '''
    #1) Create noise vectors and generate a batch (num_images) of fake images.
    fake_noise = get_noise(num_images, z_dim, device=device)
    #2) Get the discriminator's prediction of the fake image
    #            and calculate the loss. Don't forget to detach the generator!
    #            (Remember the loss function you set earlier -- criterion. You need a
    #            'ground truth' tensor in order to calculate the loss.
    fake = gen(fake_noise)
    disc_fake_pred = disc(fake.detach())   #remember to detach
    disc_fake_loss = criterion(disc_fake_pred, torch.zeros_like(disc_fake_pred))
    # 3) Get the discriminator's prediction of the real image and calculate the loss.
    disc_real_pred = disc(real)
    disc_real_loss = criterion(disc_real_pred, torch.ones_like(disc_real_pred))
    #4) Calculate the discriminator's loss by averaging the real and fake loss
    #            and set it to disc_loss.
    disc_loss = (disc_fake_loss + disc_real_loss) / 2
    return disc_loss

The next step is to define the generator loss. Here generator wants the discriminator to think that its fake images are real.

In [11]:
def get_gen_loss(gen, disc, criterion, num_images, z_dim, device):
    '''
    Return the loss of the generator given inputs.
    Parameters:
        gen: the generator model, which returns an image given z-dimensional noise
        disc: the discriminator model, which returns a single-dimensional prediction of real/fake
        criterion: the loss function, which should be used to compare
               the discriminator's predictions to the ground truth reality of the images
               (e.g. fake = 0, real = 1)
        num_images: the number of images the generator should produce,
                which is also the length of the real images
        z_dim: the dimension of the noise vector, a scalar
        device: the device type
    Returns:
        gen_loss: a torch scalar loss value for the current batch
    '''
    #1) Create noise vectors and generate a batch of fake images.
    fake_noise = get_noise(num_images, z_dim, device=device)
    fake = gen(fake_noise)
    #2) Get the discriminator's prediction of the fake image.
    disc_fake_pred = disc(fake)
    #3) Calculate the generator's loss. Generator wants
    #          the discriminator to think that its fake images are real
    gen_loss = criterion(disc_fake_pred, torch.ones_like(disc_fake_pred))
    return gen_loss

The next function visualizes the loss values for the discriminator and generator. Later you can check these loss values to see if your model is training well. See "results_baseline" folder.

In [12]:
# create a line plot of loss for the gan and save to file
def plot_history(d_hist, g_hist):
	# plot loss
	plt.plot(d_hist, label='disc-loss')
	plt.plot(g_hist, label='gen-loss')
	plt.legend()
	# save plot to file
	plt.savefig('results_baseline/plot_line_loss.png')
	plt.close()

Once you have reached this point, you are ready to integrate all the components together. In each epoch, the entire dataset will be processed in batches, and for each batch, you will need to update the discriminator and generator using their respective loss functions. These batches are sets of images that will be predicted on before the loss functions are calculated, which can result in a loss greater than 1 due to the binary cross entropy loss function.

It is common for the discriminator to initially outperform the generator since its task is simpler. However, it is important to prevent either model from achieving near-perfect accuracy, which would halt learning. Achieving a balance between the two models can be challenging in a standard GAN.





In [13]:


# make folder for results
makedirs('results_baseline', exist_ok=True)
cur_step = 0
mean_generator_loss = 0
mean_discriminator_loss = 0
gen_loss = False
error = False
d_hist, g_hist = list(), list()

for epoch in range(n_epochs):
    # Dataloader returns the batches
    for real, _ in tqdm(dataloader):
        cur_batch_size = len(real)
        # Flatten the batch of real images from the dataset
        real = real.view(cur_batch_size, -1).to(device)

        ### Update discriminator ###
        # Zero out the gradients before backpropagation
        disc_opt.zero_grad()

        # Calculate discriminator loss
        disc_loss = get_disc_loss(gen, disc, criterion, real, cur_batch_size, z_dim, device)

        # Update gradients
        disc_loss.backward(retain_graph=True)

        # Update optimizer
        disc_opt.step()

        ### Update generator ###
        #       1) Zero out the gradients.
        #       2) Calculate the generator loss, assigning it to gen_loss.
        #       3) Backprop through the generator: update the gradients and optimizer.

        gen_opt.zero_grad()
        gen_loss = get_gen_loss(gen, disc, criterion, cur_batch_size, z_dim, device)
        gen_loss.backward()
        gen_opt.step()

        # summarize loss on this batch
		# record history
        d_hist.append(disc_loss.item())
        g_hist.append(gen_loss.item())

        # Keep track of the average discriminator loss
        mean_discriminator_loss += disc_loss.item() / display_step

        # Keep track of the average generator loss
        mean_generator_loss += gen_loss.item() / display_step

        ### Visualization code ###
        if cur_step % display_step == 0 and cur_step > 0:
            print(f"Step {cur_step}: Generator loss: {mean_generator_loss}, discriminator loss: {mean_discriminator_loss}")
            fake_noise = get_noise(cur_batch_size, z_dim, device=device)
            fake = gen(fake_noise)
            save_tensor_images_models(cur_step,fake, model = gen)
            mean_generator_loss = 0
            mean_discriminator_loss = 0
        cur_step += 1
#save loss curves
plot_history(d_hist, g_hist)


  0%|          | 0/47 [00:00<?, ?it/s]

  0%|          | 0/47 [00:00<?, ?it/s]

  0%|          | 0/47 [00:00<?, ?it/s]

Step 94: Generator loss: 5.562290709703524, discriminator loss: 0.3093638439821277


  0%|          | 0/47 [00:00<?, ?it/s]

  0%|          | 0/47 [00:00<?, ?it/s]

Step 188: Generator loss: 4.904736814346721, discriminator loss: 0.19917314953388685


  0%|          | 0/47 [00:00<?, ?it/s]

  0%|          | 0/47 [00:00<?, ?it/s]

Step 282: Generator loss: 4.275495246369787, discriminator loss: 0.18065103649736397


  0%|          | 0/47 [00:00<?, ?it/s]

  0%|          | 0/47 [00:00<?, ?it/s]

Step 376: Generator loss: 7.064081166652925, discriminator loss: 0.11368820717201586


  0%|          | 0/47 [00:00<?, ?it/s]

  0%|          | 0/47 [00:00<?, ?it/s]

Step 470: Generator loss: 7.941087981487839, discriminator loss: 0.1623119300508753


  0%|          | 0/47 [00:00<?, ?it/s]

  0%|          | 0/47 [00:00<?, ?it/s]

Step 564: Generator loss: 7.2443303884343875, discriminator loss: 0.15277382566970088


  0%|          | 0/47 [00:00<?, ?it/s]

  0%|          | 0/47 [00:00<?, ?it/s]

Step 658: Generator loss: 9.279727002407643, discriminator loss: 0.12281514939535018


  0%|          | 0/47 [00:00<?, ?it/s]

  0%|          | 0/47 [00:00<?, ?it/s]

Step 752: Generator loss: 9.200297913652784, discriminator loss: 0.17846201250250357


  0%|          | 0/47 [00:00<?, ?it/s]

  0%|          | 0/47 [00:00<?, ?it/s]

Step 846: Generator loss: 7.059273182077609, discriminator loss: 0.0792903078303851


  0%|          | 0/47 [00:00<?, ?it/s]

  0%|          | 0/47 [00:00<?, ?it/s]

Step 940: Generator loss: 8.160991871610602, discriminator loss: 0.13135730756565614


  0%|          | 0/47 [00:00<?, ?it/s]

  0%|          | 0/47 [00:00<?, ?it/s]

Step 1034: Generator loss: 4.9165913094865505, discriminator loss: 0.18852223212176802


  0%|          | 0/47 [00:00<?, ?it/s]

  0%|          | 0/47 [00:00<?, ?it/s]

Step 1128: Generator loss: 6.709297631649259, discriminator loss: 0.21576963965483803


  0%|          | 0/47 [00:00<?, ?it/s]

  0%|          | 0/47 [00:00<?, ?it/s]

Step 1222: Generator loss: 8.19405849182859, discriminator loss: 0.12499209196663441


  0%|          | 0/47 [00:00<?, ?it/s]

  0%|          | 0/47 [00:00<?, ?it/s]

Step 1316: Generator loss: 5.79113980176601, discriminator loss: 0.14855283128216543


  0%|          | 0/47 [00:00<?, ?it/s]

Finally with the current architecture your results look similar to the images below

![generated_plot_376.png](attachment:generated_plot_376.png)

# Your task:
- In this step save your generated images and the loss curves and put them in your report. In the report you need to explain why the results are not very realistic and what you are going to do to improve the results.
- start to tune your network parameters and change your network architecture to get more realistic images that you can use on your online shop.
- explain all steps you used to improve your generator. Specially visualize your loss curve and explain when you have the best result.
- After the deadline choose your best generated image and upload it to the following folder **“synthesized_images”** in moodle. Remember to change the name of your image being same as your **"firstName_lastName"**.  After the deadline other students will evaluate your result subjectively.

