<a href="https://colab.research.google.com/github/FFFreitas/2nd-Workshop-on-Compact-Objects-Gravitational-Waves-and-Deep-Learning/blob/main/hands-on/Hands%20on%202.ipynb">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

# Building Your First GAN with PyTorch

## Deep Convolutional GANs

DCAGN (Deep Convolutional Generative Adversarial Network) is one of the early well-performing and stable approaches to generate images with adversarial training. Even when we only train a GAN to manipulate 1D data, we have to use multiple techniques to ensure a stable training. A lot of things could go wrong in the training of GANs. For example, either a generator or a discriminator could overfit if one or the other does not converge. Sometimes, the generator only generates a handful of sample varieties. This is called mode collapse.

To ensure the stable training of GANs on image data like this, a DCGAN uses three techniques:

- Getting rid of fully connected layers and only using convolution layers 
- Using strided convolution layers to perform downsampling, instead of using pooling layers
- Using ReLU/leakyReLU activation functions instead of Tanh between hidden layers

In this lecture, we will introduce the architectures of the generator and discriminator of the DCGAN and learn how to generate images with it. We'll use a spectrogram dataset I generated for this lectures.

# The architecture of generator

The generator network of a DCGAN contains 4 hidden layers (we treat the input layer as the 1 st hidden layer for simplicity) and 1 output layer. Transposed convolution layers are used in hidden layers, which are followed by batch normalization layers and ReLU activation functions. The output layer is also a transposed convolution layer and Tanh is used as the activation function. The architecture of the generator is shown in the following diagram:

![](../figs/DCGAN_gen.png)

The $2^{nd} , 3^{rd}$ , and $4^{th}$ hidden layers and the output layer have a stride value of 2. The 1 st layer has a padding value of 0 and the other layers have a padding value of 1. As the image (feature map) sizes increase by two in deeper layers, the numbers of channels are decreasing by half. This is a common convention in the architecture design of neural networks. All kernel sizes of transposed convolution layers are set to 4 x 4. The output channel can be either 1 or 3, depending on whether you want to generate grayscale images or color images.

## !!!!
The transposed convolution layer can be considered as the reverse process of a normal convolution. It was once called by some a deconvolution layer, which is misleading because the transposed convolution is not the inverse of convolution. Most convolution layers are not invertible, because they are ill-conditioned (have extremely large condition numbers) from the linear algebra perspective, which makes their pseudoinverse matrices unfit for representing the inverse process.

# The architecture of a discriminator

The discriminator network of a DCGAN consists of 4 hidden layers (again, we treat the input layer as the 1 st hidden layer) and 1 output layer. Convolution layers are used in all layers, which are followed by batch normalization layers except that the first layer does not have batch normalization. LeakyReLU activation functions are used in the hidden layers and Sigmoid is used for the output layer. The architecture of the discriminator is shown in the following:

![](../figs/DCGAN_disc.png)

The input channel can be either 1 or 3, depending on whether you are dealing with grayscale images or color images. All hidden layers have a stride value of 2 and a padding value of 1 so that their output image sizes will be half the input images. As image sizes increase in deeper layers, the numbers of channels are increasing by twice. All kernels in convolution layers are of a size of 4 x 4. The output layer has a stride value of 1 and a padding value of 0. It maps 4 x 4 feature maps to single values so that the Sigmoid function can transform the value into prediction confidence.

# Creating a DCGAN with PyTorch

In [None]:
import os
import sys

import numpy as np
import torch
import torch.nn as nn
import torch.nn.parallel
import torch.backends.cudnn as cudnn
import torch.optim as optim
import torch.utils.data
import torchvision.datasets as dset
import torchvision.transforms as transforms
import torchvision.utils as vutils

from pathlib import Path

In [None]:
CUDA = True
DATA_PATH = 'datasets/pure/'
OUT_PATH = 'ouputs_2/'
log_file = os.path.join(OUT_PATH, 'log.txt')
BATCH_SIZE = 128
IMAGE_CHANNEL = 3
Z_DIM = 100
G_HIDDEN = 64
X_DIM = 64
D_HIDDEN = 64
EPOCH_NUM = 100
REAL_LABEL = 1
FAKE_LABEL = 0
lr = 2e-4
seed = 42

If you don't have a CUDA-enabled graphics card and want to train the networks on the CPU, you can change CUDA to False . DATA_PATH points to the root directory of our image dataset. BATCH_SIZE has a major impact on how much GPU memory your code will consume. If you are not sure what batch size is appropriate for your system, you can start at a small value, train your model for 1 epoch, and double the batch size until errors pop up.

In [None]:
print(f"Logging to {log_file}")
CUDA = CUDA and torch.cuda.is_available()
print(f"PyTorch version: {torch.__version__}")
if CUDA:
    print(f"CUDA version: {torch.version.cuda}\n")
if seed is None:
    seed = np.random.randint(1,10000)
print(f"random seed: {seed}")
np.random.seed(seed)
torch.manual_seed(seed)
if CUDA:
    torch.cuda.manual_seed(seed)
cudnn.benchmark = True
device = torch.device("cuda:0" if CUDA else "cpu")

# Generator network
Now, let's define the generator network with PyTorch:

Note that the output layer does not have a batch normalization layer connected to it.

Let's create a helper function to initialize the network parameters:

In [None]:
def weight_init(m):
    classname = m.__class__.__name__
    if classname.find('Conv') != -1:
        m.weight.data.normal_(0., 0.02)
    elif classname.find('BatchNorm') != -1:
        m.weight.data.normal_(1., 0.02)
        m.bias.data.fill_(0)

# Discriminator network

Now, let's define the discriminator network:

Note that the input layer does not have a batch normalization layer connected to it. This is because, when applying batch normalization to all layers, it could lead to sample oscillation and model instability, as pointed out in the original paper.

Similarly, we can create a Discriminator object as follows:

# Model training and evaluation

We will use Adam as the training method for both the generator and discriminator networks. Let's first define the loss function for the discriminator network and optimizers for both of the networks:

# The dataset

In [None]:
from torch.utils.data import Dataset, DataLoader, random_split
from PIL import Image

Our dataset consist of spectrograms from different waveforms generated from pycbc:

In [None]:
image_ex = Image.open('datasets/pure/0_17.4_29.0_1092_2.4749_2.9193_-0.7559_3.9973.png')

In [None]:
image_ex

In [None]:
image_ex.size

## let's create the dataset using the *Dataset* module

In [None]:
class DataGravGan(Dataset):
    def __init__(self, path_to_data, transform=None):
        self.path_to_data = Path(path_to_data)
        self.transform = transform
        self.list_of_images = list(self.path_to_data.glob('*.png'))
        
    def __getitem__(self, i):
        img = Image.open(self.list_of_images[i])
        if self.transform is not None:
            img = self.transform(img)
        else:
            img = transforms.ToTensor()(img)
        return img
    
    def __len__(self):
        return len(self.list_of_images)

In [None]:
tfms = transforms.Compose([
    transforms.Resize((64, 64)),
    transforms.ToTensor(),
])

In [None]:
GW_dataset = DataGravGan('datasets/pure/', transform=tfms)

In [None]:
GW_dataset[1]

In [None]:
GW_dataset[1].shape

In [None]:
dataloader = DataLoader(GW_dataset, batch_size=BATCH_SIZE, shuffle=True, num_workers=8, pin_memory=True)

# Training iteration

1. Train the discriminator with the real data and recognize it as real.
2. Train the discriminator with the fake data and recognize it as fake.
3. Train the generator with the fake data and recognize it as real.

The first two steps let the discriminator learn how to tell the difference between real data and fake data. The third step teaches the generator how to confuse the discriminator with generated samples: