# Deep Convolutional GANs

In this notebook, you'll build a GAN using convolutional layers in the generator and discriminator. This is called a Deep Convolutional GAN, or DCGAN for short. The DCGAN architecture was first explored in 2016 and has seen impressive results in generating new images; you can read the [original paper, here](https://arxiv.org/pdf/1511.06434.pdf).

You'll be training DCGAN on the [CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) dataset. These are color images of different classes, such as airplanes, dogs or trucks. This dataset is much more complex and diverse than the MNIST dataset and justifies the use of the DCGAN architecture.

<img src='assets/cifar10_data.png' width=80% />


So, our goal is to create a DCGAN that can generate new, realistic-looking images. We'll go through the following steps to do this:
* Load in and pre-process the CIFAR10 dataset
* **Define discriminator and generator networks**
* Train these adversarial networks
* Visualize the loss over time and some sample, generated images

In this notebook, we will focus on defining the networks.

#### Deeper Convolutional Networks

Since this dataset is more complex than our MNIST data, we'll need a deeper network to accurately identify patterns in these images and be able to generate new ones. Specifically, we'll use a series of convolutional or transpose convolutional layers in the discriminator and generator. It's also necessary to use batch normalization to get these convolutional networks to train. 

Besides these changes in network structure, training the discriminator and generator networks should be the same as before. That is, the discriminator will alternate training on real and fake (generated) images, and the generator will aim to trick the discriminator into thinking that its generated images are real!

## Discriminator

Here you'll build the discriminator. This is a convolutional classifier like you've built before, only without any maxpooling layers. 
* The inputs to the discriminator are 32x32x3 tensor images
* You'll want a few convolutional, hidden layers
* Then a fully connected layer for the output; as before, we want a sigmoid output, but we'll add that in the loss function, [BCEWithLogitsLoss](https://pytorch.org/docs/stable/nn.html#bcewithlogitsloss), later

<img src='assets/conv_discriminator.png' width=80%/>

For the depths of the convolutional layers I suggest starting with 32 filters in the first layer, then double that depth as you add layers (to 64, 128, etc.). **Note that in the DCGAN paper, they did all the downsampling using only strided convolutional layers with no maxpooling layers.**

You'll also want to use batch normalization with [nn.BatchNorm2d](https://pytorch.org/docs/stable/nn.html#batchnorm2d) on each layer **except** the first convolutional layer and final, linear output layer. 

#### Helper `ConvBlock` module 

In general, each layer should look something like convolution > batch norm > leaky ReLU, and so we'll define a **custom torch Module** to put these layers together. This module will create a sequential series of a convolutional + an optional batch norm layer. 

Note: It is also suggested that you use a **kernel_size of 4** and a **stride of 2** for strided convolutions.

### First exercise

Implement the `ConvBlock` module below and use it for your implementation of the `Discriminator` module. Your discriminator should take a 32x32x3 image as input and output a single logit.

In [1]:
import torch
import torch.nn as nn
import torch.fx

import tests
import fuser
%load_ext autoreload
%autoreload 2

In [2]:
class Discriminator(nn.Module):
    """
    The discriminator model adapted from the DCGAN paper. It should only contains a few layers.
    args:
    - conv_dim: control the number of filters
    """
    def __init__(self, conv_dim: int):
        super().__init__()
        self.conv_dim = conv_dim
        ####
        # IMPLEMENT HERE
        ####
        self.conv1 = self._block(in_channels=3, out_channels=conv_dim, kernel_size=4, bias=True, batch_norm = False) # 32x32 -> 16x16
        self.conv2 = self._block(in_channels=conv_dim, out_channels=conv_dim*2, kernel_size=4, bias=False) # 16x16 -> 8x8
        self.conv3 = self._block(in_channels=conv_dim*2, out_channels=conv_dim*4, kernel_size=4, bias=False) # 8x8 -> 4x4
        self.conv4 = self._block(in_channels=conv_dim*4, out_channels=1, kernel_size=4, padding=0, bias=False) # 4x4 -> 1x1
        
        # get rid of any fully connected layer with respect to the DCGAN paper
        self.flatten = nn.Flatten()
        # self.fc1 = nn.Linear((4*4)*(conv_dim*4), 1)
    
    def _block(self, 
                 in_channels: int, 
                 out_channels: int, 
                 kernel_size: int, 
                 stride: int = 2, 
                 padding: int = 1,
                 bias: bool = False,
                 batch_norm: bool = True):
        """
        A convolutional block is made of 3 layers: Conv -> BatchNorm -> Activation.
        args:
        - in_channels: number of channels in the input to the conv layer
        - out_channels: number of filters in the conv layer
        - kernel_size: filter dimension of the conv layer
        - batch_norm: whether to use batch norm or not
        """
        if batch_norm:
            return nn.Sequential(
                nn.Conv2d(in_channels, out_channels, kernel_size, stride=stride, padding=padding, bias=bias),
                nn.BatchNorm2d(out_channels),
                nn.LeakyReLU(0.2)
            )
        else:
            return nn.Sequential(
                # in the DCGAN paper, they say not to use batchnorm on the first layer of descriminator and last layer of generator
                # however, in dicriminator, the bias will be canceled out in the following layers I think! so in the end, we don't have any bias!?
                nn.Conv2d(in_channels, out_channels, kernel_size, stride=stride, padding=padding, bias=bias),
                nn.LeakyReLU(0.2)
            )
    def forward(self, x):
        ####
        # IMPLEMENT HERE
        ####      
        x = self.conv1(x)
        # print(f"shape at conv1: {x.shape}")
        x = self.conv2(x)
        # print(f"shape at conv2: {x.shape}")
        x = self.conv3(x)
        # print(f"shape at conv3: {x.shape}")
        x = self.conv4(x)
        # print(f"shape at conv4: {x.shape}")
        x = self.flatten(x)
        return x

In [6]:
discriminator = Discriminator(128)
print(discriminator)


Discriminator(
  (conv1): Sequential(
    (0): Conv2d(3, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.2)
  )
  (conv2): Sequential(
    (0): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): LeakyReLU(negative_slope=0.2)
  )
  (conv3): Sequential(
    (0): Conv2d(256, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): LeakyReLU(negative_slope=0.2)
  )
  (conv4): Sequential(
    (0): Conv2d(512, 1, kernel_size=(4, 4), stride=(2, 2), bias=False)
    (1): BatchNorm2d(1, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): LeakyReLU(negative_slope=0.2)
  )
  (flatten): Flatten(start_dim=1, end_dim=-1)
)


In [7]:
from torch.ao.quantization.quantize_fx import fuse_fx
discriminator.eval()
fuse_fx(discriminator)

GraphModule(
  (conv1): Module(
    (0): Conv2d(3, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (1): LeakyReLU(negative_slope=0.2)
  )
  (conv2): Module(
    (0): Conv2d(128, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (2): LeakyReLU(negative_slope=0.2)
  )
  (conv3): Module(
    (0): Conv2d(256, 512, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1))
    (2): LeakyReLU(negative_slope=0.2)
  )
  (conv4): Module(
    (0): Conv2d(512, 1, kernel_size=(4, 4), stride=(2, 2))
    (2): LeakyReLU(negative_slope=0.2)
  )
  (flatten): Flatten(start_dim=1, end_dim=-1)
)

In [87]:
image_res = 32
# input_dim: int, padding: int, kernel: int, stride: int, layers: int
tests.image_size_conv_output(image_res, [1,1,1,0], 4, 2, 4)

layer_1: 16
layer_2: 8
layer_3: 4
layer_4: 1


In [88]:
tests.check_discriminator(discriminator, image_res=32)

shape at conv1: torch.Size([16, 128, 16, 16])
shape at conv2: torch.Size([16, 256, 8, 8])
shape at conv3: torch.Size([16, 512, 4, 4])
shape at conv4: torch.Size([16, 1, 1, 1])
Congrats, you successfully implemented your discriminator


## Generator

Next, you'll build the generator network. The input will be our noise vector `z`, as before. And, the output will be a $tanh$ output, but this time with size 32x32 which is the size of our CIFAR10 images.

<img src='assets/conv_generator.png' width=80% />

What's new here is we'll use transpose convolutional layers to create our new images. 
* The first layer is a fully connected layer which is reshaped into a deep and narrow layer, something like 4x4x512. 
* Then, we use batch normalization and a leaky ReLU activation. 
* Next is a series of [transpose convolutional layers](https://pytorch.org/docs/stable/nn.html#convtranspose2d), where you typically halve the depth and double the width and height of the previous layer. 
* And, we'll apply batch normalization and ReLU to all but the last of these hidden layers. Where we will just apply a `tanh` activation.

#### Helper `DeconvBlock` module

For each of these layers, the general scheme is transpose convolution > batch norm > ReLU, and so we'll define a function to put these layers together. This function will create a sequential series of a transpose convolutional + an optional batch norm layer. We'll create these using PyTorch's Sequential container, which takes in a list of layers and creates layers according to the order that they are passed in to the Sequential constructor.

Note: It is also suggested that you use a **kernel_size of 4** and a **stride of 2** for transpose convolutions.

#### Second exercise

Implement the `DeconvBlock` module below and use it for your implementation of the `Generator` module. Your generator should take a latent vector of dimension 128 as input and output a 32x32x3 image.

In [196]:
class Generator(nn.Module):
    """
    The generator model adapted from DCGAN
    args:
    - latent_dim: dimension of the latent vector 100x1x1
    - conv_dim: control the number of filters in the convtranspose layers
    """
    def __init__(self, latent_dim: int, conv_dim: int = 32):
        super().__init__()
        ####
        # IMPLEMENT HERE
        ####
        self.conv_dim = conv_dim
        # self.fc1 = nn.Linear(latent_dim, conv_dim*4*4, bias=False)
        # reshape to (batch_size, conv_dim, 4, 4)
        self.tconv1 = self._block(in_channels=latent_dim, out_channels=conv_dim*16, kernel_size=4, stride=2, padding=0, bias=False)
        self.tconv2 = self._block(in_channels=conv_dim*16, out_channels=conv_dim*8, kernel_size=4, stride=2, padding=1, bias=False)
        self.tconv3 = self._block(in_channels=conv_dim*8, out_channels=conv_dim*4, kernel_size=4, stride=2, padding=1, bias=False)
        self.tconv4 = self._block(in_channels=conv_dim*4, out_channels=3, kernel_size=4, stride=2, padding=1, bias=True, batch_norm=False)
    
    
    
    def _block(self, 
                 in_channels: int, 
                 out_channels: int, 
                 kernel_size: int, 
                 stride: int,
                 padding: int,
                 bias: bool = False,
                 batch_norm: bool = True):
        """
        A "de-convolutional" block is made of 3 layers: ConvTranspose -> BatchNorm -> Activation.
        args:
        - in_channels: number of channels in the input to the conv layer
        - out_channels: number of filters in the conv layer
        - kernel_size: filter dimension of the conv layer
        - stride: stride of the conv layer
        - padding: padding of the conv layer
        - batch_norm: whether to use batch norm or not
        """
        if batch_norm:
            return nn.Sequential(
                nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=stride, padding=padding, bias=bias),
                nn.BatchNorm2d(out_channels),
                nn.ReLU(),
            )
        else:
            return nn.Sequential(
                nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=stride, padding=padding, bias=bias),
                nn.Tanh(),
            )
            
    def forward(self, x):
        ####
        # IMPLEMENT HERE
        ####
        # x = self.fc1(x)
        # x = x.view(-1, self.conv_dim, 4, 4)
        print(f"input {x.shape}")
        x = self.tconv1(x)
        print(f"shape at conv1: {x.shape}")
        x = self.tconv2(x)
        print(f"shape at conv2: {x.shape}")
        x = self.tconv3(x)
        print(f"shape at conv3: {x.shape}")
        x = self.tconv4(x)
        print(f"shape at conv4: {x.shape}")
        
        return x

In [197]:
generator = Generator(128)
print(generator)

Generator(
  (tconv1): Sequential(
    (0): ConvTranspose2d(128, 512, kernel_size=(4, 4), stride=(2, 2), bias=False)
    (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (tconv2): Sequential(
    (0): ConvTranspose2d(512, 256, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (tconv3): Sequential(
    (0): ConvTranspose2d(256, 128, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (tconv4): Sequential(
    (0): ConvTranspose2d(128, 3, kernel_size=(4, 4), stride=(2, 2), padding=(1, 1), bias=False)
    (1): Tanh()
  )
)


In [198]:
image_res = 1
# input_dim: int, padding: int, kernel: int, stride: int, layers: int
tests.image_size_trans_conv_output(image_res, padding=[1,1,1,1], kernel = 2, stride=2, layers = 4)

layer_1: 0
layer_2: -2
layer_3: -6
layer_4: -14


In [202]:

# Padding calculator
output_dim = 4  # Adjust the size as needed
input_dim = 1
stride=2
kernel_size=4

# Calculate the required padding to match the output size
padding = ((output_size - 1) // 2)
padding = ((input_dim - 1) * stride - output_dim + kernel_size) // 2
print(padding)

0


In [203]:
import torch
import torch.nn as nn

torch.manual_seed(1)

input = torch.ones(1, 1, 1, 1)
upsample = nn.ConvTranspose2d(1, 1, kernel_size=4, stride=2, padding=0)
output = upsample(input)
print(f"Output size: {output.size()}")


Output size: torch.Size([1, 1, 4, 4])


In [201]:
tests.check_generator(model=generator, latent_dim=128)

input torch.Size([16, 128, 1, 1])
shape at conv1: torch.Size([16, 512, 4, 4])
shape at conv2: torch.Size([16, 256, 8, 8])
shape at conv3: torch.Size([16, 128, 16, 16])
shape at conv4: torch.Size([16, 3, 32, 32])
Congrats, you successfully implemented your discriminator
