# Laymanz Notebooks: Generative Adversarial Networks
Author: Ambrose Ling

**What is this notebook about?**

In this notebook, we will go over some of the most fundamental ideas behind General Adversarial Networks, how they work and why they have been a major advancement in the field of computer vision and generative artifical intelligence. We hope that you can walk away capable of building your own GAN framework along with training your own model from scratch and understanding some of the core ideas that are trending in this field of research.

**What do I need to set up my environment?**

All of our notebooks will only use numpy, pytorch, matplotlib for visualizations. We will not use any other third-party libraries for model development, optimization or anything like that.

**How is this notebook structured?**
1.
2.
3.


**Covered papers in this notebook**

(will do after finishing)

# What is a Generative Adversarial Network? (https://github.com/soumith/ganhacks?tab=readme-ov-file#authors)

A Generative Adversarial Network (GAN), is a generative model. It aims to learn the distribution of data through an adversarial process. Meaning that the model is in adversary with another (in competition with another).

### How does it work?

In the GAN framework, you have 2 components:

1. The Generator
- Its goal is to generate realistic/synthetic images similar to the ones in our dataset. This model is trying to fit our real data distribution.
We represent this generator as $G(z,\theta)$, the generator also defines a mapping from input latent noise to data space.
- It recevies noise as input and tries to output a result close to data
- **Intuition**: Think of the generator as the counterfeits, they are trying to generate fake money (as realistic as possible) to fool the police (the discriminator).

2. The Discriminator
- Its goal is to determine if its input comes from the training dataset or from the generator. 
- More specifically it determines whether a sample comes from the data distribution or the generator distribution
- **Intuition**: Think of the discriminator as the police, they are trying to determine if the money they see is fake or real.

Some math notation:
- $p_{data}(x)$: data distribution
- $p_{g}(x)$: generator distribution
- $D$: discriminator
- $G$: generator
- $z$: latent noise variable


### How do we train a GAN ?

The training objective:
$$
min_G max_D V(D,G) = E_{x \sim p_{data}(x)}[log(D(x))] + E_{z \sim p_z(z)}[log(1 - D(G(z)))]
$$

**What is this telling us?**
- We are training the **discriminator D** to maximize the following expression (maximize the probability that D assigns the correct label to the sample)
- We are training the **generator G** to minimize the expression (minimize the probability that D assigns the correct label to the sample, the generator wants to fool the discriminiator D)


**NOTE**:
- Conv tranpoes: $o = (i-1) \times s + k - 2p$

**Some stuff about WGANs**:
https://arxiv.org/pdf/1701.07875

### How do we measure the similarity / difference between 2 probability distributions?

### KL divergence
KL divergence measures how one probability distribution $p$ diverges from a second expected probability distribution.

$$
D_{KL} = \int_x p(x) log(\frac{p(x)}{q(x)}) dx
$$

**Some nice properties of the KL divergence**:
- KL divergence abhors regions where $q(x)$ has non-null mass and $p(x)$ has null mass. This is useful when you are trying to approximate a complex (intractable) distribution $q(x)$ with a tractable distribution $p(x)$.
- KL divergence is always non-negative $D_{KL}(P||Q) = 0$ iff $p(x) == q(x)$

**Challenges with using the KL divergence**:
- Dependence on support: (support is the set of points where probabilty is nonzero or $P > 0$, or the subset of the domain where elements are **not** mapped to zero.). In order to have a defined KL divergence, $support(P) \subset support(Q)$ and it means that $D_{KL}(P||Q)$ is finite.
- Asymetry: $D(P||Q) \neq D(Q||P)$, these are different operations. If $q(x) >>> 0$ and $p(x) ~ 0$, $q$ has a very small effect on the divergence., wont be a good measure when you have 2 equally important distributions

**NOTE:**
KL Divergence is not a metric proper.
https://stats.stackexchange.com/questions/111445/analysis-of-kullback-leibler-divergence 

### Jensen Shannon Divergence
Jensen Shannon Divergence als measures how one probability distribution $p$ diverges from a second expected probability distribution.

In [23]:
# KL divergence
import torch
import torch.nn.functional as F
import random
import numpy as np
from PIL import Image
p = torch.randn(1,100)
q = torch.randn(1,100)

# Softmax
# exp_sum = p.exp().sum()
# p = p.exp() / exp_sum

# We get a probabiltiy distribution
p = F.log_softmax(p)
q = F.log_softmax(q)

kl_div = p.exp() * (p - q)

#Reduce it to a loss value for backprop
loss = kl_div.sum() / p.shape[0]

print(f"KL Div Loss: {loss}")


KL Div Loss: 0.9990202188491821


  p = F.log_softmax(p)
  q = F.log_softmax(q)


### Creating our dataset

In [9]:
import torch
from torch.utils.data import Dataset, DataLoader
import torchvision
import torchvision.transforms as v2
from torchvision.io import read_image
root = "/Users/ambroseling/Desktop/carly-dataset"
def Preprocess(tensor: torch.Tensor):
    tensor = v2.Resize(64)(tensor)
    tensor = tensor.float()/127.5 - 1.0
    tensor = tensor[:3]
    return tensor
transforms = v2.Compose([Preprocess])
carly_dataset = torchvision.datasets.DatasetFolder(root,loader = read_image,transform= transforms,extensions=['png'])
dataloader = DataLoader(carly_dataset,batch_size=2)
print(carly_dataset[10][0].shape)
print(carly_dataset[0][0])

  from .autonotebook import tqdm as notebook_tqdm


torch.Size([3, 64, 64])
tensor([[[-0.6784, -0.7098, -0.6000,  ..., -0.4275, -0.4745, -0.3412],
         [-0.6706, -0.5686, -0.2784,  ..., -0.7412, -0.3255, -0.6235],
         [-0.6863, -0.2314, -0.5922,  ..., -0.2314, -0.4196, -0.4824],
         ...,
         [ 0.0196, -0.3961,  0.6157,  ..., -0.4118, -0.4039, -0.4118],
         [ 0.5529,  0.0510,  0.3647,  ..., -0.4118, -0.3725, -0.4039],
         [ 0.6784,  0.7725,  0.2078,  ...,  0.6392,  0.0275, -0.4196]],

        [[-0.6471, -0.6941, -0.5529,  ..., -0.3412, -0.3882, -0.2627],
         [-0.6392, -0.5294, -0.2471,  ..., -0.6706, -0.2471, -0.5529],
         [-0.6549, -0.2235, -0.5451,  ..., -0.1373, -0.3333, -0.4118],
         ...,
         [-0.1294, -0.6706,  0.1922,  ..., -0.3490, -0.3333, -0.3412],
         [-0.2392,  0.0196, -0.0510,  ..., -0.3569, -0.3020, -0.3333],
         [ 0.3255,  0.5294, -0.4275,  ...,  0.5137, -0.0431, -0.3490]],

        [[-0.7569, -0.7961, -0.6863,  ..., -0.6078, -0.6471, -0.5373],
         [-0.8039, -0



### Defining our generator


In [10]:
import torch.nn as nn

class DCGAN(nn.Module):
    '''
    Input: (100,)
    Output: (64,64) 
    '''
    def __init__(self):
        super().__init__()
        self.leaky_relu = nn.LeakyReLU()
        self.conv_up = nn.ConvTranspose2d(100,1024,4)
        self.conv1_up = nn.ConvTranspose2d(1024,512,4,stride=2,padding=1) # (4-1)*2 +4 - 2 = 8
        self.batch_norm1 = nn.BatchNorm2d(512)
        self.conv2_up = nn.ConvTranspose2d(512,256,4,stride=2,padding=1) # (8-1)*2 + 4 - 2= 16
        self.batch_norm2 = nn.BatchNorm2d(256)
        self.conv3_up = nn.ConvTranspose2d(256,128,4,stride=2,padding=1) # (16 - 1)*2 +4 -2 = 32
        self.batch_norm3 = nn.BatchNorm2d(128)
        self.conv4_up = nn.ConvTranspose2d(128,3,4,stride=2,padding=1) # (32-1) *2+4-2 = 64
        self.batch_norm4 = nn.BatchNorm2d(3)
        self.tanh = nn.Tanh()
    def forward(self,x):
        x = self.conv_up(x)
        x = self.conv1_up(x)
        x = self.leaky_relu(x)
        x = self.batch_norm1(x)
        x = self.conv2_up(x)
        x = self.leaky_relu(x)
        x = self.batch_norm2(x)
        x = self.conv3_up(x)
        x = self.leaky_relu(x)
        x = self.batch_norm3(x)
        x = self.conv4_up(x)
        x = self.leaky_relu(x)
        x = self.batch_norm4(x)
        x = self.tanh(x)
        return x

In [11]:
class GAN(nn.Module):
    def __init__(self):
        super().__init__()
        self.leaky_relu = nn.LeakyReLU()
        self.h1 = nn.Linear(100,16)
        self.batch_norm1 = nn.BatchNorm1d(16)
        self.h2 = nn.Linear(16,256)
        self.batch_norm2 = nn.BatchNorm1d(256)
        self.h3 = nn.Linear(256,512)
        self.batch_norm3 = nn.BatchNorm1d(512)
        self.h4 = nn.Linear(512,1024)
        self.batch_norm4 = nn.BatchNorm1d(1024)
        self.tanh = nn.Tanh()
    def forward(self,x):
        x = self.h1(x)
        x = self.leaky_relu(x)
        x = self.batch_norm1(x)
        x = self.h2(x)
        x = self.leaky_relu(x)
        x = self.batch_norm2(x)
        x = self.h3(x)
        x = self.leaky_relu(x)
        x = self.batch_norm3(x)
        x = self.h4(x)
        x = self.leaky_relu(x)
        x = self.batch_norm4(x)
        x = self.tanh(x)
        return x



In [12]:
generator = DCGAN()
x = torch.randn(1,100,1,1)
out =  generator(x)


In [13]:
# Defining an optimizer for the generator

optimizer_g = torch.optim.Adam(generator.parameters(),lr=0.002)

# Why do we use Adam for the generator?


### Defining our discriminator

In [14]:
class Discriminator(nn.Module):
    def __init__(self):
        super().__init__()# 64,64
        self.conv1 = nn.Conv2d(3,64,4,stride=2,padding=1) # 32
        self.batch_norm1 = nn.BatchNorm2d(64)
        self.conv2 = nn.Conv2d(64,128,4,stride=2,padding=1) # 16
        self.batch_norm2 = nn.BatchNorm2d(128)
        self.conv3 = nn.Conv2d(128,256,4,stride=2,padding=1) # 8
        self.batch_norm3 = nn.BatchNorm2d(256)
        self.conv4 = nn.Conv2d(256,512,4,stride=2,padding=1) # 4
        self.batch_norm4 = nn.BatchNorm2d(512)
        self.conv5 = nn.Conv2d(512,1,4,stride=1) # floor((64 - 5) /1) +1 = 60
        self.leaky_relu = nn.LeakyReLU(0.02)
        self.sigmoid = nn.Sigmoid()
    def forward(self,x):
        x = self.conv1(x)
        x = self.batch_norm1(x)
        x = self.leaky_relu(x)
        x = self.conv2(x)
        x = self.batch_norm2(x)
        x = self.leaky_relu(x)
        x = self.conv3(x)
        x = self.batch_norm3(x)
        x = self.leaky_relu(x)
        x = self.conv4(x)
        x = self.batch_norm4(x)
        x = self.leaky_relu(x)
        x = self.conv5(x)
        x = self.sigmoid(x)
        return x


In [15]:
discriminator = Discriminator()

x = torch.randn(1,3,64,64)
out = discriminator(x)
print(out.squeeze())

tensor(0.6898, grad_fn=<SqueezeBackward0>)


In [16]:

optimizer_d = torch.optim.SGD(discriminator.parameters(),lr=0.002)

# Why do we use SGD for the discriminator?


In [17]:
loss_fn = nn.BCELoss()

### Defining our training process

In [18]:
#Hyperparameters
epochs = 100
steps = 1
d_loss = []
g_loss = []




In [19]:
# Use tensorboard to log the loss and val images
from torch.utils.tensorboard import SummaryWriter
writer = SummaryWriter()

In [38]:
def train():
    training_step = 0
    for epoch in range(epochs):
        for batch in dataloader:
            for step in range(steps):
                x = batch[0]
                prob = discriminator(x).squeeze()
                truth = (torch.randint(900,1000,(x.shape[0],))/1000).float()
                loss_real = loss_fn(prob,truth)
                loss_real.backward()

                z = torch.randn(x.shape[0],100,1,1)
                x_gen = generator(z)
                prob = discriminator(x_gen).squeeze()
                truth = (torch.randint(0,100,(x.shape[0],))/1000).float()
                loss_fake = loss_fn(prob,truth)
                loss_fake.backward()

                loss = loss_real + loss_fake
                d_loss.append(loss)
                writer.add_scalar("Discriminator Loss/train", loss, training_step)
                optimizer_d.step()
                optimizer_d.zero_grad()

                if training_step % 5 ==0:
                    with torch.no_grad():
                        z = torch.randn(1,100,1,1)
                        x_gen = generator(z)
                        x_rgb = (x_gen*(255/2) +(255/2)).permute(0,2,3,1)[0].round().numpy().astype(np.uint8)
                        writer.add_image("Generator result",x_rgb,global_step = training_step,dataformats="HWC")
                        # x_rgb.save("val.png")

            z = torch.randn(x.shape[0],100,1,1)
            x_gen = generator(z)
            prob = discriminator(x_gen).squeeze()
            truth = (torch.randint(0,100,(x.shape[0],))/1000).float()
            loss = loss_fn(prob,truth)
            writer.add_scalar("Generator Loss/train", loss, training_step)
            g_loss.append(loss)
            loss.backward()
            optimizer_g.step()
            optimizer_g.zero_grad()

            print(f"Epoch {epoch} - D Loss: {d_loss[-1]} G Loss: {g_loss[-1]}")
            training_step +=1


    

In [39]:
train()
writer.flush()
writer.close()

# To see tensorboard results runt he following:
# tensorboard --logdir=runs

Epoch 0 - D Loss: 0.3971705734729767 G Loss: 0.26423323154449463
Epoch 0 - D Loss: 0.43171441555023193 G Loss: 0.30197861790657043
Epoch 0 - D Loss: 0.2593371570110321 G Loss: 0.11955482512712479
Epoch 0 - D Loss: 0.4137299656867981 G Loss: 0.168511301279068
Epoch 0 - D Loss: 0.47505325078964233 G Loss: 0.3165205419063568
Epoch 0 - D Loss: 0.2644991874694824 G Loss: 0.31334739923477173
Epoch 0 - D Loss: 0.3902342915534973 G Loss: 0.22432981431484222
Epoch 0 - D Loss: 0.32798030972480774 G Loss: 0.17144706845283508
Epoch 0 - D Loss: 0.5607513189315796 G Loss: 0.09795008599758148
Epoch 0 - D Loss: 0.4063386619091034 G Loss: 0.1572287380695343
Epoch 0 - D Loss: 0.3646537661552429 G Loss: 0.22211939096450806
Epoch 0 - D Loss: 0.3750917911529541 G Loss: 0.29652512073516846
Epoch 0 - D Loss: 0.30438241362571716 G Loss: 0.21281683444976807
Epoch 0 - D Loss: 0.4127422273159027 G Loss: 0.2008722573518753
Epoch 0 - D Loss: 0.38680100440979004 G Loss: 0.11734530329704285
Epoch 0 - D Loss: 0.43143

KeyboardInterrupt: 