# Laymanz Notebooks: Generative Adversarial Networks
Author: Ambrose Ling

**What is this notebook about?**

In this notebook, we will go over some of the most fundamental ideas behind General Adversarial Networks, how they work and why they have been a major advancement in the field of computer vision and generative artifical intelligence. We hope that you can walk away capable of building your own GAN framework along with training your own model from scratch and understanding some of the core ideas that are trending in this field of research.

**What do I need to set up my environment?**

All of our notebooks will only use numpy, pytorch, matplotlib for visualizations. We will not use any other third-party libraries for model development, optimization or anything like that.

**How is this notebook structured?**
1.
2.
3.


**Covered papers in this notebook**

(will do after finishing)

# What is a Generative Adversarial Network?

A Generative Adversarial Network (GAN), is a generative model. It aims to learn the distribution of data through an adversarial process. Meaning that the model is in adversary with another (in competition with another).

### How does it work?

In the GAN framework, you have 2 components:

1. The Generator
- Its goal is to generate realistic/synthetic images similar to the ones in our dataset. This model is trying to fit our real data distribution.
We represent this generator as $G(z,\theta)$, the generator also defines a mapping from input latent noise to data space.
- It recevies noise as input and tries to output a result close to data
- **Intuition**: Think of the generator as the counterfeits, they are trying to generate fake money (as realistic as possible) to fool the police (the discriminator).

2. The Discriminator
- Its goal is to determine if its input comes from the training dataset or from the generator. 
- More specifically it determines whether a sample comes from the data distribution or the generator distribution
- **Intuition**: Think of the discriminator as the police, they are trying to determine if the money they see is fake or real.

Some math notation:
- $p_{data}(x)$: data distribution
- $p_{g}(x)$: generator distribution
- $D$: discriminator
- $G$: generator
- $z$: latent noise variable


### How do we train a GAN ?

The training objective:
$$
min_G max_D V(D,G) = E_{x \sim p_{data}(x)}[log(D(x))] + E_{z \sim p_z(z)}[log(1 - D(G(z)))]
$$

**What is this telling us?**
- We are training the **discriminator D** to maximize the following expression (maximize the probability that D assigns the correct label to the sample)
- We are training the **generator G** to minimize the expression (minimize the probability that D assigns the correct label to the sample, the generator wants to fool the discriminiator D)


### How do we measure the similarity / difference between 2 probability distributions?

### KL divergence
KL divergence measures how one probability distribution $p$ diverges from a second expected probability distribution.

$$
D_{KL} = \int_x p(x) log(\frac{p(x)}{q(x)}) dx
$$

**Some nice properties of the KL divergence**:
- KL divergence abhors regions where $q(x)$ has non-null mass and $p(x)$ has null mass. This is useful when you are trying to approximate a complex (intractable) distribution $q(x)$ with a tractable distribution $p(x)$.
- KL divergence is always non-negative $D_{KL}(P||Q) = 0$ iff $p(x) == q(x)$

**Challenges with using the KL divergence**:
- Dependence on support: (support is the set of points where probabilty is nonzero or $P > 0$, or the subset of the domain where elements are **not** mapped to zero.). In order to have a defined KL divergence, $support(P) \subset support(Q)$ and it means that $D_{KL}(P||Q)$ is finite.
- Asymetry: $D(P||Q) \neq D(Q||P)$, these are different operations. If $q(x) >>> 0$ and $p(x) ~ 0$, $q$ has a very small effect on the divergence., wont be a good measure when you have 2 equally important distributions

**NOTE:**
KL Divergence is not a metric proper.
https://stats.stackexchange.com/questions/111445/analysis-of-kullback-leibler-divergence 

### Jensen Shannon Divergence
Jensen Shannon Divergence als measures how one probability distribution $p$ diverges from a second expected probability distribution.

In [5]:
# KL divergence
import torch
import torch.nn.functional as F

p = torch.randn(1,100)
q = torch.randn(1,100)

# Softmax
# exp_sum = p.exp().sum()
# p = p.exp() / exp_sum

# We get a probabiltiy distribution
p = F.log_softmax(p)
q = F.log_softmax(q)

kl_div = p.exp() * (p - q)

#Reduce it to a loss value for backprop
loss = kl_div.sum() / p.shape[0]

print(f"KL Div Loss: {loss}")


KL Div Loss: 0.9689055681228638


  p = F.log_softmax(p)
  q = F.log_softmax(q)


### Creating our dataset

In [29]:
import torch
from torch.utils.data import Dataset, DataLoader
import torchvision
import torchvision.transforms as v2
from torchvision.io import read_image
root = "/Users/ambroseling/Desktop/carly-dataset"
def Preprocess(tensor: torch.Tensor):
    tensor = tensor.float()/127.5 - 1.0
    tensor = tensor[:3]
    return tensor
transforms = v2.Compose([Preprocess])
carly_dataset = torchvision.datasets.DatasetFolder(root,loader = read_image,transform= transforms,extensions=['png'])
print(carly_dataset[10][0].shape)
print(carly_dataset[0][0])

torch.Size([3, 256, 256])
tensor([[[-0.7412, -0.7882, -0.8039,  ..., -0.2157, -0.2471, -0.3255],
         [-0.5922, -0.5216, -0.7176,  ..., -0.3333, -0.3804, -0.3882],
         [-0.6784, -0.6157, -0.8588,  ..., -0.3412, -0.3020, -0.2471],
         ...,
         [ 0.2706,  0.7725,  0.8588,  ..., -0.4353, -0.4039, -0.4039],
         [ 0.2314,  0.7490,  0.3490,  ..., -0.4275, -0.4039, -0.4039],
         [ 0.7647,  0.8824,  0.2392,  ..., -0.3647, -0.4118, -0.3961]],

        [[-0.7098, -0.7569, -0.7725,  ..., -0.1373, -0.1686, -0.2471],
         [-0.5608, -0.4902, -0.6863,  ..., -0.2549, -0.3020, -0.3098],
         [-0.6471, -0.5843, -0.8275,  ..., -0.2627, -0.2235, -0.1686],
         ...,
         [-0.3882,  0.1686,  0.6000,  ..., -0.3647, -0.3333, -0.3333],
         [-0.2392,  0.3020,  0.2392,  ..., -0.3647, -0.3333, -0.3333],
         [ 0.2078,  0.5373,  0.2000,  ..., -0.3176, -0.3569, -0.3412]],

        [[-0.8431, -0.8980, -0.9137,  ..., -0.4039, -0.4431, -0.5216],
         [-0.6863, 

### Defining our generator


In [None]:
import torch.nn as nn

class Generator(nn.Module):
    '''
    Input: 256 x 256
    '''
    def __init__(self):
        super().__inti__()
        self.conv1_down = nn.Conv2d(3,12,3,stride=1)
        self.conv2_down = nn.Conv2d(12,60,3,stride=1)
        self.conv3_down = nn.Conv2d(60,120,5,stride= 2)
        self.avg_pool_1 = nn.AvgPool2d(3,1)
        self.avg_pool_2 = nn.AvgPool2d(3,1)
        self.leaky_relu = nn.LeakyReLU()
        self.conv1_up = nn.ConvTranspose2d()
        self.conv2_up = nn.ConvTranspose2d()
        self.conv3_up = nn.ConvTranspose2d()
        self.tanh = nn.Tanh()
    def forward(self,x):
        pass

In [None]:
# Defining an optimizer for the generator

optimizer_g = torch.optim.Adam()

### Defining our discriminator

In [None]:
class Discriminator(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d()
        self.conv2 = nn.Conv2d()
        self.conv3 = nn.Conv2d()
        self.max_pool  = nn.MaxPool2d()
        self.linear1 = nn.Linear()
        self.linear2 = nn.Linear()
        self.linear3 = nn.Linear()
        self.sigmoid = nn.Sigmoid

In [None]:
optimizer_d = torch.optim.SGD()

### Defining our training process

In [None]:
def train():
    