# Lab 08: Cycle GANs and Pix2Pix

This lab will introduce CycleGANs and Pix2Pix. We will see how to use the code from the GitHub repositories
and also take a close look at the code. It is very useful when you are using these algorithms for your project.

## CycleGANs

THe CycleGAN is a very popular GAN architecture. It is used to learn transformations between images of different styles.

Example Cycle GANs:

 - Mapping between artistic and realistic images
 - Transformation between images of horses and zebras
 - Transformation between winter images and summer images
 - FaceApp or DeepFake
 - Super-resolution reconstruction of images

Assume that ${\cal X}$ is the set of images of horses and ${\cal Y}$ is the set of images of zebras.

The goal of CycleGAN is to learn a mapping function $G: \mathcal{X} \rightarrow \mathcal{Y}$,
such that an image generated by $G(X)$ for some $X\in\mathcal{X}$ are indistinguishable from
samples $Y$ from a training set over $\mathcal{Y}$.
This objective is achieved using an adversarial loss function. We not only learn $G(\cdot)$, but we also learn an inverse mapping function $F: \mathcal{Y} \rightarrow \mathcal{X}$
with the help of a cycle-consistency loss to encourage $F(G(X)) \approx X$.

While training, two kinds of training observations may be given as input:

 - *Paired* images $\{(X^{(i)}, Y^{(i)})\}_{i\in 1..N}$.
 - *Unpaired* image sets $\{X^{(i)}\}_{i\in 1..N_{X}}$ and $\{Y^{(i)}\}_{i\in 1..N_{Y}}$ without any special relationship between $X^{(i)}$ and $Y^{(i)}$.

The adversarial formulation of the Cycle GAN includes a discriminator $D_X$ that attempts to classify observations
$G(X^{(i)})$ and $Y^{(i)}$ as fake or real.
Similarly, we also have a discriminator $D_Y$ that attempts to distinguish $F(Y^{(i)})$ from $X^{(i)}$.

<img src="img/CycleGANmodel.jpg" title="CycleGAN" style="width: 640px;" />

The generator may look like this:

<img src="img/CycleGANGenerator.jpg" title="CycleGAN Generator" style="width: 640px;" />

And the discriminator amy look like this:

<img src="img/CycleGANdiscriminator.jpg" title="CycleGAN Discriminator" style="width: 640px;" />

Besides the adversarial Loss, the Cycle GAN uses two cycle-consistency losses; this enables training without paired images.
We minimize reconstruction losses $\| F(G(X)) - X \|$ and $\|G(F(Y)) - Y\|$.
In summary, the Cycle GAN comprises the three loss functions:

<img src="img/CycleGAN-formulation.png" title="CycleGAN formulation" style="width: 560px;" />

The optimization is similar to that of the ordinary GAN, except we have two generators and two discriminators:

<img src="img/Optimized-loss-function-CycleGan.png" title="CycleGAN optimization" style="width: 320px;" />

## Results

Here are some results from a Cycle GAN trained on horses and zebras:

<img src="img/CycleGANResultsA2B.jpg" title="CycleGAN Results" style="width: 640px;" />

<img src="img/CycleGANdistortionB2A.jpg" title="CycleGAN Distort" style="width: 640px;" />

You can take a look at [the diegoalejogm GitHub repository](https://github.com/diegoalejogm/gans/blob/master/CycleGans.ipynb) for some
examples of Cycle GANs constructed from scratch.

## Get and prepare Cycle GAN implementation

Today, we'll use the authors' implementation of Cycle GANs. Go ahead and download the Cycle GAN implementation:

In [1]:
!git clone https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix.git

fatal: destination path 'pytorch-CycleGAN-and-pix2pix' already exists and is not an empty directory.


This implementation requires dominate and visdom for visualization. They enable monitoring the result of training via a Web server.

In [2]:
!pip install dominate visdom

Looking in indexes: https://pypi.org/simple, https://pypi.ngc.nvidia.com
Collecting dominate
  Downloading dominate-2.6.0-py2.py3-none-any.whl (29 kB)
Collecting visdom
  Downloading visdom-0.1.8.9.tar.gz (676 kB)
[K     |████████████████████████████████| 676 kB 2.5 MB/s eta 0:00:01
Collecting jsonpatch
  Downloading jsonpatch-1.31-py2.py3-none-any.whl (12 kB)
Collecting torchfile
  Downloading torchfile-0.1.0.tar.gz (5.2 kB)
Collecting websocket-client
  Downloading websocket_client-0.58.0-py2.py3-none-any.whl (61 kB)
[K     |████████████████████████████████| 61 kB 14.0 MB/s eta 0:00:01
Collecting jsonpointer>=1.9
  Downloading jsonpointer-2.0-py2.py3-none-any.whl (7.6 kB)
Building wheels for collected packages: visdom, torchfile
  Building wheel for visdom (setup.py) ... [?25ldone
[?25h  Created wheel for visdom: filename=visdom-0.1.8.9-py3-none-any.whl size=655255 sha256=9f6b6b4e2e5e1524abd5d4f6095c2080a86e8ab2fbfcb0d603f899cca0fcf536
  Stored in directory: /tmp/pip-ephem-wheel-

Next, download a data set. You can try a different data set if you like.

In [None]:
!cd pytorch-CycleGAN-and-pix2pix ; ./datasets/download_cyclegan_dataset.sh horse2zebra

## Start a training run

We won't be able to finish a training run in class -- 200 epochs of the horse2zebra dataset with batch size 1 takes 10-20 hours on our GPUs. However, we can start a run and see how it goes.

We'll also see an example of how to use visdom, which is probably better than matplotlib for visualization when we are running on the server.

### In terminal 1:

    python -m visdom.server
   
### In terminal 2 (ssh with parameter -L 8097:localhost:8097 or let VSCode forward the port for you):

    python train.py --dataroot ./datasets/horse2zebra --name horse2zebra_cyclegan --model cycle_gan

## Tips

Take a look at `CycleGAN.ipynb` in the repository to understand how to train/test on your dataset.

Configuration options are listed in the classes in the `options` subdirectory.

Get your GANs off and running. After every 100 iterations, you should see updated samples for a real pair $(X, Y)$, including $G(X), F(Y), F(G(X)), G(F(Y)), F(X),$ and $G(Y)$. These
are labeled as fakeB, fakeA, recA, recB, idtB, and idtA. If we can all run concurrently you should be getting horsey zebras and somewhat striped horses by the end of lab.

## Looking into the code

First, let visit the <code>train.py</code> code.

In [None]:
import time
from options.train_options import TrainOptions
from data import create_dataset
from models import create_model
from util.visualizer import Visualizer

if __name__ == '__main__':
    opt = TrainOptions().parse()   # get training options
    dataset = create_dataset(opt)  # create a dataset given opt.dataset_mode and other options
    dataset_size = len(dataset)    # get the number of images in the dataset.
    print('The number of training images = %d' % dataset_size)

    model = create_model(opt)      # create a model given opt.model and other options
    model.setup(opt)               # regular setup: load and print networks; create schedulers
    visualizer = Visualizer(opt)   # create a visualizer that display/save images and plots
    total_iters = 0                # the total number of training iterations

    for epoch in range(opt.epoch_count, opt.n_epochs + opt.n_epochs_decay + 1):    # outer loop for different epochs; we save the model by <epoch_count>, <epoch_count>+<save_latest_freq>
        epoch_start_time = time.time()  # timer for entire epoch
        iter_data_time = time.time()    # timer for data loading per iteration
        epoch_iter = 0                  # the number of training iterations in current epoch, reset to 0 every epoch
        visualizer.reset()              # reset the visualizer: make sure it saves the results to HTML at least once every epoch
        model.update_learning_rate()    # update learning rates in the beginning of every epoch.
        for i, data in enumerate(dataset):  # inner loop within one epoch
            iter_start_time = time.time()  # timer for computation per iteration
            if total_iters % opt.print_freq == 0:
                t_data = iter_start_time - iter_data_time

            total_iters += opt.batch_size
            epoch_iter += opt.batch_size
            model.set_input(data)         # unpack data from dataset and apply preprocessing
            model.optimize_parameters()   # calculate loss functions, get gradients, update network weights

            if total_iters % opt.display_freq == 0:   # display images on visdom and save images to a HTML file
                save_result = total_iters % opt.update_html_freq == 0
                model.compute_visuals()
                visualizer.display_current_results(model.get_current_visuals(), epoch, save_result)

            if total_iters % opt.print_freq == 0:    # print training losses and save logging information to the disk
                losses = model.get_current_losses()
                t_comp = (time.time() - iter_start_time) / opt.batch_size
                visualizer.print_current_losses(epoch, epoch_iter, losses, t_comp, t_data)
                if opt.display_id > 0:
                    visualizer.plot_current_losses(epoch, float(epoch_iter) / dataset_size, losses)

            if total_iters % opt.save_latest_freq == 0:   # cache our latest model every <save_latest_freq> iterations
                print('saving the latest model (epoch %d, total_iters %d)' % (epoch, total_iters))
                save_suffix = 'iter_%d' % total_iters if opt.save_by_iter else 'latest'
                model.save_networks(save_suffix)

            iter_data_time = time.time()
        if epoch % opt.save_epoch_freq == 0:              # cache our model every <save_epoch_freq> epochs
            print('saving the model at the end of epoch %d, iters %d' % (epoch, total_iters))
            model.save_networks('latest')
            model.save_networks(epoch)

        print('End of epoch %d / %d \t Time Taken: %d sec' % (epoch, opt.n_epochs + opt.n_epochs_decay, time.time() - epoch_start_time))

The code contains 3 parts:

1. Load the dataset
2. Load the model
3. Training

### Command line arguments (`argparse`)

To select the model, dataset, and other settings, we use the <code>argparse</code> library:

In [None]:
opt = TrainOptions().parse()   # get training options

The <code>TrainOptions()</code> function is in the <code>/options/train_options.py</code> which inherited from <code>/options/base_options.py</code>.

In [None]:
from .base_options import BaseOptions


class TrainOptions(BaseOptions):
    """This class includes training options.

    It also includes shared options defined in BaseOptions.
    """

    def initialize(self, parser):
        parser = BaseOptions.initialize(self, parser)
        # visdom and HTML visualization parameters
        parser.add_argument('--display_freq', type=int, default=400, help='frequency of showing training results on screen')
        parser.add_argument('--display_ncols', type=int, default=4, help='if positive, display all images in a single visdom web panel with certain number of images per row.')
        parser.add_argument('--display_id', type=int, default=1, help='window id of the web display')
        parser.add_argument('--display_server', type=str, default="http://localhost", help='visdom server of the web display')
        parser.add_argument('--display_env', type=str, default='main', help='visdom display environment name (default is "main")')
        parser.add_argument('--display_port', type=int, default=8097, help='visdom port of the web display')
        parser.add_argument('--update_html_freq', type=int, default=1000, help='frequency of saving training results to html')
        parser.add_argument('--print_freq', type=int, default=100, help='frequency of showing training results on console')
        parser.add_argument('--no_html', action='store_true', help='do not save intermediate training results to [opt.checkpoints_dir]/[opt.name]/web/')
        # network saving and loading parameters
        parser.add_argument('--save_latest_freq', type=int, default=5000, help='frequency of saving the latest results')
        parser.add_argument('--save_epoch_freq', type=int, default=5, help='frequency of saving checkpoints at the end of epochs')
        parser.add_argument('--save_by_iter', action='store_true', help='whether saves model by iteration')
        parser.add_argument('--continue_train', action='store_true', help='continue training: load the latest model')
        parser.add_argument('--epoch_count', type=int, default=1, help='the starting epoch count, we save the model by <epoch_count>, <epoch_count>+<save_latest_freq>, ...')
        parser.add_argument('--phase', type=str, default='train', help='train, val, test, etc')
        # training parameters
        parser.add_argument('--n_epochs', type=int, default=100, help='number of epochs with the initial learning rate')
        parser.add_argument('--n_epochs_decay', type=int, default=100, help='number of epochs to linearly decay learning rate to zero')
        parser.add_argument('--beta1', type=float, default=0.5, help='momentum term of adam')
        parser.add_argument('--lr', type=float, default=0.0002, help='initial learning rate for adam')
        parser.add_argument('--gan_mode', type=str, default='lsgan', help='the type of GAN objective. [vanilla| lsgan | wgangp]. vanilla GAN loss is the cross-entropy objective used in the original GAN paper.')
        parser.add_argument('--pool_size', type=int, default=50, help='the size of image buffer that stores previously generated images')
        parser.add_argument('--lr_policy', type=str, default='linear', help='learning rate policy. [linear | step | plateau | cosine]')
        parser.add_argument('--lr_decay_iters', type=int, default=50, help='multiply by a gamma every lr_decay_iters iterations')

        self.isTrain = True
        return parser


### Dataset class and function

Next, let's see the loading dataset. The function in <code>train.py</code> is:

In [None]:
dataset = create_dataset(opt)  # create a dataset given opt.dataset_mode and other options

The <code>create_dataset()</code> links to <code>/data/__init__.py</code> which is linked to <code>CustomDatasetDataLoader</code> class.

Don't be worry about the linking and linking again, the datasetloaders have 2 types. You can take a look into the specific file.

1. Pair data (<code>aligned_dataset.py</code> or special <code>colorization.py</code>) - the data must have pair set each other. This is used for Pix2Pix. The folder of dataset must have **trainA** and **trainB** with the **same image name** for training phase.
2. Unpair data (<code>unaligned_dataset.py</code>) - the data have 2 data folders which do not have paired each other. The folder must have **trainA** and **trainB** for training phase.

In [None]:
def create_dataset(opt):
    """Create a dataset given the option.

    This function wraps the class CustomDatasetDataLoader.
        This is the main interface between this package and 'train.py'/'test.py'

    Example:
        >>> from data import create_dataset
        >>> dataset = create_dataset(opt)
    """
    data_loader = CustomDatasetDataLoader(opt)
    dataset = data_loader.load_data()
    return dataset

class CustomDatasetDataLoader():
    """Wrapper class of Dataset class that performs multi-threaded data loading"""

    def __init__(self, opt):
        """Initialize this class

        Step 1: create a dataset instance given the name [dataset_mode]
        Step 2: create a multi-threaded data loader.
        """
        self.opt = opt
        dataset_class = find_dataset_using_name(opt.dataset_mode)
        self.dataset = dataset_class(opt)
        print("dataset [%s] was created" % type(self.dataset).__name__)
        self.dataloader = torch.utils.data.DataLoader(
            self.dataset,
            batch_size=opt.batch_size,
            shuffle=not opt.serial_batches,
            num_workers=int(opt.num_threads))

    def load_data(self):
        return self

    def __len__(self):
        """Return the number of data in the dataset"""
        return min(len(self.dataset), self.opt.max_dataset_size)

    def __iter__(self):
        """Return a batch of data"""
        for i, data in enumerate(self.dataloader):
            if i * self.opt.batch_size >= self.opt.max_dataset_size:
                break
            yield data


Here are the code of unaligned and aligned dataset

In [None]:
# unaligned_dataset.py
# this is default dataset loader and use for cyclegan

import os
from data.base_dataset import BaseDataset, get_transform
from data.image_folder import make_dataset
from PIL import Image
import random


class UnalignedDataset(BaseDataset):
    """
    This dataset class can load unaligned/unpaired datasets.

    It requires two directories to host training images from domain A '/path/to/data/trainA'
    and from domain B '/path/to/data/trainB' respectively.
    You can train the model with the dataset flag '--dataroot /path/to/data'.
    Similarly, you need to prepare two directories:
    '/path/to/data/testA' and '/path/to/data/testB' during test time.
    """

    def __init__(self, opt):
        """Initialize this dataset class.

        Parameters:
            opt (Option class) -- stores all the experiment flags; needs to be a subclass of BaseOptions
        """
        BaseDataset.__init__(self, opt)
        self.dir_A = os.path.join(opt.dataroot, opt.phase + 'A')  # create a path '/path/to/data/trainA'
        self.dir_B = os.path.join(opt.dataroot, opt.phase + 'B')  # create a path '/path/to/data/trainB'

        self.A_paths = sorted(make_dataset(self.dir_A, opt.max_dataset_size))   # load images from '/path/to/data/trainA'
        self.B_paths = sorted(make_dataset(self.dir_B, opt.max_dataset_size))    # load images from '/path/to/data/trainB'
        self.A_size = len(self.A_paths)  # get the size of dataset A
        self.B_size = len(self.B_paths)  # get the size of dataset B
        btoA = self.opt.direction == 'BtoA'
        input_nc = self.opt.output_nc if btoA else self.opt.input_nc       # get the number of channels of input image
        output_nc = self.opt.input_nc if btoA else self.opt.output_nc      # get the number of channels of output image
        self.transform_A = get_transform(self.opt, grayscale=(input_nc == 1))
        self.transform_B = get_transform(self.opt, grayscale=(output_nc == 1))

    def __getitem__(self, index):
        """Return a data point and its metadata information.

        Parameters:
            index (int)      -- a random integer for data indexing

        Returns a dictionary that contains A, B, A_paths and B_paths
            A (tensor)       -- an image in the input domain
            B (tensor)       -- its corresponding image in the target domain
            A_paths (str)    -- image paths
            B_paths (str)    -- image paths
        """
        A_path = self.A_paths[index % self.A_size]  # make sure index is within then range
        if self.opt.serial_batches:   # make sure index is within then range
            index_B = index % self.B_size
        else:   # randomize the index for domain B to avoid fixed pairs.
            index_B = random.randint(0, self.B_size - 1)
        B_path = self.B_paths[index_B]
        A_img = Image.open(A_path).convert('RGB')
        B_img = Image.open(B_path).convert('RGB')
        # apply image transformation
        A = self.transform_A(A_img)
        B = self.transform_B(B_img)

        return {'A': A, 'B': B, 'A_paths': A_path, 'B_paths': B_path}

    def __len__(self):
        """Return the total number of images in the dataset.

        As we have two datasets with potentially different number of images,
        we take a maximum of
        """
        return max(self.A_size, self.B_size)


In [None]:
# aligned_dataset.py
# this dataset is used for pix2pix

import os
from data.base_dataset import BaseDataset, get_params, get_transform
from data.image_folder import make_dataset
from PIL import Image


class AlignedDataset(BaseDataset):
    """A dataset class for paired image dataset.

    It assumes that the directory '/path/to/data/train' contains image pairs in the form of {A,B}.
    During test time, you need to prepare a directory '/path/to/data/test'.
    """

    def __init__(self, opt):
        """Initialize this dataset class.

        Parameters:
            opt (Option class) -- stores all the experiment flags; needs to be a subclass of BaseOptions
        """
        BaseDataset.__init__(self, opt)
        self.dir_AB = os.path.join(opt.dataroot, opt.phase)  # get the image directory
        self.AB_paths = sorted(make_dataset(self.dir_AB, opt.max_dataset_size))  # get image paths
        assert(self.opt.load_size >= self.opt.crop_size)   # crop_size should be smaller than the size of loaded image
        self.input_nc = self.opt.output_nc if self.opt.direction == 'BtoA' else self.opt.input_nc
        self.output_nc = self.opt.input_nc if self.opt.direction == 'BtoA' else self.opt.output_nc

    def __getitem__(self, index):
        """Return a data point and its metadata information.

        Parameters:
            index - - a random integer for data indexing

        Returns a dictionary that contains A, B, A_paths and B_paths
            A (tensor) - - an image in the input domain
            B (tensor) - - its corresponding image in the target domain
            A_paths (str) - - image paths
            B_paths (str) - - image paths (same as A_paths)
        """
        # read a image given a random integer index
        AB_path = self.AB_paths[index]
        AB = Image.open(AB_path).convert('RGB')
        # split AB image into A and B
        w, h = AB.size
        w2 = int(w / 2)
        A = AB.crop((0, 0, w2, h))
        B = AB.crop((w2, 0, w, h))

        # apply the same transform to both A and B
        transform_params = get_params(self.opt, A.size)
        A_transform = get_transform(self.opt, transform_params, grayscale=(self.input_nc == 1))
        B_transform = get_transform(self.opt, transform_params, grayscale=(self.output_nc == 1))

        A = A_transform(A)
        B = B_transform(B)

        return {'A': A, 'B': B, 'A_paths': AB_path, 'B_paths': AB_path}

    def __len__(self):
        """Return the total number of images in the dataset."""
        return len(self.AB_paths)


### The model

Then, take a look in the loading model.

In [None]:
model = create_model(opt)      # create a model given opt.model and other options
model.setup(opt)               # regular setup: load and print networks; create schedulers
visualizer = Visualizer(opt)   # create a visualizer that display/save images and plots

The <code>create_model()</code> function create the model from user setting. If you select 'cycle_gan' (default), the code will go to <code>/model/cycle_gan_model.py</code>, and if you select 'pix2pix', the code will go to <code>/model/pix2pix_model.py</code>.

Before seeing the code, let's looking the training zone. You will see that every function which related to the model works inside the model class. Thus, we can see inside model class and can understand how code work.

#### Important function
The functions which use in training process are:

In [None]:
model.update_learning_rate()    # update learning rates in the beginning of every epoch.
model.set_input(data)         # unpack data from dataset and apply preprocessing
model.optimize_parameters()   # calculate loss functions, get gradients, update network weights
model.compute_visuals()
model.save_networks(save_suffix)

#### CycleGAN model

Let's see <code>cycle_gan_model.py</code>, initialize function:

In [None]:
def __init__(self, opt):
    """Initialize the CycleGAN class.

    Parameters:
        opt (Option class)-- stores all the experiment flags; needs to be a subclass of BaseOptions
    """
    BaseModel.__init__(self, opt)
    # specify the training losses you want to print out. The training/test scripts will call <BaseModel.get_current_losses>
    self.loss_names = ['D_A', 'G_A', 'cycle_A', 'idt_A', 'D_B', 'G_B', 'cycle_B', 'idt_B']
    # specify the images you want to save/display. The training/test scripts will call <BaseModel.get_current_visuals>
    visual_names_A = ['real_A', 'fake_B', 'rec_A']
    visual_names_B = ['real_B', 'fake_A', 'rec_B']
    if self.isTrain and self.opt.lambda_identity > 0.0:  # if identity loss is used, we also visualize idt_B=G_A(B) ad idt_A=G_A(B)
        visual_names_A.append('idt_B')
        visual_names_B.append('idt_A')

    self.visual_names = visual_names_A + visual_names_B  # combine visualizations for A and B
    # specify the models you want to save to the disk. The training/test scripts will call <BaseModel.save_networks> and <BaseModel.load_networks>.
    if self.isTrain:
        self.model_names = ['G_A', 'G_B', 'D_A', 'D_B']
    else:  # during test time, only load Gs
        self.model_names = ['G_A', 'G_B']

    # define networks (both Generators and discriminators)
    # The naming is different from those used in the paper.
    # Code (vs. paper): G_A (G), G_B (F), D_A (D_Y), D_B (D_X)
    self.netG_A = networks.define_G(opt.input_nc, opt.output_nc, opt.ngf, opt.netG, opt.norm,
                                    not opt.no_dropout, opt.init_type, opt.init_gain, self.gpu_ids)
    self.netG_B = networks.define_G(opt.output_nc, opt.input_nc, opt.ngf, opt.netG, opt.norm,
                                    not opt.no_dropout, opt.init_type, opt.init_gain, self.gpu_ids)

    if self.isTrain:  # define discriminators
        self.netD_A = networks.define_D(opt.output_nc, opt.ndf, opt.netD,
                                        opt.n_layers_D, opt.norm, opt.init_type, opt.init_gain, self.gpu_ids)
        self.netD_B = networks.define_D(opt.input_nc, opt.ndf, opt.netD,
                                        opt.n_layers_D, opt.norm, opt.init_type, opt.init_gain, self.gpu_ids)

    if self.isTrain:
        if opt.lambda_identity > 0.0:  # only works when input and output images have the same number of channels
            assert(opt.input_nc == opt.output_nc)
        self.fake_A_pool = ImagePool(opt.pool_size)  # create image buffer to store previously generated images
        self.fake_B_pool = ImagePool(opt.pool_size)  # create image buffer to store previously generated images
        # define loss functions
        self.criterionGAN = networks.GANLoss(opt.gan_mode).to(self.device)  # define GAN loss.
        self.criterionCycle = torch.nn.L1Loss()
        self.criterionIdt = torch.nn.L1Loss()
        # initialize optimizers; schedulers will be automatically created by function <BaseModel.setup>.
        self.optimizer_G = torch.optim.Adam(itertools.chain(self.netG_A.parameters(), self.netG_B.parameters()), lr=opt.lr, betas=(opt.beta1, 0.999))
        self.optimizer_D = torch.optim.Adam(itertools.chain(self.netD_A.parameters(), self.netD_B.parameters()), lr=opt.lr, betas=(opt.beta1, 0.999))
        self.optimizers.append(self.optimizer_G)
        self.optimizers.append(self.optimizer_D)

#### Generator and Discriminator

CycleGANs contain two generators and two discriminators, often with the same structure:
- G_A = G
- G_B = F
- D_A = D_Y
- D_B = D_X

<img src="img/CycleGANArc.png" title="CycleGAN" style="width: 300px;" />

In the default mode, the generators use 9 ResNet blocks and the discriminators use 3 basic CNN blocks.

#### The generator

The generator code is in <code>/model/networks.py</code>

In [None]:
def define_G(input_nc, output_nc, ngf, netG, norm='batch', use_dropout=False, init_type='normal', init_gain=0.02, gpu_ids=[]):
    """Create a generator

    Parameters:
        input_nc (int) -- the number of channels in input images
        output_nc (int) -- the number of channels in output images
        ngf (int) -- the number of filters in the last conv layer
        netG (str) -- the architecture's name: resnet_9blocks | resnet_6blocks | unet_256 | unet_128
        norm (str) -- the name of normalization layers used in the network: batch | instance | none
        use_dropout (bool) -- if use dropout layers.
        init_type (str)    -- the name of our initialization method.
        init_gain (float)  -- scaling factor for normal, xavier and orthogonal.
        gpu_ids (int list) -- which GPUs the network runs on: e.g., 0,1,2

    Returns a generator

    Our current implementation provides two types of generators:
        U-Net: [unet_128] (for 128x128 input images) and [unet_256] (for 256x256 input images)
        The original U-Net paper: https://arxiv.org/abs/1505.04597

        Resnet-based generator: [resnet_6blocks] (with 6 Resnet blocks) and [resnet_9blocks] (with 9 Resnet blocks)
        Resnet-based generator consists of several Resnet blocks between a few downsampling/upsampling operations.
        We adapt Torch code from Justin Johnson's neural style transfer project (https://github.com/jcjohnson/fast-neural-style).


    The generator has been initialized by <init_net>. It uses RELU for non-linearity.
    """
    net = None
    norm_layer = get_norm_layer(norm_type=norm)

    if netG == 'resnet_9blocks':
        net = ResnetGenerator(input_nc, output_nc, ngf, norm_layer=norm_layer, use_dropout=use_dropout, n_blocks=9)
    elif netG == 'resnet_6blocks':
        net = ResnetGenerator(input_nc, output_nc, ngf, norm_layer=norm_layer, use_dropout=use_dropout, n_blocks=6)
    elif netG == 'unet_128':
        net = UnetGenerator(input_nc, output_nc, 7, ngf, norm_layer=norm_layer, use_dropout=use_dropout)
    elif netG == 'unet_256':
        net = UnetGenerator(input_nc, output_nc, 8, ngf, norm_layer=norm_layer, use_dropout=use_dropout)
    else:
        raise NotImplementedError('Generator model name [%s] is not recognized' % netG)
    return init_net(net, init_type, init_gain, gpu_ids)

And here is the resnet generator class

In [None]:
class ResnetGenerator(nn.Module):
    """Resnet-based generator that consists of Resnet blocks between a few downsampling/upsampling operations.

    We adapt Torch code and idea from Justin Johnson's neural style transfer project(https://github.com/jcjohnson/fast-neural-style)
    """

    def __init__(self, input_nc, output_nc, ngf=64, norm_layer=nn.BatchNorm2d, use_dropout=False, n_blocks=6, padding_type='reflect'):
        """Construct a Resnet-based generator

        Parameters:
            input_nc (int)      -- the number of channels in input images
            output_nc (int)     -- the number of channels in output images
            ngf (int)           -- the number of filters in the last conv layer
            norm_layer          -- normalization layer
            use_dropout (bool)  -- if use dropout layers
            n_blocks (int)      -- the number of ResNet blocks
            padding_type (str)  -- the name of padding layer in conv layers: reflect | replicate | zero
        """
        assert(n_blocks >= 0)
        super(ResnetGenerator, self).__init__()
        if type(norm_layer) == functools.partial:
            use_bias = norm_layer.func == nn.InstanceNorm2d
        else:
            use_bias = norm_layer == nn.InstanceNorm2d

        model = [nn.ReflectionPad2d(3),
                 nn.Conv2d(input_nc, ngf, kernel_size=7, padding=0, bias=use_bias),
                 norm_layer(ngf),
                 nn.ReLU(True)]

        n_downsampling = 2
        for i in range(n_downsampling):  # add downsampling layers
            mult = 2 ** i
            model += [nn.Conv2d(ngf * mult, ngf * mult * 2, kernel_size=3, stride=2, padding=1, bias=use_bias),
                      norm_layer(ngf * mult * 2),
                      nn.ReLU(True)]

        mult = 2 ** n_downsampling
        for i in range(n_blocks):       # add ResNet blocks

            model += [ResnetBlock(ngf * mult, padding_type=padding_type, norm_layer=norm_layer, use_dropout=use_dropout, use_bias=use_bias)]

        for i in range(n_downsampling):  # add upsampling layers
            mult = 2 ** (n_downsampling - i)
            model += [nn.ConvTranspose2d(ngf * mult, int(ngf * mult / 2),
                                         kernel_size=3, stride=2,
                                         padding=1, output_padding=1,
                                         bias=use_bias),
                      norm_layer(int(ngf * mult / 2)),
                      nn.ReLU(True)]
        model += [nn.ReflectionPad2d(3)]
        model += [nn.Conv2d(ngf, output_nc, kernel_size=7, padding=0)]
        model += [nn.Tanh()]

        self.model = nn.Sequential(*model)

    def forward(self, input):
        """Standard forward"""
        return self.model(input)


class ResnetBlock(nn.Module):
    """Define a Resnet block"""

    def __init__(self, dim, padding_type, norm_layer, use_dropout, use_bias):
        """Initialize the Resnet block

        A resnet block is a conv block with skip connections
        We construct a conv block with build_conv_block function,
        and implement skip connections in <forward> function.
        Original Resnet paper: https://arxiv.org/pdf/1512.03385.pdf
        """
        super(ResnetBlock, self).__init__()
        self.conv_block = self.build_conv_block(dim, padding_type, norm_layer, use_dropout, use_bias)

    def build_conv_block(self, dim, padding_type, norm_layer, use_dropout, use_bias):
        """Construct a convolutional block.

        Parameters:
            dim (int)           -- the number of channels in the conv layer.
            padding_type (str)  -- the name of padding layer: reflect | replicate | zero
            norm_layer          -- normalization layer
            use_dropout (bool)  -- if use dropout layers.
            use_bias (bool)     -- if the conv layer uses bias or not

        Returns a conv block (with a conv layer, a normalization layer, and a non-linearity layer (ReLU))
        """
        conv_block = []
        p = 0
        if padding_type == 'reflect':
            conv_block += [nn.ReflectionPad2d(1)]
        elif padding_type == 'replicate':
            conv_block += [nn.ReplicationPad2d(1)]
        elif padding_type == 'zero':
            p = 1
        else:
            raise NotImplementedError('padding [%s] is not implemented' % padding_type)

        conv_block += [nn.Conv2d(dim, dim, kernel_size=3, padding=p, bias=use_bias), norm_layer(dim), nn.ReLU(True)]
        if use_dropout:
            conv_block += [nn.Dropout(0.5)]

        p = 0
        if padding_type == 'reflect':
            conv_block += [nn.ReflectionPad2d(1)]
        elif padding_type == 'replicate':
            conv_block += [nn.ReplicationPad2d(1)]
        elif padding_type == 'zero':
            p = 1
        else:
            raise NotImplementedError('padding [%s] is not implemented' % padding_type)
        conv_block += [nn.Conv2d(dim, dim, kernel_size=3, padding=p, bias=use_bias), norm_layer(dim)]

        return nn.Sequential(*conv_block)

    def forward(self, x):
        """Forward function (with skip connections)"""
        out = x + self.conv_block(x)  # add skip connections
        return out

#### The discriminator

For discriminator, the function <code>define_D</code> is shown as below:

In [None]:
def define_D(input_nc, ndf, netD, n_layers_D=3, norm='batch', init_type='normal', init_gain=0.02, gpu_ids=[]):
    """Create a discriminator

    Parameters:
        input_nc (int)     -- the number of channels in input images
        ndf (int)          -- the number of filters in the first conv layer
        netD (str)         -- the architecture's name: basic | n_layers | pixel
        n_layers_D (int)   -- the number of conv layers in the discriminator; effective when netD=='n_layers'
        norm (str)         -- the type of normalization layers used in the network.
        init_type (str)    -- the name of the initialization method.
        init_gain (float)  -- scaling factor for normal, xavier and orthogonal.
        gpu_ids (int list) -- which GPUs the network runs on: e.g., 0,1,2

    Returns a discriminator

    Our current implementation provides three types of discriminators:
        [basic]: 'PatchGAN' classifier described in the original pix2pix paper.
        It can classify whether 70×70 overlapping patches are real or fake.
        Such a patch-level discriminator architecture has fewer parameters
        than a full-image discriminator and can work on arbitrarily-sized images
        in a fully convolutional fashion.

        [n_layers]: With this mode, you can specify the number of conv layers in the discriminator
        with the parameter <n_layers_D> (default=3 as used in [basic] (PatchGAN).)

        [pixel]: 1x1 PixelGAN discriminator can classify whether a pixel is real or not.
        It encourages greater color diversity but has no effect on spatial statistics.

    The discriminator has been initialized by <init_net>. It uses Leakly RELU for non-linearity.
    """
    net = None
    norm_layer = get_norm_layer(norm_type=norm)

    if netD == 'basic':  # default PatchGAN classifier
        net = NLayerDiscriminator(input_nc, ndf, n_layers=3, norm_layer=norm_layer)
    elif netD == 'n_layers':  # more options
        net = NLayerDiscriminator(input_nc, ndf, n_layers_D, norm_layer=norm_layer)
    elif netD == 'pixel':     # classify if each pixel is real or fake
        net = PixelDiscriminator(input_nc, ndf, norm_layer=norm_layer)
    else:
        raise NotImplementedError('Discriminator model name [%s] is not recognized' % netD)
    return init_net(net, init_type, init_gain, gpu_ids)

And here is <code>NLayerDiscriminator</code> function:

In [None]:
class NLayerDiscriminator(nn.Module):
    """Defines a PatchGAN discriminator"""

    def __init__(self, input_nc, ndf=64, n_layers=3, norm_layer=nn.BatchNorm2d):
        """Construct a PatchGAN discriminator

        Parameters:
            input_nc (int)  -- the number of channels in input images
            ndf (int)       -- the number of filters in the last conv layer
            n_layers (int)  -- the number of conv layers in the discriminator
            norm_layer      -- normalization layer
        """
        super(NLayerDiscriminator, self).__init__()
        if type(norm_layer) == functools.partial:  # no need to use bias as BatchNorm2d has affine parameters
            use_bias = norm_layer.func == nn.InstanceNorm2d
        else:
            use_bias = norm_layer == nn.InstanceNorm2d

        kw = 4
        padw = 1
        sequence = [nn.Conv2d(input_nc, ndf, kernel_size=kw, stride=2, padding=padw), nn.LeakyReLU(0.2, True)]
        nf_mult = 1
        nf_mult_prev = 1
        for n in range(1, n_layers):  # gradually increase the number of filters
            nf_mult_prev = nf_mult
            nf_mult = min(2 ** n, 8)
            sequence += [
                nn.Conv2d(ndf * nf_mult_prev, ndf * nf_mult, kernel_size=kw, stride=2, padding=padw, bias=use_bias),
                norm_layer(ndf * nf_mult),
                nn.LeakyReLU(0.2, True)
            ]

        nf_mult_prev = nf_mult
        nf_mult = min(2 ** n_layers, 8)
        sequence += [
            nn.Conv2d(ndf * nf_mult_prev, ndf * nf_mult, kernel_size=kw, stride=1, padding=padw, bias=use_bias),
            norm_layer(ndf * nf_mult),
            nn.LeakyReLU(0.2, True)
        ]

        sequence += [nn.Conv2d(ndf * nf_mult, 1, kernel_size=kw, stride=1, padding=padw)]  # output 1 channel prediction map
        self.model = nn.Sequential(*sequence)

    def forward(self, input):
        """Standard forward."""
        return self.model(input)

### Loss function

For loss GAN function (default mode), they use **lsgan** or **MSELoss**. The function is in <code>networks.py</code>.

The other losses, i.e., Cycle loss and Identity loss, use <code>L1Loss()</code>

In [None]:
# from the line self.criterionGAN = networks.GANLoss(opt.gan_mode).to(self.device)  # define GAN loss.

class GANLoss(nn.Module):
    """Define different GAN objectives.

    The GANLoss class abstracts away the need to create the target label tensor
    that has the same size as the input.
    """

    def __init__(self, gan_mode, target_real_label=1.0, target_fake_label=0.0):
        """ Initialize the GANLoss class.

        Parameters:
            gan_mode (str) - - the type of GAN objective. It currently supports vanilla, lsgan, and wgangp.
            target_real_label (bool) - - label for a real image
            target_fake_label (bool) - - label of a fake image

        Note: Do not use sigmoid as the last layer of Discriminator.
        LSGAN needs no sigmoid. vanilla GANs will handle it with BCEWithLogitsLoss.
        """
        super(GANLoss, self).__init__()
        self.register_buffer('real_label', torch.tensor(target_real_label))
        self.register_buffer('fake_label', torch.tensor(target_fake_label))
        self.gan_mode = gan_mode
        if gan_mode == 'lsgan':
            self.loss = nn.MSELoss()
        elif gan_mode == 'vanilla':
            self.loss = nn.BCEWithLogitsLoss()
        elif gan_mode in ['wgangp']:
            self.loss = None
        else:
            raise NotImplementedError('gan mode %s not implemented' % gan_mode)

    def get_target_tensor(self, prediction, target_is_real):
        """Create label tensors with the same size as the input.

        Parameters:
            prediction (tensor) - - tpyically the prediction from a discriminator
            target_is_real (bool) - - if the ground truth label is for real images or fake images

        Returns:
            A label tensor filled with ground truth label, and with the size of the input
        """

        if target_is_real:
            target_tensor = self.real_label
        else:
            target_tensor = self.fake_label
        return target_tensor.expand_as(prediction)

    def __call__(self, prediction, target_is_real):
        """Calculate loss given Discriminator's output and grount truth labels.

        Parameters:
            prediction (tensor) - - tpyically the prediction output from a discriminator
            target_is_real (bool) - - if the ground truth label is for real images or fake images

        Returns:
            the calculated loss.
        """
        if self.gan_mode in ['lsgan', 'vanilla']:
            target_tensor = self.get_target_tensor(prediction, target_is_real)
            loss = self.loss(prediction, target_tensor)
        elif self.gan_mode == 'wgangp':
            if target_is_real:
                loss = -prediction.mean()
            else:
                loss = prediction.mean()
        return loss

### Optimizer

From the code, both generator and discriminator use **ADAM optimization**. There are 4 optimizers but they store in chain for generator and chain for discriminator

In [None]:
self.optimizer_G = torch.optim.Adam(itertools.chain(self.netG_A.parameters(), self.netG_B.parameters()), lr=opt.lr, betas=(opt.beta1, 0.999))
self.optimizer_D = torch.optim.Adam(itertools.chain(self.netD_A.parameters(), self.netD_B.parameters()), lr=opt.lr, betas=(opt.beta1, 0.999))

### set_input function

The <code>set_input()</code> function is used for setup input of image A and image B.

In [None]:
def set_input(self, input):
    """Unpack input data from the dataloader and perform necessary pre-processing steps.

    Parameters:
        input (dict): include the data itself and its metadata information.

    The option 'direction' can be used to swap domain A and domain B.
    """
    AtoB = self.opt.direction == 'AtoB'
    self.real_A = input['A' if AtoB else 'B'].to(self.device)
    self.real_B = input['B' if AtoB else 'A'].to(self.device)
    self.image_paths = input['A_paths' if AtoB else 'B_paths']

### Training step function

The training step is in <code>optimize_parameters</code> function. The function explain the step by step of cycle GAN training as:

**Note**: Given $fake$ as fake image, $real$ as real image, and rec as rectified image. $A$ and $B$ is the source and destination of 2 kinds of dataset. $G$ is a generator model, and $D$ is a discriminator model.

1. Do forward propagation (function <code>forward()</code>) - This step is the step of cycle gan generating. The process is:
    1. Create <code>fake_B</code> from generator A with input <code>real_A</code>
        - $fake_B = G_A(real_A)$
    2. Create <code>rec_A</code> from generator B with input <code>fake_B</code>
        - $rec_A = G_B(fake_A)$
    3. Create <code>fake_A</code> from generator B with input <code>real_B</code>
        - $fake_A = G_B(real_B)$
    4. Create <code>rec_B</code> from generator A with input <code>fake_A</code>
        - $rec_B = G_A(fake_A)$
2. Do back propagation of G (function <code>backward_G()</code>) - This step is to calculate gradients for both generators and update their weight. The process is:
    1. Calculate identity loss:
        - $\mathcal{L}_{idt_A} = \|G_A(real_B) - real_B\|$
        - $\mathcal{L}_{idt_B} = \|G_B(real_A) - real_A\|$
    2. Calculate GAN loss (Use discriminator here):
        - $\mathcal{L}_{G_A} = \text{MSE}(D_A(fake_B))$
        - $\mathcal{L}_{G_B} = \text{MSE}(D_B(fake_A))$
    3. Calculate Cycle loss:
        - $\mathcal{L}_{cyc_A} = \| rec_A - real_A\|$
        - $\mathcal{L}_{cyc_B} = \| rec_A - real_B\|$
    4. Sum all of losses to be $loss_G$ and back propation from the loss
3. Do back propagation of D (function <code>backward_D_A</code> and <code>backward_D_B</code> - This step is to calculate loss of discriminator and update their weight.
    - $\mathcal{L}_{D_A} = \text{MSE}(D_A(real_B)) + \text{MSE}(1 - D_A(fake_B))$
    - $\mathcal{L}_{D_B} = \text{MSE}(D_B(real_A)) + \text{MSE}(1 - D_B(fake_A))$

#### optimize_parameters function

In [None]:
def optimize_parameters(self):
    """Calculate losses, gradients, and update network weights; called in every training iteration"""
    # forward
    self.forward()      # compute fake images and reconstruction images.
    # G_A and G_B
    self.set_requires_grad([self.netD_A, self.netD_B], False)  # Ds require no gradients when optimizing Gs
    self.optimizer_G.zero_grad()  # set G_A and G_B's gradients to zero
    self.backward_G()             # calculate gradients for G_A and G_B
    self.optimizer_G.step()       # update G_A and G_B's weights
    # D_A and D_B
    self.set_requires_grad([self.netD_A, self.netD_B], True)
    self.optimizer_D.zero_grad()   # set D_A and D_B's gradients to zero
    self.backward_D_A()      # calculate gradients for D_A
    self.backward_D_B()      # calculate graidents for D_B
    self.optimizer_D.step()  # update D_A and D_B's weights

#### forward function

In [None]:
def forward(self):
    """Run forward pass; called by both functions <optimize_parameters> and <test>."""
    self.fake_B = self.netG_A(self.real_A)  # G_A(A)
    self.rec_A = self.netG_B(self.fake_B)   # G_B(G_A(A))
    self.fake_A = self.netG_B(self.real_B)  # G_B(B)
    self.rec_B = self.netG_A(self.fake_A)   # G_A(G_B(B))

#### backward_G function

In [None]:
def backward_G(self):
    """Calculate the loss for generators G_A and G_B"""
    lambda_idt = self.opt.lambda_identity
    lambda_A = self.opt.lambda_A
    lambda_B = self.opt.lambda_B
    # Identity loss
    if lambda_idt > 0:
        # G_A should be identity if real_B is fed: ||G_A(B) - B||
        self.idt_A = self.netG_A(self.real_B)
        self.loss_idt_A = self.criterionIdt(self.idt_A, self.real_B) * lambda_B * lambda_idt
        # G_B should be identity if real_A is fed: ||G_B(A) - A||
        self.idt_B = self.netG_B(self.real_A)
        self.loss_idt_B = self.criterionIdt(self.idt_B, self.real_A) * lambda_A * lambda_idt
    else:
        self.loss_idt_A = 0
        self.loss_idt_B = 0

    # GAN loss D_A(G_A(A))
    self.loss_G_A = self.criterionGAN(self.netD_A(self.fake_B), True)
    # GAN loss D_B(G_B(B))
    self.loss_G_B = self.criterionGAN(self.netD_B(self.fake_A), True)
    # Forward cycle loss || G_B(G_A(A)) - A||
    self.loss_cycle_A = self.criterionCycle(self.rec_A, self.real_A) * lambda_A
    # Backward cycle loss || G_A(G_B(B)) - B||
    self.loss_cycle_B = self.criterionCycle(self.rec_B, self.real_B) * lambda_B
    # combined loss and calculate gradients
    self.loss_G = self.loss_G_A + self.loss_G_B + self.loss_cycle_A + self.loss_cycle_B + self.loss_idt_A + self.loss_idt_B
    self.loss_G.backward()

#### backward_D function

In [None]:
def backward_D_A(self):
    """Calculate GAN loss for discriminator D_A"""
    fake_B = self.fake_B_pool.query(self.fake_B)
    self.loss_D_A = self.backward_D_basic(self.netD_A, self.real_B, fake_B)

def backward_D_B(self):
    """Calculate GAN loss for discriminator D_B"""
    fake_A = self.fake_A_pool.query(self.fake_A)
    self.loss_D_B = self.backward_D_basic(self.netD_B, self.real_A, fake_A)

def backward_D_basic(self, netD, real, fake):
    """Calculate GAN loss for the discriminator

    Parameters:
        netD (network)      -- the discriminator D
        real (tensor array) -- real images
        fake (tensor array) -- images generated by a generator

    Return the discriminator loss.
    We also call loss_D.backward() to calculate the gradients.
    """
    # Real
    pred_real = netD(real)
    loss_D_real = self.criterionGAN(pred_real, True)
    # Fake
    pred_fake = netD(fake.detach())
    loss_D_fake = self.criterionGAN(pred_fake, False)
    # Combined loss and calculate gradients
    loss_D = (loss_D_real + loss_D_fake) * 0.5
    loss_D.backward()
    return loss_D

## Pix2Pix or Image-to-Image translation

Reference: https://towardsdatascience.com/pix2pix-869c17900998

Image-to-Image translation is one of the GANs tasks.
It is used for image synthesis, which translates from a source image as a random vector z to be the target image.
Pix2Pix can convert an image such as edges of an object, sematic segmentation, and normal image to the photo image, or edges, segmented image.

The application of Image-to-Image translation are:
- Colorization
- Super-resolution
- Image to drawing or drawing to image
- Sematic segmentation

Most of all applications are image synthesis.

<img src="img/pix2pix1.png" title="Pix2Pix" style="width: 400px;" />

<img src="img/pix2pix2.png" title="Pix2Pix" style="width: 400px;" />

### The process of Pix2Pix

The steps of the Pix2Pix are very simple. In training process, the steps are:
1. Use a source image as input to the generator to let the generator create a target fake image.
2. Overlay the target real/fake image with the source image and feed them into the discriminator to let the discriminator to predict that it is real or fake.
3. Calculate loss and update their weight like normal GANs

<img src="img/pix2pix3.png" title="Pix2Pix" style="width: 640px;" />

When the users want to use it in testing mode, they just bring the generator and discard the discriminator.

### Loss function

The loss function of Discriminator is:

$$\mathcal{L}_D = \mathbb{E}_{x,y}[\log D(x,y)] + \mathbb{E}_{x,z}[1 - \log D(x,G(x,z))].$$

**Note**: Sometime, the input has noise $z$. If there is no noise vector, leave it blank.

The loss function of Generator contains 2 parts, GAN loss and L1 Loss. The L1 loss is:

$$\mathcal{L}_{L1} = \mathbb{E}_{x,y,z}[\|y-G(x,z)\|],$$

and the overall of generator loss is:

$$\mathcal{L}_G = \mathbb{E}_{x,z}[\log D(x,G(x,z))] + \lambda \mathcal{L}_{L1}.$$

When $\lambda$ is the hyper-parameters, and the author reported the most success with the lambda parameter equal to 100. (From TA: Oh my gosh, I used 0.1 for my experiment with other generator. 0_o")

### Generator networks

The Generator in Pix2Pix use an auto-Encoder Network. 
The Generator takes in the Image to be translated and compresses it into a low-dimensional, “Bottleneck”, vector representation. The Generator then learns how to upsample this into the output image.
In the paper, the author used U-Net256 as the generator.

U-Net is similar to ResNets in the way that information from earlier layers are integrated into later layers. The U-Net skip connections are also interesting because they do not require any resizing, projections etc. since the spatial resolution of the layers being connected already match each other.

<img src="img/u-net-architecture.png" title="U-Net" style="width: 640px;" />

Source: https://lmb.informatik.uni-freiburg.de/people/ronneber/u-net/

### Discriminator networks (PatchGAN)

The PatchGAN discriminator used in pix2pix is another unique component to this design. The PatchGAN / Markovian discriminator works by classifying individual (N x N) patches in the image as “real vs. fake”, opposed to classifying the entire image as “real vs. fake”. The authors reason that this enforces more constraints that encourage sharp high-frequency detail. Additionally, the PatchGAN has fewer parameters and runs faster than classifying the entire image. The image below depicts results experimenting with the size of N for the N x N patches to be classified:

<img src="img/patchgan.png" title="PatchGAN" style="width: 640px;" />

## Using the code

#### Datasets

Download one of the official datasets with:

    bash ./datasets/download_pix2pix_dataset.sh [cityscapes, night2day, edges2handbags, edges2shoes, facades, maps]

Or use your own dataset by creating the appropriate folders and adding in the images. Follow the instructions [here](https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix/blob/master/docs/datasets.md#pix2pix-datasets).

### Training

The command in **Terminal 2** is:

    python train.py --dataroot ./datasets/facades --name facades_pix2pix --model pix2pix --direction BtoA
    
and you can use visdom.server too.

## Looking into the code (Pix2Pix)

Because we have explained the code about dataset and the main. Let's take a look only <code>model/pix2pix_model.py</code>.

First, take a look into initialize function

In [None]:
    def __init__(self, opt):
        """Initialize the pix2pix class.

        Parameters:
            opt (Option class)-- stores all the experiment flags; needs to be a subclass of BaseOptions
        """
        BaseModel.__init__(self, opt)
        # specify the training losses you want to print out. The training/test scripts will call <BaseModel.get_current_losses>
        self.loss_names = ['G_GAN', 'G_L1', 'D_real', 'D_fake']
        # specify the images you want to save/display. The training/test scripts will call <BaseModel.get_current_visuals>
        self.visual_names = ['real_A', 'fake_B', 'real_B']
        # specify the models you want to save to the disk. The training/test scripts will call <BaseModel.save_networks> and <BaseModel.load_networks>
        if self.isTrain:
            self.model_names = ['G', 'D']
        else:  # during test time, only load G
            self.model_names = ['G']
        # define networks (both generator and discriminator)
        self.netG = networks.define_G(opt.input_nc, opt.output_nc, opt.ngf, opt.netG, opt.norm,
                                      not opt.no_dropout, opt.init_type, opt.init_gain, self.gpu_ids)

        if self.isTrain:  # define a discriminator; conditional GANs need to take both input and output images; Therefore, #channels for D is input_nc + output_nc
            self.netD = networks.define_D(opt.input_nc + opt.output_nc, opt.ndf, opt.netD,
                                          opt.n_layers_D, opt.norm, opt.init_type, opt.init_gain, self.gpu_ids)

        if self.isTrain:
            # define loss functions
            self.criterionGAN = networks.GANLoss(opt.gan_mode).to(self.device)
            self.criterionL1 = torch.nn.L1Loss()
            # initialize optimizers; schedulers will be automatically created by function <BaseModel.setup>.
            self.optimizer_G = torch.optim.Adam(self.netG.parameters(), lr=opt.lr, betas=(opt.beta1, 0.999))
            self.optimizer_D = torch.optim.Adam(self.netD.parameters(), lr=opt.lr, betas=(opt.beta1, 0.999))
            self.optimizers.append(self.optimizer_G)
            self.optimizers.append(self.optimizer_D)

All of step process are the same as cycle gan, but for the discriminator, the input layers size is the combination between image input channels and output channels.

    opt.input_nc + opt.output_nc
    
And for the generator, the pix2pix class has change the default of generator model to be **UNET-256** and the loss GAN is vanilla GAN which means it use **log sigmoid** for calculation in loss function.

In [None]:
def modify_commandline_options(parser, is_train=True):
    """Add new dataset-specific options, and rewrite default values for existing options.

    Parameters:
        parser          -- original option parser
        is_train (bool) -- whether training phase or test phase. You can use this flag to add training-specific or test-specific options.

    Returns:
        the modified parser.

    For pix2pix, we do not use image buffer
    The training objective is: GAN Loss + lambda_L1 * ||G(A)-B||_1
    By default, we use vanilla GAN loss, UNet with batchnorm, and aligned datasets.
    """
    # changing the default values to match the pix2pix paper (https://phillipi.github.io/pix2pix/)
    parser.set_defaults(norm='batch', netG='unet_256', dataset_mode='aligned')
    if is_train:
        parser.set_defaults(pool_size=0, gan_mode='vanilla')
        parser.add_argument('--lambda_L1', type=float, default=100.0, help='weight for L1 loss')

    return parser

### Input images

For pix2pix, it can convert from A to B and B to A too. Thus, we swap the input in here.
- If 'AtoB', $real_A = real_A$ and $real_B = real_B$ (no change)
- If 'BtoA', $real_A = real_B$ and $real_A = real_B$ (swap them)

In [None]:
def set_input(self, input):
    """Unpack input data from the dataloader and perform necessary pre-processing steps.

    Parameters:
        input (dict): include the data itself and its metadata information.

    The option 'direction' can be used to swap images in domain A and domain B.
    """
    AtoB = self.opt.direction == 'AtoB'
    self.real_A = input['A' if AtoB else 'B'].to(self.device)
    self.real_B = input['B' if AtoB else 'A'].to(self.device)
    self.image_paths = input['A_paths' if AtoB else 'B_paths']

### Generator Network (U-Net class)

The u-net class is in <code>networks.py</code>

In [None]:
class UnetGenerator(nn.Module):
    """Create a Unet-based generator"""

    def __init__(self, input_nc, output_nc, num_downs, ngf=64, norm_layer=nn.BatchNorm2d, use_dropout=False):
        """Construct a Unet generator
        Parameters:
            input_nc (int)  -- the number of channels in input images
            output_nc (int) -- the number of channels in output images
            num_downs (int) -- the number of downsamplings in UNet. For example, # if |num_downs| == 7,
                                image of size 128x128 will become of size 1x1 # at the bottleneck
            ngf (int)       -- the number of filters in the last conv layer
            norm_layer      -- normalization layer

        We construct the U-Net from the innermost layer to the outermost layer.
        It is a recursive process.
        """
        super(UnetGenerator, self).__init__()
        # construct unet structure
        unet_block = UnetSkipConnectionBlock(ngf * 8, ngf * 8, input_nc=None, submodule=None, norm_layer=norm_layer, innermost=True)  # add the innermost layer
        for i in range(num_downs - 5):          # add intermediate layers with ngf * 8 filters
            unet_block = UnetSkipConnectionBlock(ngf * 8, ngf * 8, input_nc=None, submodule=unet_block, norm_layer=norm_layer, use_dropout=use_dropout)
        # gradually reduce the number of filters from ngf * 8 to ngf
        unet_block = UnetSkipConnectionBlock(ngf * 4, ngf * 8, input_nc=None, submodule=unet_block, norm_layer=norm_layer)
        unet_block = UnetSkipConnectionBlock(ngf * 2, ngf * 4, input_nc=None, submodule=unet_block, norm_layer=norm_layer)
        unet_block = UnetSkipConnectionBlock(ngf, ngf * 2, input_nc=None, submodule=unet_block, norm_layer=norm_layer)
        self.model = UnetSkipConnectionBlock(output_nc, ngf, input_nc=input_nc, submodule=unet_block, outermost=True, norm_layer=norm_layer)  # add the outermost layer

    def forward(self, input):
        """Standard forward"""
        return self.model(input)


class UnetSkipConnectionBlock(nn.Module):
    """Defines the Unet submodule with skip connection.
        X -------------------identity----------------------
        |-- downsampling -- |submodule| -- upsampling --|
    """

    def __init__(self, outer_nc, inner_nc, input_nc=None,
                 submodule=None, outermost=False, innermost=False, norm_layer=nn.BatchNorm2d, use_dropout=False):
        """Construct a Unet submodule with skip connections.

        Parameters:
            outer_nc (int) -- the number of filters in the outer conv layer
            inner_nc (int) -- the number of filters in the inner conv layer
            input_nc (int) -- the number of channels in input images/features
            submodule (UnetSkipConnectionBlock) -- previously defined submodules
            outermost (bool)    -- if this module is the outermost module
            innermost (bool)    -- if this module is the innermost module
            norm_layer          -- normalization layer
            use_dropout (bool)  -- if use dropout layers.
        """
        super(UnetSkipConnectionBlock, self).__init__()
        self.outermost = outermost
        if type(norm_layer) == functools.partial:
            use_bias = norm_layer.func == nn.InstanceNorm2d
        else:
            use_bias = norm_layer == nn.InstanceNorm2d
        if input_nc is None:
            input_nc = outer_nc
        downconv = nn.Conv2d(input_nc, inner_nc, kernel_size=4,
                             stride=2, padding=1, bias=use_bias)
        downrelu = nn.LeakyReLU(0.2, True)
        downnorm = norm_layer(inner_nc)
        uprelu = nn.ReLU(True)
        upnorm = norm_layer(outer_nc)

        if outermost:
            upconv = nn.ConvTranspose2d(inner_nc * 2, outer_nc,
                                        kernel_size=4, stride=2,
                                        padding=1)
            down = [downconv]
            up = [uprelu, upconv, nn.Tanh()]
            model = down + [submodule] + up
        elif innermost:
            upconv = nn.ConvTranspose2d(inner_nc, outer_nc,
                                        kernel_size=4, stride=2,
                                        padding=1, bias=use_bias)
            down = [downrelu, downconv]
            up = [uprelu, upconv, upnorm]
            model = down + up
        else:
            upconv = nn.ConvTranspose2d(inner_nc * 2, outer_nc,
                                        kernel_size=4, stride=2,
                                        padding=1, bias=use_bias)
            down = [downrelu, downconv, downnorm]
            up = [uprelu, upconv, upnorm]

            if use_dropout:
                model = down + [submodule] + up + [nn.Dropout(0.5)]
            else:
                model = down + [submodule] + up

        self.model = nn.Sequential(*model)

    def forward(self, x):
        if self.outermost:
            return self.model(x)
        else:   # add skip connections
            return torch.cat([x, self.model(x)], 1)

### Discriminator network

Use the same as CycleGANs which has 3 CNNs layers as default.

### Training step function

The training step is in <code>optimize_parameters</code> function. If you look inside and compare between CycleGAN and Pix2Pix, you can see that Pix2Pix is too simpler than CycleGAN a lot. The function explain the step by step of Pix2Pix training as:

**Note**: Given $fake$ as fake image, $real$ as real image, and rec as rectified image. $A$ and $B$ is the source and destination of 2 kinds of dataset. $G$ is a generator model, and $D$ is a discriminator model.

1. Do forward propagation (function <code>forward()</code>) - Create <code>fake_B</code> from generator A with input <code>real_A</code>
    - $fake_B = G_A(real_A)$
2. Do back propagation of G (function <code>backward_G()</code>) - This step is to calculate gradients for both generators and update their weight. The process is:
    1. Concatenate A and B togeter:
        - $real_{AB} = (real_A|real_B)$
        - $fake_{AB} = (real_A|fake_B)$
    2. Calculate GAN loss (Use discriminator here):
        - $\mathcal{L}_{G} = \text{sigmoid} (D(fake_{AB}))$
    3. Calculate L1 loss:
        - $\mathcal{L}_{L1} = \|real_B-fake_B\|$
    4. Sum all of losses to be $loss_G$ and back propation from the loss
        - $\mathcal{L} = \mathcal{L}_{G} + \lambda \mathcal{L}_{L1}$
        - Note that $\lambda = 100$
3. Do back propagation of D (function <code>backward_D</code> - This step is to calculate loss of discriminator and update their weight.
    - $\mathcal{L}_{D} = \text{sigmoid}(D(real_AB)) + \text{sigmoid}(1 - D(fake_{AB}))$

#### optimize_parameter function

In [None]:
def optimize_parameters(self):
    self.forward()                   # compute fake images: G(A)
    # update D
    self.set_requires_grad(self.netD, True)  # enable backprop for D
    self.optimizer_D.zero_grad()     # set D's gradients to zero
    self.backward_D()                # calculate gradients for D
    self.optimizer_D.step()          # update D's weights
    # update G
    self.set_requires_grad(self.netD, False)  # D requires no gradients when optimizing G
    self.optimizer_G.zero_grad()        # set G's gradients to zero
    self.backward_G()                   # calculate graidents for G
    self.optimizer_G.step()             # udpate G's weights

#### forward function

In [None]:
def forward(self):
    """Run forward pass; called by both functions <optimize_parameters> and <test>."""
    self.fake_B = self.netG(self.real_A)  # G(A)

#### backward_G function

In [None]:
def backward_G(self):
    """Calculate GAN and L1 loss for the generator"""
    # First, G(A) should fake the discriminator
    fake_AB = torch.cat((self.real_A, self.fake_B), 1)
    pred_fake = self.netD(fake_AB)
    self.loss_G_GAN = self.criterionGAN(pred_fake, True)
    # Second, G(A) = B
    self.loss_G_L1 = self.criterionL1(self.fake_B, self.real_B) * self.opt.lambda_L1
    # combine loss and calculate gradients
    self.loss_G = self.loss_G_GAN + self.loss_G_L1
    self.loss_G.backward()

#### backward_D function

In [None]:
def backward_D(self):
    """Calculate GAN loss for the discriminator"""
    # Fake; stop backprop to the generator by detaching fake_B
    fake_AB = torch.cat((self.real_A, self.fake_B), 1)  # we use conditional GANs; we need to feed both input and output to the discriminator
    pred_fake = self.netD(fake_AB.detach())
    self.loss_D_fake = self.criterionGAN(pred_fake, False)
    # Real
    real_AB = torch.cat((self.real_A, self.real_B), 1)
    pred_real = self.netD(real_AB)
    self.loss_D_real = self.criterionGAN(pred_real, True)
    # combine loss and calculate gradients
    self.loss_D = (self.loss_D_fake + self.loss_D_real) * 0.5
    self.loss_D.backward()

## Independent experiments

Do the following:
1. Train a Cycle GAN on the `horses2zebras` dataset provided. Document your results in your report.
2. Create a new data set `aitict2celeba` consisting of AIT ICT faces and CelebA faces. Use the URLs for the datasets provided in class. Document your results in your report.
