# Lab 3: Generating Data

Based on assignments by Lisa Zhang and Jimmy Ba.

In this lab, you will build models to perform image colourization. That is, given a greyscale image, we wish to predict the colour at each pixel. Image colourization is a difficult problem for many reasons, one of which being that it is ill-posed: for a single greyscale image, there can be multiple, equally valid colourings.

To keep the training time manageable we will use the CIFAR-10 data set, which consists of images of size 32x32 pixels. For most of the questions we will use a subset of the dataset. The data loading script is included with the notebooks, and should download automatically the first time it is loaded. 

We will be starting with a convolutional autoencoder and tweaking it along the way to improve our perforamnce. Then as a second part of the assignment we will compare the autoencoder approach to conditional generative adversarial networks (cGANs).

In the process, you are expected to learn to:

1. Clean and process the dataset and create greyscale images.
2. Implement and modify an autoencoder architecture.
3. Tune the hyperparameters of an autoencoder.
4. Implement skip connections and other techniques to improve performance.
5. Implement a cGAN and compare with an autoencoder.
6. Improve on the cGAN by trying one of several techniques to enhance training.


### What to submit

Submit an HTML file containing all your code, outputs, and write-up
from parts A and B. You can produce a HTML file directly from Google Colab. The Colab instructions are provided at the end of this document.

**Do not submit any other files produced by your code.**

Include a link to your colab file in your submission.

Please use Google Colab to complete this assignment. If you want to use Jupyter Notebook, please complete the assignment and upload your Jupyter Notebook file to Google Colab for submission. 

## Colab Link

Include a link to your Colab file here. If you would like the TA to look at your
Colab file in case your solutions are cut off, **please make sure that your Colab
file is publicly accessible at the time of submission**.

Colab Link:https://colab.research.google.com/drive/19-DjPZutSCDB2ZHDEMhdNtNClDiC0pBP?usp=sharing

# PART A - Autoencoder [20 pt]

In this part we will construct and compare different autoencoder models for the image colourization task.

#### Helper code

Provided are some helper functions for loading and preparing the data. Note that you will need to use the Colab GPU for this assignment.

In [None]:
"""
Colourization of CIFAR-10 Horses via classification.
"""
import argparse
import math
import time

import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import numpy.random as npr
import scipy.misc
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Variable


In [None]:
######################################################################
# Setup working directory
######################################################################
%mkdir -p /content/a3/
%cd /content/a3


In [None]:
######################################################################
# Helper functions for loading data
######################################################################
# adapted from
# https://github.com/fchollet/keras/blob/master/keras/datasets/cifar10.py

import os
import pickle
import sys
import tarfile

import numpy as np
from PIL import Image
from six.moves.urllib.request import urlretrieve


def get_file(fname, origin, untar=False, extract=False, archive_format="auto", cache_dir="data"):
    datadir = os.path.join(cache_dir)
    if not os.path.exists(datadir):
        os.makedirs(datadir)

    if untar:
        untar_fpath = os.path.join(datadir, fname)
        fpath = untar_fpath + ".tar.gz"
    else:
        fpath = os.path.join(datadir, fname)

    print("File path: %s" % fpath)
    if not os.path.exists(fpath):
        print("Downloading data from", origin)

        error_msg = "URL fetch failure on {}: {} -- {}"
        try:
            try:
                urlretrieve(origin, fpath)
            except URLError as e:
                raise Exception(error_msg.format(origin, e.errno, e.reason))
            except HTTPError as e:
                raise Exception(error_msg.format(origin, e.code, e.msg))
        except (Exception, KeyboardInterrupt) as e:
            if os.path.exists(fpath):
                os.remove(fpath)
            raise

    if untar:
        if not os.path.exists(untar_fpath):
            print("Extracting file.")
            with tarfile.open(fpath) as archive:
                archive.extractall(datadir)
        return untar_fpath

    if extract:
        _extract_archive(fpath, datadir, archive_format)

    return fpath


def load_batch(fpath, label_key="labels"):
    """Internal utility for parsing CIFAR data.
    # Arguments
        fpath: path the file to parse.
        label_key: key for label data in the retrieve
            dictionary.
    # Returns
        A tuple `(data, labels)`.
    """
    f = open(fpath, "rb")
    if sys.version_info < (3,):
        d = pickle.load(f)
    else:
        d = pickle.load(f, encoding="bytes")
        # decode utf8
        d_decoded = {}
        for k, v in d.items():
            d_decoded[k.decode("utf8")] = v
        d = d_decoded
    f.close()
    data = d["data"]
    labels = d[label_key]

    data = data.reshape(data.shape[0], 3, 32, 32)
    return data, labels


def load_cifar10(transpose=False):
    """Loads CIFAR10 dataset.
    # Returns
        Tuple of Numpy arrays: `(x_train, y_train), (x_test, y_test)`.
    """
    dirname = "cifar-10-batches-py"
    origin = "http://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
    path = get_file(dirname, origin=origin, untar=True)

    num_train_samples = 50000

    x_train = np.zeros((num_train_samples, 3, 32, 32), dtype="uint8")
    y_train = np.zeros((num_train_samples,), dtype="uint8")

    for i in range(1, 6):
        fpath = os.path.join(path, "data_batch_" + str(i))
        data, labels = load_batch(fpath)
        x_train[(i - 1) * 10000 : i * 10000, :, :, :] = data
        y_train[(i - 1) * 10000 : i * 10000] = labels

    fpath = os.path.join(path, "test_batch")
    x_test, y_test = load_batch(fpath)

    y_train = np.reshape(y_train, (len(y_train), 1))
    y_test = np.reshape(y_test, (len(y_test), 1))

    if transpose:
        x_train = x_train.transpose(0, 2, 3, 1)
        x_test = x_test.transpose(0, 2, 3, 1)
    return (x_train, y_train), (x_test, y_test)

In [None]:
# Download CIFAR dataset
m = load_cifar10()

## Part 1. Data Preparation [7 pt]

To start off run the above code to load the CIFAR dataset and then work through the following questions/tasks. 

### Part (a) [1pt]
Verify that the dataset has loaded correctly. How many samples do we have? How is the data organized?

Answer#

There are total 60000 samples with 10 different classes. The data is divide in train and test sets. Train set has 50000 samples and test set has 10000 samples. All the images have 3 channels RGB with size of 32X32 pixels.

In [None]:
# code to examine the dataset
train_data = m[0]
test_data = m[1]
train_sample = train_data[0]
train_lables = train_data[1]
test_sample = test_data[0]
test_lables = test_data[1]

print('Train Sample Size ', train_sample.shape)
print('Test Sample Size ', test_sample.shape)

In [None]:
train_sample[1]

In [None]:
plt.figure(figsize=(20, 10))
for i, item in enumerate(train_sample):
        if i >= 9: break
        plt.subplot(1, 9, i+1)
        plt.imshow(item[0])

In [None]:
print(train_lables)

### Part (b) [2pt]
Preprocess the data to select only images of horses. Learning to generate only hourse images will make our task easier. Your function will also convert the colour images to greyscale to create our input data.

In [None]:
import torch
from torchvision import datasets, transforms
import matplotlib.pyplot as plt

In [None]:
# select a single category.
HORSE_CATEGORY = 7

# convert colour images into greyscale
def process(xs, ys, max_pixel=256.0, downsize_input=False):
    """
    Pre-process CIFAR10 images by taking only the horse category,
    shuffling, and have colour values be bound between 0 and 1

    Args:
      xs: the colour RGB pixel values
      ys: the category labels
      max_pixel: maximum pixel value in the original data
    Returns:
      xs: value normalized and shuffled colour images
      grey: greyscale images, also normalized so values are between 0 and 1
    """
    xs = xs / max_pixel
    xs = xs[np.where(ys == HORSE_CATEGORY)[0], :, :, :]
    npr.shuffle(xs)

    grey = np.mean(xs, axis=1, keepdims=True)

    if downsize_input:
        downsize_module = nn.Sequential(
            nn.AvgPool2d(2),
            nn.AvgPool2d(2),
            nn.Upsample(scale_factor=2),
            nn.Upsample(scale_factor=2),
        )
        xs_downsized = downsize_module.forward(torch.from_numpy(xs).float())
        xs_downsized = xs_downsized.data.numpy()
        return (xs, xs_downsized)
    else:
        return (xs, grey)


Printing processed data to verify the images

In [None]:
x_col, x_grey = process(train_sample,train_lables)

In [None]:
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(9):
    ax = fig.add_subplot(2, 9, idx+1)
    plt.imshow(np.transpose(x_col[idx], (1, 2, 0)))


In [None]:
plt.figure(figsize=(25, 4))
for i, item in enumerate(x_grey):
        if i >= 9: break
        plt.subplot(1, 9, i+1)
        plt.imshow(item[0], 'gray')

### Part (c) [2pt]
Create a dataloader (or function) to batch the samples.

In [None]:
# dataloader for batching samples

def get_batch(x, y, batch_size):
    """
    Generated that yields batches of data

    Args:
      x: input values
      y: output values
      batch_size: size of each batch
    Yields:
      batch_x: a batch of inputs of size at most batch_size
      batch_y: a batch of outputs of size at most batch_size
    """
    N = np.shape(x)[0]
    assert N == np.shape(y)[0]
    for i in range(0, N, batch_size):
        batch_x = x[i : i + batch_size, :, :, :]
        batch_y = y[i : i + batch_size, :, :, :]
        yield (batch_x, batch_y)

In [None]:
xs, ys = next(iter(get_batch(x_grey, x_col, 10)))

In [None]:
type(xs)

### Part (e) [2pt]
Verify and visualize that we are able to generate different batches of data.

In [None]:
# code to load different batches of horse dataset

print("Loading data...")
(x_train, y_train), (x_test, y_test) = load_cifar10()

print("Transforming data...")
train_rgb, train_grey = process(x_train, y_train)
test_rgb, test_grey = process(x_test, y_test)


In [None]:
# shape of data and labels before selection
print(x_train.shape, y_train.shape)

In [None]:
# shape of training data
print('Training Data: ', train_rgb.shape, train_grey.shape)
# shape of testing data
print('Testing Data: ', test_rgb.shape, test_grey.shape)

Load Batches

In [None]:
# obtain batches of images
xs, ys = next(iter(get_batch(train_grey, train_rgb, 10)))
print(xs.shape, ys.shape)

In [None]:
# obtain batches of images
xstest, ystest = next(iter(get_batch(test_grey, test_rgb, 10)))
print(xstest.shape, ystest.shape)

Visualization

In [None]:
# visualize 5 train/test images
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(5):
    ax = fig.add_subplot(2, 9, idx+1)
    plt.imshow(np.transpose(ys[idx], (1, 2, 0)))

In [None]:
plt.figure(figsize=(25, 4))
for i, item in enumerate(xs):
        if i >= 5: break
        plt.subplot(1, 9, i+1)
        plt.imshow(item[0], 'gray')

In [None]:
# visualize 5 train/test images
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(5):
    ax = fig.add_subplot(2, 9, idx+1)
    plt.imshow(np.transpose(ystest[idx], (1, 2, 0)))

In [None]:
plt.figure(figsize=(25, 4))
for i, item in enumerate(xstest):
        if i >= 5: break
        plt.subplot(1, 9, i+1)
        plt.imshow(item[0], 'gray')

## Part 2. Colourization as Regression [5 pt]

There are many ways to frame the problem of image colourization as a machine learning problem. One naive approach is to frame it as a regression problem, where we build a model to predict the RGB intensities at each pixel given the greyscale input. In this case, the outputs are continuous, and so squared error can be used to train the model.

In this section, you will get familar with training neural networks using cloud GPUs. Run the helper code and answer the questions that follow.

#### Helper Code

Regression Architecture

In [None]:
class RegressionCNN(nn.Module):
    def __init__(self, kernel, num_filters):
        # first call parent's initialization function
        super().__init__()
        padding = kernel // 2

        self.downconv1 = nn.Sequential(
            nn.Conv2d(1, num_filters, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(num_filters),
            nn.ReLU(),
            nn.MaxPool2d(2),)
        self.downconv2 = nn.Sequential(
            nn.Conv2d(num_filters, num_filters*2, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(num_filters*2),
            nn.ReLU(),
            nn.MaxPool2d(2),)

        self.rfconv = nn.Sequential(
            nn.Conv2d(num_filters*2, num_filters*2, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(num_filters*2),
            nn.ReLU())

        self.upconv1 = nn.Sequential(
            nn.Conv2d(num_filters*2, num_filters, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(num_filters),
            nn.ReLU(),
            nn.Upsample(scale_factor=2),)
        self.upconv2 = nn.Sequential(
            nn.Conv2d(num_filters, 3, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(3),
            nn.ReLU(),
            nn.Upsample(scale_factor=2),)
        self.finalconv = nn.Conv2d(3, 3, kernel_size=kernel, padding=padding)

    def forward(self, x):
        out = self.downconv1(x)
        out = self.downconv2(out)
        out = self.rfconv(out)
        out = self.upconv1(out)
        out = self.upconv2(out)
        out = self.finalconv(out)
        return out

Training code

In [None]:
class AttrDict(dict):
    def __init__(self, *args, **kwargs):
        super(AttrDict, self).__init__(*args, **kwargs)
        self.__dict__ = self

def get_torch_vars(xs, ys, gpu=False):
    """
    Helper function to convert numpy arrays to pytorch tensors.
    If GPU is used, move the tensors to GPU.

    Args:
      xs (float numpy tenosor): greyscale input
      ys (int numpy tenosor): rgb as labels
      gpu (bool): whether to move pytorch tensor to GPU
    Returns:
      Variable(xs), Variable(ys)
    """
    xs = torch.from_numpy(xs).float()
    ys = torch.from_numpy(ys).float()
    if gpu:
        xs = xs.cuda()
        ys = ys.cuda()
    return Variable(xs), Variable(ys)

def train(args, gen=None):

    # Numpy random seed
    npr.seed(args.seed)

    # Save directory
    save_dir = "outputs/" + args.experiment_name

    # LOAD THE MODEL
    if gen is None:
        Net = globals()[args.model]
        gen = Net(args.kernel, args.num_filters)

    # LOSS FUNCTION
    criterion = nn.MSELoss()
    optimizer = torch.optim.Adam(gen.parameters(), lr=args.learn_rate)

    # DATA
    print("Loading data...")
    (x_train, y_train), (x_test, y_test) = load_cifar10()

    print("Transforming data...")
    train_rgb, train_grey = process(x_train, y_train, downsize_input=args.downsize_input)
    test_rgb, test_grey = process(x_test, y_test, downsize_input=args.downsize_input)

    # Create the outputs folder if not created already
    if not os.path.exists(save_dir):
        os.makedirs(save_dir)

    print("Beginning training ...")
    if args.gpu:
        gen.cuda()
    start = time.time()

    iters = []
    train_losses = []
    valid_losses = []
    valid_accs = []
    n = 0 # the number of iterations
    for epoch in range(args.epochs):
        # Train the Model
        gen.train()  # Change model to 'train' mode
        losses = []
        for i, (xs, ys) in enumerate(get_batch(train_grey, train_rgb, args.batch_size)):
            images, labels = get_torch_vars(xs, ys, args.gpu)
            # Forward + Backward + Optimize
            optimizer.zero_grad()
            outputs = gen(images)

            loss = criterion(outputs, labels)
            loss.backward()
            optimizer.step()
            losses.append(loss.data.item())
        
            imagesval, labelsval = get_torch_vars(test_grey, test_rgb, args.gpu)
            outputval = gen(imagesval)
            iters.append(n)
            lossval = criterion(outputval, labelsval) 
            train_losses.append(np.average(loss.data.item()))
            valid_losses.append(np.average(lossval.data.item()))
            n += 1

        print(epoch, loss.cpu().detach())
        if args.plot:
          visual(images, labels, outputs, args.gpu, 1)
    
    plt.title("Training Curve")
    plt.plot(iters, train_losses, label="Train")
    plt.plot(iters, valid_losses, label="Validation")
    plt.xlabel("Iterations")
    plt.ylabel("Training Accuracy")
    plt.legend(loc='best')
    plt.show()
    
    print("Final Training Accuracy: {}".format(train_losses[-1]))
    print("Final Validation Accuracy: {}".format(valid_losses[-1]))

    return gen

Training visualization code

In [None]:
# visualize 5 train/test images
def visual(img_grey, img_real, img_fake, gpu = 0, flag_torch = 0):

  if gpu:
    img_grey = img_grey.cpu().detach()
    img_real = img_real.cpu().detach()
    img_fake = img_fake.cpu().detach()

  if flag_torch:
    img_grey = img_grey.numpy()
    img_real = img_real.numpy()
    img_fake = img_fake.numpy()

  if flag_torch == 2:
    img_real = np.transpose(img_real[:, :, :, :, :], [0, 4, 2, 3, 1]).squeeze()
    img_fake = np.transpose(img_fake[:, :, :, :, :], [0, 4, 2, 3, 1]).squeeze()

  #correct image structure
  img_grey = np.transpose(img_grey[:5, :, :, :], [0, 2, 3, 1]).squeeze()
  img_real = np.transpose(img_real[:5, :, :, :], [0, 2, 3, 1])
  img_fake = np.transpose(img_fake[:5, :, :, :], [0, 2, 3, 1])

  for i in range(5):
      ax = plt.subplot(3, 5, i + 1)
      ax.imshow(img_grey[i], cmap='gray')
      ax.axis("off")
      ax = plt.subplot(3, 5, i + 1 + 5)
      ax.imshow(img_real[i])
      ax.axis("off")
      ax = plt.subplot(3, 5, i + 1 + 10)
      ax.imshow(img_fake[i])
      ax.axis("off")
  plt.show()

Main training loop for regression CNN

In [None]:
#Main training loop for CNN
args = AttrDict()
args_dict = {
    "gpu": True,
    "valid": False,
    "checkpoint": "",
    "colours": "./data/colours/colour_kmeans24_cat7.npy",
    "model": "RegressionCNN",
    "kernel": 3,
    "num_filters": 32,
    'learn_rate':0.001, 
    "batch_size": 100,
    "epochs": 25,
    "seed": 0,
    "plot": True,
    "experiment_name": "colourization_cnn",
    "visualize": False,
    "downsize_input": False,
}

args.update(args_dict)
cnn = train(args)

##Part C Answers

In [None]:
#Main training loop for CNN
args = AttrDict()
args_dict = {
    "gpu": True,
    "valid": False,
    "checkpoint": "",
    "colours": "./data/colours/colour_kmeans24_cat7.npy",
    "model": "RegressionCNN",
    "kernel": 3,
    "num_filters": 64,
    'learn_rate':0.01, 
    "batch_size": 100,
    "epochs": 30,
    "seed": 0,
    "plot": True,
    "experiment_name": "colourization_cnn",
    "visualize": False,
    "downsize_input": False,
}

args.update(args_dict)
cnn = train(args)

In [None]:
#Main training loop for CNN
args = AttrDict()
args_dict = {
    "gpu": True,
    "valid": False,
    "checkpoint": "",
    "colours": "./data/colours/colour_kmeans24_cat7.npy",
    "model": "RegressionCNN",
    "kernel": 3,
    "num_filters": 64,
    'learn_rate':0.001, 
    "batch_size": 50,
    "epochs": 30,
    "seed": 0,
    "plot": True,
    "experiment_name": "colourization_cnn",
    "visualize": False,
    "downsize_input": False,
}

args.update(args_dict)
cnn = train(args)

In [None]:
#Main training loop for CNN
args = AttrDict()
args_dict = {
    "gpu": True,
    "valid": False,
    "checkpoint": "",
    "colours": "./data/colours/colour_kmeans24_cat7.npy",
    "model": "RegressionCNN",
    "kernel": 3,
    "num_filters": 64,
    'learn_rate':0.01, 
    "batch_size": 100,
    "epochs": 50,
    "seed": 0,
    "plot": True,
    "experiment_name": "colourization_cnn",
    "visualize": False,
    "downsize_input": False,
}

args.update(args_dict)
cnn = train(args)

In [None]:
#Main training loop for CNN
args = AttrDict()
args_dict = {
    "gpu": True,
    "valid": False,
    "checkpoint": "",
    "colours": "./data/colours/colour_kmeans24_cat7.npy",
    "model": "RegressionCNN",
    "kernel": 3,
    "num_filters": 64,
    'learn_rate':0.001, 
    "batch_size": 75,
    "epochs": 30,
    "seed": 0,
    "plot": True,
    "experiment_name": "colourization_cnn",
    "visualize": False,
    "downsize_input": False,
}

args.update(args_dict)
cnn = train(args)

### Part (a) [1 pt]
Describe the model RegressionCNN. How many convolution layers does it have? What are the filter sizes and number of filters at each layer? Construct a table or draw a diagram.

Answer: The model has 6 CNN layers, with batch normalization, ReLU and Maxpool after each layer except the last layer. The kernel size is 3 and it has 32 out filters in the first layer. Second layer has kernel size of 3 and  64 out filters. Third layer has kernel size of 3 and  64 out filters. Forth layer has kernel size of 3 and  32 out filters. Fifth layer has kernel size of 3 and  3 out filters. Last layer has kernel size of 3 and 3 out filters.

### Part (b) [1 pt]
Run the regression training code (should run without errors). This will generate some images. How many epochs are we training the CNN model in the given setting?

Answer: The model ran without problems and we are training the models for 25 epochs

### Part (c) [3 pt]
Re-train a couple of new models using a different number of training epochs. You may train each new models in a new code cell by copying and modifying the code from the last notebook cell. Comment on how the results (output images, training loss) change as we increase or decrease the number of epochs.

*Answer*: The model was retrained several times with changing LR, Epoch and batch sizes. Please see the models runs above. In total model was run 3 seperate times. It was noted that batch size of 100 results in better loss. Higher batch results in lower loss. Increasing epoch also helps with better predictions. The lowest loss was achived with batch size 100, lr 0.01, epoch 30 trainig loss of 0.006165 and testing loss of 0.005752.

## Part 3. Skip Connections [8 pt]
A skip connection in a neural network is a connection which skips one or more layer and connects to a later layer. We will introduce skip connections.

### Part (a) [4 pt]
Add a skip connection from the first layer to the last, second layer to the second last, etc.
That is, the final convolution should have both the output of the previous layer and the initial greyscale input as input. This type of skip-connection is introduced by [3], and is called a "UNet". Following the CNN class that you have completed, complete the __init__ and forward methods of the UNet class.
Hint: You will need to use the function torch.cat.

In [None]:
#complete the code

class UNet(nn.Module):
    def __init__(self, kernel, num_filters, num_colours=3, num_in_channels=1):
        super().__init__()

        # Useful parameters
        stride = 2
        padding = kernel // 2
        output_padding = 1

        self.downconv1 = nn.Sequential(
            nn.Conv2d(1, num_filters, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(num_filters),
            nn.ReLU(),
            nn.MaxPool2d(2),)
        
        self.downconv2 = nn.Sequential(
            nn.Conv2d(num_filters, num_filters*2, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(num_filters*2),
            nn.ReLU(),
            nn.MaxPool2d(2),)

        self.rfconv = nn.Sequential(
            nn.Conv2d(num_filters*2, num_filters*2, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(num_filters*2),
            nn.ReLU())

        self.upconv1 = nn.Sequential(
            nn.Conv2d(num_filters*4, num_filters, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(num_filters),
            nn.ReLU(),
            nn.Upsample(scale_factor=2),)
        
        self.upconv2 = nn.Sequential(
            nn.Conv2d(num_filters*2, 3, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(3),
            nn.ReLU(),
            nn.Upsample(scale_factor=2),)
        
        self.finalconv = nn.Conv2d(3+1, 3, kernel_size=kernel, padding=padding)


    def forward(self, x):
        out_1 = self.downconv1(x)
        out_2 = self.downconv2(out_1)
        out_3 = self.rfconv(out_2)
        out_4 = self.upconv1(torch.cat((out_3, out_2),1))
        out_5 = self.upconv2(torch.cat((out_4, out_1),1))
        out_6 = self.finalconv(torch.cat((out_5, x),1))
        
        return out_6

### Part (b) [2 pt]
Train the "UNet" model for the same amount of epochs as the previous CNN and plot the training curve using a batch size of 100. How does the result compare to the previous model? Did skip connections improve the validation loss and accuracy? Did the skip connections improve the output qualitatively? How? Give at least two reasons why skip connections might improve the performance of our CNN models.

Answer:

The result improved significantly. Skip connection was able to improve the accuracy and also improve the image appearance. Yes, loss and validation accuracy improved. In the last model validation loss and accuracy was 0.007986, whereas in this model the validation loss and accuracy dropped down to 0.005397. Yes, imaged quality was improved. Images were less blurry, and more color can be seen with same number of epochs. Skip connection improve quality because some information is short circuited between layers this way model prefers to learn whether the information of pervious layer is important to information from last layer is important. Secondly, in maxpooling some information is lost slip connections allows the model to skip the max pooling which allows the last layers to have more information, thus improving the accuracy of the model.

In [None]:
# Main training loop for UNet
args = AttrDict()
args_dict = {
    "gpu": True,
    "valid": False,
    "checkpoint": "",
    "colours": "./data/colours/colour_kmeans24_cat7.npy",
    "model": "UNet",
    "kernel": 3,
    "num_filters": 32,
    'learn_rate':0.001, 
    "batch_size": 100,
    "epochs": 25,
    "seed": 0,
    "plot": True,
    "experiment_name": "colourization_cnn",
    "visualize": False,
    "downsize_input": False,
}
args.update(args_dict)
cnn, train_loss, val_loss = train(args)

### Part (c) [2 pt]
Re-train a few more "UNet" models using different mini batch sizes with a fixed number of epochs. Describe the effect of batch sizes on the training/validation loss, and the final image output.

Answer: Batch size helps imporve the loss and image quaility. Smaller the batch better the image quality and lower loss.

In [None]:
# Main training loop for UNet
args = AttrDict()
args_dict = {
    "gpu": True,
    "valid": False,
    "checkpoint": "",
    "colours": "./data/colours/colour_kmeans24_cat7.npy",
    "model": "UNet",
    "kernel": 3,
    "num_filters": 32,
    'learn_rate':0.001, 
    "batch_size": 50,
    "epochs": 25,
    "seed": 0,
    "plot": True,
    "experiment_name": "colourization_cnn",
    "visualize": False,
    "downsize_input": False,
}
args.update(args_dict)
cnn = train(args)

In [None]:
# Main training loop for UNet
args = AttrDict()
args_dict = {
    "gpu": True,
    "valid": False,
    "checkpoint": "",
    "colours": "./data/colours/colour_kmeans24_cat7.npy",
    "model": "UNet",
    "kernel": 3,
    "num_filters": 32,
    'learn_rate':0.001, 
    "batch_size": 25,
    "epochs": 25,
    "seed": 0,
    "plot": True,
    "experiment_name": "colourization_cnn",
    "visualize": False,
    "downsize_input": False,
}
args.update(args_dict)
cnn = train(args)

In [None]:
# Main training loop for UNet
args = AttrDict()
args_dict = {
    "gpu": True,
    "valid": False,
    "checkpoint": "",
    "colours": "./data/colours/colour_kmeans24_cat7.npy",
    "model": "UNet",
    "kernel": 3,
    "num_filters": 32,
    'learn_rate':0.001, 
    "batch_size": 200,
    "epochs": 25,
    "seed": 0,
    "plot": True,
    "experiment_name": "colourization_cnn",
    "visualize": False,
    "downsize_input": False,
}
args.update(args_dict)
cnn = train(args)

# PART B - Conditional GAN [30 pt]

In this second half of the assignment we will construct a conditional generative adversarial network for our image colourization task.

## Part 1. Conditional GAN [15 pt]

To start we will be modifying the previous sample code to construct and train a conditional GAN. We will exploring the different architectures to identify and select our best image colourization model.

Note: This second half of the assignment should be started after the lecture on generative adversarial networks (GANs). 


### Part (a) [3 pt]
Modify the provided training code to implement a generator. Then test to verify it works on the desired input (Hint: you can reuse some of your earlier autoencoder models here to act as a generator)

In [None]:
%matplotlib inline
import torch
import torch.nn as nn
import pandas as pd
import numpy as np
from torchvision import transforms
from torchvision import datasets

from torch.utils.data import Dataset, DataLoader
from PIL import Image
from torch import autograd
from torch.autograd import Variable
from torchvision.utils import make_grid
import matplotlib.pyplot as plt

In [None]:
class Generator(nn.Module):
    def __init__(self, kernel, num_filters, num_colours=3, num_in_channels=1):
        super().__init__()

        # Useful parameters
        stride = 2
        padding = kernel // 2
        output_padding = 1

        self.downconv1 = nn.Sequential(
            nn.Conv2d(1, num_filters, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(num_filters),
            nn.ReLU(),
            nn.MaxPool2d(2),)
        
        self.downconv2 = nn.Sequential(
            nn.Conv2d(num_filters, num_filters*2, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(num_filters*2),
            nn.ReLU(),
            nn.MaxPool2d(2),)

        self.rfconv = nn.Sequential(
            nn.Conv2d(num_filters*2, num_filters*2, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(num_filters*2),
            nn.ReLU())

        self.upconv1 = nn.Sequential(
            nn.Conv2d(num_filters*4, num_filters, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(num_filters),
            nn.ReLU(),
            nn.Upsample(scale_factor=2),)
        
        self.upconv2 = nn.Sequential(
            nn.Conv2d(num_filters*2, 3, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(3),
            nn.ReLU(),
            nn.Upsample(scale_factor=2),)
        
        self.finalconv = nn.Conv2d(3+1, 3, kernel_size=kernel, padding=padding)
        
        ############### YOUR CODE GOES HERE ############### 
        ###################################################
 



    def forward(self, x):
        out_1 = self.downconv1(x)
        out_2 = self.downconv2(out_1)
        out_3 = self.rfconv(out_2)
        out_4 = self.upconv1(torch.cat((out_3, out_2),1))
        out_5 = self.upconv2(torch.cat((out_4, out_1),1))
        out_6 = self.finalconv(torch.cat((out_5, x),1))
        
        ############### YOUR CODE GOES HERE ###############
        ###################################################
        return out_6

Testing the Generator

In [None]:
#test generator architecture
#provide arguments to create generator architecture
g = Generator(3,16,24,1)
#create a sample mini-batch of greyscale images
img_greyscale = torch.rand(100,1,32,32)
#generate fake image
img_fake = g(img_greyscale)

In [None]:
#verify output dimensions are correct 
print(img_fake.shape)

### Part (b) [3 pt]
Modify the provided training code to implement a discriminator. Then test to verify it works on the desired input.

In [None]:
# discriminator code

class Discriminator(nn.Module):
    def __init__(self, kernel, num_filters, num_colours=3, num_in_channels=1):
        super().__init__()
        
        # Useful parameters
        stride = 2
        padding = kernel // 2
        output_padding = 1
        
        self.model = nn.Sequential(
            nn.Linear(4096, 1024),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Dropout(0.3),
            nn.Linear(1024, 512),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Dropout(0.3),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Dropout(0.3),
            nn.Linear(256, 1),
            nn.Sigmoid()
        )        
        ############### YOUR CODE GOES HERE ############### 
        ###################################################
    
    
    def forward(self, x, img_greyscale):
        
        x = x.view(x.size(0), -1)
        c = img_greyscale.view(img_greyscale.size(0), -1)
        x = torch.cat([x, c], 1)
        out = self.model(x)
        out =  out.squeeze()
        ############### YOUR CODE GOES HERE ###############
        ###################################################
        return out

Testing the discriminator

In [None]:
# test discriminator architecture
#provide arguments to create Discriminator architecture
d = Discriminator(3,64)
#create a sample mini-batch of greyscale images
img_greyscale = torch.rand(100,1,32,32)
#create a sample mini-batch of rgb images
img_rgbcale = torch.rand(100,3,32,32)
#generate fake image
prediciton = d(img_rgbcale,img_greyscale)

In [None]:
print(prediciton.shape)

### Part (c) [3 pt]
Modify the provided training code to implement a conditional GAN.

In [None]:
class AttrDict(dict):
    def __init__(self, *args, **kwargs):
        super(AttrDict, self).__init__(*args, **kwargs)
        self.__dict__ = self

def get_torch_vars(xs, ys, gpu=False):
    """
    Helper function to convert numpy arrays to pytorch tensors.
    If GPU is used, move the tensors to GPU.

    Args:
      xs (float numpy tenosor): greyscale input
      ys (int numpy tenosor): categorical labels
      gpu (bool): whether to move pytorch tensor to GPU
    Returns:
      Variable(xs), Variable(ys)
    """
    xs = torch.from_numpy(xs).float()
    ys = torch.from_numpy(ys).float() #--> ADDED for cGAN
    if gpu:
        xs = xs.cuda()
        ys = ys.cuda()
    return Variable(xs), Variable(ys)

def train(args, cnn=None):
    # Set the maximum number of threads to prevent crash in Teaching Labs
    # TODO: necessary?
    torch.set_num_threads(5)
    # Numpy random seed
    npr.seed(args.seed)

    # Save directory
    save_dir = "outputs/" + args.experiment_name

    # LOAD THE COLOURS CATEGORIES

    # INPUT CHANNEL
    num_in_channels = 1 if not args.downsize_input else 3
    # LOAD THE MODEL
    if cnn is None:
        Net = globals()[args.model]
        cnn = Generator(args.kernel, args.num_filters)
        discriminator = Discriminator(args.kernel, args.num_filters)


    # LOSS FUNCTION

    criterion = nn.BCELoss()                                                    
    g_optimizer = torch.optim.Adam(cnn.parameters(), lr=args.learn_rate)
    d_optimizer = torch.optim.Adam(discriminator.parameters(), lr=args.learn_rate)

    # DATA
    print("Loading data...")
    (x_train, y_train), (x_test, y_test) = load_cifar10()

    print("Transforming data...")
    train_rgb, train_grey = process(x_train, y_train, downsize_input=args.downsize_input)
    test_rgb, test_grey = process(x_test, y_test, downsize_input=args.downsize_input)

    # Create the outputs folder if not created already
    if not os.path.exists(save_dir):
        os.makedirs(save_dir)

    print("Beginning training ...")
    if args.gpu:
        cnn.cuda()
        discriminator.cuda()
    start = time.time()

    train_losses = []
    valid_losses = []
    valid_accs = []
    for epoch in range(args.epochs):
        # Train the Model
        cnn.train()
        discriminator.train()
        losses = []
 
        for i, (xs, ys) in enumerate(get_batch(train_grey, train_rgb, args.batch_size)):
            images, labels = get_torch_vars(xs, ys, args.gpu)

            #--->ADDED 5
            img_grey = images
            img_real = labels
            batch_size = args.batch_size
            
            #discriminator training
            d_optimizer.zero_grad()

            # train with real images
            real_validity = discriminator(img_real, img_grey)
            real_loss = criterion(real_validity, Variable(torch.ones(batch_size)).cuda())
    
            # train with fake images
            fake_images = cnn(img_grey)
            fake_validity = discriminator(fake_images,  img_grey)
            fake_loss = criterion(fake_validity, Variable(torch.zeros(batch_size)).cuda())
    
            d_loss = real_loss + fake_loss            

            d_loss.backward()
            d_optimizer.step()

            # generator training
            g_optimizer.zero_grad()
            fake_images = cnn(img_grey)
            validity = discriminator(fake_images, img_grey)
            g_loss = criterion(validity, Variable(torch.ones(batch_size)).cuda())

            g_loss.backward()
            g_optimizer.step()


        # print and visualize
        print(epoch, g_loss.cpu().detach(), d_loss.cpu().detach())
        visual(images, labels, fake_images, args.gpu, 1)

    return cnn

### Part (d) [3 pt]
Train a conditional GAN for image colourization.

In [None]:
args = AttrDict()
args_dict = {
    "gpu": True,
    "valid": False,
    "checkpoint": "",
    "colours": "./data/colours/colour_kmeans24_cat7.npy",
    "model": "Generator",
    "kernel": 5,
    "num_filters": 64,
    'learn_rate':0.0001, 
    "batch_size": 200,
    "epochs": 200,
    "seed": 0,
    "plot": False,
    "experiment_name": "colourization_cnn",
    "visualize": False,
    "downsize_input": False,
}
args.update(args_dict)
cnn = train(args)

#batch size of 50 with 100 epochs seamed to work

### Part (e) [1 pt]
How does the performance of the cGAN compare with the autoencoder models that you tested in the first half of this assignment?

Answer: cGAN image quality was not as good as the Autoencoder. The images were blurry, but some images had better color than Autoencoder. In cGAN the images were generated from scratch, whereas in Autoencoder the images existed, and the model was just learning to color the images. cGAN are much more versatile, once they learn and are trained well, they can be used to create images from scratch or random noise.

### Part (f) [2 pt]

A colour space is a choice of mapping of colours into three-dimensional coordinates. Some colours could be close together in one colour space, but further apart in others. The RGB colour space is probably the most familiar to you, the model used in in our regression colourization example computes squared error in RGB colour space. But, most state of the art colourization models
do not use RGB colour space. How could using the RGB colour space be problematic? Your answer should relate how human perception of colour is different than the squared distance. You may use the Wikipedia article on colour space to help you answer the question.

Answer:
Human eyes are sensitive to specific colors and shades. But for some colors they can have hard time to distinguish. This means different colors and intensity causes humans to see differently, green shade is more sensitive where blue is less sensitive to humans. Taking the MSE between green and blue color would be the same, whereas showing green and blue shades human eye will perceive it as a larger change.

## Part 2. Exploration [10 pt]

At this point we have trained a few different generative models for our image colourization task with varying results. What makes this work exciting is that there many other approaches we could take. In this part of the assignment you will be exploring at least one of several approaches towards improving our performance on the image colourization task. Some well known approaches you can consider include:

- lab colour space representation instead of RBG which simplifies the problem and requires you to predict two output channels instead of three
- k-means to represent RBG colourspace by 'k' distinct colours, this effectively changes the problem from regression to classification.

Other interesting approaches include:
- combining L1 loss along with the discriminator-based loss
- starting with a pretrained generator (i.e. Resnet)
- patch discriminator trained on local regions

A great example of some of these different approaches can be found in a <a href="https://towardsdatascience.com/colorizing-black-white-images-with-u-net-and-conditional-gan-a-tutorial-81b2df111cd8">blog post by Moein Shariatnia</a>.

Note you are only required to pick one of the suggested modifications.

**Answer#**
I will be trying the first option#

lab color space representation instead of RBG which simplifies the problem and requires you to predict two output channels instead of three

Reason I believe this will perform well is because the predication are reduced down from 3 colors to 2 colors. This will enable the same model to predict better colors with higher accuracy.


In [None]:
from skimage.color import rgb2lab, lab2rgb

New process function which takes the train sample in rbd format and converts them to L.A.B. Parts of the functions were taken from the follwing artical. https://towardsdatascience.com/colorizing-black-white-images-with-u-net-and-conditional-gan-a-tutorial-81b2df111cd8

In [None]:
# convert colour images into LAB
def process2(xs, ys, max_pixel=256.0, downsize_input=False, label=True):
    """
    Pre-process CIFAR10 images by taking only the horse category,
    shuffling

    Args:
      xs: the colour RGB pixel values
      ys: the category labels
      max_pixel: maximum pixel value in the original data
    Returns:
      L: L images
      ab: ab images
    """
    xs = xs / max_pixel
    if label:
      xs = xs[np.where(ys == HORSE_CATEGORY)[0], :, :, :]
    npr.shuffle(xs)
    xs = xs.transpose(0, 2, 3, 1)
    #print(xs.shape)

    img_lab = rgb2lab(xs).astype("float32") # Converting RGB to L*a*b

    L = np.zeros((len(xs), 32,32,1))
    ab = np.zeros((len(xs), 32,32,2))

    #print(img_lab.shape)

    for i in range(len(xs)):
      L1 = img_lab[i,:,:,:1]
      L[i] = L1 / 50. - 1. # Between -1 and 1
      ab1 = img_lab[i,:,:,1:] 
      ab[i] = ab1 / 110. # Between -1 and 1

    L = np.transpose(L, (0,3,1,2))
    ab = np.transpose(ab, (0,3,1,2))

    return (L, ab)


In [None]:
HORSE_CATEGORY = 7

In [None]:
L, ab = process2(train_sample,train_lables) #converting images to lab colour space

In [None]:
L.shape

In [None]:
ab.shape

In [None]:
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(9):
    ax = fig.add_subplot(2, 9, idx+1)
    plt.imshow(L[idx][0], cmap='gray')

In [None]:
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(9):
    ax = fig.add_subplot(2, 9, idx+1)
    plt.imshow(ab[idx][0])

Function to convert lab to rgb. Parts of this functions are taken from the following article:
https://towardsdatascience.com/colorizing-black-white-images-with-u-net-and-conditional-gan-a-tutorial-81b2df111cd8

In [None]:
def lab_to_rgb(L, ab):
    """
    Takes a batch of images
    """
    L = L.cpu().detach().numpy()
    ab = ab.cpu().detach().numpy()
    L = (L + 1.) * 50.
    ab = ab * 110.
    Lab = np.concatenate((L, ab), axis=1).transpose(0, 2, 3, 1)
    rgb_imgs = []
    for img in Lab:
        img_rgb = lab2rgb(img)
        rgb_imgs.append(img_rgb)
    rgb_imgs = np.stack(rgb_imgs, axis=0)
    rgb_imgs = torch.from_numpy(rgb_imgs)
    return rgb_imgs

In [None]:
L = torch.from_numpy(L)
ab = torch.from_numpy(ab)

In [None]:
rgb = lab_to_rgb(L, ab) #converting to rgb

In [None]:
L.shape

In [None]:
ab.shape

In [None]:
rgb.shape

In [None]:
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(9):
    ax = fig.add_subplot(2, 9, idx+1)
    plt.imshow(rgb[idx])

In [None]:
%matplotlib inline
import torch
import torch.nn as nn
import pandas as pd
import numpy as np
from torchvision import transforms
from torchvision import datasets

from torch.utils.data import Dataset, DataLoader
from PIL import Image
from torch import autograd
from torch.autograd import Variable
from torchvision.utils import make_grid
import matplotlib.pyplot as plt

In [None]:
import warnings
warnings.filterwarnings("ignore")

Generator and Discrimnator are from CGAN. They were modified as per the new dimessions of the dataset.

In [None]:
class Generator(nn.Module):
    def __init__(self, kernel, num_filters, num_colours=3, num_in_channels=1):
        super().__init__()

        # Useful parameters
        stride = 2
        padding = kernel // 2
        output_padding = 1

        self.downconv1 = nn.Sequential(
            nn.Conv2d(1, num_filters, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(num_filters),
            nn.ReLU(),
            nn.MaxPool2d(2),)
        
        self.downconv2 = nn.Sequential(
            nn.Conv2d(num_filters, num_filters*2, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(num_filters*2),
            nn.ReLU(),
            nn.MaxPool2d(2),)

        self.rfconv = nn.Sequential(
            nn.Conv2d(num_filters*2, num_filters*2, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(num_filters*2),
            nn.ReLU())

        self.upconv1 = nn.Sequential(
            nn.Conv2d(num_filters*4, num_filters, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(num_filters),
            nn.ReLU(),
            nn.Upsample(scale_factor=2),)
        
        self.upconv2 = nn.Sequential(
            nn.Conv2d(num_filters*2, 2, kernel_size=kernel, padding=padding),
            nn.BatchNorm2d(2),
            nn.ReLU(),
            nn.Upsample(scale_factor=2),)
        
        self.finalconv = nn.Conv2d(2+1, 2, kernel_size=kernel, padding=padding)
        
        ############### YOUR CODE GOES HERE ############### 
        ###################################################
 



    def forward(self, x):
        out_1 = self.downconv1(x)
        out_2 = self.downconv2(out_1)
        out_3 = self.rfconv(out_2)
        out_4 = self.upconv1(torch.cat((out_3, out_2),1))
        out_5 = self.upconv2(torch.cat((out_4, out_1),1))
        out_6 = self.finalconv(torch.cat((out_5, x),1))
        
        ############### YOUR CODE GOES HERE ###############
        ###################################################
        return out_6

Testing generator

In [None]:
#test generator architecture
#provide arguments to create generator architecture
g = Generator(3,16,24,1)
#create a sample mini-batch of greyscale images
img_greyscale = torch.rand(100,1,32,32)
#generate fake image
img_fake = g(img_greyscale)

In [None]:
#verify output dimensions are correct 
print(img_fake.shape)

In [None]:
# discriminator code

class Discriminator(nn.Module):
    def __init__(self, kernel, num_filters, num_colours=3, num_in_channels=1):
        super().__init__()
        
        # Useful parameters
        stride = 2
        padding = kernel // 2
        output_padding = 1
        
        self.model = nn.Sequential(
            nn.Linear(3072, 1024),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Dropout(0.3),
            nn.Linear(1024, 512),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Dropout(0.3),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Dropout(0.3),
            nn.Linear(256, 1),
            nn.Sigmoid()
        )        
        ############### YOUR CODE GOES HERE ############### 
        ###################################################
    
    
    def forward(self, x, img_greyscale):
        
        x = x.contiguous().view(x.size(0), -1)
        c = img_greyscale.view(img_greyscale.size(0), -1)
        x = torch.cat([x, c], 1)
        out = self.model(x)
        out =  out.squeeze()
        ############### YOUR CODE GOES HERE ###############
        ###################################################
        return out

Testing Discriminator

In [None]:
# test discriminator architecture
#provide arguments to create Discriminator architecture
d = Discriminator(3,64)
#create a sample mini-batch of greyscale images
L_samp = torch.rand(100,1,32,32)
#create a sample mini-batch of rgb images
ab_samp = torch.rand(100,2,32,32)
#generate fake image
prediciton = d(ab_samp,L_samp)

In [None]:
print(prediciton.shape)

In [None]:
# visualize 5 train/test images
def visual(img_grey, img_real, img_fake, gpu = 0, flag_torch = 0):

  if gpu:
    img_grey = img_grey.cpu().detach()
    img_real = img_real.cpu().detach()
    img_fake = img_fake.cpu().detach()

  if flag_torch:
    img_grey = img_grey.numpy()
    img_real = img_real.numpy()
    img_fake = img_fake.numpy()

  if flag_torch == 2:
    img_real = np.transpose(img_real[:, :, :, :, :], [0, 4, 2, 3, 1]).squeeze()
    img_fake = np.transpose(img_fake[:, :, :, :, :], [0, 4, 2, 3, 1]).squeeze()

  #correct image structure
  img_grey = np.transpose(img_grey[:5, :, :, :], [0, 2, 3, 1]).squeeze()
  img_real = np.transpose(img_real[:5, :, :, :], [0, 1, 2, 3])
  img_fake = np.transpose(img_fake[:5, :, :, :], [0, 1, 2, 3])

  for i in range(5):
      ax = plt.subplot(3, 5, i + 1)
      ax.imshow(img_grey[i], cmap='gray')
      ax.axis("off")
      ax = plt.subplot(3, 5, i + 1 + 5)
      ax.imshow(img_real[i])
      ax.axis("off")
      ax = plt.subplot(3, 5, i + 1 + 10)
      ax.imshow(img_fake[i])
      ax.axis("off")
  plt.show()

Train function for new data type

In [None]:
class AttrDict(dict):
    def __init__(self, *args, **kwargs):
        super(AttrDict, self).__init__(*args, **kwargs)
        self.__dict__ = self

def get_torch_vars(xs, ys, gpu=False):
    """
    Helper function to convert numpy arrays to pytorch tensors.
    If GPU is used, move the tensors to GPU.

    Args:
      xs (float numpy tenosor): greyscale input
      ys (int numpy tenosor): categorical labels
      gpu (bool): whether to move pytorch tensor to GPU
    Returns:
      Variable(xs), Variable(ys)
    """
    xs = torch.from_numpy(xs).float()
    ys = torch.from_numpy(ys).float() #--> ADDED for cGAN
    if gpu:
        xs = xs.cuda()
        ys = ys.cuda()
    return Variable(xs), Variable(ys)

def train(args, cnn=None):
    # Set the maximum number of threads to prevent crash in Teaching Labs
    # TODO: necessary?
    torch.set_num_threads(5)
    # Numpy random seed
    npr.seed(args.seed)

    # Save directory
    save_dir = "outputs/" + args.experiment_name

    # LOAD THE COLOURS CATEGORIES

    # INPUT CHANNEL
    num_in_channels = 1 if not args.downsize_input else 3
    # LOAD THE MODEL
    if cnn is None:
        Net = globals()[args.model]
        cnn = Generator(args.kernel, args.num_filters)
        discriminator = Discriminator(args.kernel, args.num_filters)


    # LOSS FUNCTION

    criterion = nn.BCELoss()                                                    
    g_optimizer = torch.optim.Adam(cnn.parameters(), lr=args.learn_rate)
    d_optimizer = torch.optim.Adam(discriminator.parameters(), lr=args.learn_rate)

    # DATA
    print("Loading data...")
    (x_train, y_train), (x_test, y_test) = load_cifar10()

    print("Transforming data...")
    #L, ab = process2(train_sample,train_lables)
    train_L, train_ab = process2(x_train, y_train)
    test_L, test_ab = process2(x_test, y_test)

    # Create the outputs folder if not created already
    if not os.path.exists(save_dir):
        os.makedirs(save_dir)

    print("Beginning training ...")
    if args.gpu:
        cnn.cuda()
        discriminator.cuda()
    start = time.time()

    train_losses = []
    valid_losses = []
    valid_accs = []
    for epoch in range(args.epochs):
        # Train the Model
        cnn.train()
        discriminator.train()
        losses = []
 
        for i, (xs, ys) in enumerate(get_batch(train_L, train_ab, args.batch_size)):
            images, labels = get_torch_vars(xs, ys, args.gpu)

            #--->ADDED 5
            img_L = images
            #print(img_L.shape)
            img_ab = labels
            batch_size = args.batch_size
            
            #discriminator training
            d_optimizer.zero_grad()

            # train with real images
            real_validity = discriminator(img_ab, img_L)
            real_loss = criterion(real_validity, Variable(torch.ones(batch_size)).cuda())
    
            # train with fake images
            fake_images = cnn(img_L)
            fake_validity = discriminator(fake_images,  img_L)
            fake_loss = criterion(fake_validity, Variable(torch.zeros(batch_size)).cuda())
    
            d_loss = real_loss + fake_loss            

            d_loss.backward()
            d_optimizer.step()

            # generator training
            g_optimizer.zero_grad()
            fake_images = cnn(img_L)
            validity = discriminator(fake_images, img_L)
            g_loss = criterion(validity, Variable(torch.ones(batch_size)).cuda())

            g_loss.backward()
            g_optimizer.step()


        # print and visualize
        print(epoch, g_loss.cpu().detach(), d_loss.cpu().detach())
        realrgb = lab_to_rgb(images, labels)
        fakergb = lab_to_rgb(images, fake_images)
        visual(images, realrgb, fakergb, args.gpu, 1)

    return cnn

Training CGAN with L.A.B.

In [None]:
args = AttrDict()
args_dict = {
    "gpu": True,
    "valid": False,
    "checkpoint": "",
    "colours": "./data/colours/colour_kmeans24_cat7.npy",
    "model": "Generator",
    "kernel": 3,
    "num_filters": 64,
    'learn_rate':0.0001, 
    "batch_size": 50,
    "epochs": 100,
    "seed": 0,
    "plot": False,
    "experiment_name": "colourization_cnn",
    "visualize": False,
    "downsize_input": False,
}
args.update(args_dict)
cnn = train(args)

#batch size of 50 with 100 epochs seamed to work

## Part 3. New Data [5 pt]
Retrieve sample pictures from online and demonstrate how well your best model performs. Provide all your code.

In [None]:
# mount Google Drive
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# resize all images to 32 x 32
data_transform = transforms.Compose([transforms.Resize((32,32)), transforms.ToTensor()])

##Loading Test Images

In [None]:
data_dir = '/content/drive/My Drive/Lab3_images/'
data = datasets.ImageFolder(data_dir, transform=data_transform)

In [None]:
# print out some data stats
print('Num training images: ', len(data))

#The picture in cat folder corresponds to label 0 and dog corresponds to 1
print(data.class_to_idx)

In [None]:
classes = data.class_to_idx['horses']

In [None]:
train_loader = torch.utils.data.DataLoader(dataset=data, batch_size=10, 
                                           shuffle=True)

In [None]:
dataiter = iter(train_loader)
images, labels = dataiter.next()
images = images.numpy() # convert images to numpy for display

##Converting Images to L.A.B Format

In [None]:
l_test, ab_test = process2(images, labels,1, label=False)

L Images

In [None]:
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(10):
    ax = fig.add_subplot(2, 9, idx+1)
    plt.imshow(l_test[idx][0], cmap='gray')

AB Images

In [None]:
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(10):
    ax = fig.add_subplot(2, 9, idx+1)
    plt.imshow(ab_test[idx][0])

In [None]:
l_test = torch.from_numpy(l_test).float()
ab_test = torch.from_numpy(ab_test).float()

Converting L.A.B to RGB Images

In [None]:
rgb = lab_to_rgb(l_test, ab_test)

In [None]:
fig = plt.figure(figsize=(25, 4))
for idx in np.arange(10):
    ax = fig.add_subplot(2, 9, idx+1)
    plt.imshow(np.transpose(rgb[idx], (0, 1, 2)))

#Loading the CGAN trained on LAB images. 

In [None]:
cnn.load_state_dict

##Prediciting and Visualizing Results

In [None]:
prediction = cnn(l_test.cuda())

In [None]:
realrgb = lab_to_rgb(l_test, ab_test)
fakergb = lab_to_rgb(l_test, prediction)
visual(l_test, realrgb, fakergb, args.gpu, 1)

My best model was CGAN with LAB colour space. This model was able to outperfrom all the other methods. Once reason for better performance was the total number for predictions. In RGB space we are predicting 256exp3 where as for LABs we are only predicting 256exp2. Which are way less predictions resulting in amazing performace. Additonally this model is only predicting the colours, unlike in RGB the model was also trying to predict the shape. This makes this model better suited for colourization. The predictions are very good. The model is good at prediciting brown, and green colour, however its struggling to predict blue and yellow. This can be improved by introducing more examples containing more colours and training for more epochs. 

### Saving to HTML
Detailed instructions for saving to HTML can be found <a href="https://stackoverflow.com/questions/53460051/convert-ipynb-notebook-to-html-in-google-colab/64487858#64487858">here</a>. Provided below are a summary of the instructions:

(1) download your ipynb file by clicking on File->Download.ipynb

(2) reupload your file to the temporary Google Colab storage (you can access the temporary storage from the tab to the left)

(3) run the following:

In [None]:
%%shell
jupyter nbconvert --to html /content/LAB_3_Generating_Data1.ipynb

(4) the html file will be available for download in the temporary Google Colab storage

(5) review the html file and make sure all the results are visible before submitting your assignment to Quercus