<a href="https://colab.research.google.com/github/Afsah-Hyder/Computer-Vision-Course/blob/main/CV_Assignment_02_ah07065.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


# Computer Vision Assigment 02
## Name: Afsah Hyder
## ID: ah07065

In this question we will expand on your knowledge about Generative Adversarial Networks
(GANs). Building on top of what we have already implemented, the primary task in this question
is to implement a Deep Convolutional GAN (DCGAN).
You can read about GANs from online resources and understand it’s working principles. To get
an insight on the power of advanced GANs, check out this.
Simply speaking, a DCGAN is a type of GAN that uses a CNN as discriminator whereas for the
generator it uses an architecture similar to a CNN but instead of normal convolution layers, it is
composed of transposed convolutions. You can read more about DCGAN in the original
DCGAN paper.
To successfully complete the question, you need to complete all the parts below. Make sure to
cite all the sources used.
1. Load any one of the datasets from the list below, that you think is interesting to
work, on in your notebook (only the training dataset). We would recommend you
guys to downscale the images i.e., 64x64 or 32x32 and if needed, also convert
the image to greyscale. [5]
List of Datasets:
• Flowers
• Aircrafts
• Cute Dogs

In [None]:
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import tensorflow_datasets as tfds

# Optional library for grayscale conversion
from PIL import Image  # If you choose to convert to grayscale

# Load the flowers dataset
dataset, info = tfds.load(
    'tf_flowers',
    split='train',
    with_info=True,
    as_supervised=True,
)

# Print information about the dataset
print(info)

IMG_SIZE = 32

def format_image(image, label):

    image = tf.image.resize(image, (IMG_SIZE, IMG_SIZE))

    # Normalisation
    image = image/255.0
    return image, label

training_set = dataset.shuffle(300).map(format_image).prefetch(1)


tfds.core.DatasetInfo(
    name='tf_flowers',
    full_name='tf_flowers/3.0.1',
    description="""
    A large set of images of flowers
    """,
    homepage='https://www.tensorflow.org/tutorials/load_data/images',
    data_dir='/root/tensorflow_datasets/tf_flowers/3.0.1',
    file_format=tfrecord,
    download_size=218.21 MiB,
    dataset_size=221.83 MiB,
    features=FeaturesDict({
        'image': Image(shape=(None, None, 3), dtype=uint8),
        'label': ClassLabel(shape=(), dtype=int64, num_classes=5),
    }),
    supervised_keys=('image', 'label'),
    disable_shuffling=False,
    splits={
        'train': <SplitInfo num_examples=3670, num_shards=2>,
    },
    citation="""@ONLINE {tfflowers,
    author = "The TensorFlow Team",
    title = "Flowers",
    month = "jan",
    year = "2019",
    url = "http://download.tensorflow.org/example_images/flower_photos.tgz" }""",
)


Implement a CNN based discriminator. Justify in no more than 2 lines why have you
chosen your architecture for the discriminator. Remember to use only Pytorch in
building your networks. [12+3]

In [None]:
import torch
import torch.nn as nn
import torchvision.datasets

class CNNModel(nn.Module):
  def __init__(self,size):
    super(CNNModel, self).__init__()
    self.conv_layers = nn.Sequential(
        nn.Conv2d(3, 32, kernel_size=3, padding=1),  # Input channel = 3 (RGB)
        nn.BatchNorm2d(32),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=2, stride=2),      # output= 32*16*16

        nn.Conv2d(32, 64, kernel_size=3, padding=1),
        nn.BatchNorm2d(64),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=2, stride=2),     # output= 64*8*8

        nn.Conv2d(64, 128, kernel_size=3, padding=1),
        nn.BatchNorm2d(128),
        nn.ReLU(),
        nn.MaxPool2d(kernel_size=2, stride=2),     # output= 128*4*4
    )
    self.fc_layers = nn.Sequential(
        nn.Flatten(),
        nn.Linear(128 * 4 * 4, 512),  # Calculate output size based on input dimensions
        nn.BatchNorm1d(512),
        nn.ReLU(),
        nn.Dropout(0.1),                 #not much data, so lower drop-out rates used

        nn.Linear(512,145),
        nn.BatchNorm1d(145),
        nn.ReLU(),
        nn.Dropout(0.1),

        nn.Linear(145, 5),
        nn.Softmax(dim=1),
    )

  def forward(self, x):
    x = self.conv_layers(x)
    x = self.fc_layers(x)
    return x

**Question 02** In this question, you will have to read up a bit on the different types of GANS that are found:
Style transfer GAN, CGANS, CycleGAN, SRGANS. Only reading the abstract and introduction
of these papers will equip you to solve these questions, reading them fully is up to you. Then, go through the given situations and suggest the suitable GAN along with a short but
suitable explanation:



---


a. A researcher gave a student the job to perform some preprocessing techniques on an image dataset. The student, while playing around with that data, applies a sufficiently large median blurring kernel to the images, but deletes the original files. These new images were padded correctly as to retain the original size of the images. Which GAN is best suited to revive the original HD images and why? [10]

Super-Resolution Generative Adversarial Networks (SRGANs) are the most suitable architecture for restoring the missing detail in the blurred HD images. Unlike general-purpose GANs, SRGANs are specifically designed for super-resolution tasks. Their strength lies in a combined loss function that incorporates both an adversarial loss and a content loss. The adversarial loss trains the network to generate images that appear realistic to a discriminator network, mimicking natural high-resolution photos. The content loss, however, ensures the generated image retains the essential details and features present in the blurred version. This targeted approach makes SRGANs well-suited for recovering the lost information in the corrupted HD images.



---


b. Interns at a computer vision company are tasked with adding different skin textures to
images of pandas and regular bears. If presented with a picture of a regular bear, the
skin should be changed to match that of a panda. They do not have paired images of
pandas and bears that they can use as direct mappings of each other. They have a set
of images of pandas, and a different set of images of bears. Which GAN is best suited to
achieve this image-to-image translation task and why? [10]

CycleGAN is ideal for this unpaired image-to-image translation task. Separate panda and bear image sets train two generators: one to create panda-like skin on bears and another for bear-like skin on pandas. The cycle consistency loss enforces that translating a bear to panda-like skin and back to "bear" should yield an image close to the original bear. This cyclical constraint ensures realistic skin texture swapping on new images, even without paired training data.



---


c. A daughter wants to give her Van Gogh-fanatic mother a present. So, she decides to
turna set of family photos to a set that looks as if it's been painted by Van Gogh. Which
GAN's best suited to achieve this task and how would she train her model (using what
datasets, etc.)? [10]

Style Transfer GAN can create the painted photos with a Van Gogh artistic style. It achieves this artistic mastery through two key mechanisms: "style vectors" that define the artistic essence, and a "progressive training" approach where the network gradually learns to generate high-resolution images. A technique called "adaptive instance normalization" empowers StyleGAN to adapt the learned style to various image content.


