# GAN

Finally got hands on implementing "the collest thing that has happened in deep learning in the last 20 years". Yep, it's the Generative Adversarial Neural Networks (for the poor). This is just an implementation to understand the basic working of GAN (,a DCGAN actually) and apply it to generate images of new anime faces.

The images are not gonna be highly detailed, since I am poor and my only savior is Google Colabs, so I am restricted to use small images of dimensions 64 X 64. Downloaded the dataset from Kaggle. The link to the dataset is:

https://www.kaggle.com/aadilmalik94/animecharacterfaces

### Importing and installing the necessary libraries

In [0]:
%tensorflow_version 1.x
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt

from keras.layers import Input, Reshape, Dropout, Dense, Flatten, BatchNormalization, Activation, ZeroPadding2D
from keras.layers import LeakyReLU
from keras.layers.convolutional import UpSampling2D, Conv2D
from keras.models import Sequential, Model, load_model
from keras.optimizers import Adam

import sys
from PIL import Image
import cv2
import os
import time
import glob

### Importing data from Kaggle

In [3]:
#Installing Kaggle
!pip install -U kaggle
#Make a directory name dKaggle
!mkdir -p ~/.kaggle
#Upload the kaggle.json file
from google.colab import files
files.upload()
#Copy the json file to the Kaggle directory
!cp kaggle.json ~/.kaggle/
#Import the dataset. This usually takes a lot of time. The total dataset is >13GB
!kaggle datasets download -d aadilmalik94/animecharacterfaces
!ls
#Unzipping the files
!unzip "animecharacterfaces.zip"
!ls
%cd animeface-character-dataset/data/
!ls
%cd ../../

Requirement already up-to-date: kaggle in /usr/local/lib/python3.6/dist-packages (1.5.6)


Saving kaggle.json to kaggle (1).json
Downloading animecharacterfaces.zip to /content
 98% 746M/760M [00:16<00:00, 33.5MB/s]
100% 760M/760M [00:17<00:00, 46.7MB/s]


Here we define the function to load the images from the dataset and preprocess them. The images are rescaled between -1 and 1 (as follwed by all the implementations available).

In [0]:
def load_dataset(batch_size, image_shape, data_dir=None):
    sample_dim = (batch_size,) + image_shape
    sample = np.empty(sample_dim, dtype=np.float32)
    all_data_dirlist = list(glob.glob(data_dir))
    sample_imgs_paths = np.random.choice(all_data_dirlist,batch_size)
    for index,img_filename in enumerate(sample_imgs_paths):
        image = Image.open(img_filename)
        image = image.resize(image_shape[:-1])
        image = np.asarray(image)
        image = (image/127.5) -1 
        sample[index,...] = image
    return sample

def read_data():
    image_shape=(64,64,3)
    X_train = load_dataset(30000, (64,64,3), "animeface-character-dataset/data/*.png")
    print('data loaded')
    return X_train

Now that we have the data ready, time to code out GAN.

# Deep Convolutional Generative Adversarial Networks (DCGAN) Theory

So what exacly are GANs. Well, a Generative Adversarial Network (GAN) is a class of deep learning network  model invented by Ian Goodfellow in 2014. Two deep neural networks sort of compete with each other in a game. Given a training set, this technique learns to generate new data with the same statistics as the training set. For example, a GAN trained on photographs can generate new photographs that look at least superficially authentic to human observers, having many realistic characteristics.

So the basic structue becomes two deep neural net architectures. On of the is the Generator Network and the other is the Discriminator Network.

![GAN Structure](./img1.png)

### Genertor Network

The Generator Network takes random noise as input, propagates it through a neural network to transform the noise and reshape it into an image that is similar to the one in the dataset. The end goal of the Generator is to learn a distribution similar to the distribution of the training dataset to sample out realistic images.

### Discriminator Network

The Discriminator Network is more of a classifier that outputs the probability that the image fed to it is real. It is fed real images with final_value 1 denoting it as original images and los images from the Discriminator with final_value 0 denoting it as a fake image, during the whole training.

Thus, it boils down to the fact that during training, the Generator has to generate images taht looked similar to the original images such the Discriminator cannot differentiate them. The Discriminator also has to train itself to better identify the fake images. This makes the Generator to converge more towrds generating more realistic images.

DCGAN's are a type of GAN the uses Deep Convolutional Neural Networks (or CNN) as function approximators. CNN are best suited to find correlation within spatial data.

![DCGAN Structure](./img2.png)

That's almost the baisc theory required. We can now move on to code the architecutres and define the other necessary components. But before that lets keep all the possible parameters at one place.

Note: I did not write the code to load pre-trained model, because I wanted to train it from scratch. However, I have saved the models after training whcih can be loaded back using the load_model() function.

## Hyperparameters

In [0]:
img_row = img_col = 64
n_channels = 3
img_shape = (img_row, img_col, n_channels)

## Defining the Generator and Discriminator network

In [0]:
def build_generator():

    model = Sequential()

    model.add(Dense(64 * 4 * 4, activation="relu", input_dim=100))
    model.add(Reshape((4, 4, 64)))
    model.add(UpSampling2D())
    model.add(Conv2D(128, kernel_size=3, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(Activation("relu"))
    model.add(UpSampling2D())
    model.add(Conv2D(256, kernel_size=3, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(Activation("relu"))
    model.add(UpSampling2D())
    model.add(Conv2D(128, kernel_size=3, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(Activation("relu"))
    model.add(UpSampling2D())
    model.add(Conv2D(64, kernel_size=3, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(Activation("relu"))
    model.add(Conv2D(16, kernel_size=3, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(Activation("relu"))
    model.add(Conv2D(n_channels, kernel_size=3, padding="same"))
    model.add(Activation("tanh"))

    model.summary()

    noise = Input(shape=(100,))
    img = model(noise)

    return Model(noise, img)

def build_discriminator():

    model = Sequential()

    model.add(Conv2D(32, kernel_size=3, strides=2, input_shape=img_shape, padding="same"))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dropout(0.25))
    model.add(Conv2D(64, kernel_size=3, strides=2, padding="same"))
    model.add(ZeroPadding2D(padding=((0,1),(0,1))))
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dropout(0.25))
    model.add(Conv2D(128, kernel_size=3, strides=2, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dropout(0.25))
    model.add(Conv2D(256, kernel_size=3, strides=1, padding="same"))
    model.add(BatchNormalization(momentum=0.8))
    model.add(LeakyReLU(alpha=0.2))
    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(1, activation='sigmoid'))

    model.summary()

    img = Input(shape=img_shape)
    validity = model(img)

    return Model(img, validity)

In [28]:
optimizer = Adam(0.0002, 0.5)

# Build and compile the discriminator
discriminator = build_discriminator()
discriminator.compile(loss='binary_crossentropy', optimizer=optimizer, metrics=['accuracy'])

# Build the generator
generator = build_generator()

# The generator takes noise as input and generates imgs
z = Input(shape=(100,))
img = generator(z)

# For the combined model we will only train the generator
discriminator.trainable = False
valid = discriminator(img)

# The combined GAN  (stacked generator and discriminator) trains the generator to fool the discriminator
gan = Model(z, valid)
gan.compile(loss='binary_crossentropy', optimizer=optimizer)

Model: "sequential_9"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
conv2d_34 (Conv2D)           (None, 32, 32, 32)        896       
_________________________________________________________________
leaky_re_lu_17 (LeakyReLU)   (None, 32, 32, 32)        0         
_________________________________________________________________
dropout_17 (Dropout)         (None, 32, 32, 32)        0         
_________________________________________________________________
conv2d_35 (Conv2D)           (None, 16, 16, 64)        18496     
_________________________________________________________________
zero_padding2d_5 (ZeroPaddin (None, 17, 17, 64)        0         
_________________________________________________________________
batch_normalization_26 (Batc (None, 17, 17, 64)        256       
_________________________________________________________________
leaky_re_lu_18 (LeakyReLU)   (None, 17, 17, 64)       

## Defining the function to save the image output after 100 epochs of training

The data have are 64 X 64 X 3 dimensional images.

Source: https://github.com/jeffheaton/t81_558_deep_learning/blob/63bbb19f092736a7077fdebd39fba7c87db27014/t81_558_class_07_2_Keras_gan.ipynb

In [0]:
def save_images(epoch,noise):
  image_array = np.full(( 
      16 + (4 * (64+16)), 
      16 + (7 * (64+16)), 3), 
      255, dtype=np.uint8)
  
  generated_images = generator.predict(noise)

  generated_images = 0.5 * generated_images + 0.5

  image_count = 0
  for row in range(4):
      for col in range(7):
        r = row * (64+16) + 16
        c = col * (64+16) + 16
        image_array[r:r+64,c:c+64] = generated_images[image_count] * 255
        image_count += 1

          
  output_path = './output_images'
  if not os.path.exists(output_path):
    os.makedirs(output_path)
  
  filename = os.path.join(output_path,f"train_image_{epoch}.png")
  im = Image.fromarray(image_array)
  im.save(filename)

## Loading the data into X_train variable

In [22]:
X_train = read_data()

data loaded


## Defining the training function

In [0]:
def train(epochs, batch_size, saveAt):
    for epoch in range(epochs):

        real_ids = np.random.randint(0, X_train.shape[0], batch_size)
        real_images = X_train[real_ids]

        noise = np.random.normal(0, 1, (batch_size, 100))
        gen_images = generator.predict(noise)


        real_loss = discriminator.train_on_batch(real_images, np.ones((batch_size, 1)))
        fake_loss = discriminator.train_on_batch(gen_images, np.zeros((batch_size, 1)))
        disc_loss = 0.5 * np.add(real_loss, fake_loss)


        noise = np.random.normal(0, 1, (batch_size, 100))
        gen_loss = gan.train_on_batch(noise, np.ones((batch_size, 1)))

        # Print details
        print ("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" % (epoch, disc_loss[0], 100*disc_loss[1], gen_loss))

        # Save images at saveAt points
        if epoch % saveAt == 0:
            save_images(epoch, noise)

In [0]:
train(epochs=40000, batch_size=32, save_interval=50)

  'Discrepancy between trainable weights and collected trainable'


0 [D loss: 0.542992, acc.: 67.19%] [G loss: 1.729649]
1 [D loss: 0.502218, acc.: 78.12%] [G loss: 1.886701]
2 [D loss: 0.487596, acc.: 75.00%] [G loss: 1.964919]
3 [D loss: 0.452026, acc.: 85.94%] [G loss: 1.876930]
4 [D loss: 0.271426, acc.: 92.19%] [G loss: 2.012665]
5 [D loss: 0.409888, acc.: 84.38%] [G loss: 2.651031]
6 [D loss: 0.622145, acc.: 65.62%] [G loss: 2.703025]
7 [D loss: 0.354155, acc.: 85.94%] [G loss: 3.286540]
8 [D loss: 0.587899, acc.: 70.31%] [G loss: 3.545927]
9 [D loss: 0.345541, acc.: 84.38%] [G loss: 3.706892]
10 [D loss: 0.329320, acc.: 85.94%] [G loss: 2.677814]
11 [D loss: 0.378747, acc.: 82.81%] [G loss: 2.536362]
12 [D loss: 0.201384, acc.: 95.31%] [G loss: 2.666023]
13 [D loss: 0.199269, acc.: 96.88%] [G loss: 2.527017]
14 [D loss: 0.273919, acc.: 87.50%] [G loss: 2.225196]
15 [D loss: 0.414209, acc.: 75.00%] [G loss: 3.822218]
16 [D loss: 0.179617, acc.: 95.31%] [G loss: 4.159556]
17 [D loss: 0.239672, acc.: 95.31%] [G loss: 3.226823]
18 [D loss: 0.311909

## Saving the models

In [None]:
generator.save("generator_latest.h5")
discriminator.save("discriminator_latest.h5")

## Conclusion

This was a very simple implementation of the DCGAN. Implemented to understand the basics of the working of the model. With the basics done, I can now move on to more applications of this. ALthough I heavily doubt about the computation power required and if Google Colabs will be sufficient enough. Let's just hope for the best.

### References:

1. https://github.com/pavitrakumar78/Anime-Face-GAN-Keras : One of the most detailed implementation to start with. The model has been trained for 10000 steps. Sadly I could not do that because my Colabs session somehow seemed to crash.

2. https://towardsdatascience.com/generate-anime-style-face-using-dcgan-and-explore-its-latent-feature-representation-ae0e905f3974

3. https://heartbeat.fritz.ai/my-mangagan-building-my-first-generative-adversarial-network-2ec1920257e3