## DCGAN using CIFAR-10

Deep Convolutional Generative Adversarial Network [DCGAN] is a Unsupervised Learning Technique that learns a hierarchy of representations from object parts to scenes in both the Generator and Discriminator. The Generative Adversarial Networks [GAN] was introduced by Ian Goodfellow, an architecture whose aim was to learn from the errors or the loss of the output that the generator generated and tune it's parameters such that it is able to generate the output that fools a discriminator. A discriminator is nothing but a classifier that trains on the data and learns to classify the data generated by the discriminator as fake or real.

In this tutorial, we'll be implementing the DCGAN Architecture, slightly modified to fit for the CIFAR-10 Dataset and see how the DCGAN is able to generate CIFAR-10 images from noise input by learning the representations and decreasing the generator loss.

So, let's get started.

## Step-1: Import Dependencies

In [None]:
# Import Dependencies
import numpy as np

# Dataset
from keras.datasets import cifar10
from keras.models import Sequential

# Common Layers
from keras.layers import Dense, Activation, BatchNormalization, Reshape, Flatten 
from keras.optimizers import Adam

# Layers specific to Generator
from keras.layers import Conv2DTranspose

# Layers specific to Discriminator
from keras.layers import Conv2D, LeakyReLU 

# Use this to pass an element-wise TensorFlow/Theano/CNTK function as an activation
import keras.backend as k

import matplotlib.pyplot as plt
%matplotlib inline

For this code, I'll be using the CIFAR-10 Dataset.

The CIFAR-10 dataset will be laoded using the Keras "load_data" functionality. When we load the data using this, it is loaded into training and test set as a tuple of each. i.e. a tuple of training features and labels and a tuple of test features and labels.

Then we will analyze the dataset for its number of features and labels and visualize the dataset at the end.

## Step-2: Load and Visualize Dataset

In [None]:
# Load Dataset
(X_train, y_train), (X_test, y_test) = cifar10.load_data() 

In [None]:
# Get Data Analysis
print('Training Data: \n')
print('Num. Features: ',len(X_train)), print('Num. Labels: ',len(y_train))
print('Shape of Features: ',X_train.shape), print('Shape of Labels: ',y_train.shape)
print('\n\n')

print('Test Data: \n')
print('Num. Features: ',len(X_test)), print('Num. Labels: ',len(y_test))
print('Shape of Features: ',X_test.shape), print('Shape of Labels: ', y_test.shape)

So, as we can see, there are 50,000 training features and labels and 10,000 test features and labels. The shape of the total dataset is :

Training: (50000, 32, 32, 3)

Test: (10000, 32, 32, 3)

Let's take a look at the individual image shape.

In [None]:
# Shape of One Image
rand_idx = np.random.randint(0, len(X_train), 1)
print('Shape of one Image: ', X_train[rand_idx].shape)

Each image in the CIFAR-10 dataset is of the shape (32, 32, 3) i.e. a total of 3072 pixel values.

In [None]:
# Visualize Images
label_names = ['airplane','automobile','bird','cat','deer','dog','frog','horse','ship','truck']
fig, ax = plt.subplots(nrows=3, ncols=3, figsize=(10,5))
for i in range(0,3):
    for j in range(0,3):
        idx = np.random.randint(0, len(X_train), 1)
        idx = idx[0]
        ax[i,j].imshow(X_train[idx])
        ax[i,j].set_axis_off()
        ax[i,j].title.set_text('Label: {}'.format(label_names[y_train[idx][0]]))
        plt.tight_layout()

## Step-3: Data Preprocessing

According to the paper, the input images were scaled to the range of [-1,1]. Using this ensures that each input parameter i.e. the pixels in the case of images have a similar data distribution. This helps as it speeds up the convergence while training the model. This also helps to avoid vanishing gradient problem while backpropagation. 

So, next, we'll write a function that does just that.

In [None]:
# Taking a random image and looking at its pixel values
idx = np.random.randint(0, len(X_train), 1)
print('Image Index No.: ', idx)
print('\nImage Pixel Values [Before Normalization]: \n\n',X_train[idx])
print('\n\n Shape of Image: ',X_train[idx].shape)

In the above lines, we take a random image and see that it has the values that lie in the range (0,255). So, let's write a function to take in these images and normalize them to the range of (-1,1). Also, the paper tells that the dense layer i.e. the layer before the first convolution layer for the generator is a 4-D tensor. So, we'll reshape all of the training data here as well and add a 4th dimension up front.

Also, we'll also write a de-normalizing function to reshape the generated images to the shape (28,28) and the pixel values in the range (0,255) before plotting the final generated images.

In [None]:
# Function to apply Normalization similar to tanh activation function range i.e. [-1,1]
def normalize_images(img):
    img = img.reshape(-1,32,32,3)
    img = np.float32(img)
    img = (img / 255 - 0.5) * 2
    img = np.clip(img, -1, 1)
    return img

In [None]:
# Function to DeNormalize the Images once we are done Training the DCGAN Model
def denormalize_images(img):
    img = (img / 2 + 1) * 255
    img = np.clip(img, 0, 255)
    img = np.uint8(img)
    img = img.reshape(32, 32, 3)
    return img

Now that we are done with the normalizing and denormalizing functions, let's normalize the training and test features and have a look at the same image after normalization and the overall shape of the train and test features.

In [None]:
# Normalize the Training and Test Features
X_train = normalize_images(X_train)
X_test = normalize_images(X_test)

In [None]:
# Test the Normalization Function
print('Image Index No.: ', idx)
print('\nImage Pixel Values [After Normalization]: \n\n',X_train[idx])

So, as we can see, using normalization our image pixel values have been changed from the scale of (0,255) to the scale of (-1,1).

In [None]:
print(X_train.shape)
print(X_test.shape)

## Step-4: DCGAN Generator Architecture

The paper describes the DCGAN Generator Architecture as shown in the following image:

In [None]:
# Display Generator Architecture
from IPython.display import Image
Image(filename='./Images/generator.png', width=900) 

The Generator for DCGAN has the following components:

**1. Input Layer [Dense]:**

This layer is where we provide the noise Input using which, over time and training the Generator is able to convert into an image at the output.

This layer is a Fully Connected or Dense layer that takes the pixels of the image. For this tutorial, we'll be using the input units with a dimension of 7*7*128. As per the paper, the input shape for the dense layer is 100z, so, we'l be using the input shape as 100.

**2. Reshape:**

Before giving the data into the Transposed Convolution function, we need to reshape the input data so that we can convert the array of values to a matrix and apply convolution operations on it.

Input Shape: **[7*7*128]**

After Reshaping, Input to Transpose Convolution Function: **[7,7,128]**

**3. 2-D Transposed Convolution [Conv2DTranspose]:**

As per the architecture of DCGAN mentioned in the paper, the Generator performs a series of Transposed Convolutions after getting the data from the dense layer and at the final layer we get a 64x64 image from these high level representations.
Since, the first Transposed Convolution layer has a shape of 7 x 7 x 128 with a filter of 5 x 5 and stride of 2, the next transposed convolution layer will have the shape of 14 x 14 x 64, and then finally 28 x 28 x 1 i.e. the image.

**4. Activation Functions [ReLU, Tanh]:**

As per the paper, the Transposed Convolution layers use the ReLU activation function whereas we use a tanh activation function for the final layer. Using the bounded activation function allows the model to learn more quickly to saturate and cover the color space of the training distribution.

** -------------------------------------------------------------------------------------------------------------------------- **

**NOTE:** In case you are thinking that from where all these shapes for the convolutions appeared from, let's have a look.
1. We know that, each MNIST image has a shape of  28 x 28 x 1 where 1 is the color channel and since the image is black and white, so it's 1.
2. Look at the generator from the image side. You have an image with the shape 28x28x1 as input.
3. Then, for filter of 5x5, stride of 2 and same padding, we can find out the shape of the next output as follows:

### output shape = (W + 2*P - F) + 1 / S

where, 

**W:** Width of Input Image

**P:** Padding

**F:** Size of Filter

**S:** Stride

So, using the values that we defined above, we get the output as:

** Shape of output after 1st convolution:** (28 + 2*2 - 5) + 1 / 2  => 14 or  14 x 14 x number of filters

So, if the number of filters = 64, the shape of output becomes:  14 x 14 x 64

Similarly, for the next layer with the same configuration, the shape will be: 7 x 7 x 128.

Well, you might think that why I said convolutions and not Transposed Convolutions, that is because we went from image to last filter. This is what is the architecture for Discriminator. If we just use this in the opposite order, it becomes the size for the Generator. You will see more when we print out the summary of the Generator and the Discriminator.

** -------------------------------------------------------------------------------------------------------------------------- **

So, let's define the Generator Function.

In [None]:
# Generator
def generator(inputSize):
    generator_model = Sequential()
    # Input Dense Layer
    generator_model.add(Dense(4*4*256, input_shape=(inputSize,)))
    # Reshape the Input, apply Batch Normalization and Leaky ReLU Activation.
    generator_model.add(Reshape(target_shape=(4,4,256)))
    generator_model.add(BatchNormalization())
    generator_model.add(Activation('relu'))
    
    # First Transpose Convolution Layer
    generator_model.add(Conv2DTranspose(filters=128, kernel_size=5, strides=2, padding='same'))
    generator_model.add(BatchNormalization())
    generator_model.add(Activation('relu'))
        
    # Since, we are using MNIST Data which has only 1 channel, so filter for Generated Image = 1
    generator_model.add(Conv2DTranspose(filters=64, kernel_size=5, strides=2, padding='same'))
    generator_model.add(BatchNormalization())
    generator_model.add(Activation('relu'))
    
    generator_model.add(Conv2DTranspose(filters=3, kernel_size=5, strides=2, padding='same'))
    generator_model.add(Activation('tanh'))
    
    generator_model.summary()
    
    return generator_model

## Step-5: DCGAN Discriminator Architecture

The paper describes the DCGAN Discriminator Architecture as shown in the following image:

In [None]:
# Display Discriminator Architecture
from IPython.display import Image
Image(filename='./Images/discriminator.png', width=900) 

The discriminator for DCGAN has the following components:

**1. 2-D Convolution [Conv2D]:**

Since, the aim of the discriminator is to classify images between real and fake, it takes in the complete image generated by the generator and try to tell that whether it is a true or a fake image. Hence, CNN comes into play as they are the state of the art networks for image classification. So, we use Convolution filters for the first 3 layers as opposed to Transpose Convolution in the Generator.


**2. Activation Functions [LeakyReLU]:**

As per the paper, the Convolution layers use the LeakyReLU activation function throughout the discriminator layers. Using the bounded activation function allows the model to learn more quickly to saturate and cover the color space of the training distribution.

**NOTE:** The shapes for the Discriminator have been defined in the Generator text.

In [None]:
# Discriminator
def discriminator(leakSlope):
    discriminator_model = Sequential()
    
    # Input and First Conv2D Layer
    discriminator_model.add(Conv2D(filters=64, kernel_size=5, strides=2, padding='same', input_shape=(28,28,1)))
    discriminator_model.add(LeakyReLU(alpha=leakSlope))
    
    # Second Conv2D Layer
    discriminator_model.add(Conv2D(filters=128, kernel_size=5, strides=2, padding='same'))
    discriminator_model.add(BatchNormalization())
    discriminator_model.add(LeakyReLU(alpha=leakSlope))
    
    discriminator_model.add(Conv2D(filters=256, kernel_size=5, strides=2, padding='same'))
    discriminator_model.add(BatchNormalization())
    discriminator_model.add(LeakyReLU(alpha=leakSlope))
    
    discriminator_model.add(Flatten())
    discriminator_model.add(Dense(32*32*3))
    discriminator_model.add(BatchNormalization())
    discriminator_model.add(LeakyReLU(alpha=leakSlope))
    
    # Output Layer
    discriminator_model.add(Dense(3))
    discriminator_model.add(Activation('sigmoid'))
    
    discriminator_model.summary()
    
    return discriminator_model

Now that we have defined the Generator and the Discriminator, we just need to combine these two in a single place to form a DCGAN architecture. So, let's do that.

## Step-6: DCGAN Architecture

The paper defines the DCGAN Architecture as follows:

In [None]:
# Display Discriminator Architecture
from IPython.display import Image
Image(filename='./Images/complete_dcgan.png', width=900) 

In [None]:
# Define DCGAN Architecture
def DCGAN(sample_size, generator_lr, generator_momentum, discriminator_lr, discriminator_momentum, leakyAlpha, show_summary=False):
    
    # Clear Session
    k.clear_session()
    
    # Generator
    gen = generator(inputSize=100)
    
    # Discrimintor
    dis = discriminator(leakSlope=0.2)
    dis.compile(loss='binary_crossentropy', optimizer=Adam(lr=discriminator_lr, beta_1=discriminator_momentum))
    
    dis.trainable = False
    
    dcgan = Sequential([gen, dis])
    dcgan.compile(loss='binary_crossentropy', optimizer=Adam(lr=generator_lr, beta_1=generator_momentum))
    
    if show_summary == True:
        print("\n Generator Model Summary: \n")
        gen.summary()
        
        print("\n\n Discriminator Model Summary: \n")
        dis.summary()
        
        print("\n\nDCGAN Model Summary\n")
        dcgan.summary()
    
    return dcgan, gen, dis

In the above block, we define the DCGAN Architecture where we combine the Generator and the Discriminator into one. We'll be using Adam Optimizer for the training of the generator and the discriminator with a binary_crossentropy loss function.

## Helper Function to Plot Images after every 20 Epochs of Training

In [None]:
# Function to Plot Images
def plot_images(generated_images):
    n_images = len(generated_images)
    rows = 4
    cols = n_images//rows
    
    plt.figure(figsize=(cols, rows))
    for i in range(n_images):
        img = denormalize_images(generated_images[i])
        plt.subplot(rows, cols, i+1)
        plt.imshow(img)
        plt.xticks([])
        plt.yticks([])
    plt.tight_layout()
    plt.show() 

Now that we have our model ready, it's time to train our model. In the function below, we will generate a random noise data which will be passed into the generator as input. Over the time, the generator will learn from it's loss the correct features and eventually start outputting the MNIST images.

## Step-7: Train the Model

In [None]:
# Function to Train the Model
def train_model(sample_size, generator_lr, generator_momentum, discriminator_lr, discriminator_momentum, leakyAlpha, epochs, batch_size, eval_size, smooth):
    
    # To Do: Add Label Noise Data
    # Training Labels [Real, Fake]
    training_labels = [np.ones([batch_size, 1]), np.zeros([batch_size, 1])]
    
    # Test Labels [Real, Fake]
    test_labels = [np.ones([eval_size, 1]), np.zeros([eval_size, 1])]
    
    # Total Number of Batches = (Total Training Images / Images per Batch)
    num_batches = (len(X_train) // batch_size)
    
    # Call the DCGAN Architecture
    dcgan, generator, discriminator = DCGAN(sample_size, generator_lr, generator_momentum, discriminator_lr, discriminator_momentum, leakyAlpha, show_summary=True)
    
    # Array to Store Cost/Loss Values
    cost = []
    
    # Train the Generator and Discriminator
    for i in range(epochs):
        for j in range(num_batches):
            
            # Noise Input for Generator
            # Mean = 0, Stddev = 1
            noise_data = np.random.normal(loc=0, scale=1, size=(batch_size, sample_size))
            
            # Make Predictions using Generator and Generate Fake Images
            fake_images = generator.predict_on_batch(noise_data)
            
            # Load MNIST Data in Batches
            # [0:128], [128:256], ...
            train_image_batch = X_train[j*batch_size:(j+1)*batch_size]
            
            # Train the Discriminator
            discriminator.trainable = True
            
            # Train the Discriminator on Training Data and Labels
            discriminator.train_on_batch(train_image_batch, training_labels[0] * (1 - smooth))
            
            # Train Discriminator on Fake Generated Images and Labels
            discriminator.train_on_batch(fake_images, training_labels[1])
            
            # Set Discriminator training to False when Generator is Training
            discriminator.trainable = False
            
            # Train the Generator on Noise Data Input with Training Labels to reduce Cost/Loss
            # This way, the Discriminator gets trained twice for each one training step of Generator
            dcgan.train_on_batch(noise_data, training_labels[0])
        
            
        # To Do: Add Eval Code
        # Eval/Test Features [Real,Fake]
        real_eval_features = X_test[np.random.choice(len(X_test), size= eval_size, replace=False)]
        
        # Eval Noise Data
        noise_data = np.random.normal(loc=0, scale=1, size=(eval_size, sample_size)) 
        
        # Fake Eval Features: Creates the Images to Fool the Discriminator
        fake_eval_features = generator.predict_on_batch(noise_data)
        
        # Calculate Loss
        # Discriminator Loss: Actual Training Loss for Classification + Loss on Fake Data
        discriminator_loss  = discriminator.test_on_batch(real_eval_features, test_labels[0])
        discriminator_loss += discriminator.test_on_batch(fake_eval_features, test_labels[1])
        
        # Generator Loss: DCGAN Loss
        generator_loss  = dcgan.test_on_batch(noise_data, test_labels[0])
        
        # Add calculated cost/loss to array for plotting
        cost.append((discriminator_loss, generator_loss))
        
        print("Epochs: {0}, Generator Loss: {1}, Discriminator Loss: {2}".format(i+1, generator_loss, discriminator_loss))
       
        # Plot the Images and Save them after every 10 epochs
        if ((i+1)%10 == 0):
            plot_images(fake_eval_features)
        
    # Save Trained Models
    generator.save('./cifar_generator.h5')
    discriminator.save('./cifar_discriminator.h5')
    dcgan.save('./cifar_dcgan.h5')

In [None]:
# sample_size, generator_lr, generator_momentum, discriminator_lr, discriminator_momentum, leakyAlpha, epochs, batch_size, eval_size, smooth
train_model(sample_size=100,generator_lr=0.0001, generator_momentum=0.9, discriminator_lr=0.001, discriminator_momentum=0.9, leakyAlpha=0.2, epochs=100, batch_size=128, eval_size=16, smooth=0.1);

So, finally after 100 iterations we see that the DCGAN Model has learnt to generate the CIFAR-10 images using generator and fool the discriminator.