# | WGAN-GP | GM | QuickDraw | Image Generation |

## WGAN-GP (Wasserstein GAN with Gradient Penalty) and GM (Generative Models) for QuickDraw Image Generation

# <b>1 <span style='color:#78D118'>|</span> Introduction</b>

This project is an exploration of Generative Models (GM) and its capabilities, focusing on the generation of bicycle images using Wasserstein Generative Adversarial Networks (WGAN-GP) in conjunction with estimators and generators.

WGAN-GP (Wasserstein GAN with Gradient Penalty) is a specific type of generative adversarial network (GAN) utilized for generating realistic data, particularly images.

- **Architecture**:
   - **WGAN-GP**: WGAN-GP represents a variant of the GAN framework that places significant emphasis on enhancing training stability and the quality of generated outputs. It leverages the concept of Wasserstein divergence, a metric measuring the dissimilarity between two probability distributions. WGAN-GP introduces a gradient penalty mechanism to control the gradient norms of the discriminator, thereby fostering more stable training and smoother gradients.

- **Loss Function**:
   - **WGAN-GP**: In contrast to traditional GANs, WGAN-GP employs the Wasserstein divergence loss, also referred to as the Earth-Mover (EM) loss. This loss measures the dissimilarity between probability distributions and is considered advantageous for improving the quality of generated data. In addition to this loss, WGAN-GP incorporates a gradient penalty component to enhance training stability further.

- **Training Stability**:
   - **WGAN-GP**: The primary objective of WGAN-GP is to enhance training stability. By incorporating a gradient penalty, it mitigates issues that can affect some conventional GANs, such as mode collapse, where the generator tends to produce similar-looking samples.

In summary, WGAN-GP is a specialized GAN variant tailored to enhance the training stability and output quality, particularly when generating images.

## Objectives :
 - Develop and train a powerful WGAN-GP model using the expansive QuickDraw dataset.
 - Cultivate a deep understanding of the cutting-edge WGAN-GP architecture and Generative AI techniques.

## The QuickDraw Dataset:
The [Quick Draw dataset](https://quickdraw.withgoogle.com/data) is a treasure trove of approximately 50 million drawings, contributed by real artists. For our endeavor, we have curated a subset consisting of 117,555 meticulously crafted bicycle drawings.

**Access the QuickDraw Dataset:**
 - Dataset Repository: [https://github.com/googlecreativelab/quickdraw-dataset](https://github.com/googlecreativelab/quickdraw-dataset)
 - Numpy Bitmap Files: [https://console.cloud.google.com/storage/quickdraw_dataset/full/numpy_bitmap](https://console.cloud.google.com/storage/quickdraw_dataset/full/numpy_bitmap)
 - Bicycle Dataset: [https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/bicycle.npy](https://storage.googleapis.com/quickdraw_dataset/full/numpy_bitmap/bicycle.npy)

## Project Workflow:

- **Setup**: Imports and Parameters.
-  **Data Exploration**: Discovering bicycle drawings in the Dataset.
-  **Model Architecture**: Designing a WGAN-GP (Wasserstein Generative Adversarial Network with Gradient Penalty).
-  **Model Building**: Creating the GAN Model.
-  **Model Training**: Feed data to the model and watch as it learns to generate bicycles.
-  **Artistic Analysis**: Delving into the generated Bicycles.

# <b>2 <span style='color:#78D118'>|</span> Setup</b>

## <b>2.1 <span style='color:#78D118'>|</span> Imports</b>

In [None]:
!pip install -q git+https://github.com/YanSteph/SKit.git

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from skimage import io
import sys

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.utils import plot_model
from tensorflow.keras.optimizers.legacy import Adam 
from tensorflow.keras.utils import plot_model

#import WGANGP
#import ImagesCallback

from skit.show import show_images, show_history, show_text

## <b>2.1 <span style='color:#78D118'>|</span> Parameters</b>

**Train**

`scale`: With `scale=1`, it takes approximately 5 to 6 minutes on a GPU V100, else exceed 2 hours on a CPU.  

`latent_dim`: This variable represents the dimensionality of the latent space. In many machine learning applications, particularly in generative models like GANs, the latent space is where the model learns to represent complex data in a lower-dimensional space. In this case, `latent_dim` is set to 128, indicating that the latent space has 128 dimensions.

`epochs`: It specifies the number of complete passes through the entire training dataset during the training process. Setting it to 3 means that the model will go through the dataset three times during training.

`n_critic`: This is typically used in the context of training a GAN. It represents the number of times the critic (discriminator) is trained before the generator is updated. Training the critic multiple times before updating the generator is a technique to ensure that the critic provides meaningful feedback to the generator.

`batch_size`: It represents the number of data samples used in each iteration of training. A batch is a subset of the entire dataset, and the model's weights are updated after processing each batch. A batch size of 64 means that 64 data samples are processed together before updating the model's parameters.

**Adam**

`Learning Rate`: This is the learning rate, a crucial parameter during neural network training. It determines the size of the steps the optimizer takes when updating the model's weights at each iteration. A too high learning rate can lead to unstable convergence, while a too low one can make training very slow or converge to a local minimum. It needs to be carefully tuned for good results.

`Beta_1`: This is the attenuation coefficient for the moving average of the first moment (moving average of gradients). It's a number between 0 and 1 that determines the relative importance of recent gradients compared to older ones. A typical value is usually close to 0.9. A value closer to 1 gives more weight to recent gradients, while a value closer to 0 results in a slower-moving average.

`Beta_2`: This is the attenuation coefficient for the moving average of the second moment (moving average of squared gradients). Like beta_1, it's also a number between 0 and 1. A typical value for beta_2 is usually close to 0.999. Beta_2 determines how fast the moving average of the second moment adapts to gradient variations.

**Option**

`fit_verbosity`: Verbosity level during training: 0 = silent, 1 = progress bar, 2 = one line per epoch. 

`num_img`: The number of images to visualize during the training.

In [None]:
# Train
# ----
scale         = 1
latent_dim    = 128
epochs        = 3
n_critic      = 2
batch_size    = 64

# Adam
# ----
learning_rate = 0.0002
beta_1        = 0.5
beta_2        = 0.9

# Option
# ----
num_img       = 12
fit_verbosity = 1

# <b>3 <span style='color:#78D118'>|</span> Data Exploration</b>

In [None]:
# Load dataset
x_data = np.load("./bike.npy")
print('Original dataset shape : ',x_data.shape)

# Rescale
n=int(scale*len(x_data))
x_data = x_data[:n]
print('Rescaled dataset shape : ',x_data.shape)

# Normalize, reshape and shuffle
x_data = x_data/255
x_data = x_data.reshape(-1,28,28,1)
np.random.shuffle(x_data)
print('Final dataset shape    : ',x_data.shape)


These image are drawn by real humans.

In [None]:
show_images(
    x_data, 
    indices      = range(72), 
    columns      = num_img, 
    figure_size  = (2,2), 
    padding      = 0,
    spines_alpha = 0
)

# <b>4 <span style='color:#78D118'>|</span> Model Architecture</b>

## <b>4.1 <span style='color:#78D118'>|</span> Discriminator</b>


In [None]:
# Input layer
inputs = keras.Input(shape=(28, 28, 1))

# Convolutional layers with LeakyReLU activation
x = layers.Conv2D(64, kernel_size=4, strides=2, padding="same")(inputs)
x = layers.LeakyReLU(alpha=0.2)(x)

x = layers.Conv2D(128, kernel_size=4, strides=2, padding="same")(x)
x = layers.LeakyReLU(alpha=0.2)(x)

x = layers.Conv2D(128, kernel_size=4, strides=2, padding="same")(x)
x = layers.LeakyReLU(alpha=0.2)(x)

# Flatten the output
x = layers.Flatten()(x)

# Apply dropout for regularization
x = layers.Dropout(0.2)(x)

# Output layer with a single unit (binary classification)
x = layers.Dense(1)(x)

# Create and summary the discriminator model
discriminator = keras.Model(inputs, x, name="discriminator")
discriminator.summary()
plot_model(discriminator, show_shapes=True, show_layer_names=True)

## <b>4.2 <span style='color:#78D118'>|</span> Generator</b>

In [None]:
# Define the input layer
inputs = keras.Input(shape=(latent_dim,))

# Fully connected layer followed by reshaping
x = layers.Dense(7 * 7 * 64)(inputs)
x = layers.Reshape((7, 7, 64))(x)

# Upsampling layers
x = layers.UpSampling2D()(x)
x = layers.Conv2D(128, kernel_size=3, strides=1, padding='same', activation='relu')(x)

x = layers.UpSampling2D()(x)
x = layers.Conv2D(256, kernel_size=3, strides=1, padding='same', activation='relu')(x)

# Output layer
outputs = layers.Conv2D(1, kernel_size=5, strides=1, padding="same", activation='sigmoid')(x)

# Create the generator model
generator = keras.Model(inputs, outputs, name="generator")

# Display model summary
generator.summary()
plot_model(generator, show_shapes=True, show_layer_names=True)

# <b>5 <span style='color:#78D118'>|</span> Model Building</b>

In [None]:
gan = WGANGP(
        discriminator = discriminator, 
        generator     = generator, 
        latent_dim    = latent_dim, 
        n_critic      = n_critic
    )

In [None]:
gan.compile(
    discriminator_optimizer = Adam(learning_rate=learning_rate, beta_1=beta_1, beta_2=beta_2),
    generator_optimizer     = Adam(learning_rate=learning_rate, beta_1=beta_1, beta_2=beta_2)
)

# <b>6 <span style='color:#78D118'>|</span> Model Training</b>

In [None]:
imagesCallback = ImagesCallback(
    num_img    = num_img, 
    latent_dim = latent_dim, 
    run_dir    = f'./images'
)

history = gan.fit( 
            x_data, 
            epochs     = epochs, 
            batch_size = batch_size, 
            callbacks  = [imagesCallback], 
            verbose    = fit_verbosity 
          ) 

gan.save(f'./models/model.h5')

## <b>6.1 <span style='color:#78D118'>|</span> History</b>

In [None]:
show_history(
    history,
    title = "Loss",
    metrics = ["d_loss", "g_loss"], 
    metric_labels = ["Discriminator loss", "Generator loss"]
)

In [None]:
for epoch in range(0,epochs,1):
    images=[]
    
    for i in range(num_img):
        filename = f'./images/image-{epoch:03d}-{i:02d}.jpg'
        image    = io.imread(filename)
        images.append(image)
        
    show_text("b", f"Epoch: {epoch}", False)
    show_images(
        images, 
        None, 
        indices='all', 
        columns=num_img, 
        figure_size=(1,1), 
        interpolation=None, 
        padding=0, 
        spines_alpha=0
    )

## <b>7 <span style='color:#78D118'>|</span> Artistic Analysis</b>

In [None]:
gan.reload(f'./models/model.h5')

Generate somes images from latent space :

In [None]:
nb_images = num_img*15

z = np.random.normal(size=(nb_images,latent_dim))
images = gan.predict(z, verbose=0)

In [None]:
show_images(
    images, 
    None, 
    indices='all', 
    columns=num_img, 
    figure_size=(1,1), 
    interpolation=None, 
    padding=0, 
    spines_alpha=0
)

## References

The creation of this document was greatly influenced by the following key sources of information:

1. [Quick Draw dataset](https://quickdraw.withgoogle.com/data) is a treasure trove of approximately 50 million drawings, contributed by real artists.
2. [Fidle](https://gricad-gitlab.univ-grenoble-alpes.fr/talks/fidle/-/wikis/home) - An informative guide that provides in-depth explanations and examples on various data science topics.