# Handwriting Generation using Conditional GANs (cGANs)

## Introduction

Handwriting generation is a challenging task in generative AI that involves generating images of handwritten text that mimic natural human handwriting. By conditioning a generative model on specific text inputs, we can control the text that is generated by the model. **Conditional GANs (cGANs)** are particularly useful for this task as they allow us to generate data (images) conditioned on a specific label or input—in this case, a text label.

### What is a Conditional GAN (cGAN)?

A **Conditional Generative Adversarial Network (cGAN)** is a type of GAN where both the generator and discriminator receive additional information besides random noise. In handwriting generation, this additional information is typically the text or the label (e.g., "hello", "world"). This allows the model to generate handwriting that corresponds to specific textual input.

- **Generator**: The generator in a cGAN creates data (images) conditioned on a specific input, such as the text label. The generator takes two inputs:
  1. A random noise vector (latent vector).
  2. A textual representation (e.g., a one-hot encoded vector or an embedding vector representing the text to be written).
  
- **Discriminator**: The discriminator tries to distinguish between real and fake images, and is also conditioned on the text. It takes two inputs:
  1. A handwritten image (real or generated).
  2. The text label (the conditioning information).

The generator's goal is to create realistic handwriting images that correspond to the text input, while the discriminator's goal is to correctly identify whether an image is real or generated.

## Architecture Overview

### 1. Generator

The **generator** is responsible for generating handwriting images based on the given text input. The generator receives:
- A random noise vector (latents).
- A text vector (the condition) representing the target word/phrase to be written.

The generator processes these inputs through dense layers and upsampling layers (e.g., **transposed convolutions**) to generate the image. The text conditioning is typically incorporated by concatenating the text vector to the noise vector, allowing the generator to control the content of the generated handwriting.

### 2. Discriminator

The **discriminator** evaluates whether an image is real or fake. It receives:
- An image (handwritten text).
- A text vector representing the condition.

The discriminator is a convolutional neural network (CNN) that processes the image and text, comparing the two to determine whether the image corresponds to the correct text. It outputs a probability indicating whether the image is real (from the dataset) or fake (generated).

### 3. Loss Functions

- **Generator Loss**: The generator is trained to minimize the ability of the discriminator to distinguish between real and fake images. The loss is computed by using binary cross-entropy to measure the difference between the discriminator's prediction for fake images and the target label of 1 (real).

- **Discriminator Loss**: The discriminator is trained to correctly classify real and fake images. It tries to maximize the difference between the predictions for real images (target label = 1) and fake images (target label = 0).

### 4. Training Process

The training process alternates between training the **discriminator** and the **generator**:
1. **Discriminator Update**: The discriminator is trained to differentiate between real and generated images, conditioned on the text.
2. **Generator Update**: The generator is trained to fool the discriminator into classifying generated images as real, conditioned on the text.

The generator and discriminator are trained iteratively, improving their performance over time. The generator learns to generate more realistic handwriting, and the discriminator becomes better at distinguishing real from fake images.

## Benefits of Conditional GANs for Handwriting Generation

- **Control over Generated Content**: By conditioning the generator on the text, we have fine-grained control over the content of the generated handwriting. This allows us to generate any desired word or sentence.
  
- **Realistic Handwriting Generation**: cGANs are capable of generating realistic and coherent handwriting that resembles human-written text.

- **Flexibility**: By using embeddings or other forms of text representation, cGANs can work with complex text inputs (e.g., multi-word sentences, various handwriting styles).

## Key Challenges

- **Complexity of Handwriting Styles**: Handwriting generation involves dealing with variability in handwriting styles, sizes, and strokes. Capturing this complexity requires large, diverse datasets and advanced model architectures.

- **Training Stability**: GANs, in general, are known for their instability during training. Proper tuning of the hyperparameters, including the learning rates for both the generator and discriminator, is essential for achieving stable training.

## Applications of Handwriting Generation

- **Digital Document Creation**: Automatically generate handwritten text for various digital applications (e.g., emails, notes).
  
- **Personalization**: Create personalized handwritten messages based on specific user input or preferences.

- **Handwriting Synthesis**: Generate handwritten content that mimics a particular person's handwriting, useful for applications like digital signatures or stylized writing.

- **OCR and Text Recognition**: Handwriting generation models can also be used to augment OCR (Optical Character Recognition) training datasets by generating synthetic handwritten data for specific words or phrases.

---

## Conclusion

Handwriting generation using Conditional GANs (cGANs) is a powerful approach for generating realistic handwritten text based on specific input conditions. By leveraging both random noise and text embeddings, cGANs allow for controlled generation of handwriting, which can be used in various applications such as digital document creation, personalized messages, and handwriting synthesis. The combination of adversarial training and text conditioning enables the generation of high-quality, coherent handwriting images that are aligned with the given textual input.



In [26]:
import tensorflow as tf
from tensorflow.keras import layers, models
import numpy as np
import matplotlib.pyplot as plt


In [27]:
def load_data():
    # Assuming you load your custom handwriting dataset where images and corresponding texts are available
    # For demonstration purposes, using MNIST as a placeholder.
    (x_train, y_train), (_, _) = tf.keras.datasets.mnist.load_data()

    # Normalize images to [-1, 1] range
    x_train = (x_train.astype(np.float32) - 127.5) / 127.5
    x_train = np.expand_dims(x_train, axis=-1)  # Add channel dimension

    # Encode text labels (for conditional generation) - e.g., use text embedding or one-hot encoding
    y_train = tf.keras.utils.to_categorical(y_train, 10)  # Adjust this for your dataset

    return x_train, y_train


In [28]:


def build_generator(latent_dim, text_dim):
    # Separate inputs for noise and text
    noise_input = layers.Input(shape=(latent_dim,))
    text_input = layers.Input(shape=(text_dim,))

    # Combine noise vector and text embedding
    merged_input = layers.concatenate([noise_input, text_input])  # Concatenate noise + text
    x = layers.Dense(128 * 7 * 7, activation='relu')(merged_input)  # Adjusted dense layer output
    x = layers.Reshape((7, 7, 128))(x)  # Reshape to a smaller starting size

    # Transposed convolution layers to generate the image
    # Adjusted strides and kernel sizes to reach 28x28 output
    x = layers.Conv2DTranspose(64, kernel_size=3, strides=2, padding='same', activation='relu')(x)
    x = layers.BatchNormalization()(x)

    x = layers.Conv2DTranspose(1, kernel_size=3, strides=2, padding='same', activation='tanh')(x)
    # Removed an extra Conv2DTranspose layer and adjusted the last one

    # Create the model with two inputs and one output
    model = models.Model(inputs=[noise_input, text_input], outputs=x)

    return model

In [29]:
def build_discriminator(image_shape, text_dim):
    image_input = layers.Input(shape=image_shape)
    text_input = layers.Input(shape=(text_dim,))

    # Convolutional layers to process image
    x = layers.Conv2D(64, kernel_size=5, strides=2, padding='same', activation='relu')(image_input)
    x = layers.Dropout(0.3)(x)

    x = layers.Conv2D(128, kernel_size=5, strides=2, padding='same', activation='relu')(x)
    x = layers.Dropout(0.3)(x)

    # Flatten and combine with text embedding
    x = layers.Flatten()(x)
    text_embedding = layers.Dense(128, activation='relu')(text_input)
    x = layers.concatenate([x, text_embedding])  # Concatenate image features and text features

    # Final dense layer for classification (real or fake)
    x = layers.Dense(1)(x)

    model = models.Model(inputs=[image_input, text_input], outputs=x)

    return model


In [30]:
cross_entropy = tf.keras.losses.BinaryCrossentropy(from_logits=True)

def generator_loss(fake_output):
    return cross_entropy(tf.ones_like(fake_output), fake_output)

def discriminator_loss(real_output, fake_output):
    real_loss = cross_entropy(tf.ones_like(real_output), real_output)
    fake_loss = cross_entropy(tf.zeros_like(fake_output), fake_output)
    return real_loss + fake_loss


In [31]:
@tf.function
def train_step(images, text, generator, discriminator, gen_optimizer, disc_optimizer, latent_dim):
    batch_size = tf.shape(images)[0]  # Get batch size from images

    noise = tf.random.normal([BATCH_SIZE, latent_dim])

    with tf.GradientTape() as gen_tape, tf.GradientTape() as disc_tape:
        generated_images = generator([noise, text], training=True)

        real_output = discriminator([images, text], training=True)
        fake_output = discriminator([generated_images, text], training=True)

        gen_loss = generator_loss(fake_output)
        disc_loss = discriminator_loss(real_output, fake_output)

    gradients_of_generator = gen_tape.gradient(gen_loss, generator.trainable_variables)
    gradients_of_discriminator = disc_tape.gradient(disc_loss, discriminator.trainable_variables)

    gen_optimizer.apply_gradients(zip(gradients_of_generator, generator.trainable_variables))
    disc_optimizer.apply_gradients(zip(gradients_of_discriminator, discriminator.trainable_variables))

def train(dataset, epochs, generator, discriminator, gen_optimizer, disc_optimizer, latent_dim):
    for epoch in range(epochs):
        for image_batch, text_batch in dataset:
            train_step(image_batch, text_batch, generator, discriminator, gen_optimizer, disc_optimizer, latent_dim)

        print(f"Epoch {epoch + 1}/{epochs}")


In [32]:
latent_dim = 100  # Noise vector size
text_dim = 10  # Example for one-hot encoding of digits (adjust for your dataset)

# Hyperparameters
BATCH_SIZE = 64
EPOCHS = 20

# Load data
x_train, y_train = load_data()
train_dataset = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(BATCH_SIZE)

# Build models
generator = build_generator(latent_dim, text_dim)
discriminator = build_discriminator(image_shape=(28, 28, 1), text_dim=text_dim)

# Optimizers
gen_optimizer = tf.keras.optimizers.Adam(1e-4)
disc_optimizer = tf.keras.optimizers.Adam(1e-4)

# Train the model
train(train_dataset, EPOCHS, generator, discriminator, gen_optimizer, disc_optimizer, latent_dim)


ValueError: in user code:

    File "<ipython-input-24-f5d4bc241b67>", line 8, in train_step  *
        generated_images = generator([noise, text], training=True)
    File "/usr/local/lib/python3.10/dist-packages/keras/src/utils/traceback_utils.py", line 122, in error_handler  **
        raise e.with_traceback(filtered_tb) from None

    ValueError: Exception encountered when calling Concatenate.call().
    
    [1mDimension 0 in both shapes must be equal, but are 64 and 32. Shapes are [64] and [32]. for '{{node functional_16_1/concatenate_9_1/concat}} = ConcatV2[N=2, T=DT_FLOAT, Tidx=DT_INT32](random_normal, functional_16_1/Cast, functional_16_1/concatenate_9_1/concat/axis)' with input shapes: [64,100], [32,10], [] and with computed input tensors: input[2] = <-1>.[0m
    
    Arguments received by Concatenate.call():
      • inputs=['tf.Tensor(shape=(64, 100), dtype=float32)', 'tf.Tensor(shape=(32, 10), dtype=float32)']


In [None]:
def display_generated_images(generator, epoch, latent_dim, text_dim):
    noise = tf.random.normal([16, latent_dim])
    generated_images = generator([noise, np.ones((16, text_dim))], training=False)  # Example for text label = "1"
    generated_images = (generated_images + 1) / 2.0  # Rescale to [0, 1]

    plt.figure(figsize=(4, 4))
    for i in range(16):
        plt.subplot(4, 4, i + 1)
        plt.imshow(generated_images[i, :, :, 0], cmap='gray')
        plt.axis('off')
    plt.suptitle(f"Epoch {epoch}")
    plt.show()
