# Handwritten Text Generation with PyTorch

This notebook demonstrates generating handwritten-like text using a stroke-level recurrent neural network (RNN) implemented in PyTorch.

We use the DeepWriting dataset (reference: [DeepWriting Dataset on PapersWithCode](https://paperswithcode.com/dataset/deepwriting)) which contains stroke-level handwriting data.

The notebook covers dataset loading, model definition, training, inference, and saving the model.

## Setup
Install necessary libraries if not already installed.

In [1]:
!pip install torch numpy

Collecting nvidia-cuda-nvrtc-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_nvrtc_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-runtime-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_runtime_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cuda-cupti-cu12==12.4.127 (from torch)
  Downloading nvidia_cuda_cupti_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cudnn-cu12==9.1.0.70 (from torch)
  Downloading nvidia_cudnn_cu12-9.1.0.70-py3-none-manylinux2014_x86_64.whl.metadata (1.6 kB)
Collecting nvidia-cublas-cu12==12.4.5.8 (from torch)
  Downloading nvidia_cublas_cu12-12.4.5.8-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-cufft-cu12==11.2.1.3 (from torch)
  Downloading nvidia_cufft_cu12-11.2.1.3-py3-none-manylinux2014_x86_64.whl.metadata (1.5 kB)
Collecting nvidia-curand-cu12==10.3.5.147 (from torch)
  Downloading nvidia_curand_cu12-10.3.5

In [8]:
# Step 1: Download DeepWriting dataset (already done in previous execution)
# !mkdir data_set # Directory already exists
# !wget https://files.ait.ethz.ch/projects/deepwriting/deepwriting_dataset.zip -P data_set/ # File already downloaded
# !unzip data_set/deepwriting_dataset.zip -d data_set/ # Files already unzipped

# Step 2: Load data from .npz files
import os
import numpy as np

training_data_path = "data_set/deepwriting_training.npz"
validation_data_path = "data_set/deepwriting_validation.npz"

all_strokes = None

if os.path.exists(training_data_path):
    training_data = np.load(training_data_path, allow_pickle=True)
    if 'strokes' in training_data:
        all_strokes = training_data['strokes']
        print(f"Loaded training data from {training_data_path}")
    else:
        for key in training_data.keys():
            if isinstance(training_data[key], np.ndarray):
                all_strokes = training_data[key]
                print(f"Assuming key '{key}' contains the stroke data from {training_data_path}")
                break

if os.path.exists(validation_data_path) and all_strokes is not None:
    validation_data = np.load(validation_data_path, allow_pickle=True)
    if 'strokes' in validation_data:
        all_strokes = np.concatenate((all_strokes, validation_data['strokes']), axis=0)
        print(f"Loaded and concatenated validation data from {validation_data_path}")
    else:
        for key in validation_data.keys():
            if isinstance(validation_data[key], np.ndarray):
                all_strokes = np.concatenate((all_strokes, validation_data[key]), axis=0)
                print(f"Assuming key '{key}' contains the stroke data from {validation_data_path} and concatenated.")
                break
elif os.path.exists(validation_data_path) and all_strokes is None:
     validation_data = np.load(validation_data_path, allow_pickle=True)
     if 'strokes' in validation_data:
         all_strokes = validation_data['strokes']
         print(f"Loaded validation data from {validation_data_path}")
     else:
         for key in validation_data.keys():
             if isinstance(validation_data[key], np.ndarray):
                 all_strokes = validation_data[key]
                 print(f"Assuming key '{key}' contains the stroke data from {validation_data_path}")
                 break


# Step 3: Save to .npy
if all_strokes is not None:
    np.save("/content/strokes.npy", all_strokes, allow_pickle=True)
    print(f"✅ Saved {len(all_strokes)} stroke sequences to strokes.npy")
else:
    print("❌ No stroke data was loaded from the .npz files.")

Loaded training data from data_set/deepwriting_training.npz
Loaded and concatenated validation data from data_set/deepwriting_validation.npz
✅ Saved 35282 stroke sequences to strokes.npy


## Dataset Loading

Download or place the stroke-level dataset file `strokes.npy` in the working directory.

You can download the dataset from [DeepWriting Dataset](https://paperswithcode.com/dataset/deepwriting).

For this notebook, we assume the file `strokes.npy` is available locally.

In [9]:
import numpy as np
import os

data_path = 'strokes.npy'
if not os.path.exists(data_path):
    print(f"Dataset file {data_path} not found. Please download it from the DeepWriting dataset and place it in the working directory.")
else:
    print(f"Dataset file {data_path} found.")

Dataset file strokes.npy found.


## Dataset Class
Define a PyTorch Dataset class to load stroke sequences.

In [10]:
import torch
from torch.utils.data import Dataset

class StrokeDataset(Dataset):
    def __init__(self, file_path):
        self.data = np.load(file_path, allow_pickle=True)

    def __len__(self):
        return len(self.data)

    def __getitem__(self, idx):
        seq = self.data[idx]
        x = torch.tensor(seq[:-1], dtype=torch.float32)   # input (T-1, 3)
        y = torch.tensor(seq[1:], dtype=torch.float32)    # target (T-1, 3)
        return x, y

## Model Definition
Define the RNN model for handwriting generation.

In [16]:
import torch
import torch.nn as nn

class HandwritingRNN(nn.Module):
    def __init__(self, input_size=3, hidden_size=256, num_layers=2, output_size=3, dropout_prob=0.2):
        super(HandwritingRNN, self).__init__()
        self.hidden_size = hidden_size
        self.num_layers = num_layers

        self.lstm = nn.LSTM(
            input_size, hidden_size, num_layers,
            batch_first=True, dropout=dropout_prob
        )
        self.layer_norm = nn.LayerNorm(hidden_size)
        self.dropout = nn.Dropout(dropout_prob)
        self.fc = nn.Linear(hidden_size, output_size)

        self.init_weights()

    def init_weights(self):
        # Xavier initialization for LSTM weights
        for name, param in self.lstm.named_parameters():
            if 'weight' in name:
                nn.init.xavier_uniform_(param)
            elif 'bias' in name:
                nn.init.constant_(param, 0.0)
        nn.init.xavier_uniform_(self.fc.weight)
        nn.init.constant_(self.fc.bias, 0.0)

    def forward(self, x, hidden=None):
        out, hidden = self.lstm(x, hidden)            # LSTM output
        out = self.layer_norm(out)                    # Layer Normalization
        out = self.dropout(out)                       # Dropout
        out = self.fc(out)                            # Final output layer
        return out, hidden


## Training Loop
Define the training loop function.

In [20]:
def train_model(model, dataloader, criterion, optimizer, device, epochs=10, scheduler=None):
    model.train()
    for epoch in range(epochs):
        total_loss = 0
        for x, y in dataloader:
            x, y = x.to(device), y.to(device)
            optimizer.zero_grad()
            pred, _ = model(x)
            loss = criterion(pred, y)
            loss.backward()
            optimizer.step()
            total_loss += loss.item()
        # Step the scheduler at the end of each epoch
        if scheduler:
            scheduler.step()
        print(f"Epoch {epoch+1}/{epochs}, Loss: {total_loss / len(dataloader):.4f}")

## Inference (Sampling)
Define a function to generate handwriting sequences from a seed sequence.

In [18]:
def generate_sequence(model, seed_seq, length=300, device='cpu'):
    model.eval()
    generated = []
    input_seq = torch.tensor(seed_seq, dtype=torch.float32).unsqueeze(0).to(device)
    hidden = None
    with torch.no_grad():
        for _ in range(length):
            out, hidden = model(input_seq, hidden)
            next_point = out[:, -1, :].cpu().numpy()
            generated.append(next_point)
            input_seq = out[:, -1:, :]
    return np.array(generated)

## Main Training and Saving
Set parameters, load dataset, create model, train, and save the model.

In [22]:
import torch.optim as optim
from torch.utils.data import DataLoader
import torch.nn.utils.rnn as rnn_utils
import numpy as np
import torch.nn as nn # Import nn for criterion
import torch # Import torch for scheduler

# ---- Collate Function ----
def collate_fn(batch):
    """Pads sequences to the longest sequence in the batch."""
    sequences, targets = zip(*batch)
    sequences_padded = rnn_utils.pad_sequence(sequences, batch_first=True, padding_value=0)
    targets_padded = rnn_utils.pad_sequence(targets, batch_first=True, padding_value=0)
    return sequences_padded, targets_padded

# ---- Training Parameters ----
device = 'cuda' if torch.cuda.is_available() else 'cpu'
batch_size = 64
hidden_size = 256
epochs = 50 # Increased epochs
learning_rate = 0.001
model_save_path = 'handwriting_rnn.pth'
data_path = 'strokes.npy' # Assuming strokes.npy is in the current directory

# Load dataset
# Using the StrokeDataset class defined in a previous cell
dataset = StrokeDataset(data_path)
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=True,
                        drop_last=True, collate_fn=collate_fn)

# Create model
# Using the HandwritingRNN class defined in a previous cell
model = HandwritingRNN(hidden_size=hidden_size).to(device)

# Optimizer and Criterion
optimizer = optim.Adam(model.parameters(), lr=learning_rate)
criterion = nn.MSELoss() # Using nn.MSELoss as in the original cell

# Learning Rate Scheduler
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.5)


# Train model
# Using the train_model function defined in a previous cell
train_model(model, dataloader, criterion, optimizer, device, epochs=epochs, scheduler=scheduler)

# Save model
torch.save(model.state_dict(), model_save_path)
print(f"✅ Model saved at {model_save_path}")

Epoch 1/50, Loss: 0.4392
Epoch 2/50, Loss: 0.2643
Epoch 3/50, Loss: 0.2499
Epoch 4/50, Loss: 0.2429
Epoch 5/50, Loss: 0.2365
Epoch 6/50, Loss: 0.2313
Epoch 7/50, Loss: 0.2264
Epoch 8/50, Loss: 0.2227
Epoch 9/50, Loss: 0.2193
Epoch 10/50, Loss: 0.2178
Epoch 11/50, Loss: 0.2115
Epoch 12/50, Loss: 0.2102
Epoch 13/50, Loss: 0.2083
Epoch 14/50, Loss: 0.2072
Epoch 15/50, Loss: 0.2059
Epoch 16/50, Loss: 0.2041
Epoch 17/50, Loss: 0.2031
Epoch 18/50, Loss: 0.2018
Epoch 19/50, Loss: 0.2001
Epoch 20/50, Loss: 0.1993
Epoch 21/50, Loss: 0.1945
Epoch 22/50, Loss: 0.1931
Epoch 23/50, Loss: 0.1929
Epoch 24/50, Loss: 0.1921
Epoch 25/50, Loss: 0.1910
Epoch 26/50, Loss: 0.1891
Epoch 27/50, Loss: 0.1888
Epoch 28/50, Loss: 0.1886
Epoch 29/50, Loss: 0.1881
Epoch 30/50, Loss: 0.1864
Epoch 31/50, Loss: 0.1839
Epoch 32/50, Loss: 0.1826
Epoch 33/50, Loss: 0.1822
Epoch 34/50, Loss: 0.1812
Epoch 35/50, Loss: 0.1805
Epoch 36/50, Loss: 0.1803
Epoch 37/50, Loss: 0.1800
Epoch 38/50, Loss: 0.1794
Epoch 39/50, Loss: 0.

In [23]:
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
import torch.nn.utils.rnn as rnn_utils
import numpy as np

# Assuming StrokeDataset and HandwritingRNN classes are defined in previous cells
# Assuming collate_fn is defined in a previous cell

def evaluate_model(model, dataloader, criterion, device):
    model.eval()
    total_loss = 0
    total_samples = 0

    with torch.no_grad():
        for x, y in dataloader:
            x, y = x.to(device), y.to(device)
            output, _ = model(x)
            loss = criterion(output, y)
            total_loss += loss.item() * x.size(0)  # Weighted by batch size
            total_samples += x.size(0)

    avg_loss = total_loss / total_samples
    return avg_loss

In [24]:
# ---- Evaluation Setup ----
device = 'cuda' if torch.cuda.is_available() else 'cpu'
model_save_path = 'handwriting_rnn.pth'
# Assuming validation data is available or using the full dataset for evaluation
# For demonstration, let's use the full dataset loaded into strokes.npy
data_path = 'strokes.npy'

# Load dataset
dataset = StrokeDataset(data_path)
# Using the same batch size and collate_fn as training for consistency
batch_size = 64
dataloader = DataLoader(dataset, batch_size=batch_size, shuffle=False, # No need to shuffle for eval
                        drop_last=False, collate_fn=collate_fn) # Keep all samples for eval

# Create model instance and load trained weights
hidden_size = 256 # Must match the hidden size used for training
model = HandwritingRNN(hidden_size=hidden_size).to(device)
model.load_state_dict(torch.load(model_save_path, map_location=device))

# Define Criterion
criterion = nn.MSELoss()

# Evaluate the model
avg_eval_loss = evaluate_model(model, dataloader, criterion, device)
print(f"Evaluation Loss: {avg_eval_loss:.4f}")

Evaluation Loss: 0.1658
