# Scribe Handwriting Synthesis - Colab Training

Train a handwriting synthesis model on Google Colab Pro with GPU acceleration.

**Requirements:**
- Google Colab Pro subscription ($10/month)
- ~15 GB Google Drive space for data + checkpoints
- GPU runtime (T4/V100/A100)

**Expected training time:** 3-6 hours (250 epochs with rnn_size=400)

**Important:** This notebook saves all outputs to Google Drive for persistence across sessions.

## 1. Install TensorFlow 2.15 and Dependencies

Colab comes with TensorFlow 2.19 by default, but Scribe requires TensorFlow 2.15.

In [None]:
# Install TensorFlow 2.15 and required dependencies
!pip install -q tensorflow==2.15.0 numpy==1.26.4 svgwrite==1.4.3 matplotlib==3.8.2

print("‚úì Dependencies installed")

## 2. Verify Installation and GPU

Check that TensorFlow 2.15 is installed correctly and GPU is available.

In [None]:
import tensorflow as tf
import sys

print(f"Python version: {sys.version}")
print(f"TensorFlow version: {tf.__version__}")

# Check GPU availability
gpus = tf.config.list_physical_devices('GPU')
if gpus:
    print(f"\n‚úì GPU available: {gpus[0].name}")
    # Enable memory growth to prevent OOM errors on T4
    for gpu in gpus:
        tf.config.experimental.set_memory_growth(gpu, True)
    print("‚úì GPU memory growth enabled")
else:
    print("\n‚ö†Ô∏è  WARNING: No GPU detected!")
    print("Go to Runtime > Change runtime type > Hardware accelerator > GPU")

print(f"\n‚úì Setup verification complete")

## 3. Mount Google Drive and Setup Directories

All project files and outputs will be saved to Google Drive for persistence.

**Before running this cell:**
1. Upload the entire `scribe` folder to your Google Drive root
2. Ensure `data/strokes_training_data.cpkl` is present (44 MB)
3. Ensure `data/styles/` directory contains 26 .npy files

In [None]:
from google.colab import drive
import os

# Mount Google Drive
drive.mount('/content/drive')

# Define project paths
PROJECT_DIR = '/content/drive/MyDrive/scribe'
DATA_DIR = f'{PROJECT_DIR}/data'
SAVED_DIR = f'{PROJECT_DIR}/saved'
LOGS_DIR = f'{PROJECT_DIR}/logs'

# Create directories if they don't exist
os.makedirs(SAVED_DIR, exist_ok=True)
os.makedirs(LOGS_DIR, exist_ok=True)
os.makedirs(f'{LOGS_DIR}/figures', exist_ok=True)

print(f"‚úì Google Drive mounted")
print(f"‚úì Project directory: {PROJECT_DIR}")
print(f"‚úì Output directories created")

## 4. Verify Required Files

Check that all necessary files are present before starting training.

In [None]:
import os

# Check critical files
required_files = [
    f'{PROJECT_DIR}/train.py',
    f'{PROJECT_DIR}/model.py',
    f'{PROJECT_DIR}/utils.py',
    f'{DATA_DIR}/strokes_training_data.cpkl',
]

all_present = True
for filepath in required_files:
    if os.path.exists(filepath):
        size = os.path.getsize(filepath)
        print(f"‚úì {os.path.basename(filepath)} ({size:,} bytes)")
    else:
        print(f"‚úó MISSING: {filepath}")
        all_present = False

# Check style files
styles_dir = f'{DATA_DIR}/styles'
if os.path.exists(styles_dir):
    style_files = [f for f in os.listdir(styles_dir) if f.endswith('.npy')]
    print(f"\n‚úì Found {len(style_files)} style files")
else:
    print(f"\n‚ö†Ô∏è  WARNING: Styles directory not found")

if all_present:
    print("\n‚úì All required files present - ready to train!")
else:
    print("\n‚úó ERROR: Missing required files - please upload the scribe folder to Google Drive")

## 5. Verify Training Data

Run the data verification script to ensure the dataset is valid.

In [None]:
%cd {PROJECT_DIR}
!python verify_data.py

## 6. Start Training

Train with recommended parameters for style priming support:
- `rnn_size=400` (required for style priming)
- `nmixtures=20` (high quality output)
- `nepochs=250` (full training)
- `save_every=250` (more frequent checkpoints for Colab)

**Expected training time:** 3-6 hours on T4/V100 GPU

**Note:** If the session disconnects, simply re-run this cell. The script automatically resumes from the latest checkpoint.

In [None]:
%cd {PROJECT_DIR}

!python train.py \
    --rnn_size 400 \
    --nmixtures 20 \
    --nepochs 250 \
    --batch_size 32 \
    --learning_rate 1e-4 \
    --save_every 250 \
    --data_dir {DATA_DIR} \
    --save_path {SAVED_DIR}/model \
    --log_dir {LOGS_DIR}

## 7. Monitor Training Progress (Optional)

Watch the training log in real-time. Run this in a separate cell while training is running.

In [None]:
# View the latest log file
!tail -f {LOGS_DIR}/*.log

## 8. Generate Samples After Training

Test the trained model by generating handwriting samples.

In [None]:
%cd {PROJECT_DIR}

# Generate basic sample
!python sample.py \
    --text "The quick brown fox jumps over the lazy dog. 1234567890" \
    --bias 1.0 \
    --format svg \
    --save_path {SAVED_DIR}/model

# Generate multi-line sample with styles
!python sample.py \
    --lines "Dear friend, I hope this finds you well." \
            "The meeting is scheduled for 3:00pm on June 15th." \
            "Looking forward to seeing you soon!" \
    --biases 1.2 1.0 1.2 \
    --styles 0 3 0 \
    --format svg \
    --save_path {SAVED_DIR}/model

print(f"\n‚úì Samples generated in {LOGS_DIR}/figures/")
print("View them in Google Drive or download below")

## 9. Download Results (Optional)

Download checkpoints and samples directly from Colab.

In [None]:
from google.colab import files
import glob

# Option 1: Download latest checkpoint
print("Available checkpoints:")
checkpoints = glob.glob(f'{SAVED_DIR}/checkpoint-*')
for cp in sorted(checkpoints)[-5:]:
    print(f"  {os.path.basename(cp)}")

# Option 2: Download generated SVG samples
print("\nGenerated samples:")
samples = glob.glob(f'{LOGS_DIR}/figures/*.svg')
for sample in sorted(samples)[-5:]:
    print(f"  {os.path.basename(sample)}")

# Uncomment to download the latest sample:
# if samples:
#     latest_sample = sorted(samples)[-1]
#     files.download(latest_sample)
#     print(f"\n‚úì Downloaded: {os.path.basename(latest_sample)}")

print("\nüí° Tip: All files are saved to Google Drive and persist across sessions")