# BDH Training on Kaggle - Monte Cristo Dataset

This notebook trains the Baby Dragon Hatchling (BDH) model on two classic novels:
- **The Count of Monte Cristo** by Alexandre Dumas (~2.7 MB)
- **In Search of the Castaways** by Jules Verne (~845 KB)

**Total training data: ~3.5 MB of text**

## Setup Instructions for Kaggle:
1. **Enable GPU**: Settings → Accelerator → GPU T4 x2
2. **Internet**: Settings → Internet → ON
3. Upload these files to Kaggle input:
   - `The Count of Monte Cristo.txt`
   - `In search of the castaways.txt`
4. Run all cells in order

Training takes ~30-40 minutes on Kaggle's free GPU.

In [1]:
# Check GPU availability
!nvidia-smi

Sat Jan 10 08:00:22 2026       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.172.08             Driver Version: 570.172.08     CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla P100-PCIE-16GB           Off |   00000000:00:04.0 Off |                    0 |
| N/A   37C    P0             28W /  250W |       0MiB /  16384MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                     

In [2]:
# Clear GPU memory
import torch
import gc

gc.collect()
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    print("GPU memory cleared!")
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Available memory: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.2f} GB")

GPU memory cleared!
GPU: Tesla P100-PCIE-16GB
Available memory: 15.89 GB


## Clone Original BDH Repository from Pathway

In [3]:
# Clone the official Pathway BDH repository
!rm -rf bdh
!git clone https://github.com/pathwaycom/bdh.git
%cd bdh
!ls -la

Cloning into 'bdh'...
remote: Enumerating objects: 77, done.[K
remote: Counting objects: 100% (26/26), done.[K
remote: Compressing objects: 100% (17/17), done.[K
remote: Total 77 (delta 21), reused 9 (delta 9), pack-reused 51 (from 1)[K
Receiving objects: 100% (77/77), 998.95 KiB | 25.61 MiB/s, done.
Resolving deltas: 100% (28/28), done.
/kaggle/working/bdh
total 48
drwxr-xr-x 4 root root 4096 Jan 10 08:00 .
drwxr-xr-x 3 root root 4096 Jan 10 08:00 ..
-rw-r--r-- 1 root root 5051 Jan 10 08:00 bdh.py
drwxr-xr-x 2 root root 4096 Jan 10 08:00 figs
drwxr-xr-x 8 root root 4096 Jan 10 08:00 .git
-rw-r--r-- 1 root root   10 Jan 10 08:00 .gitignore
-rw-r--r-- 1 root root 1072 Jan 10 08:00 LICENSE.md
-rw-r--r-- 1 root root 4706 Jan 10 08:00 README.md
-rw-r--r-- 1 root root   21 Jan 10 08:00 requirements.txt
-rw-r--r-- 1 root root 3670 Jan 10 08:00 train.py


In [4]:
# Install dependencies
!pip install torch numpy tqdm -q

## Verify BDH Model

In [5]:
import sys
import importlib.util
import torch

# Load bdh module
spec = importlib.util.spec_from_file_location("bdh", "bdh.py")
bdh = importlib.util.module_from_spec(spec)
sys.modules["bdh"] = bdh
spec.loader.exec_module(bdh)

# Show model configuration
config = bdh.BDHConfig()
print("BDH Model Configuration:")
print(f"  Layers: {config.n_layer}")
print(f"  Embedding dimension: {config.n_embd}")
print(f"  Attention heads: {config.n_head}")
print(f"  Dropout: {config.dropout}")
print(f"  Vocabulary size: {config.vocab_size} (byte-level)")

# Create model and show parameter count
model = bdh.BDH(config)
total_params = sum(p.numel() for p in model.parameters())
print(f"\nTotal parameters: {total_params:,} (~{total_params/1e6:.1f}M)")

BDH Model Configuration:
  Layers: 6
  Embedding dimension: 256
  Attention heads: 4
  Dropout: 0.1
  Vocabulary size: 256 (byte-level)

Total parameters: 25,296,896 (~25.3M)


## Load Training Data (Monte Cristo Books)

**IMPORTANT**: Upload these files to Kaggle as Input Files:
1. Go to "Add Input" → "Upload" → Select both .txt files
2. They will be available at `/kaggle/input/your-dataset-name/`

In [6]:
import os
from pathlib import Path

# Find the input directory (Kaggle automatically mounts uploaded datasets)
input_dirs = list(Path('/kaggle/input').glob('*'))
print("Available input datasets:")
for dir in input_dirs:
    print(f"  {dir}")
    for file in dir.glob('*.txt'):
        file_size_mb = file.stat().st_size / (1024 * 1024)
        print(f"    - {file.name} ({file_size_mb:.2f} MB)")

Available input datasets:
  /kaggle/input/bdh-book
    - In search of the castaways.txt (0.81 MB)
    - The Count of Monte Cristo.txt (2.66 MB)


In [7]:
# Load and combine both books
# MODIFY THIS PATH to match your uploaded dataset name
data_dir = Path('/kaggle/input/bdh-book')  # Change this!

books = [
    'The Count of Monte Cristo.txt',
    'In search of the castaways.txt'
]

combined_text = ""
for book in books:
    book_path = data_dir / book
    if book_path.exists():
        with open(book_path, 'r', encoding='utf-8', errors='replace') as f:
            text = f.read()
            combined_text += text + "\n\n"  # Add separator
        print(f"✓ Loaded {book}: {len(text):,} characters")
    else:
        print(f"❌ File not found: {book_path}")
        print(f"   Please upload it to Kaggle input!")

# Save combined text
with open('monte_cristo_combined.txt', 'w', encoding='utf-8') as f:
    f.write(combined_text)

print(f"\n✓ Total training data: {len(combined_text):,} characters")
print(f"✓ Saved to monte_cristo_combined.txt")
print(f"\nFirst 200 characters:")
print(combined_text[:200])

✓ Loaded The Count of Monte Cristo.txt: 2,646,614 characters
✓ Loaded In search of the castaways.txt: 826,131 characters

✓ Total training data: 3,472,749 characters
✓ Saved to monte_cristo_combined.txt

First 200 characters:
﻿The Project Gutenberg eBook of The Count of Monte Cristo
    
This ebook is for the use of anyone anywhere in the United States and
most other parts of the world at no cost and with almost no restric


## Create Training Script with Model Saving

In [8]:
%%writefile train_monte_cristo.py
import torch
import torch.nn as nn
import sys
import importlib.util
import os
from tqdm import tqdm

# Load bdh module
spec = importlib.util.spec_from_file_location("bdh", "bdh.py")
bdh = importlib.util.module_from_spec(spec)
sys.modules["bdh"] = bdh
spec.loader.exec_module(bdh)

# Configuration
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
dtype = torch.bfloat16 if torch.cuda.is_bf16_supported() else torch.float16
print(f"Using device: {device} with dtype {dtype}")

# Load data
with open('monte_cristo_combined.txt', 'r', encoding='utf-8') as f:
    data = f.read()

print(f"Training data: {len(data):,} characters")

# Convert to bytes
data_bytes = bytearray(data, "utf-8")
data_tensor = torch.tensor(data_bytes, dtype=torch.long, device=device)
print(f"Token count: {len(data_tensor):,}")

# Training parameters (adjusted for larger dataset)
max_iters = 7000  # More iterations for bigger dataset
eval_interval = 100
batch_size = 8  # Larger batch
block_size = 512  # Longer context
learning_rate = 3e-4

print(f"\nTraining parameters:")
print(f"  Iterations: {max_iters}")
print(f"  Batch size: {batch_size}")
print(f"  Block size: {block_size}")
print(f"  Learning rate: {learning_rate}")

# Create model
config = bdh.BDHConfig()
model = bdh.BDH(config).to(device)
total_params = sum(p.numel() for p in model.parameters())
print(f"\nModel parameters: {total_params:,}")

# Optimizer
optimizer = torch.optim.AdamW(model.parameters(), lr=learning_rate)

# Training function
def get_batch():
    ix = torch.randint(len(data_tensor) - block_size, (batch_size,))
    x = torch.stack([data_tensor[i:i+block_size] for i in ix])
    y = torch.stack([data_tensor[i+1:i+block_size+1] for i in ix])
    return x, y

# Training loop
print("\n" + "="*60)
print("STARTING TRAINING")
print("="*60)

model.train()
losses = []
best_loss = float('inf')

for iter in tqdm(range(max_iters), desc="Training"):
    xb, yb = get_batch()
    
    # Forward pass with mixed precision
    with torch.amp.autocast(device_type='cuda', dtype=dtype):
        logits, loss = model(xb, yb)
    
    # Backward pass
    optimizer.zero_grad(set_to_none=True)
    loss.backward()
    optimizer.step()
    
    losses.append(loss.item())
    
    if iter % eval_interval == 0:
        print(f"\nStep: {iter}/{max_iters} loss {loss.item():.3f}")
        
        # Save checkpoint if best loss
        if loss.item() < best_loss:
            best_loss = loss.item()

print(f"\nTraining complete! Final loss: {losses[-1]:.4f}")
print(f"Best loss: {best_loss:.4f}")

# Generate sample
print("\n" + "="*60)
print("GENERATING SAMPLE TEXT")
print("="*60)
model.eval()
context = torch.tensor(
    bytearray("The Count of Monte Cristo", "utf-8"), 
    dtype=torch.long, 
    device=device
).unsqueeze(0)

with torch.no_grad():
    generated = model.generate(context, max_new_tokens=300, temperature=0.8, top_k=10)
    sample_text = bytes(generated.to(torch.uint8).to("cpu").squeeze(0)).decode(errors="backslashreplace")
    print(sample_text)

# Save the model
print("\n" + "="*60)
print("SAVING MODEL")
print("="*60)

save_dir = "saved_models"
os.makedirs(save_dir, exist_ok=True)

checkpoint = {
    'model_state_dict': model.state_dict(),
    'optimizer_state_dict': optimizer.state_dict(),
    'config': config,
    'training_info': {
        'final_loss': losses[-1],
        'best_loss': best_loss,
        'iterations': max_iters,
        'dataset': 'monte_cristo_combined',
        'dataset_size': len(data),
        'total_params': total_params,
        'batch_size': batch_size,
        'block_size': block_size,
    },
    'losses': losses,
}

save_path = os.path.join(save_dir, "bdh_monte_cristo.pt")
torch.save(checkpoint, save_path)

file_size_mb = os.path.getsize(save_path) / (1024 * 1024)
print(f"✓ Model saved to: {save_path}")
print(f"  File size: {file_size_mb:.2f} MB")
print(f"  Final loss: {losses[-1]:.4f}")
print(f"  Best loss: {best_loss:.4f}")
print(f"  Parameters: {total_params:,}")
print(f"  Training data: {len(data):,} characters")
print("\n✓ Training complete!")

Writing train_monte_cristo.py


## Run Training (30-40 minutes)

In [9]:
# Run the training script
!python train_monte_cristo.py

Using device: cuda with dtype torch.bfloat16
Training data: 3,472,749 characters
Token count: 3,551,719

Training parameters:
  Iterations: 7000
  Batch size: 8
  Block size: 512
  Learning rate: 0.0003

Model parameters: 25,296,896

STARTING TRAINING
Training:   0%|                                        | 0/7000 [00:00<?, ?it/s]
Step: 0/7000 loss 5.593
Training:   1%|▍                           | 100/7000 [02:26<2:47:41,  1.46s/it]
Step: 100/7000 loss 2.511
Training:   3%|▊                           | 200/7000 [04:52<2:45:15,  1.46s/it]
Step: 200/7000 loss 2.406
Training:   4%|█▏                          | 300/7000 [07:17<2:42:50,  1.46s/it]
Step: 300/7000 loss 1.888
Training:   6%|█▌                          | 400/7000 [09:43<2:40:21,  1.46s/it]
Step: 400/7000 loss 1.591
Training:   7%|██                          | 500/7000 [12:09<2:37:55,  1.46s/it]
Step: 500/7000 loss 1.353
Training:   9%|██▍                         | 600/7000 [14:35<2:35:30,  1.46s/it]
S

## Verify Saved Model

In [10]:
import os

# Check saved model
model_path = "saved_models/bdh_monte_cristo.pt"

if os.path.exists(model_path):
    checkpoint = torch.load(model_path, weights_only=False)  # ← Added this parameter
    
    print("✓ Model loaded successfully!")
    print(f"\nTraining Info:")
    for key, value in checkpoint['training_info'].items():
        if isinstance(value, float):
            print(f"  {key}: {value:.4f}")
        else:
            print(f"  {key}: {value}")
    
    print(f"\nModel file size: {os.path.getsize(model_path) / 1024 / 1024:.2f} MB")
    print(f"Loss history points: {len(checkpoint['losses'])}")
else:
    print("❌ Model not found!")

✓ Model loaded successfully!

Training Info:
  final_loss: 0.7665
  best_loss: 0.6660
  iterations: 7000
  dataset: monte_cristo_combined
  dataset_size: 3472749
  total_params: 25296896
  batch_size: 8
  block_size: 512

Model file size: 289.60 MB
Loss history points: 7000


## Load and Test the Trained Model

In [11]:
# Load the saved model
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
checkpoint = torch.load('saved_models/bdh_monte_cristo.pt',weights_only=False)

# Reconstruct model

loaded_model = bdh.BDH(checkpoint['config'])
loaded_model.load_state_dict(checkpoint['model_state_dict'])
loaded_model = loaded_model.to(device)
loaded_model.eval()

print("✓ Model loaded and ready for inference!")

# Test with custom prompts
prompts = [
    "Edmond Dantès",
    "The treasure of Monte Cristo",
    "Chapter 1.",
]

print("\n" + "="*60)
print("GENERATED TEXT SAMPLES")
print("="*60)

for custom_prompt in prompts:
    print(f"\nPrompt: '{custom_prompt}'")
    print("-" * 40)
    
    context = torch.tensor(
        bytearray(custom_prompt, "utf-8"), 
        dtype=torch.long, 
        device=device
    ).unsqueeze(0)
    
    with torch.no_grad():
        generated = loaded_model.generate(
            context, 
            max_new_tokens=200, 
            temperature=0.8, 
            top_k=10
        )
        result = bytes(generated.to(torch.uint8).to("cpu").squeeze(0)).decode(
            errors="backslashreplace"
        )
    
    print(result)
    print()

✓ Model loaded and ready for inference!

GENERATED TEXT SAMPLES

Prompt: 'Edmond Dantès'
----------------------------------------
Edmond Dantès,
after thought that the tribe temperor returned to the powder and belong
to him were a lesson than he had given them good, and the house was
married by his master’s daughter. But that she was decei


Prompt: 'The treasure of Monte Cristo'
----------------------------------------
The treasure of Monte Cristo was, was also with him
that Marseilles support the coffer of the Count of Monte Cristo and to
the hearth almost three hundred persons who were passing twenty years of
excellent person, whose proceedi


Prompt: 'Chapter 1.'
----------------------------------------
Chapter 1. Haydée
Chapter 89. The Villefort Family Louis XVIII. A Smugglery
Chapter 4. The Baron Danglars
Chapter 10. The Breakfast
Chapter 3. The Corsican Ogre
Chapter 19. The Rue
Chapter 8. Father and Son
Ch



## Download Model for Local Use

In [12]:
# On Kaggle, output files are automatically saved in /kaggle/working/
# The model will be available in your output after committing the notebook

import shutil

# Copy to Kaggle output directory
output_path = "/kaggle/working/bdh_monte_cristo.pt"
shutil.copy("saved_models/bdh_monte_cristo.pt", output_path)

print(f"✓ Model copied to {output_path}")
print(f"  Size: {os.path.getsize(output_path) / (1024*1024):.2f} MB")
print("\nTo download:")
print("1. Click 'Save Version' (top right)")
print("2. Select 'Save & Run All'")
print("3. After completion, go to 'Output' tab")
print("4. Download bdh_monte_cristo.pt")
print("\n✓ You can now use this model with the long_inference.py script!")

✓ Model copied to /kaggle/working/bdh_monte_cristo.pt
  Size: 289.60 MB

To download:
1. Click 'Save Version' (top right)
2. Select 'Save & Run All'
3. After completion, go to 'Output' tab
4. Download bdh_monte_cristo.pt

✓ You can now use this model with the long_inference.py script!


## Summary

### What Was Done:
1. ✅ Cloned official Pathway BDH repository
2. ✅ Loaded Monte Cristo books (~3.5 MB combined)
3. ✅ Trained BDH model for 5000 iterations
4. ✅ Saved trained model with full checkpoint
5. ✅ Verified model can generate text in Monte Cristo style

### Model File Contents:
- `model_state_dict`: All trained weights
- `optimizer_state_dict`: Optimizer state
- `config`: Model architecture
- `training_info`: Losses, dataset info, hyperparameters
- `losses`: Full training loss history

### Next Steps:
1. Download `bdh_monte_cristo.pt` from Kaggle output
2. Use with `long_inference.py` to process 100k+ token books
3. The model now understands Monte Cristo writing style!