# Baby Dragon Hatchling (BDH) Training

This notebook trains the BDH model - a biologically-inspired language model architecture.

## Setup Instructions
1. **Enable GPU**: Go to `Runtime` → `Change runtime type` → Select `T4 GPU`
2. Run all cells in order

Training takes ~10-15 minutes on Colab's free T4 GPU.

In [8]:
# Check GPU availability
!nvidia-smi

Sun Oct 12 09:07:49 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   48C    P0             26W /   70W |    1570MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [9]:
# Clear GPU memory from any previous sessions
import torch
if torch.cuda.is_available():
    torch.cuda.empty_cache()
    print("GPU memory cleared!")

GPU memory cleared!


In [10]:
# Clone the repository fresh (remove any old clone first)
!rm -rf bdh
!git clone https://github.com/Git-Faisal/bdh.git
%cd bdh
!pwd  # Verify we're in /content/bdh

Cloning into 'bdh'...
remote: Enumerating objects: 78, done.[K
remote: Counting objects: 100% (25/25), done.[K
remote: Compressing objects: 100% (16/16), done.[K
remote: Total 78 (delta 15), reused 18 (delta 9), pack-reused 53 (from 1)[K
Receiving objects: 100% (78/78), 997.08 KiB | 30.21 MiB/s, done.
Resolving deltas: 100% (28/28), done.
/content/bdh/bdh
/content/bdh/bdh


In [11]:
# Install dependencies (already installed in Colab, but just to be safe)
!pip install torch numpy requests -q

## View the Model Architecture
Let's take a quick look at the BDH model:

In [12]:
# Import the BDH module
import sys
import importlib.util
import torch

# Load bdh module from current directory
spec = importlib.util.spec_from_file_location("bdh", "bdh.py")
bdh = importlib.util.module_from_spec(spec)
sys.modules["bdh"] = bdh
spec.loader.exec_module(bdh)

# Show model configuration
config = bdh.BDHConfig()
print("BDH Model Configuration:")
print(f"  Layers: {config.n_layer}")
print(f"  Embedding dimension: {config.n_embd}")
print(f"  Attention heads: {config.n_head}")
print(f"  Dropout: {config.dropout}")
print(f"  Vocabulary size: {config.vocab_size} (byte-level)")

# Create model and show parameter count
model = bdh.BDH(config)
total_params = sum(p.numel() for p in model.parameters())
print(f"\nTotal parameters: {total_params:,} (~{total_params/1e6:.1f}M)")

BDH Model Configuration:
  Layers: 6
  Embedding dimension: 256
  Attention heads: 4
  Dropout: 0.1
  Vocabulary size: 256 (byte-level)

Total parameters: 25,296,896 (~25.3M)


## Start Training!
This will:
1. Download the tiny Shakespeare dataset (~1MB)
2. Train for 3000 iterations (~10-15 minutes)
3. Show loss every 100 steps
4. Generate sample text at the end

In [13]:
# Run training
!python train.py

Using device: cuda with dtype bfloat16
Step: 0/3000 loss 5.66
Step: 100/3000 loss 3.03
Step: 200/3000 loss 2.47
Step: 300/3000 loss 2.04
Step: 400/3000 loss 1.75
Step: 500/3000 loss 1.59
Step: 600/3000 loss 1.49
Step: 700/3000 loss 1.41
Step: 800/3000 loss 1.37
Step: 900/3000 loss 1.34
Step: 1000/3000 loss 1.3
Step: 1100/3000 loss 1.28
Step: 1200/3000 loss 1.25
Step: 1300/3000 loss 1.23
Step: 1400/3000 loss 1.2
Step: 1500/3000 loss 1.18
Step: 1600/3000 loss 1.17
Step: 1700/3000 loss 1.16
Step: 1800/3000 loss 1.13
Step: 1900/3000 loss 1.11
Step: 2000/3000 loss 1.09
Step: 2100/3000 loss 1.08
Step: 2200/3000 loss 1.05
Step: 2300/3000 loss 1.03
Step: 2400/3000 loss 1.01
Step: 2500/3000 loss 0.998
Step: 2600/3000 loss 0.966
Step: 2700/3000 loss 0.944
Step: 2800/3000 loss 0.92
Step: 2900/3000 loss 0.897
Training done, now generating a sample 
To be or none with stones.

LEONTES:
It is a man that this is not to die?

LEONTES:
Why, then, I'll prove him


## Custom Text Generation (Optional)
After training, you can generate your own text with custom prompts:

In [1]:
import sys
import importlib.util
import torch

# Load bdh module
spec = importlib.util.spec_from_file_location("bdh", "bdh.py")
bdh = importlib.util.module_from_spec(spec)
sys.modules["bdh"] = bdh
spec.loader.exec_module(bdh)

# Note: This cell assumes you've trained the model above
# In the vanilla code, the model isn't saved, so this only works
# immediately after training in the same session

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Create and load model (you'd need to save/load weights for this to work)
config = bdh.BDHConfig()
model = bdh.BDH(config).to(device)
model.eval()

# Your custom prompt
prompt_text = "To be or not to be"  # Change this!

prompt = torch.tensor(
    bytearray(prompt_text, "utf-8"),
    dtype=torch.long,
    device=device
).unsqueeze(0)

# Generate
with torch.no_grad():
    output = model.generate(prompt, max_new_tokens=200, top_k=5)
    result = bytes(output.to(torch.uint8).to("cpu").squeeze(0)).decode(
        errors="backslashreplace"
    )
    print(result)

FileNotFoundError: [Errno 2] No such file or directory: '/content/bdh.py'

## What to Observe

During training, watch the **loss** value:
- **Initial loss (~4-5)**: Random guessing
- **After training (~1.0-1.5)**: Model has learned patterns!

The generated text should look Shakespearean-ish by the end.

## Next Steps
- Try modifying the model config (more layers, bigger embeddings)
- Train for more iterations
- Experiment with different generation parameters (temperature, top_k)
- Add code to save/load the model weights