# Baby Dragon Hatchling (BDH) Training

This notebook trains the BDH model - a biologically-inspired language model architecture.

## Setup Instructions
1. **Enable GPU**: Go to `Runtime` → `Change runtime type` → Select `T4 GPU`
2. Run all cells in order

Training takes ~10-15 minutes on Colab's free T4 GPU.

In [None]:
# Check GPU availability
!nvidia-smi

In [None]:
# Clone the repository
!git clone https://github.com/Git-Faisal/bdh.git
%cd bdh

In [None]:
# Install dependencies (already installed in Colab, but just to be safe)
!pip install torch numpy requests -q

## View the Model Architecture
Let's take a quick look at the BDH model:

In [None]:
import bdh

# Show model configuration
config = bdh.BDHConfig()
print("BDH Model Configuration:")
print(f"  Layers: {config.n_layer}")
print(f"  Embedding dimension: {config.n_embd}")
print(f"  Attention heads: {config.n_head}")
print(f"  Dropout: {config.dropout}")
print(f"  Vocabulary size: {config.vocab_size} (byte-level)")

# Create model and show parameter count
import torch
model = bdh.BDH(config)
total_params = sum(p.numel() for p in model.parameters())
print(f"\nTotal parameters: {total_params:,} (~{total_params/1e6:.1f}M)")

## Start Training!
This will:
1. Download the tiny Shakespeare dataset (~1MB)
2. Train for 3000 iterations (~10-15 minutes)
3. Show loss every 100 steps
4. Generate sample text at the end

In [None]:
# Run training
!python train.py

## Custom Text Generation (Optional)
After training, you can generate your own text with custom prompts:

In [None]:
import torch
import bdh

# Note: This cell assumes you've trained the model above
# In the vanilla code, the model isn't saved, so this only works
# immediately after training in the same session

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Create and load model (you'd need to save/load weights for this to work)
config = bdh.BDHConfig()
model = bdh.BDH(config).to(device)
model.eval()

# Your custom prompt
prompt_text = "To be or not to be"  # Change this!

prompt = torch.tensor(
    bytearray(prompt_text, "utf-8"), 
    dtype=torch.long, 
    device=device
).unsqueeze(0)

# Generate
with torch.no_grad():
    output = model.generate(prompt, max_new_tokens=200, top_k=5)
    result = bytes(output.to(torch.uint8).to("cpu").squeeze(0)).decode(
        errors="backslashreplace"
    )
    print(result)

## What to Observe

During training, watch the **loss** value:
- **Initial loss (~4-5)**: Random guessing
- **After training (~1.0-1.5)**: Model has learned patterns!

The generated text should look Shakespearean-ish by the end.

## Next Steps
- Try modifying the model config (more layers, bigger embeddings)
- Train for more iterations
- Experiment with different generation parameters (temperature, top_k)
- Add code to save/load the model weights