# 🚀 Matryoshka Transcoder Training (Nested Groups) - Private Repo

This notebook trains a Matryoshka Transcoder with **nested groups** architecture on Google Colab using your private repository.

## 🔐 Private Repository Access
This notebook uses a **Personal Access Token** for secure authentication with your private GitHub repository.

## 🎯 What You'll Get
- **Nested Groups**: Each group includes all previous groups (matches original paper)
- **Hierarchical Learning**: Features organize from coarse to fine
- **Adaptive Complexity**: Each group is a complete model
- **Interpretability**: Activation samples for feature analysis

## 📊 Training Options
- **Layer 17** (Recommended): ~6 hours, research quality
- **Layer 8** (Quick Test): ~30 minutes, proof of concept
- **Layer 12** (Alternative): ~8 hours, mid-layer analysis

---


## 🔧 Setup & Installation


In [None]:
# Check GPU availability
import torch
print("🖥️  GPU Check:")
print(f"   CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"   GPU: {torch.cuda.get_device_name(0)}")
    print(f"   Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("   ⚠️  No GPU detected! Go to Runtime → Change runtime type → T4 GPU")

print("\n📦 Installing dependencies...")
%pip install torch transformers transformer-lens wandb datasets einops jaxtyping -q
print("✅ Dependencies installed")


In [None]:
# Clone repository with nested groups implementation
print("📥 Cloning repository...")
print("   This will use your personal access token for authentication")

# Get token from user
import getpass
token = getpass.getpass("Enter your GitHub Personal Access Token: ")

# Clone using token authentication
!git clone https://{token}@github.com/a1842806/matryoshka_transcoder.git
%cd matryoshka_transcoder

# Switch to the nested groups branch
!git checkout interpretability-evaluation
!git pull
print("✅ Repository cloned and updated")

# Verify we're on the right branch
!git branch --show-current


## 🔐 Weights & Biases Setup (Optional)

W&B tracks your training metrics. You can skip this if you don't want to use it.


In [None]:
# Login to Weights & Biases (optional)
import wandb
print("🔐 Weights & Biases Login")
print("   This will open a browser tab for authentication")
print("   You can skip this by interrupting the cell (Ctrl+C)")

try:
    wandb.login()
    print("✅ W&B login successful")
except:
    print("⚠️  W&B login skipped - metrics won't be tracked online")
    print("   Training will still work, just without W&B dashboard")


## 🎯 Choose Training Configuration

Select one of the training options below:


### Option 1: Layer 17 (Recommended) - ~6 hours on T4 GPU

**Best for**: Research quality results, final training
- Dictionary size: 18,432 features (nested groups)
- Training steps: ~15,000
- Features: Activation sample collection
- Optimized hyperparameters


In [None]:
# Train on Layer 17 (RECOMMENDED)
print("🚀 Starting Layer 17 training...")
print("   This will take ~6 hours on T4 GPU")
print("   You can monitor progress on W&B dashboard")
print("\n" + "="*80)

!python src/scripts/train_gemma_layer17_with_warmup_decay_samples.py


### Option 2: Layer 8 (Quick Test) - ~30 minutes on T4 GPU

**Best for**: Testing setup, quick experiments
- Dictionary size: 18,432 features (nested groups)
- Training steps: ~1,000
- Fast iteration


In [None]:
# Train on Layer 8 (QUICK TEST)
print("🚀 Starting Layer 8 training...")
print("   This will take ~30 minutes on T4 GPU")
print("   Good for testing your setup")
print("\n" + "="*80)

!python src/scripts/train_gemma_layer8_with_warmup_decay_samples.py


### Option 3: Layer 12 (Alternative) - ~8 hours on T4 GPU

**Best for**: Mid-layer analysis, comprehensive training
- Dictionary size: 36,864 features (nested groups)
- Training steps: ~20,000
- Full training run


In [None]:
# Train on Layer 12 (ALTERNATIVE)
print("🚀 Starting Layer 12 training...")
print("   This will take ~8 hours on T4 GPU")
print("   Comprehensive training run")
print("\n" + "="*80)

!python src/scripts/train_gemma_layer12_with_warmup_decay_samples.py


## 💾 Save Results to Google Drive

Run this after training completes to save your checkpoints to Google Drive.


In [None]:
# Mount Google Drive and save results
from google.colab import drive
import os
import shutil

print("💾 Saving results to Google Drive...")

# Mount Google Drive
drive.mount('/content/drive')

# Define paths
src_checkpoints = 'checkpoints/transcoder/gemma-2-2b/'
dst_checkpoints = '/content/drive/MyDrive/matryoshka_checkpoints/'

# Check if checkpoints exist
if os.path.exists(src_checkpoints):
    # Create destination directory
    os.makedirs(dst_checkpoints, exist_ok=True)
    
    # Copy checkpoints
    shutil.copytree(src_checkpoints, dst_checkpoints, dirs_exist_ok=True)
    
    print(f"✅ Checkpoints saved to: {dst_checkpoints}")
    
    # List saved files
    print("\n📁 Saved files:")
    for root, dirs, files in os.walk(dst_checkpoints):
        for file in files:
            if file.endswith(('.pt', '.json')):
                rel_path = os.path.relpath(os.path.join(root, file), dst_checkpoints)
                print(f"   - {rel_path}")
    
    # Also save activation samples if they exist
    activation_samples_dirs = [d for d in os.listdir(src_checkpoints) if 'activation_samples' in d]
    if activation_samples_dirs:
        dst_samples = '/content/drive/MyDrive/matryoshka_samples/'
        os.makedirs(dst_samples, exist_ok=True)
        
        for sample_dir in activation_samples_dirs:
            src_sample = os.path.join(src_checkpoints, sample_dir)
            dst_sample = os.path.join(dst_samples, sample_dir)
            shutil.copytree(src_sample, dst_sample, dirs_exist_ok=True)
        
        print(f"✅ Activation samples saved to: {dst_samples}")
    
    print("\n🎉 All results saved to Google Drive!")
    
else:
    print("⚠️  No checkpoints found. Make sure training completed successfully.")
    print(f"   Expected location: {src_checkpoints}")


## 🔑 How to Get a GitHub Personal Access Token

### Step 1: Create a Personal Access Token
1. Go to [GitHub Settings → Developer settings → Personal access tokens](https://github.com/settings/tokens)
2. Click **"Generate new token"** → **"Generate new token (classic)"**
3. Give it a name like "Colab Training"
4. Set expiration (recommend 90 days)
5. Select scopes:
   - ✅ **repo** (Full control of private repositories)
   - ✅ **read:org** (Read org and team membership)
6. Click **"Generate token"**
7. **Copy the token immediately** (you won't see it again!)

### Step 2: Use the Token
- When prompted in the notebook, paste your token
- The token will be hidden as you type (for security)
- The token is only used for cloning, not stored

### Security Notes:
- 🔒 Token is only used for repository access
- 🔒 Token is not stored or logged anywhere
- 🔒 You can revoke the token anytime from GitHub settings
- 🔒 Token expires automatically based on your setting


## 🎉 Next Steps

### What You've Accomplished:
✅ Trained a Matryoshka Transcoder with **nested groups**  
✅ Implemented hierarchical feature learning  
✅ Collected activation samples for interpretability  
✅ Saved results to Google Drive  
✅ Evaluated model performance  

### Your Files:
- **Checkpoints**: `/content/drive/MyDrive/matryoshka_checkpoints/`
- **Activation Samples**: `/content/drive/MyDrive/matryoshka_samples/`
- **Evaluation Results**: In your model's `interpretability_eval/` folder

### Further Analysis:
1. **Compare with Google's Transcoder**: Use `src/eval/compare_interpretability.py`
2. **Feature Visualization**: Analyze the activation samples in detail
3. **Hierarchical Analysis**: Study how different groups learn different abstractions
4. **Transfer Learning**: Use your transcoder for downstream tasks

### Documentation:
- **Quick Reference**: `QUICK_START_COLAB.md`
- **Detailed Guide**: `COLAB_TRAINING.md`
- **Architecture**: `NESTED_GROUPS_SUMMARY.md`

---

## 🎓 Understanding Nested Groups

Your transcoder now uses **nested groups** (matching the original Matryoshka paper):

| Group | Features | Description |
|-------|----------|-------------|
| 0 | 0-1151 | Base coarse features |
| 1 | 0-3455 | Includes Group 0 + new features |
| 2 | 0-8063 | Includes Groups 0,1 + new features |
| 3 | 0-18431 | All features (complete model) |

**Benefits:**
- 🎯 **Hierarchical Learning**: Features organize coarse → fine
- 🔄 **Adaptive Complexity**: Each group is a complete model
- 📚 **Paper Alignment**: Matches original Matryoshka design
- 🧠 **Better Interpretability**: Clear feature hierarchy

**Happy training! 🚀**
