# CPU Fallback: DistilGPT-2 Finetune

## When to Choose This Path

Use this notebook if:
- You don't have GPU access (Colab/Kaggle/RunPod)
- You want a simpler, full finetune (no quantization)
- You're okay with a smaller model (DistilGPT-2 is 82M params)

## Expected Speed

- **Training:** ~2-4 hours on CPU for 2 epochs (depends on dataset size)
- **Inference:** Fast on CPU (small model)

## How It Differs from QLoRA

- **Full finetune:** Updates all weights (not just adapters)
- **No quantization:** Uses FP32/FP16
- **Smaller model:** DistilGPT-2 vs Mistral-7B
- **CPU-friendly:** Designed to run without GPU

Trade-off: Smaller model = less capacity, but easier to train and deploy.


In [None]:
# === TODO (you code this) ===
# Tokenize dataset with DistilGPT2; set reasonable seq_length for CPU.
# Hints:
#   - Load DistilGPT-2 tokenizer
#   - Use smaller seq_length (256-384) for CPU efficiency
#   - Tokenize train/val splits
# Acceptance:
#   - tokenized DatasetDict prepared

from transformers import AutoTokenizer
from datasets import load_dataset

def tokenize_for_distilgpt2(dset, seq_length: int):
    """
    Tokenize dataset for DistilGPT-2 training.
    
    Args:
        dset: DatasetDict with train/validation
        seq_length: Maximum sequence length (256-384 for CPU)
        
    Returns:
        DatasetDict: Tokenized datasets
    """
    raise NotImplementedError

# Load dataset and tokenize
# dset = load_dataset("YOURUSER/frankenstein-fanfic-snippets")
# dset_tokenized = tokenize_for_distilgpt2(dset, seq_length=256)


## Train on CPU

Configure training arguments for CPU. Use small batch size and gradient accumulation.


In [None]:
# === TODO (you code this) ===
# Configure TrainingArguments; train for 1-2 epochs on CPU.
# Hints:
#   - Use TrainingArguments with no_cuda=True or device='cpu'
#   - Small batch_size (4-8) with grad_accum
#   - Save model to outputs/ directory
# Acceptance:
#   - training runs to completion; model saved to outputs/

from transformers import AutoModelForCausalLM, TrainingArguments, Trainer

def train_distilgpt2_cpu(ds_train, ds_val, cfg: dict, out_dir: str):
    """
    Train DistilGPT-2 on CPU.
    
    Args:
        ds_train: Training dataset
        ds_val: Validation dataset
        cfg: Config dict with distilgpt2 settings
        out_dir: Output directory for model
    """
    raise NotImplementedError

# Train
cfg = {
    'distilgpt2': {
        'lr': 5.0e-5,
        'batch_size': 8,
        'epochs': 2
    }
}
# train_distilgpt2_cpu(ds_tokenized['train'], ds_tokenized['validation'], cfg, "outputs/distilgpt2-frankenstein")
