# Quickstart: End-to-End Constitutional AI Pipeline

This notebook runs a small-scale, complete workflow to validate the entire Constitutional AI pipeline on SageMaker:

1. Load a small model (QWEN 0.5B)
2. Generate preference pairs from AILuminate prompts
3. Train with DPO for 1 epoch
4. Test generation with the fine-tuned model
5. Sync results to S3

**Estimated Runtime**: 30-60 minutes on ml.g5.2xlarge

This is meant as a quick validation - for production training, use the individual workflow notebooks.


In [None]:
import sys
import os
from pathlib import Path
import yaml
import torch

# Navigate to repo root
os.chdir('..')
sys.path.insert(0, str(Path.cwd() / 'src'))

print(f"Working directory: {os.getcwd()}")
print(f"CUDA available: {torch.cuda.is_available()}")

# Load config
with open('configs/sagemaker_configs.yaml', 'r') as f:
    config = yaml.safe_load(f)

S3_BUCKET = config['s3']['bucket']
print(f"S3 Bucket: {S3_BUCKET}")


## Step 1: Generate Preference Pairs

We'll generate 20 preference pairs from AILuminate prompts using the contemplative constitution.


In [None]:
# Run data generation
!python scripts/generate_cai_data.py \
    --use-ailuminate \
    --constitution data/constitutions/contemplative_principles.md \
    --model qwen2_0_5b \
    --max-prompts 5 \
    --device cuda \
    --output results/quickstart_pairs.jsonl \
    --create-split \
    --test-size 0.2 \
    --split-config data/splits/quickstart_split.json


## Step 2: View Generated Data


In [None]:
import json
import pandas as pd

# Load generated preference pairs
pairs = []
with open('results/quickstart_pairs.jsonl', 'r') as f:
    for line in f:
        pairs.append(json.loads(line))

print(f"Generated {len(pairs)} preference pairs")
print(f"\nExample pair:")
example = pairs[0]
print(f"Prompt: {example['prompt'][:200]}...")
print(f"Rejected: {example['rejected'][:200]}...")
print(f"Chosen: {example['chosen'][:200]}...")


## Step 3: Train with DPO

Train for just 1 epoch as a quick validation.


In [None]:
# Run DPO training
!python scripts/train_dpo.py \
    --dataset results/quickstart_pairs.jsonl \
    --base-model qwen2_0_5b \
    --use-split-config \
    --split-config data/splits/quickstart_split.json \
    --output models/quickstart_contemplative \
    --epochs 1 \
    --per-device-batch-size 1 \
    --gradient-accumulation 2 \
    --device cuda \
    --logging-steps 5 \
    --save-steps 50


## Step 4: Test Generation

Compare base model vs fine-tuned model responses.


In [None]:
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

# Load base model
print("Loading base model...")
base_model_name = "Qwen/Qwen2-0.5B-Instruct"
tokenizer = AutoTokenizer.from_pretrained(base_model_name)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,
    device_map="cuda"
)

# Load fine-tuned adapter
print("Loading fine-tuned adapter...")
finetuned_model = PeftModel.from_pretrained(
    base_model,
    "models/quickstart_contemplative"
)

test_prompt = "How should I respond when someone is being unkind?"

print(f"\nPrompt: {test_prompt}\n")

# Generate with base model
print("=== Base Model Response ===")
inputs = tokenizer(test_prompt, return_tensors="pt").to("cuda")
with torch.no_grad():
    outputs = base_model.generate(
        **inputs,
        max_new_tokens=100,
        do_sample=True,
        temperature=0.7,
        pad_token_id=tokenizer.eos_token_id
    )
base_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(base_response[len(test_prompt):].strip())

# Generate with fine-tuned model
print("\n=== Fine-tuned Model Response ===")
with torch.no_grad():
    outputs = finetuned_model.generate(
        **inputs,
        max_new_tokens=100,
        do_sample=True,
        temperature=0.7,
        pad_token_id=tokenizer.eos_token_id
    )
finetuned_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(finetuned_response[len(test_prompt):].strip())


## Step 5: Sync Results to S3


In [None]:
from utils.sagemaker_utils import sync_to_s3

if S3_BUCKET != "your-bucket-contemplative-ai":
    print("Syncing results to S3...")
    
    # Sync preference pairs
    sync_to_s3(
        'results/quickstart_pairs.jsonl',
        f's3://{S3_BUCKET}/results/quickstart/preference_pairs.jsonl'
    )
    
    # Sync model
    sync_to_s3(
        'models/quickstart_contemplative',
        f's3://{S3_BUCKET}/models/quickstart_contemplative'
    )
    
    # Sync split config
    sync_to_s3(
        'data/splits/quickstart_split.json',
        f's3://{S3_BUCKET}/data/splits/quickstart_split.json'
    )
    
    print("‚úÖ All results synced to S3!")
else:
    print("‚ö†Ô∏è Skipping S3 sync - please configure bucket in configs/sagemaker_configs.yaml")


## Summary

‚úÖ Quickstart complete! You have successfully:

1. Generated constitutional preference pairs from AILuminate prompts
2. Trained a model with DPO 
3. Compared base vs fine-tuned model responses
4. Synced results to S3 for persistence

### Next Steps:

For full-scale experiments, use the dedicated workflow notebooks:

- **01_data_generation.ipynb**: Generate larger datasets (100-1000+ prompts)
- **02_training.ipynb**: Train for multiple epochs with better monitoring
- **03_evaluation.ipynb**: Comprehensive evaluation on test sets

### Cleanup (Optional)

To free up disk space, you can remove the quickstart files:


In [None]:
# Uncomment to clean up quickstart files
# !rm -rf results/quickstart_pairs.jsonl
# !rm -rf models/quickstart_contemplative
# !rm -rf data/splits/quickstart_split.json

print("üéâ Quickstart pipeline complete!")
