# Constitutional AI Evaluation - Google Colab

**Purpose**: Run comprehensive evaluation of all 3 models (Base, Stage 2, Stage 3) on Constitutional AI test set.

## Cell 1: Check GPU Availability

In [None]:
import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print(" Change runtime type to GPU")

## Cell 2: Clone Repository

**Note**: Update the GitHub URL once you push your repository

In [None]:
import os

# Change this to your GitHub repository URL
REPO_URL = "https://github.com/Jai-Dhiman/ml-learning.git"

# Clone repository
if not os.path.exists('ml-learning'):
    !git clone {REPO_URL}
else:
    print("‚úì Repository already cloned")

# Change to project directory
%cd ml-learning/constitutional-ai-stage4

# Verify structure
!ls -la

## Cell 3: Install Dependencies

In [None]:
# Install required packages
!pip install -q torch transformers peft datasets accelerate sentencepiece protobuf

print("‚úì Dependencies installed")

## Cell 4: Optional - Mount Google Drive

Mount Google Drive to automatically save results. Skip this cell if you want to download results manually.

In [None]:
from google.colab import drive

# Mount Google Drive
drive.mount('/content/drive')

# Create results directory in Google Drive
DRIVE_RESULTS_DIR = '/content/drive/MyDrive/constitutional_ai_evaluation_results'
!mkdir -p {DRIVE_RESULTS_DIR}

print(f"‚úì Google Drive mounted. Results will be saved to: {DRIVE_RESULTS_DIR}")

## Cell 5: Validate Setup

Check that all artifacts and models are accessible

In [None]:
# Run validation script
!python3 src/inference/validate_setup.py

## Cell 6: Run Quick Test (Optional)

Test the evaluation pipeline with 5 prompts to ensure everything works

In [None]:
# Quick test with 5 prompts
!python3 src/evaluation/evaluation_runner.py \
  --models stage3_constitutional \
  --max-prompts 5

print("\n‚úì Quick test complete! If this worked, proceed to full evaluation.")

## Cell 7: Run Full Evaluation (‚è∞ 4-6 hours)

**This will take 4-6 hours on T4 GPU, 2-3 hours on L4 GPU**

The evaluation will:
1. Load all 3 models (Base, Stage 2, Stage 3)
2. Generate responses for 50 test prompts
3. Evaluate each response on 4 constitutional principles
4. Save results to JSON and CSV

In [None]:
import time
from datetime import datetime

# Record start time
start_time = time.time()
print(f"Starting evaluation at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")
print("\nThis will take approximately 4-6 hours on T4 GPU...\n")
print("=" * 70)

# Run full evaluation
!python3 src/evaluation/evaluation_runner.py \
  --models base stage2_helpful stage3_constitutional \
  --max-prompts 50 \
  --output-dir artifacts/evaluation/final_results \
  --save-csv --save-json

# Record end time
end_time = time.time()
duration = (end_time - start_time) / 3600  # Convert to hours

print("=" * 70)
print(f"\n‚úì Evaluation complete!")
print(f"Duration: {duration:.2f} hours")
print(f"Finished at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

## Cell 8: Display Results Summary

In [None]:
import json
import pandas as pd

# Load JSON results
with open('artifacts/evaluation/final_results/results.json', 'r') as f:
    results = json.load(f)

print("=" * 70)
print("EVALUATION RESULTS SUMMARY")
print("=" * 70)

# Display aggregate scores
if 'comparison_summary' in results:
    print("\nAggregate Scores by Model:")
    for model, scores in results['comparison_summary'].get('aggregate_scores', {}).items():
        print(f"  {model}: {scores:.4f}")

# Load and display CSV
print("\n" + "=" * 70)
print("Detailed Comparison Table:")
print("=" * 70)
df = pd.read_csv('artifacts/evaluation/final_results/comparison.csv')
print(df.to_string())

print("\n" + "=" * 70)
print("Results saved to:")
print("  - artifacts/evaluation/final_results/results.json")
print("  - artifacts/evaluation/final_results/comparison.csv")
print("=" * 70)

## Cell 9: Copy Results to Google Drive (if mounted)

In [None]:
import os

# Check if Google Drive is mounted
if os.path.exists('/content/drive/MyDrive'):
    DRIVE_RESULTS_DIR = '/content/drive/MyDrive/constitutional_ai_evaluation_results'
    
    # Copy results
    !cp -r artifacts/evaluation/final_results/* {DRIVE_RESULTS_DIR}/
    
    print(f"‚úì Results copied to Google Drive: {DRIVE_RESULTS_DIR}")
    print("  You can access these files from your Google Drive at any time!")
else:
    print("Google Drive not mounted. Use Cell 10 to download results manually.")

## Cell 10: Download Results (Manual Download)

In [None]:
from google.colab import files
import os

# Create a zip file of all results
!zip -r evaluation_results.zip artifacts/evaluation/final_results/

print("Downloading results...")
files.download('evaluation_results.zip')

print("\n‚úì Results downloaded!")
print("\nExtract the zip file locally to access:")
print("  - results.json (complete evaluation data)")
print("  - comparison.csv (model comparison table)")

## Cell 11: Generate Quick Visualizations (Optional)

In [None]:
import matplotlib.pyplot as plt
import numpy as np

# Load results
df = pd.read_csv('artifacts/evaluation/final_results/comparison.csv')

# Extract model names and scores (assumes specific column structure)
# Adjust column names based on actual CSV structure
models = df['model'].tolist() if 'model' in df.columns else ['Base', 'Stage 2', 'Stage 3']
principles = ['harm_prevention', 'truthfulness', 'helpfulness', 'fairness']

# Create radar chart
fig, ax = plt.subplots(figsize=(10, 8), subplot_kw=dict(projection='polar'))

angles = np.linspace(0, 2 * np.pi, len(principles), endpoint=False).tolist()
angles += angles[:1]

for model in models:
    # Extract scores for this model (placeholder logic)
    # Adjust based on actual data structure
    scores = [0.5, 0.6, 0.7, 0.65]  # Replace with actual scores from df
    scores += scores[:1]
    ax.plot(angles, scores, 'o-', linewidth=2, label=model)
    ax.fill(angles, scores, alpha=0.25)

ax.set_theta_offset(np.pi / 2)
ax.set_theta_direction(-1)
ax.set_xticks(angles[:-1])
ax.set_xticklabels([p.replace('_', ' ').title() for p in principles])
ax.set_ylim(0, 1)
ax.set_ylabel('Score', labelpad=30)
ax.set_title('Constitutional Principle Scores by Model', size=16, pad=20)
ax.legend(loc='upper right', bbox_to_anchor=(1.3, 1.1))
ax.grid(True)

plt.tight_layout()
plt.savefig('principle_comparison_radar.png', dpi=300, bbox_inches='tight')
print("\n‚úì Radar chart saved to: principle_comparison_radar.png")
plt.show()

# Download visualization
files.download('principle_comparison_radar.png')

## Summary

**Evaluation Complete! üéâ**

You now have:
- Complete evaluation results (JSON)
- Model comparison table (CSV)
- Optional visualizations

**Next Steps**:
1. Review results in `results.json` and `comparison.csv`
2. Create statistical analysis scripts (significance testing, effect sizes)
3. Generate publication-quality figures
4. Write the paper!

See `RESEARCH_PUBLICATION_PLAN.md` for detailed next steps.