# SageMaker Environment Setup

This notebook sets up the environment for running Contemplative Constitutional AI on SageMaker.

**Run this notebook once after creating your SageMaker instance or after restarting it.**

## What this notebook does:
1. Installs all required dependencies
2. Initializes git submodules (AILuminate dataset)
3. Verifies GPU/CUDA availability
4. Tests S3 connectivity
5. Creates necessary local directories


## 1. Install Dependencies

This will install all packages from `requirements.txt`.


In [None]:
# Upgrade pip first
!pip install --upgrade pip


In [None]:
# Install all requirements
!pip install -r ../requirements.txt


## 2. Initialize Git Submodules

This downloads the AILuminate benchmark dataset.


In [None]:
import os
os.chdir('..')  # Move to repo root
print(f"Current directory: {os.getcwd()}")


In [None]:
# Initialize submodules
!git submodule update --init --recursive


## 3. Verify GPU and CUDA


In [None]:
import torch
import sys
from pathlib import Path

# Add src to path
sys.path.insert(0, str(Path.cwd() / 'src'))

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"\n‚úÖ GPU Detected!")
    print(f"GPU Count: {torch.cuda.device_count()}")
    print(f"GPU Name: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    print(f"CUDA Version: {torch.version.cuda}")
else:
    print("\n‚ö†Ô∏è No GPU detected! You may be running on a CPU instance.")
    print("Consider switching to a GPU instance (ml.g5.2xlarge recommended).")


## 4. Test Model Loading


In [None]:
from models.model_loader import ModelLoader

# Initialize model loader
loader = ModelLoader()

# Detect device
device = loader.detect_device()
print(f"\nDetected device: {device}")

# Get model info without loading
model_info = loader.get_model_info('qwen2_0_5b')
print(f"\nModel info for QWEN2-0.5B:")
print(f"  Model name: {model_info['model_name']}")
print(f"  Size: {model_info['model_size']}")
print(f"  Estimated memory: {model_info['estimated_memory_gb']}GB")

print("\n‚úÖ Model loader working correctly!")


## 5. Configure S3 Access

**Important:** Update the bucket name in `configs/sagemaker_configs.yaml` before running this cell.


In [None]:
import yaml
import boto3
from botocore.exceptions import NoCredentialsError, ClientError

# Load SageMaker config
with open('configs/sagemaker_configs.yaml', 'r') as f:
    sagemaker_config = yaml.safe_load(f)

# Get S3 bucket name
S3_BUCKET = sagemaker_config['s3']['bucket']
print(f"S3 Bucket: {S3_BUCKET}")

if S3_BUCKET == "your-bucket-contemplative-ai":
    print("\n‚ö†Ô∏è WARNING: Please update the S3 bucket name in configs/sagemaker_configs.yaml")
    print("   Current value is the default placeholder.")


In [None]:
# Test S3 access
try:
    s3_client = boto3.client('s3')
    
    # Try to access the bucket
    s3_client.head_bucket(Bucket=S3_BUCKET)
    print(f"‚úÖ Successfully connected to S3 bucket: {S3_BUCKET}")
    
    # List some objects (if any)
    response = s3_client.list_objects_v2(Bucket=S3_BUCKET, MaxKeys=5)
    if 'Contents' in response:
        print(f"\nFound {len(response['Contents'])} objects (showing up to 5):")
        for obj in response['Contents']:
            print(f"  - {obj['Key']}")
    else:
        print("\nBucket is empty (which is fine for first setup)")
        
except NoCredentialsError:
    print("‚ùå AWS credentials not found!")
    print("   Make sure your SageMaker instance has an IAM role with S3 access.")
except ClientError as e:
    if e.response['Error']['Code'] == '404':
        print(f"‚ùå Bucket '{S3_BUCKET}' does not exist!")
        print("   Please create the bucket or update the name in configs/sagemaker_configs.yaml")
    else:
        print(f"‚ùå Error accessing S3: {e}")


## 6. Test SageMaker Utilities


In [None]:
from utils.sagemaker_utils import (
    is_sagemaker_environment,
    detect_sagemaker_device,
    get_sagemaker_paths,
    ensure_local_directories
)

print(f"Is SageMaker environment: {is_sagemaker_environment()}")
print(f"Detected device: {detect_sagemaker_device()}")

print("\nSageMaker paths:")
paths = get_sagemaker_paths()
for name, path in paths.items():
    print(f"  {name}: {path}")

# Create necessary directories
ensure_local_directories()
print("\n‚úÖ Local directories initialized")


## 7. System Information


In [None]:
import psutil
import platform

print("System Information:")
print(f"  Platform: {platform.platform()}")
print(f"  Python version: {sys.version.split()[0]}")
print(f"  CPU cores: {psutil.cpu_count()}")

memory = psutil.virtual_memory()
print(f"  Total memory: {memory.total / (1024**3):.1f} GB")
print(f"  Available memory: {memory.available / (1024**3):.1f} GB")

disk = psutil.disk_usage('/')
print(f"  Total disk: {disk.total / (1024**3):.1f} GB")
print(f"  Free disk: {disk.free / (1024**3):.1f} GB")


## 8. Summary

If all cells above ran successfully with ‚úÖ indicators, your environment is ready!

### Next Steps:

1. **Validate Setup**: Run `sagemaker_smoke_test.ipynb` to thoroughly test the environment
2. **Quick Test**: Run `00_quickstart.ipynb` for a small end-to-end test
3. **Generate Data**: Use `01_data_generation.ipynb` to create preference pairs
4. **Train Model**: Use `02_training.ipynb` to fine-tune with DPO
5. **Evaluate**: Use `03_evaluation.ipynb` to assess model performance

### Common Issues:

- **No GPU**: Make sure you selected a GPU instance (ml.g5.2xlarge recommended)
- **S3 Access**: Check IAM role has S3 permissions and bucket name is correct
- **Package Errors**: Try restarting the kernel and running setup again


In [None]:
print("üéâ Setup complete! You're ready to run experiments.")
