# 04 - Variable CFG Experiments

**Main Experiment Notebook**

This notebook runs the core V-CFG experiments comparing different CFG schedules for weather-based adversarial attacks.

## Key Hypothesis
Variable CFG schedules (linear, cosine decay) produce more realistic weather perturbations than constant CFG while maintaining attack effectiveness.

## 1. Setup

In [None]:
# Clone and install
!git clone https://github.com/YOUR_USERNAME/adaptive-weather-attacks.git 2>/dev/null || \
    (cd adaptive-weather-attacks && git pull)
%cd /content/adaptive-weather-attacks
!pip install -e . -q
!pip install diffusers accelerate lpips torchattacks -q

In [None]:
# Mount Drive and copy data
from google.colab import drive
drive.mount('/content/drive')

import shutil, os
if not os.path.exists('/content/GTSRB_dataset'):
    shutil.copytree('/content/drive/MyDrive/GTSRB_dataset', '/content/GTSRB_dataset')
    print("✅ Dataset copied")

In [None]:
# Imports
import torch
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
from tqdm import tqdm

from src.config import DEVICE, print_config
from src.data import get_dataloaders, get_raw_test_dataset
from src.models import load_checkpoint, ModelWrapper
from src.diffusion import VariableCFGPipeline, get_weather_prompt, WEATHER_PROMPTS
from src.diffusion.cfg_schedules import visualize_schedules
from src.metrics import RealismMetrics, compute_attack_success_rate
from src.utils import plot_images, save_results

print_config()

## 2. Load Models and Data

In [None]:
# Load trained classifier
from src.models import get_model
from src.config import CHECKPOINT_DIR

# Try to load from checkpoint, otherwise use pretrained
model = get_model('resnet50', num_classes=43, pretrained=True)
model = ModelWrapper(model)

checkpoint_path = CHECKPOINT_DIR / 'resnet50_best.pth'
if checkpoint_path.exists():
    model.load_state_dict(torch.load(checkpoint_path, map_location=DEVICE))
    print("✅ Loaded trained checkpoint")
else:
    print("⚠️ Using pretrained weights (run notebook 01 first for best results)")

model = model.to(DEVICE).eval()

In [None]:
# Load test data (raw PIL images for diffusion)
raw_dataset = get_raw_test_dataset('/content/GTSRB_dataset')
print(f"✅ Loaded {len(raw_dataset)} test images")

# Also get normalized loader for evaluation
from src.data import get_test_loader
test_loader = get_test_loader('/content/GTSRB_dataset', batch_size=32)

## 3. Initialize V-CFG Pipeline

In [None]:
# Initialize diffusion pipeline (this will download Stable Diffusion)
pipeline = VariableCFGPipeline(device=DEVICE)
print("✅ Pipeline ready")

In [None]:
# Visualize CFG schedules
visualize_schedules(num_steps=30, schedules=['constant', 'linear', 'cosine', 'step'])

## 4. Single Image Comparison

In [None]:
# Get a sample image
sample_idx = 100
sample_image, sample_label = raw_dataset[sample_idx]

print(f"Sample: Class {sample_label}")
plt.imshow(sample_image)
plt.title(f"Original (Class {sample_label})")
plt.axis('off')
plt.show()

In [None]:
# Compare CFG schedules on single image
prompt = "a traffic sign in dense fog"
schedules = ['constant', 'linear', 'cosine']

results = pipeline.compare_schedules(
    image=sample_image,
    prompt=prompt,
    schedules=schedules,
    strength=0.5,
    seed=42
)

# Display results
plot_images(results, titles=list(results.keys()))

## 5. Main Experiment: Compare CFG Schedules

In [None]:
# Experiment parameters
NUM_SAMPLES = 100  # Number of images to attack
WEATHER_TYPE = 'fog'  # fog, rain, snow, night, glare
STRENGTH = 0.5
CFG_SCHEDULES = ['constant', 'linear', 'cosine']
SEED = 42

# Sample indices
np.random.seed(SEED)
sample_indices = np.random.choice(len(raw_dataset), NUM_SAMPLES, replace=False)

In [None]:
from src.data.transforms import get_test_transforms

# Initialize metrics
realism_metrics = RealismMetrics(device=DEVICE)
transform = get_test_transforms()

# Results storage
all_results = {schedule: {'images': [], 'labels': []} for schedule in CFG_SCHEDULES}
original_images = []
original_tensors = []
labels = []

In [None]:
# Collect original images first
print("Collecting original images...")
for idx in tqdm(sample_indices):
    img, label = raw_dataset[idx]
    original_images.append(img)
    original_tensors.append(transform(img))
    labels.append(label)

original_tensors = torch.stack(original_tensors)
labels = torch.tensor(labels)
print(f"✅ Collected {len(original_images)} images")

In [None]:
# Generate adversarial images for each schedule
for schedule in CFG_SCHEDULES:
    print(f"\n{'='*60}")
    print(f"Generating with {schedule.upper()} CFG schedule")
    print(f"{'='*60}")
    
    generated_images = []
    generated_tensors = []
    
    for i, img in enumerate(tqdm(original_images, desc=f"{schedule} CFG")):
        prompt = get_weather_prompt(WEATHER_TYPE)
        
        adv_img = pipeline.generate_single(
            image=img,
            prompt=prompt,
            cfg_schedule=schedule,
            strength=STRENGTH,
            seed=SEED + i  # Different seed per image, same across schedules
        )
        
        generated_images.append(adv_img)
        generated_tensors.append(transform(adv_img))
    
    all_results[schedule]['images'] = generated_images
    all_results[schedule]['tensors'] = torch.stack(generated_tensors)
    
    print(f"✅ Generated {len(generated_images)} images with {schedule} CFG")

## 6. Evaluate Results

In [None]:
# Compute metrics for each schedule
final_results = {}

for schedule in CFG_SCHEDULES:
    print(f"\nEvaluating {schedule} schedule...")
    
    adv_tensors = all_results[schedule]['tensors'].to(DEVICE)
    orig_tensors = original_tensors.to(DEVICE)
    
    # Attack success rate
    attack_metrics = compute_attack_success_rate(
        model, orig_tensors, adv_tensors, labels
    )
    
    # Realism metrics
    realism = realism_metrics.compute_all(orig_tensors, adv_tensors)
    
    final_results[schedule] = {
        'asr': attack_metrics['attack_success_rate'],
        'clean_acc': attack_metrics['clean_accuracy'],
        'adv_acc': attack_metrics['adversarial_accuracy'],
        'lpips': realism['lpips'],
        'ssim': realism['ssim'],
        'psnr': realism['psnr'],
    }
    
    print(f"  ASR: {attack_metrics['attack_success_rate']:.1f}%")
    print(f"  LPIPS: {realism['lpips']:.3f}")
    print(f"  SSIM: {realism['ssim']:.3f}")

In [None]:
# Display results table
import pandas as pd

df = pd.DataFrame(final_results).T
df = df.round(3)
df.index.name = 'CFG Schedule'

print("\n" + "="*60)
print("EXPERIMENT RESULTS")
print("="*60)
print(df.to_string())
print("\n(Lower LPIPS = more realistic, Higher SSIM = more similar)")

## 7. Visual Comparison

In [None]:
# Show comparison for a few samples
NUM_DISPLAY = 5

fig, axes = plt.subplots(NUM_DISPLAY, len(CFG_SCHEDULES) + 1, figsize=(4*(len(CFG_SCHEDULES)+1), 4*NUM_DISPLAY))

for row in range(NUM_DISPLAY):
    # Original
    axes[row, 0].imshow(original_images[row])
    axes[row, 0].set_title(f"Original\nLabel: {labels[row].item()}")
    axes[row, 0].axis('off')
    
    # Each schedule
    for col, schedule in enumerate(CFG_SCHEDULES, 1):
        axes[row, col].imshow(all_results[schedule]['images'][row])
        axes[row, col].set_title(f"{schedule.capitalize()} CFG")
        axes[row, col].axis('off')

plt.tight_layout()
plt.savefig('results/figures/cfg_comparison.png', dpi=150, bbox_inches='tight')
plt.show()

## 8. Save Results

In [None]:
# Save results
experiment_config = {
    'num_samples': NUM_SAMPLES,
    'weather_type': WEATHER_TYPE,
    'strength': STRENGTH,
    'seed': SEED,
    'schedules': CFG_SCHEDULES,
}

save_data = {
    'config': experiment_config,
    'results': final_results,
}

save_results(save_data, f'vcfg_experiment_{WEATHER_TYPE}')
print("✅ Results saved!")

In [None]:
# Copy results to Drive
!cp -r results/ /content/drive/MyDrive/adaptive-weather-attacks/
print("✅ Results copied to Google Drive")

---

## ✅ Experiment Complete!

### Key Findings:
- Compare ASR (Attack Success Rate) across schedules
- Compare LPIPS (lower = more realistic)
- Compare SSIM (higher = more similar to original)

### Expected Results:
- **Linear/Cosine CFG** should have **lower LPIPS** (more realistic)
- **Constant CFG** may have slightly higher ASR but worse realism
- **Linear/Cosine CFG** should be the best tradeoff

**Next:** Run `05_results_analysis.ipynb` for more visualizations and transferability experiments.