# Weight Optimization for C23 Dataset

## Objective:
- Test all 3 models (Xception, F3Net, Effort-CLIP) on c23 dataset
- Find optimal ensemble weights through grid search
- Show comprehensive metrics (F1, Accuracy, Precision, Recall, AUC)
- Use compute units efficiently (~10-15 units, not all 41)

## Dataset Structure:
```
dataset_c23/
├── manipulated_sequences/
│   ├── Deepfakes/c23/frames/000_003/*.png
│   ├── Face2Face/c23/frames/012_026/*.png
│   └── FaceSwap/c23/frames/...
└── original_sequences/
    └── youtube/c23/frames/000/*.png
```

## test.json Format:
```json
[["original_id", "fake_id"], ["953", "974"], ["000", "003"], ...]
```

**Note:** Fake video folders are named `{original_id}_{fake_id}` (e.g., "000_003")

## Step 1: Mount Google Drive & Install Dependencies

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
# Install dependencies
!pip install -q timm==0.9.12 transformers==4.36.0 facenet-pytorch scikit-learn pillow==10.2.0

## Step 2: Upload Dataset to Colab

**Option 1: Upload ZIP to Google Drive, then extract**
```bash
# The dataset is already in: C:\Users\Admin\Downloads\dataset_c23.zip (21.1 GB)

# Upload dataset_c23.zip to Google Drive (use Drive desktop app for large files)
# Then in Colab:
!unzip /content/drive/MyDrive/dataset_c23.zip -d /content/
```

**Option 2: Direct upload (NOT recommended for 21GB)**
```python
from google.colab import files
uploaded = files.upload()  # Too slow for 21GB!
!unzip dataset_c23.zip -d /content/
```

**Recommended:** Upload `dataset_c23.zip` to Google Drive first, then extract in Colab

In [None]:
# Set dataset path (adjust based on your upload method)
# Note: Update this to match your extracted folder name
DATASET_ROOT = "/content/dataset_c23"  # or "/content/drive/MyDrive/dataset_c23"

# Verify structure
!ls -la {DATASET_ROOT}
!ls {DATASET_ROOT}/manipulated_sequences/
!ls {DATASET_ROOT}/original_sequences/

## Step 3: Load Model Weights from Drive

In [None]:
# Define paths to your model weights in Google Drive
WEIGHTS_DIR = "/content/drive/MyDrive/deepfake-detection/backend/app/models/weights"

XCEPTION_PATH = f"{WEIGHTS_DIR}/xception_best.pth"
F3NET_PATH = f"{WEIGHTS_DIR}/f3net_best.pth"
EFFORT_PATH = f"{WEIGHTS_DIR}/effort_clip_L14_trainOn_FaceForensic.pth"

# Verify files exist
import os
print(f"Xception exists: {os.path.exists(XCEPTION_PATH)}")
print(f"F3Net exists: {os.path.exists(F3NET_PATH)}")
print(f"Effort exists: {os.path.exists(EFFORT_PATH)}")

## Step 4: Define Corrected Model Classes

In [None]:
import torch
import torch.nn as nn
import timm
from typing import Tuple

# ================================
# 1. CORRECTED XCEPTION MODEL
# ================================

class XceptionModel:
    def __init__(self, weights_path: str, device: str = 'cuda'):
        self.device = torch.device(device if torch.cuda.is_available() else 'cpu')
        self.model = self._load_model(weights_path)
        self.model.eval()

    def _load_model(self, weights_path: str):
        print("\n[INFO] Loading Xception model...")
        
        # Create timm Xception with fc classifier (NOT last_linear)
        model = timm.create_model('xception', pretrained=False, num_classes=2)
        
        # Load checkpoint
        checkpoint = torch.load(weights_path, map_location='cpu')
        
        # Map keys: backbone.fc.* → fc.*
        new_state_dict = {}
        for k, v in checkpoint.items():
            new_k = k.replace('module.', '')
            new_k = new_k.replace('backbone.', '')  # CRITICAL: remove backbone prefix
            new_k = new_k.replace('model.', '')
            new_k = new_k.replace('encoder.', '')
            
            # Map last_linear to fc (timm Xception uses fc)
            new_k = new_k.replace('last_linear.', 'fc.')
            
            new_state_dict[new_k] = v
        
        # Load weights
        missing, unexpected = model.load_state_dict(new_state_dict, strict=False)
        
        # Verify classifier loaded
        classifier_loaded = any('fc.' in k for k in new_state_dict.keys())
        print(f"[DEBUG] Mapped {len(new_state_dict)} keys")
        print(f"[DEBUG] Classifier layer found: {classifier_loaded}")
        
        if not classifier_loaded:
            print("[WARNING] Classifier weights NOT found in checkpoint!")
        
        model = model.to(self.device)
        print("[OK] Xception model loaded\n")
        return model

    @torch.no_grad()
    def predict(self, image_tensor: torch.Tensor) -> Tuple[float, float]:
        image_tensor = image_tensor.to(self.device)
        logits = self.model(image_tensor)
        probs = torch.softmax(logits, dim=1)
        
        real_prob = probs[0][0].item()
        fake_prob = probs[0][1].item()
        
        return fake_prob, real_prob

In [None]:
# ================================
# 2. CORRECTED F3NET MODEL
# ================================

class F3NetModel:
    def __init__(self, weights_path: str, device: str = 'cuda'):
        self.device = torch.device(device if torch.cuda.is_available() else 'cpu')
        self.model = self._load_model(weights_path)
        self.model.eval()

    def _load_model(self, weights_path: str):
        print("\n[INFO] Loading F3Net model...")
        
        # Create timm Xception
        model = timm.create_model('xception', pretrained=False, num_classes=2)
        
        # Modify first conv for 12 channels (RGB + frequency domain)
        original_conv1 = model.conv1
        model.conv1 = nn.Conv2d(
            in_channels=12,  # 3 RGB + 9 frequency channels
            out_channels=original_conv1.out_channels,
            kernel_size=original_conv1.kernel_size,
            stride=original_conv1.stride,
            padding=original_conv1.padding,
            bias=False
        )
        
        # Load checkpoint
        checkpoint = torch.load(weights_path, map_location='cpu')
        
        # Map keys
        new_state_dict = {}
        fad_head_skipped = 0
        
        for k, v in checkpoint.items():
            # Skip FAD_head layers (frequency domain head - not needed)
            if k.startswith('FAD_head'):
                fad_head_skipped += 1
                continue
            
            new_k = k.replace('module.', '')
            new_k = new_k.replace('backbone.', '')  # CRITICAL
            new_k = new_k.replace('model.', '')
            new_k = new_k.replace('encoder.', '')
            
            # Map Sequential layer to Linear: last_linear.1.weight → fc.weight
            new_k = new_k.replace('last_linear.1.', 'fc.')
            new_k = new_k.replace('last_linear.', 'fc.')
            
            # Map other classifier names
            new_k = new_k.replace('classifier.', 'fc.')
            new_k = new_k.replace('head.', 'fc.')
            
            new_state_dict[new_k] = v
        
        # Load weights
        missing, unexpected = model.load_state_dict(new_state_dict, strict=False)
        
        # Verify classifier loaded
        classifier_loaded = any('fc.' in k for k in new_state_dict.keys())
        print(f"[DEBUG] Mapped {len(new_state_dict)} keys")
        print(f"[DEBUG] Skipped {fad_head_skipped} FAD_head layers")
        print(f"[DEBUG] Classifier layer found: {classifier_loaded}")
        
        if not classifier_loaded:
            print("[WARNING] Classifier weights NOT found in checkpoint!")
        
        model = model.to(self.device)
        print("[OK] F3Net model loaded\n")
        return model

    @torch.no_grad()
    def predict(self, image_tensor: torch.Tensor) -> Tuple[float, float]:
        image_tensor = image_tensor.to(self.device)
        
        # Duplicate RGB to 12 channels (simple approach)
        if image_tensor.shape[1] == 3:
            image_tensor = image_tensor.repeat(1, 4, 1, 1)  # [B, 3, H, W] → [B, 12, H, W]
        
        logits = self.model(image_tensor)
        probs = torch.softmax(logits, dim=1)
        
        real_prob = probs[0][0].item()
        fake_prob = probs[0][1].item()
        
        return fake_prob, real_prob

In [None]:
# ================================
# 3. CORRECTED EFFORT-CLIP MODEL
# ================================

class EffortModel:
    def __init__(self, weights_path: str, device: str = 'cuda'):
        self.device = torch.device(device if torch.cuda.is_available() else 'cpu')
        self.model, self.classifier = self._load_model(weights_path)
        self.model.eval()
        self.classifier.eval()

    def _load_model(self, weights_path: str):
        print("\n[INFO] Loading Effort-CLIP model...")
        
        # Import transformers CLIP model
        from transformers import CLIPVisionModel, CLIPVisionConfig
        
        # Load checkpoint
        checkpoint = torch.load(weights_path, map_location='cpu')
        
        # Detect classifier input dimension
        classifier_weight = checkpoint.get('module.head.weight', checkpoint.get('head.weight'))
        if classifier_weight is None:
            raise ValueError("Cannot find head.weight in checkpoint!")
        
        hidden_dim = classifier_weight.shape[1]  # Should be 1024
        print(f"[DEBUG] Detected classifier input dim: {hidden_dim}")
        
        # Create CLIP vision config (1024 dim, 24 layers - CLIP-L/14)
        config = CLIPVisionConfig(
            hidden_size=hidden_dim,
            num_hidden_layers=24,
            num_attention_heads=16,
            intermediate_size=4096,
            image_size=224,
            patch_size=14,
            num_channels=3
        )
        
        model = CLIPVisionModel(config)
        
        # Map checkpoint keys to CLIP model keys
        clip_state_dict = {}
        loaded_count = 0
        skipped_count = 0
        
        for k, v in checkpoint.items():
            if not k.startswith('module.backbone.'):
                continue
            
            # Remove prefix
            new_k = k.replace('module.backbone.', '')
            
            # Skip LoRA/residual weights (S_residual, U_residual, V_residual)
            if 'residual' in new_k.lower():
                skipped_count += 1
                continue
            
            # CRITICAL: Add vision_model. prefix for CLIPVisionModel
            new_k = 'vision_model.' + new_k
            
            # Map checkpoint naming to transformers CLIP naming
            # Remove _main suffix if exists
            new_k = new_k.replace('.weight_main', '.weight')
            new_k = new_k.replace('.bias_main', '.bias')
            
            clip_state_dict[new_k] = v
            loaded_count += 1
        
        print(f"[DEBUG] Processed {loaded_count} backbone params")
        print(f"[DEBUG] Skipped {skipped_count} LoRA/residual params")
        
        # Load weights into CLIP model
        missing, unexpected = model.load_state_dict(clip_state_dict, strict=False)
        
        # Calculate match rate
        total_params = len(model.state_dict())
        loaded_params = total_params - len(missing)
        match_rate = (loaded_params / total_params) * 100
        
        print(f"[DEBUG] Loaded {loaded_params}/{total_params} params ({match_rate:.1f}% match rate)")
        print(f"[WARNING] Missing keys: {len(missing)}")
        print(f"[WARNING] Unexpected keys: {len(unexpected)}")
        
        if match_rate < 50:
            print(f"\n[ERROR] Low match rate! Model may not work correctly.")
            print(f"[INFO] Sample missing keys: {list(missing)[:5]}")
            print(f"[INFO] Sample unexpected keys: {list(unexpected)[:5]}")
        else:
            print(f"\n[SUCCESS] Good match rate! Model should work correctly.")
        
        model = model.to(self.device)
        
        # Load classifier head
        classifier = nn.Linear(hidden_dim, 2)
        
        # Get head weights
        head_weight_key = 'module.head.weight' if 'module.head.weight' in checkpoint else 'head.weight'
        head_bias_key = 'module.head.bias' if 'module.head.bias' in checkpoint else 'head.bias'
        
        classifier.weight.data.copy_(checkpoint[head_weight_key])
        classifier.bias.data.copy_(checkpoint[head_bias_key])
        classifier = classifier.to(self.device)
        
        print(f"[DEBUG] Classifier head loaded ({hidden_dim} → 2)")
        print("[OK] Effort-CLIP model loaded\n")
        
        return model, classifier

    @torch.no_grad()
    def predict(self, image_tensor: torch.Tensor) -> Tuple[float, float]:
        image_tensor = image_tensor.to(self.device)
        
        # CLIP vision encoder forward pass
        outputs = self.model(pixel_values=image_tensor)
        features = outputs.pooler_output  # Use pooler output instead of last_hidden_state
        features = features.to(self.device)
        
        # Classifier
        logits = self.classifier(features)
        probs = torch.softmax(logits, dim=1)
        
        real_prob = probs[0][0].item()
        fake_prob = probs[0][1].item()
        
        return fake_prob, real_prob

## Step 5: Load All Models

In [None]:
# Load all 3 models
print("="*60)
print("LOADING MODELS")
print("="*60)

xception = XceptionModel(XCEPTION_PATH)
f3net = F3NetModel(F3NET_PATH)
effort = EffortModel(EFFORT_PATH)

print("\n" + "="*60)
print("ALL MODELS LOADED SUCCESSFULLY")
print("="*60)

## Step 6: Load Dataset and Sample Test Set

In [None]:
import json
import random
from pathlib import Path
from PIL import Image
import torchvision.transforms as transforms

# Load test.json
test_json_path = Path(DATASET_ROOT) / "test.json"
with open(test_json_path, 'r') as f:
    test_pairs = json.load(f)

print(f"Total test pairs: {len(test_pairs)}")
print(f"Example pairs: {test_pairs[:3]}")

# Define transforms for different models
transform_xception = transforms.Compose([
    transforms.Resize((299, 299)),  # Xception input size
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])

# Effort-CLIP: Try ImageNet normalization (CLIP/ViT standard)
transform_effort = transforms.Compose([
    transforms.Resize((224, 224)),  # ViT/CLIP input size
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])  # ImageNet stats
])

# Function to load image from video folder
def load_frame(video_id: str, is_fake: bool, method: str = "Deepfakes", original_id: str = None) -> Image.Image:
    """
    Load a frame from the dataset.

    Args:
        video_id: Video ID (e.g., "000" for original, "003" for fake)
        is_fake: True if loading fake video, False if loading original
        method: Manipulation method (Deepfakes, Face2Face, FaceSwap)
        original_id: Original video ID (needed for fake videos - folder name is original_fake)
    """
    if is_fake:
        # Fake videos are stored as "original_fake" (e.g., "000_003")
        if original_id is None:
            raise ValueError("original_id required for fake videos")
        folder_name = f"{original_id}_{video_id}"
        frames_dir = Path(DATASET_ROOT) / "manipulated_sequences" / method / "c23" / "frames" / folder_name
    else:
        # Original videos are stored by ID (e.g., "000")
        frames_dir = Path(DATASET_ROOT) / "original_sequences" / "youtube" / "c23" / "frames" / video_id

    # Get first available frame
    frame_files = sorted(frames_dir.glob("*.png"))
    if not frame_files:
        raise FileNotFoundError(f"No frames found in {frames_dir}")

    # Load first frame
    return Image.open(frame_files[0]).convert('RGB')

# Sample test set efficiently (to conserve compute units)
SAMPLE_SIZE = 1000  # Sample 1000 pairs = 2000 images
random.seed(42)
sampled_pairs = random.sample(test_pairs, min(SAMPLE_SIZE, len(test_pairs)))

print(f"\nSampled {len(sampled_pairs)} pairs for testing")
print(f"Total images to process: {len(sampled_pairs) * 2}")
print(f"⚠️  NOTE: If only 70 pairs available, will use all of them")

In [None]:
# Prepare dataset
from tqdm import tqdm

test_data = []
manipulation_methods = ["Deepfakes", "Face2Face", "FaceSwap"]

print("\nLoading test images...")
errors = 0

for original_id, fake_id in tqdm(sampled_pairs, desc="Loading images"):
    try:
        # Load original (REAL)
        original_img = load_frame(original_id, is_fake=False)
        test_data.append({
            'image_299': transform_xception(original_img),  # For Xception/F3Net
            'image_224': transform_effort(original_img),     # For Effort-CLIP
            'label': 0,  # 0 = REAL
            'video_id': original_id
        })
        
        # Load fake - try different methods until one works
        # Folder name format: original_fake (e.g., "953_974")
        fake_loaded = False
        for method in manipulation_methods:
            try:
                fake_img = load_frame(fake_id, is_fake=True, method=method, original_id=original_id)
                test_data.append({
                    'image_299': transform_xception(fake_img),
                    'image_224': transform_effort(fake_img),
                    'label': 1,  # 1 = FAKE
                    'video_id': f"{original_id}_{fake_id}_{method}"
                })
                fake_loaded = True
                break
            except FileNotFoundError:
                continue
        
        if not fake_loaded:
            errors += 1
            
    except Exception as e:
        errors += 1
        continue

print(f"\nLoaded {len(test_data)} images successfully")
print(f"Errors: {errors}")
print(f"Real images: {sum(1 for d in test_data if d['label'] == 0)}")
print(f"Fake images: {sum(1 for d in test_data if d['label'] == 1)}")

## Step 7: Evaluate Individual Models

In [None]:
import numpy as np
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, roc_auc_score

def evaluate_model(model, test_data, model_name: str, image_size: int = 299):
    """
    Evaluate a single model on test data.
    
    Args:
        model: The model to evaluate
        test_data: List of test samples
        model_name: Name for display
        image_size: Image size to use (299 for Xception/F3Net, 224 for Effort)
    
    Returns:
        predictions: List of fake probabilities
        labels: List of ground truth labels
        metrics: Dict of metrics
    """
    predictions = []
    labels = []
    
    print(f"\n{'='*60}")
    print(f"Evaluating {model_name}")
    print(f"{'='*60}")
    
    # Select correct image size key
    image_key = f'image_{image_size}'
    
    for item in tqdm(test_data, desc=f"{model_name} inference"):
        image_tensor = item[image_key].unsqueeze(0)  # Add batch dimension
        label = item['label']
        
        # Predict
        fake_prob, real_prob = model.predict(image_tensor)
        
        predictions.append(fake_prob)
        labels.append(label)
    
    # Convert to numpy
    predictions = np.array(predictions)
    labels = np.array(labels)
    
    # Calculate binary predictions (threshold = 0.5)
    binary_preds = (predictions > 0.5).astype(int)
    
    # Calculate metrics
    metrics = {
        'accuracy': accuracy_score(labels, binary_preds),
        'precision': precision_score(labels, binary_preds, zero_division=0),
        'recall': recall_score(labels, binary_preds, zero_division=0),
        'f1': f1_score(labels, binary_preds, zero_division=0),
        'auc': roc_auc_score(labels, predictions)
    }
    
    # Print metrics
    print(f"\nResults:")
    print(f"  Accuracy:  {metrics['accuracy']:.4f}")
    print(f"  Precision: {metrics['precision']:.4f}")
    print(f"  Recall:    {metrics['recall']:.4f}")
    print(f"  F1 Score:  {metrics['f1']:.4f}")
    print(f"  AUC:       {metrics['auc']:.4f}")
    
    return predictions, labels, metrics

In [None]:
# Evaluate all 3 models (use correct image size for each)
xception_preds, labels, xception_metrics = evaluate_model(xception, test_data, "Xception", image_size=299)
f3net_preds, _, f3net_metrics = evaluate_model(f3net, test_data, "F3Net", image_size=299)
effort_preds, _, effort_metrics = evaluate_model(effort, test_data, "Effort-CLIP", image_size=224)

## Step 8: Grid Search for Optimal Ensemble Weights

In [None]:
from itertools import product

# Check if Effort-CLIP is working (AUC > 0.6)
effort_working = effort_metrics['auc'] > 0.6

if not effort_working:
    print("⚠️  WARNING: Effort-CLIP appears to be random guessing (AUC < 0.6)")
    print("⚠️  Proceeding with 2-model ensemble (Xception + F3Net only)\n")
    
    # 2-model optimization
    weight_range = np.arange(0.0, 1.1, 0.1)
    
    print(f"\n{'='*60}")
    print("GRID SEARCH FOR OPTIMAL ENSEMBLE WEIGHTS (2 MODELS)")
    print(f"{'='*60}\n")
    
    best_f1 = 0
    best_weights = None
    best_metrics = None
    results = []
    
    total_combinations = 0
    for w_xception in weight_range:
        w_f3net = round(1.0 - w_xception, 1)
        
        if w_f3net < 0 or w_f3net > 1:
            continue
        
        total_combinations += 1
        
        # Ensemble predictions (2 models only)
        ensemble_preds = (
            w_xception * xception_preds +
            w_f3net * f3net_preds
        )
        
        # Binary predictions
        binary_preds = (ensemble_preds > 0.5).astype(int)
        
        # Calculate metrics
        f1 = f1_score(labels, binary_preds, zero_division=0)
        accuracy = accuracy_score(labels, binary_preds)
        precision = precision_score(labels, binary_preds, zero_division=0)
        recall = recall_score(labels, binary_preds, zero_division=0)
        auc = roc_auc_score(labels, ensemble_preds)
        
        results.append({
            'w_xception': w_xception,
            'w_f3net': w_f3net,
            'w_effort': 0.0,
            'f1': f1,
            'accuracy': accuracy,
            'precision': precision,
            'recall': recall,
            'auc': auc
        })
        
        # Update best
        if f1 > best_f1:
            best_f1 = f1
            best_weights = (w_xception, w_f3net, 0.0)
            best_metrics = {
                'accuracy': accuracy,
                'precision': precision,
                'recall': recall,
                'f1': f1,
                'auc': auc
            }

else:
    print("✅ All 3 models working well - optimizing 3-model ensemble\n")
    
    # 3-model optimization
    weight_range = np.arange(0.0, 1.1, 0.1)
    
    print(f"\n{'='*60}")
    print("GRID SEARCH FOR OPTIMAL ENSEMBLE WEIGHTS (3 MODELS)")
    print(f"{'='*60}\n")
    
    best_f1 = 0
    best_weights = None
    best_metrics = None
    results = []
    
    total_combinations = 0
    for w_xception in weight_range:
        for w_f3net in weight_range:
            for w_effort in weight_range:
                # Weights must sum to 1.0 (with tolerance)
                if abs(w_xception + w_f3net + w_effort - 1.0) > 0.01:
                    continue
                
                total_combinations += 1
                
                # Ensemble predictions (weighted average)
                ensemble_preds = (
                    w_xception * xception_preds +
                    w_f3net * f3net_preds +
                    w_effort * effort_preds
                )
                
                # Binary predictions
                binary_preds = (ensemble_preds > 0.5).astype(int)
                
                # Calculate metrics
                f1 = f1_score(labels, binary_preds, zero_division=0)
                accuracy = accuracy_score(labels, binary_preds)
                precision = precision_score(labels, binary_preds, zero_division=0)
                recall = recall_score(labels, binary_preds, zero_division=0)
                auc = roc_auc_score(labels, ensemble_preds)
                
                results.append({
                    'w_xception': w_xception,
                    'w_f3net': w_f3net,
                    'w_effort': w_effort,
                    'f1': f1,
                    'accuracy': accuracy,
                    'precision': precision,
                    'recall': recall,
                    'auc': auc
                })
                
                # Update best
                if f1 > best_f1:
                    best_f1 = f1
                    best_weights = (w_xception, w_f3net, w_effort)
                    best_metrics = {
                        'accuracy': accuracy,
                        'precision': precision,
                        'recall': recall,
                        'f1': f1,
                        'auc': auc
                    }

print(f"Tested {total_combinations} weight combinations\n")
print(f"{'='*60}")
print("OPTIMAL ENSEMBLE WEIGHTS FOUND")
print(f"{'='*60}")
print(f"\nWeights:")
print(f"  Xception:    {best_weights[0]:.2f}")
print(f"  F3Net:       {best_weights[1]:.2f}")
print(f"  Effort-CLIP: {best_weights[2]:.2f}")
print(f"\nMetrics:")
print(f"  Accuracy:  {best_metrics['accuracy']:.4f}")
print(f"  Precision: {best_metrics['precision']:.4f}")
print(f"  Recall:    {best_metrics['recall']:.4f}")
print(f"  F1 Score:  {best_metrics['f1']:.4f}")
print(f"  AUC:       {best_metrics['auc']:.4f}")

## Step 9: Generate Optimized Config

In [None]:
# Create optimized config
optimized_config = {
    "models": {
        "xception": {
            "name": "xception",
            "path": "app/models/weights/xception_best.pth",
            "description": "Fast and reliable baseline",
            "weight": round(best_weights[0], 2),
            "enabled": True
        },
        "efficientnet_b4": {
            "name": "tf_efficientnet_b4",
            "path": "app/models/weights/effnb4_best.pth",
            "description": "Balanced performance (DISABLED: incompatible checkpoint format)",
            "weight": 0.0,
            "enabled": False
        },
        "f3net": {
            "name": "f3net",
            "path": "app/models/weights/f3net_best.pth",
            "description": "Frequency-aware network with spatial attention",
            "weight": round(best_weights[1], 2),
            "enabled": True
        },
        "effort": {
            "name": "effort_clip",
            "path": "app/models/weights/effort_clip_L14_trainOn_FaceForensic.pth",
            "description": "CLIP-based multimodal detection",
            "weight": round(best_weights[2], 2),
            "enabled": best_weights[2] > 0.0  # Auto-disable if weight is 0
        }
    },
    "ensemble": {
        "method": "weighted_average",
        "threshold": 0.5,
        "min_models": 2
    },
    "device": "cuda",
    "face_detection": {
        "min_confidence": 0.85,
        "min_face_size": 40
    },
    "inference": {
        "batch_size": 1,
        "generate_gradcam": False
    },
    "optimization_metadata": {
        "dataset": "FaceForensics++ c23",
        "test_samples": len(test_data),
        "optimization_method": "grid_search",
        "individual_metrics": {
            "xception": xception_metrics,
            "f3net": f3net_metrics,
            "effort": effort_metrics
        },
        "ensemble_metrics": best_metrics
    }
}

# Save config
with open('config_optimized.json', 'w') as f:
    json.dump(optimized_config, f, indent=2)

print("\n✅ Optimized config saved to 'config_optimized.json'")
print("\nDownload this file and replace backend/app/config.json")

# Show configuration summary
print("\n" + "="*60)
print("CONFIGURATION SUMMARY")
print("="*60)
print(f"\nEnabled Models: {sum(1 for m in optimized_config['models'].values() if m['enabled'])}/4")
for model_key, model_cfg in optimized_config['models'].items():
    if model_cfg['enabled']:
        print(f"  ✅ {model_cfg['name']}: weight={model_cfg['weight']:.2f}")
    else:
        print(f"  ❌ {model_cfg['name']}: DISABLED")
print(f"\nExpected Performance:")
print(f"  F1 Score:  {best_metrics['f1']:.2%}")
print(f"  Accuracy:  {best_metrics['accuracy']:.2%}")
print(f"  AUC:       {best_metrics['auc']:.2%}")

## Step 10: Summary and Top 10 Results

In [None]:
import pandas as pd

# Create DataFrame
df_results = pd.DataFrame(results)

# Sort by F1 score
df_results = df_results.sort_values('f1', ascending=False)

print("\n" + "="*80)
print("TOP 10 ENSEMBLE CONFIGURATIONS")
print("="*80 + "\n")
print(df_results.head(10).to_string(index=False))

print("\n" + "="*80)
print("INDIVIDUAL MODEL PERFORMANCE")
print("="*80 + "\n")
print(f"Xception:")
print(f"  Accuracy: {xception_metrics['accuracy']:.4f}")
print(f"  F1:       {xception_metrics['f1']:.4f}")
print(f"  AUC:      {xception_metrics['auc']:.4f}\n")

print(f"F3Net:")
print(f"  Accuracy: {f3net_metrics['accuracy']:.4f}")
print(f"  F1:       {f3net_metrics['f1']:.4f}")
print(f"  AUC:      {f3net_metrics['auc']:.4f}\n")

print(f"Effort-CLIP:")
print(f"  Accuracy: {effort_metrics['accuracy']:.4f}")
print(f"  F1:       {effort_metrics['f1']:.4f}")
print(f"  AUC:      {effort_metrics['auc']:.4f}\n")

print(f"Ensemble (Optimized):")
print(f"  Weights:  Xception={best_weights[0]:.2f}, F3Net={best_weights[1]:.2f}, Effort={best_weights[2]:.2f}")
print(f"  Accuracy: {best_metrics['accuracy']:.4f}")
print(f"  F1:       {best_metrics['f1']:.4f}")
print(f"  AUC:      {best_metrics['auc']:.4f}")

## Next Steps:

1. **Download `config_optimized.json`** from Colab
2. **Replace** `backend/app/config.json` with the optimized version
3. **Restart backend server**:
   ```bash
   cd backend
   python -m uvicorn app.main:app --reload
   ```
4. **Test** with real images to verify improvements

---

**Note:** If accuracy is still ~50%, it confirms that the checkpoint weights are incompatible with the dataset/architecture. In that case, you'll need to train models from scratch using Kaggle Notebooks (FREE 30hrs/week GPU).