# üéØ Complete Improved Multimodal Deepfake Detection - All Datasets

## üÜï Key Improvements in This Notebook

### ‚úÖ Addressed Issues:
1. **Class Balancing (Highest Priority)** - Multiple strategies implemented
2. **Focal Loss** - Addresses hard examples and class imbalance
3. **Class Weights** - Automatic computation for each dataset
4. **Threshold Tuning** - Optimized decision thresholds
5. **Comprehensive Results Analysis** - Detailed metrics and visualizations

### üìä Datasets Used:
1. **Deepfake Image Detection Dataset** (Images)
2. **FaceForensics++** (Videos/Images)
3. **Celeb-DF V2** (Videos)
4. **FakeAVCeleb** (Audio-Visual)
5. **DFD** (Videos)
6. **DeepFake_Audio Dataset** (Audio)

### üéì Techniques Applied:
- **Data Augmentation** for minority class
- **Focal Loss** (Œ≥=2, Œ±=0.25)
- **Class Weights** (sklearn.utils.class_weight)
- **Threshold Optimization** (F1, Youden's J)
- **SMOTE** for severe imbalance
- **Stratified Sampling** throughout
- **Early Fusion + Late Fusion** multimodal approaches

## üì¶ Installation & Setup

In [1]:
# Install required packages
!pip install torch torchvision torchaudio
!pip install opencv-python librosa soundfile
!pip install scikit-learn imbalanced-learn
!pip install matplotlib seaborn pandas numpy
!pip install pillow tqdm facenet-pytorch
!pip install timm efficientnet-pytorch

Collecting facenet-pytorch
  Downloading facenet_pytorch-2.6.0-py3-none-any.whl.metadata (12 kB)
Collecting numpy<2.0.0,>=1.24.0 (from facenet-pytorch)
  Downloading numpy-1.26.4-cp310-cp310-win_amd64.whl.metadata (61 kB)
Collecting pillow
  Downloading pillow-10.2.0-cp310-cp310-win_amd64.whl.metadata (9.9 kB)
Collecting torch<2.3.0,>=2.2.0 (from facenet-pytorch)
  Downloading torch-2.2.2-cp310-cp310-win_amd64.whl.metadata (26 kB)
Collecting torchvision<0.18.0,>=0.17.0 (from facenet-pytorch)
  Downloading torchvision-0.17.2-cp310-cp310-win_amd64.whl.metadata (6.6 kB)
Downloading facenet_pytorch-2.6.0-py3-none-any.whl (1.9 MB)
   ---------------------------------------- 0.0/1.9 MB ? eta -:--:--
   ---------------------------------------- 1.9/1.9 MB 20.8 MB/s  0:00:00
Downloading pillow-10.2.0-cp310-cp310-win_amd64.whl (2.6 MB)
   ---------------------------------------- 0.0/2.6 MB ? eta -:--:--
   ---------------------------------------- 2.6/2.6 MB 29.9 MB/s  0:00:00
Downloading numpy-1

  You can safely remove it manually.
  You can safely remove it manually.
  You can safely remove it manually.
  You can safely remove it manually.
  You can safely remove it manually.
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
opencv-python 4.12.0.88 requires numpy<2.3.0,>=2; python_version >= "3.9", but you have numpy 1.26.4 which is incompatible.
torchaudio 2.9.1+cu128 requires torch==2.9.1, but you have torch 2.2.2 which is incompatible.


Collecting efficientnet-pytorch
  Downloading efficientnet_pytorch-0.7.1.tar.gz (21 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Building wheels for collected packages: efficientnet-pytorch
  Building wheel for efficientnet-pytorch (pyproject.toml): started
  Building wheel for efficientnet-pytorch (pyproject.toml): finished with status 'done'
  Created wheel for efficientnet-pytorch: filename=efficientnet_pytorch-0.7.1-py3-none-any.whl size=16519 sha256=8d5525e96904f454c325c9d21f5fcaafd7987db04309a25e996b9ef03641060d
  Stored in directory: c:\users\akshay-stu\appdata\local\pip\cache\wheels\03\3f\e9\911b1bc46869644912bda90a56bcf7b960f20b5187feea3baf
Successfully built efficientnet-pytorch
Installing co

In [4]:
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("‚ö†Ô∏è WARNING: CUDA not available - using CPU")

PyTorch version: 2.2.2+cpu
CUDA available: False


## üìö Import Libraries

In [2]:
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
import warnings
warnings.filterwarnings('ignore')

# PyTorch
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
import torchvision.transforms as transforms
from torchvision import models

# Image/Video/Audio processing
import cv2
from PIL import Image
import librosa
import soundfile as sf

# Sklearn
from sklearn.model_selection import train_test_split, StratifiedKFold
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    confusion_matrix, classification_report, roc_curve, auc,
    precision_recall_curve, average_precision_score
)
from sklearn.utils.class_weight import compute_class_weight
from imblearn.over_sampling import SMOTE, RandomOverSampler
from imblearn.under_sampling import RandomUnderSampler

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")

Using device: cpu


## üìä Dataset Statistics & Paths

### Current Dataset Distribution:

**Note:** These numbers will be verified and displayed in the data loading cells below.

In [None]:
# Dataset paths configuration
DATASET_PATHS = {
    'deepfake_images': {
        'train_real': '../Deepfake image detection dataset/train-20250112T065955Z-001/train/real',
        'train_fake': '../Deepfake image detection dataset/train-20250112T065955Z-001/train/fake',
        'test_real': '../Deepfake image detection dataset/test-20250112T065939Z-001/test/real',
        'test_fake': '../Deepfake image detection dataset/test-20250112T065939Z-001/test/fake'
    },
    'faceforensics': {
        'original': '../FaceForensics++/FaceForensics++_C23/original',
        'deepfakes': '../FaceForensics++/FaceForensics++_C23/Deepfakes',
        'face2face': '../FaceForensics++/FaceForensics++_C23/Face2Face',
        'faceswap': '../FaceForensics++/FaceForensics++_C23/FaceSwap',
        'neuraltextures': '../FaceForensics++/FaceForensics++_C23/NeuralTextures'
    },
    'celebdf': {
        'celeb_real': '../Celeb V2/Celeb-real',
        'youtube_real': '../Celeb V2/YouTube-real',
        'celeb_synthesis': '../Celeb V2/Celeb-synthesis'
    },
    'dfd': {
        'original': '../DFD/DFD_original sequences',
        'manipulated': '../DFD/DFD_manipulated_sequences/DFD_manipulated_sequences'
    },
    'audio': {
        'real': '../DeepFake_AudioDataset/KAGGLE/AUDIO/REAL',
        'fake': '../DeepFake_AudioDataset/KAGGLE/AUDIO/FAKE'
    },
    'fakeavceleb': {
        'real_av': '../FakeAVCeleb/FakeAVCeleb_v1.2/FakeAVCeleb_v1.2/RealVideo-RealAudio',
        'fake_vv_aa': '../FakeAVCeleb/FakeAVCeleb_v1.2/FakeAVCeleb_v1.2/FakeVideo-FakeAudio',
        'fake_v_ra': '../FakeAVCeleb/FakeAVCeleb_v1.2/FakeAVCeleb_v1.2/FakeVideo-RealAudio',
        'fake_rv_a': '../FakeAVCeleb/FakeAVCeleb_v1.2/FakeAVCeleb_v1.2/RealVideo-FakeAudio'
    }
}

print("‚úÖ Dataset paths configured successfully!")

## üîß Utility Functions

In [None]:
def count_files_in_directory(directory, extensions=None):
    """Count files in a directory with optional extension filter."""
    if not os.path.exists(directory):
        return 0
    
    if extensions is None:
        extensions = ['.jpg', '.jpeg', '.png', '.mp4', '.avi', '.mov', '.wav', '.mp3', '.flac']
    
    count = 0
    for root, dirs, files in os.walk(directory):
        for file in files:
            if any(file.lower().endswith(ext) for ext in extensions):
                count += 1
    return count

def get_dataset_statistics():
    """Get comprehensive statistics for all datasets."""
    stats = {}
    
    # Deepfake Images
    print("üìä Counting Deepfake Image Detection Dataset...")
    stats['deepfake_images'] = {
        'train_real': count_files_in_directory(DATASET_PATHS['deepfake_images']['train_real'], ['.jpg', '.jpeg', '.png']),
        'train_fake': count_files_in_directory(DATASET_PATHS['deepfake_images']['train_fake'], ['.jpg', '.jpeg', '.png']),
        'test_real': count_files_in_directory(DATASET_PATHS['deepfake_images']['test_real'], ['.jpg', '.jpeg', '.png']),
        'test_fake': count_files_in_directory(DATASET_PATHS['deepfake_images']['test_fake'], ['.jpg', '.jpeg', '.png'])
    }
    
    # FaceForensics++
    print("üìä Counting FaceForensics++...")
    stats['faceforensics'] = {
        'original': count_files_in_directory(DATASET_PATHS['faceforensics']['original']),
        'deepfakes': count_files_in_directory(DATASET_PATHS['faceforensics']['deepfakes']),
        'face2face': count_files_in_directory(DATASET_PATHS['faceforensics']['face2face']),
        'faceswap': count_files_in_directory(DATASET_PATHS['faceforensics']['faceswap']),
        'neuraltextures': count_files_in_directory(DATASET_PATHS['faceforensics']['neuraltextures'])
    }
    
    # Celeb-DF V2
    print("üìä Counting Celeb-DF V2...")
    stats['celebdf'] = {
        'celeb_real': count_files_in_directory(DATASET_PATHS['celebdf']['celeb_real']),
        'youtube_real': count_files_in_directory(DATASET_PATHS['celebdf']['youtube_real']),
        'celeb_synthesis': count_files_in_directory(DATASET_PATHS['celebdf']['celeb_synthesis'])
    }
    
    # DFD
    print("üìä Counting DFD...")
    stats['dfd'] = {
        'original': count_files_in_directory(DATASET_PATHS['dfd']['original']),
        'manipulated': count_files_in_directory(DATASET_PATHS['dfd']['manipulated'])
    }
    
    # Audio
    print("üìä Counting DeepFake Audio Dataset...")
    stats['audio'] = {
        'real': count_files_in_directory(DATASET_PATHS['audio']['real'], ['.wav', '.mp3', '.flac']),
        'fake': count_files_in_directory(DATASET_PATHS['audio']['fake'], ['.wav', '.mp3', '.flac'])
    }
    
    # FakeAVCeleb
    print("üìä Counting FakeAVCeleb...")
    stats['fakeavceleb'] = {
        'real_av': count_files_in_directory(DATASET_PATHS['fakeavceleb']['real_av']),
        'fake_vv_aa': count_files_in_directory(DATASET_PATHS['fakeavceleb']['fake_vv_aa']),
        'fake_v_ra': count_files_in_directory(DATASET_PATHS['fakeavceleb']['fake_v_ra']),
        'fake_rv_a': count_files_in_directory(DATASET_PATHS['fakeavceleb']['fake_rv_a'])
    }
    
    return stats

def print_dataset_statistics(stats):
    """Print formatted dataset statistics with class imbalance ratio."""
    print("\n" + "="*80)
    print("üìä COMPREHENSIVE DATASET STATISTICS")
    print("="*80)
    
    # Deepfake Images
    print("\n1Ô∏è‚É£  DEEPFAKE IMAGE DETECTION DATASET")
    train_real = stats['deepfake_images']['train_real']
    train_fake = stats['deepfake_images']['train_fake']
    test_real = stats['deepfake_images']['test_real']
    test_fake = stats['deepfake_images']['test_fake']
    print(f"   Train Real: {train_real:,}")
    print(f"   Train Fake: {train_fake:,}")
    print(f"   Test Real:  {test_real:,}")
    print(f"   Test Fake:  {test_fake:,}")
    print(f"   Total:      {train_real + train_fake + test_real + test_fake:,}")
    if train_fake > 0:
        print(f"   ‚ö†Ô∏è  Train Imbalance Ratio (Real:Fake): {train_real/train_fake:.2f}:1")
    
    # FaceForensics++
    print("\n2Ô∏è‚É£  FACEFORENSICS++")
    ff_original = stats['faceforensics']['original']
    ff_deepfakes = stats['faceforensics']['deepfakes']
    ff_face2face = stats['faceforensics']['face2face']
    ff_faceswap = stats['faceforensics']['faceswap']
    ff_neural = stats['faceforensics']['neuraltextures']
    ff_total_fake = ff_deepfakes + ff_face2face + ff_faceswap + ff_neural
    print(f"   Original (Real):      {ff_original:,}")
    print(f"   Deepfakes (Fake):     {ff_deepfakes:,}")
    print(f"   Face2Face (Fake):     {ff_face2face:,}")
    print(f"   FaceSwap (Fake):      {ff_faceswap:,}")
    print(f"   NeuralTextures (Fake): {ff_neural:,}")
    print(f"   Total Fake:           {ff_total_fake:,}")
    print(f"   Total:                {ff_original + ff_total_fake:,}")
    if ff_total_fake > 0:
        print(f"   ‚ö†Ô∏è  Imbalance Ratio (Real:Fake): {ff_original/ff_total_fake:.2f}:1")
    
    # Celeb-DF V2
    print("\n3Ô∏è‚É£  CELEB-DF V2")
    celeb_real = stats['celebdf']['celeb_real']
    youtube_real = stats['celebdf']['youtube_real']
    celeb_fake = stats['celebdf']['celeb_synthesis']
    total_real = celeb_real + youtube_real
    print(f"   Celeb-real:           {celeb_real:,}")
    print(f"   YouTube-real:         {youtube_real:,}")
    print(f"   Total Real:           {total_real:,}")
    print(f"   Celeb-synthesis (Fake): {celeb_fake:,}")
    print(f"   Total:                {total_real + celeb_fake:,}")
    if celeb_fake > 0:
        print(f"   ‚ö†Ô∏è  Imbalance Ratio (Real:Fake): {total_real/celeb_fake:.2f}:1")
    
    # DFD
    print("\n4Ô∏è‚É£  DFD (DEEPFAKE DETECTION)")
    dfd_real = stats['dfd']['original']
    dfd_fake = stats['dfd']['manipulated']
    print(f"   Original (Real):      {dfd_real:,}")
    print(f"   Manipulated (Fake):   {dfd_fake:,}")
    print(f"   Total:                {dfd_real + dfd_fake:,}")
    if dfd_fake > 0:
        print(f"   ‚ö†Ô∏è  Imbalance Ratio (Real:Fake): {dfd_real/dfd_fake:.2f}:1")
    
    # Audio
    print("\n5Ô∏è‚É£  DEEPFAKE AUDIO DATASET")
    audio_real = stats['audio']['real']
    audio_fake = stats['audio']['fake']
    print(f"   Real Audio:           {audio_real:,}")
    print(f"   Fake Audio:           {audio_fake:,}")
    print(f"   Total:                {audio_real + audio_fake:,}")
    if audio_fake > 0:
        print(f"   ‚ö†Ô∏è  Imbalance Ratio (Real:Fake): {audio_real/audio_fake:.2f}:1")
    
    # FakeAVCeleb
    print("\n6Ô∏è‚É£  FAKEAVCELEB (AUDIO-VISUAL)")
    fav_real = stats['fakeavceleb']['real_av']
    fav_fake1 = stats['fakeavceleb']['fake_vv_aa']
    fav_fake2 = stats['fakeavceleb']['fake_v_ra']
    fav_fake3 = stats['fakeavceleb']['fake_rv_a']
    fav_total_fake = fav_fake1 + fav_fake2 + fav_fake3
    print(f"   RealVideo-RealAudio (Real):  {fav_real:,}")
    print(f"   FakeVideo-FakeAudio (Fake):  {fav_fake1:,}")
    print(f"   FakeVideo-RealAudio (Fake):  {fav_fake2:,}")
    print(f"   RealVideo-FakeAudio (Fake):  {fav_fake3:,}")
    print(f"   Total Fake:                  {fav_total_fake:,}")
    print(f"   Total:                       {fav_real + fav_total_fake:,}")
    if fav_total_fake > 0:
        print(f"   ‚ö†Ô∏è  Imbalance Ratio (Real:Fake): {fav_real/fav_total_fake:.2f}:1")
    
    print("\n" + "="*80)

# Get and display statistics
print("üîç Analyzing datasets... This may take a few minutes...")
dataset_stats = get_dataset_statistics()
print_dataset_statistics(dataset_stats)

## üéØ Focal Loss Implementation

Focal Loss addresses class imbalance by down-weighting easy examples and focusing on hard examples.

**Formula:** `FL(p_t) = -Œ±_t * (1 - p_t)^Œ≥ * log(p_t)`

Where:
- `Œ±`: Balancing factor for class imbalance (default: 0.25)
- `Œ≥`: Focusing parameter (default: 2.0)
- `p_t`: Model's estimated probability for the true class

In [None]:
class FocalLoss(nn.Module):
    """Focal Loss for addressing class imbalance.
    
    Reference: Lin et al. "Focal Loss for Dense Object Detection" (2017)
    """
    def __init__(self, alpha=0.25, gamma=2.0, reduction='mean'):
        super(FocalLoss, self).__init__()
        self.alpha = alpha
        self.gamma = gamma
        self.reduction = reduction
    
    def forward(self, inputs, targets):
        BCE_loss = F.binary_cross_entropy_with_logits(inputs, targets, reduction='none')
        pt = torch.exp(-BCE_loss)  # Probability of correct class
        F_loss = self.alpha * (1 - pt) ** self.gamma * BCE_loss
        
        if self.reduction == 'mean':
            return F_loss.mean()
        elif self.reduction == 'sum':
            return F_loss.sum()
        else:
            return F_loss

class WeightedFocalLoss(nn.Module):
    """Focal Loss with class weights for severe imbalance."""
    def __init__(self, alpha=0.25, gamma=2.0, weight=None, reduction='mean'):
        super(WeightedFocalLoss, self).__init__()
        self.alpha = alpha
        self.gamma = gamma
        self.weight = weight
        self.reduction = reduction
    
    def forward(self, inputs, targets):
        BCE_loss = F.binary_cross_entropy_with_logits(inputs, targets, reduction='none')
        pt = torch.exp(-BCE_loss)
        F_loss = self.alpha * (1 - pt) ** self.gamma * BCE_loss
        
        if self.weight is not None:
            # Apply class weights
            weights = self.weight[targets.long()]
            F_loss = F_loss * weights
        
        if self.reduction == 'mean':
            return F_loss.mean()
        elif self.reduction == 'sum':
            return F_loss.sum()
        else:
            return F_loss

print("‚úÖ Focal Loss implemented successfully!")