# Building a Deepfake Detector using Machine Learning Models
This notebook demonstrates the development of a deepfake detection system using multiple pre-trained **CNN (Convolutional Neural Network)** models (**ResNet**, **EfficientNet** and **Xception**) combined with **LSTM (Long Short-Term Memory)** for temporal analysis. The dataset used is **FaceForensics++**, **DFDC** and **Celeb-DF (v2)** with `OpenCV` utilized for video frame extraction and preprocessing and `dlib` employed for face detection and cropping.

## 1. Importing Libraries and Setup
Importing all necessary libraries at the top to ensures better organization, easy debugging and smooth execution of the entire pipeline.

In [1]:
import os
import sys
import cv2
import numpy as np
import pandas as pd
import shutil
import dlib
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
from tensorflow.keras.preprocessing.image import img_to_array
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.models import Model
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Input, Dense, Flatten, GlobalAveragePooling2D, Dropout, LSTM, TimeDistributed, Concatenate
from tensorflow.keras.applications import ResNet50, EfficientNetV2B0, Xception
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ModelCheckpoint, CSVLogger

### 1.1 Checking TensorFlow GPU Support and Listing Available GPUs

In [2]:
import tensorflow as tf

# Check if TensorFlow is built with CUDA support and list GPUs
print("TensorFlow CUDA Support:", tf.test.is_built_with_cuda())
gpus = tf.config.list_physical_devices('GPU')
print("Num GPUs Available:", len(gpus))

if gpus:
    for i, gpu in enumerate(gpus):
        print(f"GPU {i}: {tf.config.experimental.get_device_details(gpu)['device_name']}")
else:
    print("No GPU detected.")

TensorFlow CUDA Support: True
Num GPUs Available: 1
GPU 0: NVIDIA GeForce 940MX


### 1.2 Configuring GPU Memory Growth
To ensure TensorFlow uses GPU resources effectively.

In [3]:
# Enable GPU memory growth to prevent allocation issues
physical_devices = tf.config.list_physical_devices('GPU')
if physical_devices:
    for gpu in physical_devices:
        tf.config.experimental.set_memory_growth(gpu, True)
        print(f"Enabled memory growth for: {tf.config.experimental.get_device_details(gpu)['device_name']}")
else:
    print("No GPU devices found. Ensure proper GPU setup.")

Enabled memory growth for: NVIDIA GeForce 940MX


## 2. Dataset Preparation
Preparing the dataset for video frame extraction, face detection and cropping followed by organizing the data into structured train and validation directories.

### 2.1 Defining Paths and Creating Directories for the Training Dataset

In [8]:
# Defining base directory where the dataset resides
base_dir = os.getcwd() # Current working directory where my Jupyter Notebook is located

# Defining paths for dataset directories
real_videos_dir = os.path.join(base_dir, "Datasets", "FaceForensic++", "real")
fake_videos_dir = os.path.join(base_dir, "Datasets", "FaceForensic++", "fake")

# Defining paths for cropped faces directories
real_faces_dir = os.path.join(base_dir, "Cropped_Faces", "real")
fake_faces_dir = os.path.join(base_dir, "Cropped_Faces", "fake")

# Defining paths for training and validation directories
train_dir = os.path.join(base_dir, "Cropped_Faces", "train")
val_dir = os.path.join(base_dir, "Cropped_Faces", "val")

# Creating necessary directories if they don’t already exist
os.makedirs(real_faces_dir, exist_ok=True)
os.makedirs(fake_faces_dir, exist_ok=True)
os.makedirs(os.path.join(train_dir, "real"), exist_ok=True)
os.makedirs(os.path.join(train_dir, "fake"), exist_ok=True)
os.makedirs(os.path.join(val_dir, "real"), exist_ok=True)
os.makedirs(os.path.join(val_dir, "fake"), exist_ok=True)

print(f"Directories for processing and output:")
print(f"Real Videos: {real_videos_dir}")
print(f"Fake Videos: {fake_videos_dir}")
print(f"Real Faces: {real_faces_dir}")
print(f"Fake Faces: {fake_faces_dir}")
print(f"Train Directory: {train_dir}")
print(f"Validation Directory: {val_dir}")

Directories for processing and output:
Real Videos: C:\Users\atul\Datasets\FaceForensic++\real
Fake Videos: C:\Users\atul\Datasets\FaceForensic++\fake
Real Faces: C:\Users\atul\Cropped_Faces\real
Fake Faces: C:\Users\atul\Cropped_Faces\fake
Train Directory: C:\Users\atul\Cropped_Faces\train
Validation Directory: C:\Users\atul\Cropped_Faces\val


### 2.2 Face Detection and Cropping from Dataset for Train/Val

In [9]:
# Initialize the face detector
detector = dlib.get_frontal_face_detector()

def crop_faces(input_dir, output_dir, dataset_name, face_size=(224, 224), is_real=True):
    """
    Detects and crops faces from videos in the input directory.
    Cropped faces are saved in specific folders with unique names in the output directory.

    Args:
    - input_dir (str): Path to the directory containing videos.
    - output_dir (str): Path to the directory to save cropped face images.
    - dataset_name (str): Prefix for naming folders and files (e.g., dataset name).
    - face_size (tuple): Dimensions to resize each face (width, height).
    - is_real (bool): Indicates whether the videos are from the "real" or "fake" category.
    """
    # Check if input and output directories exist
    if not os.path.exists(input_dir):
        print(f"Input directory {input_dir} does not exist. Skipping.")
        return
    os.makedirs(output_dir, exist_ok=True)

    # Supported video formats
    supported_formats = (".mp4", ".avi", ".mkv", ".mov")

    # Initialize folder counter
    folder_counter = 0

    # Looping through each file in the input directory
    for file in os.listdir(input_dir):
        if file.lower().endswith(supported_formats):  # Process only supported video files
            video_name = os.path.splitext(file)[0]  # Extract the video name (without extension)

            # Generate a unique folder name based on the dataset name and category
            category = "real" if is_real else "fake"
            folder_name = f"{dataset_name}_{category}{folder_counter}"
            folder_counter += 1

            # Create the unique folder
            folder_path = os.path.join(output_dir, folder_name)

            # Skip already processed videos
            if os.path.exists(folder_path) and len(os.listdir(folder_path)) > 0:
                print(f"Skipping already processed video: {file}")
                continue

            os.makedirs(folder_path, exist_ok=True)

            video_path = os.path.join(input_dir, file)
            cap = cv2.VideoCapture(video_path)  # Open the video file

            if not cap.isOpened():
                print(f"Failed to open video {file}. Skipping.")
                continue

            frame_count = 0
            cropped_count = 0  # Counter for cropped faces

            # Looping through frames in the video
            while cap.isOpened():
                ret, frame = cap.read()  # Read a frame
                if not ret:  # Exit when no more frames
                    break

                frame_count += 1
                gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)  # Convert frame to grayscale
                faces = detector(gray)  # Detect faces in the frame

                # Save each detected face
                for i, face in enumerate(faces):
                    x, y, w, h = face.left(), face.top(), face.width(), face.height()

                    # Validate face coordinates to ensure they are within the frame bounds
                    x = max(0, x)
                    y = max(0, y)
                    w = min(frame.shape[1] - x, w)
                    h = min(frame.shape[0] - y, h)

                    if w <= 0 or h <= 0:  # Check if the cropped region is valid
                        print(f"Invalid face region in frame {frame_count}, video {file}. Skipping.")
                        continue

                    # Crop the face from the frame
                    cropped_face = frame[y:y+h, x:x+w]

                    # Resize the cropped face to the specified size
                    cropped_face = cv2.resize(cropped_face, face_size)

                    # Generate a unique filename for the cropped face
                    file_name = f"{folder_name}_frame{frame_count}_face{i}.jpg"
                    save_path = os.path.join(folder_path, file_name)

                    # Save the cropped face
                    cv2.imwrite(save_path, cropped_face)
                    cropped_count += 1  # Increment the cropped face counter

            cap.release()  # Release the video capture object
            print(f"Processed {file}: {cropped_count} face(s) cropped into {folder_name}.")
    print("--- Face cropping complete ---")

# Process real videos
print("--- Processing Real videos from FaceForensic++ dataset ---")
crop_faces(real_videos_dir, real_faces_dir, "FF", is_real=True)

# Process fake videos
print("\n--- Processing Fake videos from FaceForensic++ dataset ---")
crop_faces(fake_videos_dir, fake_faces_dir, "FF", is_real=False)

--- Processing Real videos from FaceForensic++ dataset ---
Processed 01__exit_phone_room.mp4: 255 face(s) cropped into FF_real0.
Processed 01__hugging_happy.mp4: 702 face(s) cropped into FF_real1.
Processed 01__kitchen_pan.mp4: 534 face(s) cropped into FF_real2.
Processed 01__kitchen_still.mp4: 800 face(s) cropped into FF_real3.
Processed 01__meeting_serious.mp4: 1199 face(s) cropped into FF_real4.
Processed 01__outside_talking_pan_laughing.mp4: 599 face(s) cropped into FF_real5.
Processed 01__outside_talking_still_laughing.mp4: 841 face(s) cropped into FF_real6.
Processed 01__podium_speech_happy.mp4: 902 face(s) cropped into FF_real7.
Processed 01__secret_conversation.mp4: 1000 face(s) cropped into FF_real8.
Processed 01__talking_against_wall.mp4: 860 face(s) cropped into FF_real9.
Processed 01__talking_angry_couch.mp4: 1489 face(s) cropped into FF_real10.
Processed 01__walking_and_outside_surprised.mp4: 2032 face(s) cropped into FF_real11.
Processed 01__walking_down_indoor_hall_disgu

### 2.3 Defining Paths and Creating Directories for the Test Dataset

In [4]:
# Defining base directory where the dataset resides
base_dir = os.getcwd() # Current working directory where my Jupyter Notebook is located

# Defining paths for dataset directories
real_videos_dir = os.path.join(base_dir, "Datasets", "SDFVD", "real")
fake_videos_dir = os.path.join(base_dir, "Datasets", "SDFVD", "fake")

# Defining paths for cropped faces directories
real_faces_dir = os.path.join(base_dir, "Cropped_Faces", "test", "real")
fake_faces_dir = os.path.join(base_dir, "Cropped_Faces", "test", "fake")

# Creating necessary directories if they don’t already exist
os.makedirs(real_faces_dir, exist_ok=True)
os.makedirs(fake_faces_dir, exist_ok=True)

print(f"Directories for Test:")
print(f"Real Videos Directory: {real_videos_dir}")
print(f"Fake Videos Directory: {fake_videos_dir}")
print(f"Real Faces Directory: {real_faces_dir}")
print(f"Fake Faces Directory: {fake_faces_dir}")

Directories for Test:
Real Videos Directory: C:\Users\atul\Datasets\SDFVD\real
Fake Videos Directory: C:\Users\atul\Datasets\SDFVD\fake
Real Faces Directory: C:\Users\atul\Cropped_Faces\test\real
Fake Faces Directory: C:\Users\atul\Cropped_Faces\test\fake


### 2.2 Face Detection and Cropping from Dataset for Testing

In [6]:
# Initialize the face detector
detector = dlib.get_frontal_face_detector()

def crop_faces(input_dir, output_dir, dataset_name, face_size=(224, 224), is_real=True):
    """
    Detects and crops faces from videos in the input directory.
    Cropped faces are saved in specific folders with unique names in the output directory.

    Args:
    - input_dir (str): Path to the directory containing videos.
    - output_dir (str): Path to the directory to save cropped face images.
    - dataset_name (str): Prefix for naming folders and files (e.g., dataset name).
    - face_size (tuple): Dimensions to resize each face (width, height).
    - is_real (bool): Indicates whether the videos are from the "real" or "fake" category.
    """
    # Check if input and output directories exist
    if not os.path.exists(input_dir):
        print(f"Input directory {input_dir} does not exist. Skipping.")
        return
    os.makedirs(output_dir, exist_ok=True)

    # Supported video formats
    supported_formats = (".mp4", ".avi", ".mkv", ".mov")

    # Initialize folder counter
    folder_counter = 0

    # Looping through each file in the input directory
    for file in os.listdir(input_dir):
        if file.lower().endswith(supported_formats):  # Process only supported video files
            video_name = os.path.splitext(file)[0]  # Extract the video name (without extension)

            # Generate a unique folder name based on the dataset name and category
            category = "real" if is_real else "fake"
            folder_name = f"{dataset_name}_{category}{folder_counter}"
            folder_counter += 1

            # Create the unique folder
            folder_path = os.path.join(output_dir, folder_name)

            # Skip already processed videos
            if os.path.exists(folder_path) and len(os.listdir(folder_path)) > 0:
                print(f"Skipping already processed video: {file}")
                continue

            os.makedirs(folder_path, exist_ok=True)

            video_path = os.path.join(input_dir, file)
            cap = cv2.VideoCapture(video_path)  # Open the video file

            if not cap.isOpened():
                print(f"Failed to open video {file}. Skipping.")
                continue

            frame_count = 0
            cropped_count = 0  # Counter for cropped faces

            # Looping through frames in the video
            while cap.isOpened():
                ret, frame = cap.read()  # Read a frame
                if not ret:  # Exit when no more frames
                    break

                frame_count += 1
                gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)  # Convert frame to grayscale
                faces = detector(gray)  # Detect faces in the frame

                # Save each detected face
                for i, face in enumerate(faces):
                    x, y, w, h = face.left(), face.top(), face.width(), face.height()

                    # Validate face coordinates to ensure they are within the frame bounds
                    x = max(0, x)
                    y = max(0, y)
                    w = min(frame.shape[1] - x, w)
                    h = min(frame.shape[0] - y, h)

                    if w <= 0 or h <= 0:  # Check if the cropped region is valid
                        print(f"Invalid face region in frame {frame_count}, video {file}. Skipping.")
                        continue

                    # Crop the face from the frame
                    cropped_face = frame[y:y+h, x:x+w]

                    # Resize the cropped face to the specified size
                    cropped_face = cv2.resize(cropped_face, face_size)

                    # Generate a unique filename for the cropped face
                    file_name = f"{folder_name}_frame{frame_count}_face{i}.jpg"
                    save_path = os.path.join(folder_path, file_name)

                    # Save the cropped face
                    cv2.imwrite(save_path, cropped_face)
                    cropped_count += 1  # Increment the cropped face counter

            cap.release()  # Release the video capture object
            print(f"Processed {file}: {cropped_count} face(s) cropped into {folder_name}.")
    print("--- Face cropping complete ---")

# Process real videos
print("--- Processing Real videos from SDFVD dataset ---")
crop_faces(real_videos_dir, real_faces_dir, "SDFVD", is_real=True)

# Process fake videos
print("\n--- Processing Fake videos from SDFVD dataset ---")
crop_faces(fake_videos_dir, fake_faces_dir, "SDFVD", is_real=False)

--- Processing Real videos from SDFVD dataset ---
Skipping already processed video: v1.mp4
Skipping already processed video: v10.mp4
Skipping already processed video: v11.mp4
Skipping already processed video: v12.mp4
Skipping already processed video: v13.mp4
Processed v14.mp4: 0 face(s) cropped into SDFVD_real5.
Skipping already processed video: v15.mp4
Skipping already processed video: v16.mp4
Skipping already processed video: v17.mp4
Skipping already processed video: v18.mp4
Skipping already processed video: v19.mp4
Skipping already processed video: v2.mp4
Skipping already processed video: v20.mp4
Skipping already processed video: v21.mp4
Skipping already processed video: v22.mp4
Skipping already processed video: v23.mp4
Skipping already processed video: v24.mp4
Skipping already processed video: v25.mp4
Skipping already processed video: v26.mp4
Processed v27.mp4: 0 face(s) cropped into SDFVD_real19.
Skipping already processed video: v28.mp4
Skipping already processed video: v29.mp4
S