# Deepfake Detection with CNN and OpenCV
---

### The Dataset

Using the [SDFVD 2.0 dataset](https://data.mendeley.com/datasets/zzb7jyy8w8/1) from Mendeley, this includes:

*   461 real videos
*   461 fake videos

Clips are short, high-quality, and feature diverse faces. They are augmented.

---
### Steps

1. Dataset Organization
  The dataset is already structured into two main folders `SDFVD2.0_real/` and `SDFVD2.0_fake/` each containing:

   *   `original/` - raw unaltered videos
   *   `augmented/` - videos with transformations (e.g brightness, noise, blur)

  Each augmented video follows this naming pattern:

  `<prefix>_<original_filename>_aug_<augmentation_index>.mp4`

  This makes it easier to manage training splits. Splitting data into training and testing sets is an important step. It assists with evaluating a model's performance on unseen data and prevent overfitting.

2. Load Videos and Extract Frames (OpenCV)

  Extract frames from each `mp4` file.
  Organized extracted frames into `processed_frames/` organized by `real/` or `fake/` and video source `original/` or `augmented/`

3. Detect Faces (OpenCV)

  We want consistency, so its important to focus only on the relevant part of the image. Run face detection on each frame to isolate the facial region.

4. Preprocess Images
For each detected face:

  *   Crop to face region
  *   Resize to 224 × 224
  * Convert BGR → RGB
  * Normalize pixels for CNN input

5. Train the CNN

  Train **convolutional neural network** to classify each image as:
    *   `0` → Real
    *   `1` → Fake

  Use training loops with loss functions like `CrossEntropyLoss` and optimizers like `Adam`.

6. Predict from Preprocessed Frames

  Run trained CNN on new frames and collect predictions either labels or probabilities.

   > **(OPTIONAL) The PyTorch GRAD-CAM to explain the model's predictions**
   >
   > [The PyTorch Grad-Cam Library](https://github.com/jacobgil/pytorch-grad-cam) implements several methods to interpret the decision of CNN when classifying an image real or fake
   
![Example on Github, replace with our own](https://raw.githubusercontent.com/jacobgil/jacobgil.github.io/master/assets/cam_dog.gif)

7.  Apply some logic to classify the entire video as fake or real. We can do this by:
  * If most frames are fake → video is fake
  * Classify the entire video as fake or real by averaging the frame-level fake probabilities. If the average exceeds a threshold, label it fake; otherwise, real.



# 1. Mount Google Drive in Colab

In [None]:
from google.colab import drive
drive.mount('/content/drive')

In [None]:
import sys
import os

project_path = '/content/drive/My Drive/project'
src_path = os.path.join(project_path, 'src')

sys.path.append(src_path)
os.chdir(project_path)