# **DEEPFAKE DETECTION THROUGH OPTICAL FLOW**

In this notebook you will find all the required steps to build up, train and test a model which detects DeepFake videos through a combination of frames and optical flow.

To run this notebook you have to add your **Kaggle Username** and **Token** in order to download the dataset from Kaggle. You can generate your API Token from your profile page.

First we import the needed packages:

In [None]:
import torch
import torchvision
import cv2

from imutils import paths
import shutil

import torch.nn as nn
import torch.optim as optim

from torchvision import transforms
from torchvision.models import resnet18, ResNet18_Weights
from torchvision.datasets import ImageFolder

from torch.utils.data import DataLoader

import matplotlib.pyplot as plt
from PIL import Image

import os
import numpy as np
from tqdm import tqdm #to show progressive meter


-------------------------------------------------------------------
# **THE DATASET**

Here, we download the dataset. We chose to use **FaceForensics++** to train and test the model. 

This is a forensics dataset consisting of **1000 original video** sequences that have been manipulated with four automated face manipulation methods: *Deepfakes*, *Face2Face*, *FaceSwap* and *NeuralTextures*.

The videos included in this subset have a compression rate factor of 23. If you want to work with the full dataset, you have to make a request through a google form to the creators. For our purpose, this one will be enough.  

In [None]:
# insert your kaggle API Token here
os.environ['KAGGLE_USERNAME'] = "[insert your kaggle username]"
os.environ['KAGGLE_KEY'] = "[insert your kaggle API Token]"

# download the dataset
!kaggle datasets download -d sorokin/faceforensics

After the download, we unzip all the content inside the new directory '*data*'. The process will take a few minutes.

In [None]:
!mkdir /content/data

# unzip the dataset in the new directory
!unzip faceforensics.zip -d /content/data

----------------------------------------------------------------------
# **HANDLE THE DATA**

Before defining the model, we must handle and pre-process the dataset. 

Here, we specify: 

1.   **The current path**;
2.   **the new paths for training, validation and test**;
3.   **the percentage of videos we want for training, validation and test**. 

We chose a separation of 70% for training, 20% for validation and 10% for test.

In [None]:
# specify the path to the dataset
DATASET_PATH = "/content/data"

# specify the paths to our training, validation and test data
TRAIN = "train"
VALIDATION = "val"
TEST = "test"

# split the dataset into training, validation and test data
TRAIN_SPLIT = 0.7
VAL_SPLIT = 0.2
TEST_SPLIT = 0.1

Below, we have the '*copy_videos()*' function, which takes as input a list (containing the videos' paths) and a destination folder for our videos.

In other words, we take the videos and we split them into training, validation and test videos. Then, we save them in the proper directories. 

We've decided to take only the DeepFake videos and the original videos. In total, **960 videos** were stored (480 DeepFakes and 480 originals).

In [None]:
def copy_videos(videoPaths, folder, Set):

  # create the proper directory if it does not exist ('train', 'val', 'test')
  if not os.path.exists(folder):
    os.makedirs(folder)
  
  number = 0
  max_videos = 0
  
  # set a max number of videos, we take 500 videos
  # 360 training videos for each type (altered and original)
  if folder=='train': 
    max_videos = 960*TRAIN_SPLIT/2
  # 60 validation videos for each type (altered and original)
  elif folder=='val':
    max_videos = 960*VAL_SPLIT/2
  # 60 test videos for each type (altered and original)
  else:
    max_videos = 960*TEST_SPLIT/2
  
  while(number<max_videos):

    path = videoPaths[number]
    
    # grab image name and its label from the path and create
		# a placeholder corresponding to the separate label folder
    videoName = path.split(os.path.sep)[-1]
    labelFolder = os.path.join(folder, Set)
		
    # check to see if the label folder exists and if not create it
    if not os.path.exists(labelFolder):
      os.makedirs(labelFolder)
		
    # construct the destination image path and copy the current
		# image to it
    destination = os.path.join(labelFolder, videoName)
    shutil.copy(path, destination)
  
    number+=1

In [None]:
# load all altered videos paths and randomly shuffle them
print("[INFO] loading video paths...")
videoAlteredPaths = list(paths.list_files(DATASET_PATH+'/manipulated_sequences/Deepfakes/c23/videos'))
np.random.shuffle(videoAlteredPaths)

# generate altered training, validation and test paths
valAlteredPathsLen = int(len(videoAlteredPaths) * VAL_SPLIT)
trainAlteredPathsLen = int(len(videoAlteredPaths) * TRAIN_SPLIT)
trainPaths = videoAlteredPaths[:trainAlteredPathsLen]
valPaths = videoAlteredPaths[trainAlteredPathsLen:trainAlteredPathsLen+valAlteredPathsLen]
testPaths = videoAlteredPaths[trainAlteredPathsLen+valAlteredPathsLen:]

# copy the altered training, validation and test videos to their respective
# directories
print("[INFO] copying training , validation and test altered videos...")
copy_videos(trainPaths, TRAIN, "altered")
copy_videos(valPaths, VALIDATION, "altered")
copy_videos(testPaths, TEST, "altered")

# load all the original videos paths and randomly shuffle them
print("[INFO] loading video paths...")
videoOriginalPaths = list(paths.list_files(DATASET_PATH+'/original_sequences/youtube/c23/videos'))
np.random.shuffle(videoOriginalPaths)

# generate original training, validation and test paths
valOriginalPathsLen = int(len(videoOriginalPaths) * VAL_SPLIT)
trainOriginalPathsLen = int(len(videoOriginalPaths) * TRAIN_SPLIT)
trainPaths = videoOriginalPaths[:trainOriginalPathsLen]
valPaths = videoOriginalPaths[trainOriginalPathsLen:trainOriginalPathsLen+valOriginalPathsLen]
testPaths = videoOriginalPaths[trainOriginalPathsLen+valOriginalPathsLen:]

# copy the original training, validation and test videos to their respective
# directories
print("[INFO] copying training , validation and test original videos...")
copy_videos(trainPaths, TRAIN, "original")
copy_videos(valPaths, VALIDATION, "original")
copy_videos(testPaths, TEST, "original")

--------------------------------------------------------------------------------
# **OPTICAL FLOW**

Here, we present the main pre-processing part, where we take each video, we extract the frames and we compute the optical flow.

Since we had a huge number of videos and we didn't use any Deep Learning method, the computation of the optical flow (for each pair of frames) would have been to slow. So, before estimating the optical flow, we've decided to extract the face from each frame and then we've computed the dense optical flow between subsequent frames.

At the end, we save a **fusion** between the RGB frames and the optical flow. 

We begin by creating the directories where we store the results:

In [None]:
opticalPath = "/content/optical_fusion/"

if not os.path.exists(opticalPath):
  os.makedirs(opticalPath)
  os.makedirs(opticalPath+"training/original")
  os.makedirs(opticalPath+"training/altered")
  os.makedirs(opticalPath+"validation/original")
  os.makedirs(opticalPath+"validation/altered")
  os.makedirs(opticalPath+"test/original")
  os.makedirs(opticalPath+"test/altered")
  

This function is meant to take as input the **BGR frame** and the **face cascade classifier** which is, in this case, the haar cascade frontalface classifier. 

We've chosen this one after several trials, optimizing the parameters, because it's the one that better detected the faces in our frames. The classifier is defined outside the function to avoid recalling it for each frame. This saves us a lot of time.

In [None]:
# define the classifier
face_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_alt2.xml')

def detect_face(frame_bgr, face_cascade):
    
    # convert the frame to gray, the classifier works with gray images
    gray = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2GRAY)
    faces = face_cascade.detectMultiScale(gray, scaleFactor=1.2, minNeighbors=5)  

    if (len(faces) == 0):
        return None
    
    for (x, y, w, h) in faces:
        continue
    
    # return the BGR face
    return frame_bgr[y - 40 : y + w + 40, x - 40 : x + h + 40]

The '*bgr_fusion()*' function takes as input the BGR frames and the optical flow (which was converted in a HSV representation and then in a BGR image).
First, it adds (**blends**) the two BGR frames with equal weights, then it blends the result with the optical flow, but this time with different weights. At the end, it returns this fusion.

In [None]:
def bgr_fusion(face1, face2, optical_flow):
  
  # average between the two frames
  facesWeighted = cv2.addWeighted(face1, 0.5, face2, 0.5, 0)
  
  # weighted average between the two frames and the optical flow
  optical_fusion = cv2.addWeighted(facesWeighted, 0.7, optical_flow, 0.3, 0)
  
  return optical_fusion


The '*compute_optical_flow()*' function takes in input the **two frames** (the crop of the faces), the type and the name (**path**) of the video, the optical flow **number** and the **set** (training, validation or test). For training and validation we compute 10 optical flow per video, so the number can be 0, 1 or 2. For test we compute the optical flow almost for all the frames.

The optical flow is computed using a the **Farneback** method, which is a dense method. First, we creates an array filled with zero with the same dimensions of the BGR frame. Then, we compute the optical flow, we extract the magnitude and angle of the 2D vectors and finally, we set image hue and value according to the optical flow direction and magnitude. Then we convert HSV to RGB (BGR) color representation. 

With this optical flow and the two frames we call the '*bgr_fusion()*' function.

In [None]:
def compute_optical_flow(face1_bgr, face2_bgr, number, path, Set=None):
    
    # convert grayscale frames into bgr frames
    face1_gray = cv2.cvtColor(face1_bgr, cv2.COLOR_BGR2GRAY)
    face2_gray = cv2.cvtColor(face2_bgr, cv2.COLOR_BGR2GRAY)

    # Creates an array filled with zero 
    # with the same dimensions of the frame
    hsv = np.zeros_like(face1_bgr)
    hsv[..., 1] = 255

    # Compute the optical flow
    flow = cv2.calcOpticalFlowFarneback(face1_gray, face2_gray, None, 0.5, 3, 15, 3, 5, 1.2, 0)
    
    # Magnitude and angle of the 2D vectors
    mag, ang = cv2.cartToPolar(flow[..., 0], flow[..., 1])

    # Sets image hue and value according to the optical flow direction
    # and magnitude, then converts HSV to RGB (BGR) color representation
    hsv[..., 0] = ang*180/np.pi/2
    hsv[..., 2] = cv2.normalize(mag, None, 0, 255, cv2.NORM_MINMAX)
    optical_bgr = cv2.cvtColor(hsv, cv2.COLOR_HSV2BGR)

    # fuse the frames with the optical flow and convert to rgb
    fused_bgr = bgr_fusion(face1_bgr, face2_bgr, optical_bgr)
    fused_rgb = cv2.cvtColor(fused_bgr, cv2.COLOR_BGR2RGB)
    
    # save the optical flow (fused with the frames) as RGB  
    Image.fromarray(fused_rgb).save(opticalPath+'{}/{}-{}.jpg'.format(Set,path,number))

The '*frames()*' function takes in input the video path and the set of the video. For training and validation data, we compute 10 optical flows, while for test data we take 30 optical flows (avoiding redundancy by skipping frames).
We've defined some parameters in order to obtain the same number of optical flows for videos of different lenghts. 

First, we capture the video with VideoCapture(). Then, we iterate several times (according to the set) the following process:

1.   **Read two frames**;
2.   **detect the faces**;
3.   **resize the faces to 224x224**;
4.   **compute the optical flow**;
5.   **skip some frames to avoid redundacy**.

In [None]:
def frames(path, Set):
    
    # capture the video
    video = cv2.VideoCapture(path)
    
    # type (original or altered) and name of the video
    path = path.split('/')[1]+'/'+ path.split('/')[2][0:-4]
    
    number = 0
    
    # set some parameters according to the set and the video lenght:
    # number of optical flows computed and number of frames skipped
    video_lenght = int(video.get(cv2.CAP_PROP_FRAME_COUNT))
    long = video_lenght > 200
    
    if Set == 'training' or Set == 'validation':
      max = 10
      if long:
        skip = 16
      else:
        skip = 8
  
    elif Set == 'test':
      max = 30
      if long:
        skip = 4
      else:
        skip = 2
      
    # iterate over the video to compute the optical flow
    while(number<max):
          
      # capture the first frame
      ret, old_frame = video.read()
      if not ret:
        break

      # capture the second frame
      ret, new_frame = video.read()
      if not ret:
        break

      # detect the faces 
      face1 = detect_face(old_frame, face_cascade)
      face2 = detect_face(new_frame, face_cascade)

      try:
        # resize the faces
        face1 = cv2.resize(face1, (224, 224), interpolation = cv2.INTER_AREA)
        face2 = cv2.resize(face2, (224, 224), interpolation = cv2.INTER_AREA)
            
        # compute the optical flow
        compute_optical_flow(face1, face2, number, path, Set)
        number +=1

      except Exception as e:
        continue
          
      # skip 20 frames to give randomness
      for i in range(0, skip, 1):
        video.read()

    video.release()

Below, we recall the '*frames()*' function for each element of a list containing the video paths. We do that for the three sets.

This process will take a while (**1h30m/2h**).

In [None]:
# take the training video paths and extract the frames 
videoTrainPaths = list(paths.list_files(TRAIN))
print("Extracting training frames:")
for i in tqdm(videoTrainPaths):
  frames(i,'training')

# take the validation video paths and extract the frames
videoValPaths = list(paths.list_files(VALIDATION))
print("Extracting validation frames:")
for i in tqdm(videoValPaths):
  frames(i,'validation')

# take the test video paths and extract the frames
videoTestPaths = list(paths.list_files(TEST))
print("Extracting test frames:")
for i in tqdm(videoTestPaths):
  frames(i,'test')

--------------------------------------------------------------------------------
# **LOAD THE DATASET**

Now, it's time to load the dataset. We **transform** the image in a PyTorch tensor and we **normalize** the data. The input images have a 224x224 resolution and they are RGB images (a fusion between the optical flow and the frames). It is important to give as input data that have similar properties to those on which the model has been trained on.

We'll also be doing data augmentation, trying to improve the performance of the model by forcing it to learn about images at different angles.

In [13]:
# mean and standard deviation
mean=[0.485, 0.456, 0.406]
std=[0.229, 0.224, 0.225]

# apply some transformations
train_tran = transforms.Compose([
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize(mean, std)
])
val_tran = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(mean, std)
])


Then, we load the training and validation set with ImageFolder and DataLoader. The **batch size** is 128, the number of **epochs** is 50. 

Subsequently, we store the **dataset classes**, the training and validation datasets' **lenght**, and the **device**; if the GPU is available, the training, validation and test phases will be performed on it. We also create two dictionaries to store the losses and the accuracies.

In [14]:
# define the batch size and the number of epochs
BATCH_SIZE = 128
epochs = 50

# load the data using ImageFolder and DataLoader
trainDataset=ImageFolder('/content/optical_fusion/training',transform=train_tran)
valDataset=ImageFolder('/content/optical_fusion/validation',transform=val_tran)
testDataset=ImageFolder('/content/optical_fusion/test',transform=val_tran)

train_loader=DataLoader(trainDataset, batch_size=BATCH_SIZE, shuffle=True)
val_loader=DataLoader(valDataset,batch_size=256, shuffle=True)
test_loader=DataLoader(testDataset,batch_size=BATCH_SIZE)

# data classes
class_name = trainDataset.classes

# training, validation and test datasets' lenght
train_size = len(train_loader) 
val_size = len(val_loader)
test_size = len(test_loader)

# dictionaries to store the losses and the accuracies
losses = {'train':[], 'val':[]}
accuracies = {'train':[], 'val':[]}

# define the device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

--------------------------------------------------
# **TRAINING AND VALIDATION FUNCTIONS**

At this point, we have to use our data to **train** and **validate** a model. 
For every epoch, we train and validate the model to keep track of how it improves. In order to do this, we wrote the training and valitation functions.

In the *train()* function, we feed the inputs and the labels (organized in batches) to our model, which gives back the predictions. Then, we compute the loss, we back propagate and we update the weights. 

The accuracy is obtained using softmax: we apply it on each frame, we get a prediction and we average this prediction along the batch. At the end we average the prediction on the whole set.

The epoch losses and accuracies are then saved in dictionaries and they are printed in the console

In [15]:
def train(model, loss_fn, optimizer, sched, epoch):
  print(f'Epoch {epoch}/{epochs}')

  # model in train mode
  model.train() 
  
  # total accuracy and total loss
  total_accuracy = 0   
  total_loss = 0 

  # iterate for each batch of data
  for data in tqdm(train_loader):

    accuracy = 0
    
    # take the input and send it to the GPU
    inputs, labels = data[0].to(device), data[1].to(device) # takes inputs and classes from the train dataset
    
    # get the predictions
    outputs = model(inputs) 

    # apply softmax on the output and get a tensor of probabilities
    probabilities = torch.nn.functional.softmax(outputs, dim=1)*100
    _, indices = torch.sort(outputs, descending = True)

    for i in range(len(probabilities)):
      for j in indices[i][:2]:
        if j == labels[i].item():
          accuracy += probabilities[i][j].item()

    # compute the loss
    loss = loss_fn(outputs, labels)
    # zero grad
    optimizer.zero_grad()
    # back propagation
    loss.backward()
    optimizer.step()
    
    # update loss and accuracy
    accuracy = round(accuracy/len(labels), 3)
    total_accuracy += accuracy
    total_loss += loss.item()

  # scheduler step
  sched.step()

  #compute epoch loss
  epoch_accuracy = round(total_accuracy/train_size, 3)
  epoch_loss = round(total_loss/train_size, 3)
  
  # save epoch losses and accuracies and print them
  accuracies['train'].append(epoch_accuracy)
  losses['train'].append(epoch_loss)
  print('Train Loss: %.3f | Accuracy: %.3f'%(epoch_loss, epoch_accuracy))

In [16]:
def validation(model, loss_fn, epoch):
  # model in train mode
  model.eval() 

  # total accuracy and total loss
  total_loss=0
  total_accuracy = 0 

  with torch.no_grad():
    # iterate for each batch of data
    for data in tqdm(val_loader):
      
      accuracy = 0
      
      # take the input and send it to the GPU
      images, labels=data[0].to(device), data[1].to(device)
      
      # get the predictions
      outputs=model(images)

      # apply softmax on the output and get a tensor of probabilities
      probabilities = torch.nn.functional.softmax(outputs, dim=1)*100
      _, indices = torch.sort(outputs, descending = True)

      for i in range(len(probabilities)):
        for j in indices[i][:2]:
          if j == labels[i].item():
            accuracy += probabilities[i][j].item()

      # compute the loss
      loss = loss_fn(outputs, labels)

      # update loss and accuracy
      total_loss += loss.item()
      accuracy = round(accuracy/len(labels), 3)
      total_accuracy += accuracy

  # compute epoch loss
  epoch_accuracy = round(total_accuracy/val_size, 3)  
  epoch_loss = round(total_loss/val_size, 3)
  
  # save epoch losses and accuracies and print them
  losses['val'].append(epoch_loss)
  accuracies['val'].append(epoch_accuracy)
  print('Validation Loss: %.3f | Accuracy: %.3f'%(epoch_loss, epoch_accuracy)) 

--------------------------------------------------------------------------------
# **THE MODEL**
The model we used is a pretrained **ResNet18**. To avoid overfitting, we froze the first 2/3 layers, keeping the weights untouched. The last fully connected layer was replaced by a new fully connected one with 2 output neurons. 

We trained this model for 50 epochs using a **cross entropy loss**, the **stochastic gradient descent** as optimizer and a **step scheduler**.

The scheduler is meant to reduce the learning rate every 'step_size' epochs.

In [18]:
# ResNet18 with pretrained weights on ImageNet
model = resnet18(weights=ResNet18_Weights.DEFAULT)

# freeze some parameters
ct = 0
for child in model.children():
  ct += 1
  if ct < 3:
    for param in child.parameters():
      param.requires_grad = False

# change the last FC layer
num_ftrs = model.fc.in_features
model.fc = nn.Linear(num_ftrs, 2)

# send the model to the GPU (or to CPU if you don't have it)
model = model.to(device)

# define the loss
loss_fn = nn.CrossEntropyLoss()

# SGD Optimizer
optimizer_ft = optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

#StepLr scheduler
scheduler = torch.optim.lr_scheduler.StepLR(optimizer_ft, step_size=6, gamma=0.1)

In [None]:
# train and validate for 50 epochs
for epoch in range(1,epochs+1): 
  train(model,loss_fn, optimizer_ft, scheduler, epoch)
  validation(model, loss_fn, epoch)

-------------------------------------------------------------------------------
# **Plot the results**

Below, we can see the results. Our model reaches an accuracy of 95% on the training set, while for the validation set, it has 80-85% accuracy.
We still have a little of overfitting.

In [None]:
plt.figure(figsize=(11, 11))
plt.subplot(2, 1, 1)
plt.plot(accuracies['train'], label='Training Accuracy')
plt.plot(accuracies['val'], label='Validation Accuracy')
plt.legend(loc='lower right')
plt.ylabel('Accuracy')
plt.title('Training and Validation Accuracy')

plt.subplot(2, 1, 2)
plt.plot(losses['train'], label='Training Loss')
plt.plot(losses['val'], label='Validation Loss')
plt.legend(loc='upper right')
plt.ylabel('Cross Entropy')
plt.ylim([0,1.0])
plt.title('Training and Validation Loss')
plt.xlabel('epoch')
plt.show()

--------------------------------------------------------------------------------
# **TEST THE MODEL ON THE TEST SET**
Here, we test our final model with the test set. Since it would have been tricky to test the videos separately, we've decided to average the softmax for all the frames. This is the same thing of averaging the prediction for each video since the number of frames taken per video is the same.

We've also added an accuracy indicator for original and altered set to highlight how the model performs on these two types of videos.

In [None]:
# set the model in evaluation mode
model.eval()
with torch.no_grad():
  fake_accuracy =0
  fake_video = 0
  original_accuracy = 0
  original_video = 0
  video_accuracy = 0
  
  # iterate for each batch
  for data in test_loader:
    
    # take the input and send it to the GPU
    images, labels = data[0].to(device), data[1].to(device)

    # get the predictions from the model 
    outputs = model(images)

    # get the probabilities and compute the accuracy
    probabilities = torch.nn.functional.softmax(outputs, dim=1)*100
    _, indices = torch.sort(outputs, descending = True)
    
    for i in range(len(probabilities)):
      for j in indices[i][:2]:
        if j==labels[i].item():
          video_accuracy += probabilities[i][j].item()
          if labels[i].item() == 0:
            fake_accuracy += probabilities[i][j].item()
            fake_video +=1
          else:
            original_accuracy += probabilities[i][j].item()
            original_video +=1
  
  # print the results
  fake_accuracy = round(fake_accuracy/fake_video,2)
  print("Total Accuracy for fake videos is: {}".format(fake_accuracy))
  print()
  original_accuracy = round(original_accuracy/original_video,2)
  print("Total Accuracy for original videos is: {}".format(original_accuracy))
  print()
  total_accuracy = round(video_accuracy/(fake_video+original_video),2)
  print("Total Accuracy is: {}".format(total_accuracy))