# Preprocessing

In this notebook we will read in the train videos and extract n number of frames. Then we will perform facial recognition to extract every face from those frames and write them as their own images (after resizing). 

No augmentation will be done in this notebook - this will leave us the option to do it after the raw face images have been written. That way we can try numerous augmentation techniques without having to extract the frames again, and ensures that we try augmentation to the same raw images each time (and thus have a more reliable testing environment).

We use OpenCV to read the videos, extract the frames and reshape them. The [MTCNN algorithm](https://github.com/ipazc/mtcnn) is used for facial recognition. This is an effective algorithm, however I am keen to try quicker, more lightweight algorithms like BlazeFace and YOLOv2 later on...

------------------------------
*PLEASE NOTE*:
The scripts in this notebook have been designed for the FULL training dataset on [Kaggle](https://www.kaggle.com/c/deepfake-detection-challenge). There will be some pathing and folder related errors if you attempt this using the train_sample data.

In [1]:
#!pip install ../input/mtcnn-package/mtcnn-0.1.0-py3-none-any.whl

import pandas as pd
import numpy as np

import os
import sys
import shutil

import cv2
from mtcnn import MTCNN

from tqdm.notebook import tqdm
import random

import warnings
warnings.filterwarnings("ignore")

Using TensorFlow backend.


First we define our directory paths and directory lists - including the directory where we will save our train and test images that we extract from the videos.

In [2]:
train_videos_path = '../input/train_videos/'
train_metadata_path = '../input/train_metadata/'
train_images_path = "../input/train_images/" # path to save train images to

We'll loop through all of the videos in all the train folder locations to make one list of paths. We will also rename the metadata (to determine which folder it corresponds to) and copy it to a new directory 'train_metadata'.

In [3]:
train_videos_files = [] # List of all train videos paths
train_metadata_files = [] # List of train metadata paths

for folder in enumerate(os.listdir(train_videos_path)):
    for file in os.listdir(train_videos_path + folder[1]):
        if file == 'metadata.json':
            # Rename and copy the metadata to a new directory
            old_path = train_videos_path + folder[1] + '/' + file
            new_path = train_metadata_path + 'metadata' + str(folder[0]) + '.json'
            shutil.copy(old_path, new_path)            
            train_metadata_files.append(new_path)
        else:
            train_videos_files.append(train_videos_path + folder[1] + '/'+ file)

Now we loop round all the videos in our directory to extract images for each video.

In [4]:
def extract_faces(videos_dir_path, images_dir_path, frames=1, conf_level=0.75):
    """
    Inputs a directory of videos, extracts n frames. 
    Outputs images of ANY faces detected in those frames.
        
    videos_dir_path: (str) Path to your directory of videos
    images_dir_path: (str) Path to where you'll save your images to
    frames: (int) Extract n number of random frames from video.
    """
    if type(videos_dir_path) == list: 
        videos_dir = videos_dir_path
    else: 
        videos_dir = os.listdir(videos_dir_path) # List train vids
    # Extract images from videos
    print(f'Extracting {frames} random frame(s) from videos')
    detector = MTCNN() # Facial recognition algorithm
    
    for video in tqdm(range(0, len(videos_dir))):
        file_name = videos_dir_path[video].split('/')[4]
        file_path = videos_dir_path[video]
        vid_name = file_name.split('.')[0]
        frames_list = [] # We'll store the raw frames here
        for i in range(0, frames):
            try:
                # Extract frames from video 
                cap = cv2.VideoCapture(file_path)
                total_frames = cap.get(7)
                frame_number = random.randint(0, total_frames)
                cap.set(1, frame_number)
                success, frame = cap.read()
                frames_list.append(frame)
                
                
                ## USE BELOW FOR TESTING - IS NOT SCALABLE FOR TRAIN EXTRACTION ## 
                
                ## Number to get every nth frame from. E.g. if we want
                ## 10 frames, and the video has 300 total frames, 
                ## we take every 30th frame.
                #num_frames = total_frames / frames 
                #success, frame = cap.read()
                #count = 0
                #while success:
                #    if count % num_frames == 0:
                #        frames_list.append(frame)
                #    success, frame = cap.read()
                #    count += 1
                #    if count > num_frames:
                #        break
                
                ## USE ABOVE FOR TESTING - IS NOT SCALABLE FOR TRAIN EXTRACTION ## 
                
                
                
                # Extract faces from frames
                for i, image in enumerate(frames_list):
                    frame_name = vid_name + '_' + str(i)
                    # Read image and detect faces
                    result = detector.detect_faces(image)
                    # Extract and save faces as their own images
                    faces = []
                    for i, face in enumerate(result):
                        # Only extract the face if confidence is more than or equal to default 0.95
                        if face['confidence'] >= conf_level:
                            startX, startY, width, height = face['box'] # Get box coordinates
                            img_crop = image[startY:startY + height, startX:startX + width]
                            img_crop_resize = cv2.resize(img_crop, (256, 256))
        
                            crop_img_name = images_dir_path + frame_name + '_' + str(i) + '.jpg'
                            cv2.imwrite(crop_img_name, img_crop_resize)
                        else:
                            pass
            except:
                pass

In [5]:
extract_faces(train_videos_files, train_images_path, frames=1)

Extracting 1 random frame(s) from videos


HBox(children=(FloatProgress(value=0.0, max=119146.0), HTML(value='')))




We have completed the face extraction and image preprocessing stage for the training data. We should now have a directory of images that we will train our model with.

This is by no means a perfect solution - it took ~1.5 days to complete on the entire training set. I ended up with ~240,000 images. This is with GPU. 

Ideally I wouldn't have to loop through the whole video just to extract n frames (which is where the significant time comes from). However I haven't found a way so far that can just pluck out the desired frames - and not through lack of trying!! Video codecs are just too complicated and OpenCV is not yet equipped to have this functionality.

We'll revisit this code in the test stage, when we create our test pipeline.

The next stage of this project is the [Train notebook](https://github.com/TheNerdyCat/deepfake-detection-challenge/blob/master/output/train.ipynb)