## Preparing Data

By: Alex Comerford (alexanderjcomerford@gmail.com)

In this notebook we will be preparing and organizing data for our machine learning model. This model is based off of neural networks, specifically using the Generative Adversarial Network architecture. The data we will be providing to this network will be image based data extracted from a video.

Our input format is `webm` and our desired output format is `png`.

#### Environment setup

First we will install some high level dependencies for our data extraction process. In this case we will be using the `ffmpeg` library to extract images from our `webm` input format.

## Import dependencies

Now that we have our environment setup we will be importing all the dependencies we will need to extract images from our videos

In [1]:
import os
import dlib
import ffmpeg
import shutil
import PIL.Image
import IPython.display
import numpy as np
import cv2

## Paramaters

In this notebook we will be defining the high level parameters to be used throughout the rest of the notebook. In this case we only need to define 3. The input webm file, the output save path, and the number of images to produce.

In [2]:
FILE                     = "./data/raw/2018-12-02-215035.webm"
ORIGINAL_SAVE_PATH       = "./data/original_images/"
LANDMARK_SAVE_PATH       = "./data/landmark_images/"
FACE_LANDMARK_SHAPE_FILE = "./models/shape_predictor_68_face_landmarks.dat"
MAX_NUM_IMAGES           = 5000
DOWNSAMPLE_RATIO         = 2
input_filetype           = ".jpg"

## About the input file

In this next cell we will get some high level informatino about our input video including the number of total frames in the video, location, and size

In [3]:
## Use ffmpeg to get the number of frames
def get_num_frames(input_file):
    frames = !ffmpeg -i {FILE} -map 0:v:0 -c copy -f null -y /dev/null 2>&1 | grep -Eo 'frame= *[0-9]+ *' | grep -Eo '[0-9]+' | tail -1
    frames = int(frames[0])
    return frames

## Extract informatino about input file
probe = ffmpeg.probe(FILE)
video = ffmpeg.input(FILE)
video_info = next(s for s in probe['streams'] if s['codec_type'] == 'video')
width = int(video_info['width'])
height = int(video_info['height'])
num_frames = get_num_frames(FILE)

print ("File Location = ", FILE)
print ("File Size     = ", os.path.getsize(FILE))
print ("Num Frames    = ", num_frames)
print ("Width         = ", width)
print ("Height        = ", height)

File Location =  ./data/raw/2018-12-02-215035.webm
File Size     =  53890813
Num Frames    =  11743
Width         =  640
Height        =  480


In [4]:
def get_n_frame(video_input, n):
    '''get_n_frame
    
    Given an ffmpeg video input source, return the
    nth frame as a numpy array
    '''

    out, _ = (
        video_input.filter('select', 'gte(n,{})'.format(n))
        .output('pipe:', format='rawvideo', pix_fmt='rgb24', vframes=1)
        .run(capture_stdout=True, capture_stderr=True)
    )
    extracted_frame = np.frombuffer(out, np.uint8).reshape([height, width, 3])
    return extracted_frame

def save_n_frame(video_input, save_path, n):
    '''save_n_frame
    
    Given an ffmpeg video input source save, the nth
    frame as a jpg
    '''
    video_input.filter('select', 'gte(n,{})'.format(n))\
               .output(os.path.join(save_path,'%d.jpg'%n), 
                       vframes=1, 
                       format='image2', 
                       vcodec='mjpeg')\
               .overwrite_output()\
               .run()

In [5]:
def extract_landmarks(input_frame):
    def reshape_for_polyline(array):
        return np.array(array, np.int32).reshape((-1, 1, 2))
    
    frame_resize = cv2.resize(input_frame, None, fx=1 / DOWNSAMPLE_RATIO, fy=1 / DOWNSAMPLE_RATIO)
    gray = cv2.cvtColor(frame_resize, cv2.COLOR_BGR2GRAY)
    faces = detector(gray, 1)
    black_image = np.zeros(input_frame.shape, np.uint8)

    # Perform if there is a face detected
    if len(faces) >= 1:
        for face in faces:
            detected_landmarks = predictor(gray, face).parts()
            landmarks = [[p.x * DOWNSAMPLE_RATIO, p.y * DOWNSAMPLE_RATIO] for p in detected_landmarks]

            jaw = reshape_for_polyline(landmarks[0:17])
            left_eyebrow = reshape_for_polyline(landmarks[22:27])
            right_eyebrow = reshape_for_polyline(landmarks[17:22])
            nose_bridge = reshape_for_polyline(landmarks[27:31])
            lower_nose = reshape_for_polyline(landmarks[30:35])
            left_eye = reshape_for_polyline(landmarks[42:48])
            right_eye = reshape_for_polyline(landmarks[36:42])
            outer_lip = reshape_for_polyline(landmarks[48:60])
            inner_lip = reshape_for_polyline(landmarks[60:68])

            color = (255, 255, 255)
            thickness = 3

            cv2.polylines(black_image, [jaw], False, color, thickness)
            cv2.polylines(black_image, [left_eyebrow], False, color, thickness)
            cv2.polylines(black_image, [right_eyebrow], False, color, thickness)
            cv2.polylines(black_image, [nose_bridge], False, color, thickness)
            cv2.polylines(black_image, [lower_nose], True, color, thickness)
            cv2.polylines(black_image, [left_eye], True, color, thickness)
            cv2.polylines(black_image, [right_eye], True, color, thickness)
            cv2.polylines(black_image, [outer_lip], True, color, thickness)
            cv2.polylines(black_image, [inner_lip], True, color, thickness)
            
        return input_frame, black_image
    return (False, False)

## Extracting frames/images

In the next cell we will be using ffmpeg in python to extract MAX_NUM_IMAGES of images into our output directory. Afterwards we will rearchetest the entire directory structure in a tree format. This seems weird but is easier to batch files in sub directories

In [13]:
def prepare_dataset(input_file, 
                    original_save_path, 
                    landmark_save_path,
                    max_num_images,
                    log_every=100):
    
    ## make dir if doesn't exist
    os.makedirs(original_save_path, exist_ok=True)
    os.makedirs(landmark_save_path, exist_ok=True)
    
    ## Func to count files in dir
    num_files_in_dir = lambda path: sum(list(len(f) for _,_,f in os.walk(path)))
    
    ## Divisor for number of frames seperated between iterations
    divisor = 1 if not int(num_frames / max_num_images) \
                else int(num_frames / max_num_images)
    
    ## Check if dataset already exists
    if (num_files_in_dir(original_save_path) >= (num_frames / divisor)) and \
       (num_files_in_dir(landmark_save_path) >= (num_frames / divisor)):
        print ("Dataset already created, returning ...")
        return
    
    successful_extractions = 0
    failed_extractions = 0
    for i in range(0,num_frames,divisor):

            ## Log images
            if i%log_every==0:
                print ("--------------------------")
                print ("%d iterations..."%i)
                print ("%d successful_extractions"%successful_extractions)
                print ("%d failed_extractions"%failed_extractions)

            ## make numeric group directories
            original_save_path_group = os.path.join(original_save_path, '%s'%str(i)[0])
            landmark_save_path_group = os.path.join(landmark_save_path, '%s'%str(i)[0])
            os.makedirs(original_save_path_group, exist_ok=True)
            os.makedirs(landmark_save_path_group, exist_ok=True)
            
            if len(str(i)) > 1:
                original_save_path_group = os.path.join(original_save_path_group, '%s'%str(i)[1])
                landmark_save_path_group = os.path.join(landmark_save_path_group, '%s'%str(i)[1])
                os.makedirs(original_save_path_group,exist_ok=True)
                os.makedirs(landmark_save_path_group,exist_ok=True)            
            
            ## extract landmarks and frame
            extracted_frame = get_n_frame(video, i)
            extracted_frame, extracted_landmark = extract_landmarks(extracted_frame)
            
            if type(extracted_frame) == bool:
                failed_extractions += 1
            else:
                
                try:
                    save_n_frame(video, original_save_path_group, i)
                    cv2.imwrite(os.path.join(landmark_save_path_group, "%d.jpg"%i), 
                                extracted_landmark)
                except Exception as e:
                    if os.path.exists(os.path.join(landmark_save_path_group, "%d.jpg"%i)) and \
                       os.path.exists(os.path.join(original_save_path_group,'%d.jpg'%i)):
                        successful_extractions += 1
                    else:
                        failed_extractions += 1

In [None]:
# Create the face predictor and landmark predictor
detector = dlib.get_frontal_face_detector()
predictor = dlib.shape_predictor(FACE_LANDMARK_SHAPE_FILE)

prepare_dataset(FILE, 
                ORIGINAL_SAVE_PATH, 
                LANDMARK_SAVE_PATH,
                MAX_NUM_IMAGES)

--------------------------
0 iterations...
0 successful_extractions
0 failed_extractions
--------------------------
100 iterations...
1 successful_extractions
45 failed_extractions
--------------------------
200 iterations...
1 successful_extractions
95 failed_extractions
