# frame_extraction
We've been able to break the film down into individual image frames and perform some basic cinematic analyses on them. We've stored this information/metadata in dataframes. We'll want to be able to use these analyses to select specific image frames for future model training purposes, such as selecting all images with a specific color palette or with a certain number of characters present.

For the purposes of training CNNs and GANs with Keras/TensorFlow, we'll want to copy frames images into a single folder. We can use `os` and other Python libraries to navigate the directory structure and copy files around.

In [5]:
import os
from distutils.dir_util import copy_tree
from shutil import copy, move, rmtree
import pandas as pd

Each movie's frames are stored in their own directory in `/frame_per_second`.

In [2]:
%ls ../frame_per_second/

[0m[01;34mall_is_true_2018[0m/
[01;34mbefore_sunset_2004[0m/
[01;34mblack_and_blue_2019[0m/
[01;34mbooksmart_2019[0m/
[01;34mextremely_wicked_shockingly_evil_and_vile_2019[0m/
[01;34mford_v_ferrari_2019[0m/
[01;34mjoker_2019[0m/
[01;34mknives_out_2019[0m/
[01;34mlost_in_translation_2003[0m/
[01;34moceans_eleven_2001[0m/
[01;34monce_upon_a_time_in_hollywood_2019[0m/
[01;34mplus_one_2019[0m/
[01;34msecond_act_2018[0m/
[01;34mthe_hustle_2019[0m/
[01;34mtoy_story_4_2019[0m/


### Medium Close-Ups
Let's say we want to compile movie frames where it's a close-up of a single-character. Most two-character dialogue scenes make extensive use of medium close-ups of each character. 

We'll start with *The Hustle* (2019).

In [7]:
import sys
sys.path.append('../data_serialization')
from serialization_preprocessing_io import *

In [9]:
film = 'the_hustle_2019'
srt_df, subtitle_df, sentence_df, vision_df, face_df = read_pickle(film)

First we look at frames with prim_char_flag equal to 1. These are frames that satisfy one of the two conditions:
- There is one face in frame
- There are multiple face in frame, but there is one face much larger than the others (the other faces are 75% or less in size)

In [26]:
face_df[face_df['prim_char_flag'] == 1][100:104]

Unnamed: 0_level_0,face_locations,face_sizes,face_encodings,faces_found,prim_char_flag,p_center_alignment,p_horizontal_distance,p_open_mouth,p_face_cluster,p_emotion
frame,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
194,"[(102, 547, 263, 387)]",[8.37],"[[-0.1602196991443634, 0.05712738633155823, 0....",1,1,,,1.0,28.0,surprise
195,"[(102, 405, 263, 244)]",[8.48],"[[-0.1326746940612793, 0.05337563902139664, 0....",1,1,left,183.0,1.0,10.0,fear
196,"[(100, 366, 233, 233)]",[5.79],"[[-0.0987188071012497, 0.0743609368801117, 0.0...",1,1,left,194.0,1.0,6.0,surprise
197,"[(66, 654, 227, 494)]",[8.37],"[[-0.18016411364078522, 0.09841547906398773, 0...",1,1,right,227.0,0.0,28.0,surprise


In [27]:
len(face_df[face_df['prim_char_flag'] == 1])

3163

In [31]:
face_df[face_df['face_sizes'][0] == 1]

KeyError: 0

Next, we'll want to make sure the sizes of the faces are at least a certain percentage of the overall frame size. We'll set this threshold at 4%.

In [30]:
face_df[(face_df['prim_char_flag'] == 1) & (face_df['face_sizes'][0] > 4)]

KeyError: 0

In [21]:
hustle_prim_char_indices = face_df[face_df['prim_char_flag'] == 1].index.tolist()
len(hustle_prim_char_indices)

3163