# frame_analysis
This notebook contains exploration of various computer vision analyses, in individual frames. These are to help with overall *Moviegoer* goals, such as for dialogue attribution, or character identification.

In [1]:
import os
import numpy as np
from sklearn.cluster import KMeans, AgglomerativeClustering
import face_recognition
from scene_cluster_io import *
from keras import models
import math
import tensorflow as tf

Using TensorFlow backend.


# Mouth Open or Closed
A major goal of the project is dialogue attribution, or determining which character is speaking. This is very easy for humans, of course, but difficult for machines to understand.

Usually, the film shows whoever is currently speaking, but sometimes it's more important to show a character listening, and reacting to dialogue. If we can determine if the character onscreen has his or her mouth open, we can reasonably assume that they're the one speaking.

In [2]:
film = 'hobbs_shaw'
frame = 766
dialogue_folder = os.path.join('dialogue_frames', film)
img_path = dialogue_folder + '/' + film + '_frame' + str(frame) + '.jpg'

We'll use the face_recognition library to find a face in an individual movie frame. Then we'll take a closer look at the position of the face landmarks.

In [3]:
image = face_recognition.load_image_file(img_path)
face_locations = face_recognition.face_locations(image)

print('Found ' + str(len(face_locations)) + ' face(s) in frame ' + str(frame))

Found 1 face(s) in frame 766


In [4]:
face_landmarks_list = face_recognition.face_landmarks(image, face_locations)

In [5]:
face_landmarks = face_landmarks_list[0]

With the locations of all the face landmarks, we can take a closer look at the character's mouth, specifically at the the locations of the top and bottom lip. Knowing these, we can calculate the overall size of the mouth, and if it's greater than a certain threshold, we declare the mouth is open.

In [6]:
def get_lip_height(lip):
    for i in [2,3,4]:
        sum = 0
        distance = math.sqrt( (lip[i][0] - lip[12-i][0])**2 +
                              (lip[i][1] - lip[12-i][1])**2   )
        sum += distance
    return sum / 3


def get_mouth_height(top_lip, bottom_lip):
    for i in [8,9,10]:
        sum = 0
        distance = math.sqrt( (top_lip[i][0] - bottom_lip[18-i][0])**2 + 
                              (top_lip[i][1] - bottom_lip[18-i][1])**2   )
        sum += distance
    return sum / 3


def mouth_open_check(face_landmarks, open_ratio=.8):
    top_lip = face_landmarks['top_lip']
    bottom_lip = face_landmarks['bottom_lip']
    
    top_lip_height =    get_lip_height(top_lip)
    bottom_lip_height = get_lip_height(bottom_lip)
    mouth_height =      get_mouth_height(top_lip, bottom_lip)

    if mouth_height > min(top_lip_height, bottom_lip_height) * open_ratio:
        return 1
    else:
        return 0

In [7]:
mouth_open_check(face_landmarks)

1

We can add this to the DataFrame about individual frames, which was originally created as part of the scene clustering process. Below I've manually designated a scene to be analyzed. We cluster all the frames into shots, and then assign unique Shot IDs, as well as predict if they're Medium Close-Up shots.

In [8]:
film = 'hobbs_shaw'
frame_choice = list(range(766, 823))
threshold = 3000

dialogue_folder = os.path.join('dialogue_frames', film)
print('There are', len(os.listdir(dialogue_folder)), 'images in the folder')
print('Selected', len(frame_choice), 'of those frames')

hac_labels = label_clusters(dialogue_folder, frame_choice, film, threshold)

There are 8194 images in the folder
Selected 57 of those frames
Number of clusters: 4


In [9]:
tuned_model = models.load_model('saved_models/tuned_model')

In [10]:
y_pred_values = predict_mcu(dialogue_folder, tuned_model, frame_choice, film)
shot_id_list = get_shot_ids(frame_choice, hac_labels)

Now, we can check if a character has an open mouth in each frame. We run through each frame, appending a 0 or 1 to `mouth_open_list`. This will be zipped into a DataFrame along with our other frame data.

In [15]:
mouth_open_list = []

for x in frame_choice:
    img_path = dialogue_folder + '/' + film + '_frame' + str(x) + '.jpg'
    image = face_recognition.load_image_file(img_path)
    face_locations = face_recognition.face_locations(image, number_of_times_to_upsample=1)
    face_landmarks_list = face_recognition.face_landmarks(image, face_locations)
    print('Found landmarks for ' + str(len(face_landmarks_list)) + ' face(s) in frame ' + str(x))

    if face_landmarks_list:
        face_landmarks = face_landmarks_list[0]
        mouth_open_list.append(mouth_open_check(face_landmarks))
    else:
        mouth_open_list.append(0)

Found landmarks for 1 face(s) in frame 766
Found landmarks for 1 face(s) in frame 767
Found landmarks for 0 face(s) in frame 768
Found landmarks for 1 face(s) in frame 769
Found landmarks for 1 face(s) in frame 770
Found landmarks for 1 face(s) in frame 771
Found landmarks for 1 face(s) in frame 772
Found landmarks for 1 face(s) in frame 773
Found landmarks for 0 face(s) in frame 774
Found landmarks for 1 face(s) in frame 775
Found landmarks for 1 face(s) in frame 776
Found landmarks for 1 face(s) in frame 777
Found landmarks for 1 face(s) in frame 778
Found landmarks for 1 face(s) in frame 779
Found landmarks for 1 face(s) in frame 780
Found landmarks for 1 face(s) in frame 781
Found landmarks for 1 face(s) in frame 782
Found landmarks for 0 face(s) in frame 783
Found landmarks for 0 face(s) in frame 784
Found landmarks for 0 face(s) in frame 785
Found landmarks for 1 face(s) in frame 786
Found landmarks for 1 face(s) in frame 787
Found landmarks for 1 face(s) in frame 788
Found landm

In [16]:
scene_df = pd.DataFrame(zip(frame_choice, hac_labels, shot_id_list, y_pred_values, mouth_open_list), columns=['frame_file', 'cluster', 'shot_id', 'mcu', 'mouth_open'])
scene_df.head(5)

Unnamed: 0,frame_file,cluster,shot_id,mcu,mouth_open
0,766,3,0,0,1
1,767,3,0,1,1
2,768,3,0,1,0
3,769,3,0,1,1
4,770,2,1,1,0
