# self_intro_character_identification
Since we've explored the visual, audio, and subtitle tracks and extracted features from each, we can start to use them all together to accomplish broader *Moviegoer* goals. This notebook is the first example of this.

We'll be generating a list of possible characters, then looking for these names in self-identifications ("My name is Alice." or "I'm Ben."). Then we can build a composite, "average" encoding of their face, so we can track them throughout the film, every time we spot their face.

In [1]:
import sys
sys.path.append('../subtitle_features')
from subtitle_dataframes_io import *
from subtitle_auxiliary_io import *
sys.path.append('../vision_features')
from vision_dataframes_io import *
sys.path.append('../audio_features')
from audio_dataframes_io import *
from time_reference_io import *
import datetime

pd.set_option('display.max_colwidth', None)
nlp = spacy.load('en')

## Generating the Film's Character List
We'll generate the `subtitle_df` and `sentence_df` dataframes based off the subtitle file.

In [2]:
subs = pysrt.open('../subtitles/plus_one.srt')
subtitle_df = generate_base_subtitle_df(subs)
subtitle_df = generate_subtitle_features(subtitle_df)
subtitle_df['cleaned_text'] = subtitle_df['concat_sep_text'].map(clean_line)
sentences = partition_sentences(remove_blanks(subtitle_df['cleaned_text'].tolist()), nlp)
subtitle_indices = tie_sentence_subtitle_indices(sentences, subtitle_df)
sentence_df = pd.DataFrame(list(zip(sentences, subtitle_indices)), columns=['sentence', 'subtitle_indices'])
sentence_df = generate_sentence_features(sentence_df, nlp)

We've previously defined two functions that will read through the subtitles and count up character names, either mentioned as dialogue, or labelling an offscreen speaker.

In [3]:
chars_sub_mentions = character_subtitle_mentions(sentences, nlp)
chars_sub_mentions

[('Ben', 95),
 ('Alice', 33),
 ('Dad', 21),
 ('Gina', 11),
 ('Brett', 7),
 ('Nick', 6),
 ('Matt', 5),
 ('Jess Ramsey', 5),
 ('Amanda', 4),
 ('Ben King', 4)]

In [4]:
chars_offscreen_speakers = character_offscreen_speakers(subtitle_df)
chars_offscreen_speakers

[('ALICE', 51),
 ('BEN', 47),
 ('CHUCK', 14),
 ('ANGELA', 5),
 ('DAVIS', 3),
 ('MATT', 3),
 ('NICK', 3),
 ('BRETT', 2),
 ('DEEJAY', 2),
 ('PAUL', 1)]

We can take the most common names, and assume they're the main characters.

In [5]:
characters = []

for character in chars_sub_mentions:
    if character[1] >= 10:
        characters.append(character[0].lower())

for character in chars_offscreen_speakers:
    if character[1] >= 5: 
        characters.append(character[0].lower())
        
characters = list(set(characters))
characters

['ben', 'gina', 'alice', 'dad', 'angela', 'chuck']

## Finding Self-Introduction Sentences
For this exercise, we're going to identify characters solely based on self-introductions. We have a function to find phrases like "My name is Alice." or "I'm Ben."

In [6]:
sentence_df[sentence_df['self_intro'].notnull()]

Unnamed: 0,sentence,subtitle_indices,self_intro,other_intro,direct_address,conv_boundary
539,"I'm Ben, by the way.""",[506],Ben,,,
564,I'm Ben.,[527],Ben,,,
566,I'm Kara.,[528],Kara,,,
748,"I'm Ellie, by the way.",[697],Ellie,,,
752,I'm Alice.,[701],Alice,,,
909,I'm Ben King.,[843],Ben King,,,
2324,"Uh, I'm Ben.",[2068],Ben,,,
2327,I'm Alice.,[2069],Alice,,,
2331,I'm Jackie.,[2070],Jackie,,,
2961,"I'm Ben, and most of you here probably know me as Chuck's son.","[2619, 2620]",Ben,,,


For now, we'll focus on Ben's self-introductions. The film has five sentences where he introduces himself. 

In [7]:
ben_string = 'ben'

In [8]:
sentence_df[sentence_df.self_intro.str.contains(ben_string, na=False, case=False)]

Unnamed: 0,sentence,subtitle_indices,self_intro,other_intro,direct_address,conv_boundary
539,"I'm Ben, by the way.""",[506],Ben,,,
564,I'm Ben.,[527],Ben,,,
909,I'm Ben King.,[843],Ben King,,,
2324,"Uh, I'm Ben.",[2068],Ben,,,
2961,"I'm Ben, and most of you here probably know me as Chuck's son.","[2619, 2620]",Ben,,,


## Calculating the Film Times and Frames
From the `sentence_df`, we need to look for Ben's self-introductions and find the indices in `subtitle_df`, which contains times of the actual subtitles (not sentences).

In [9]:
ben_indices = sentence_df[sentence_df.self_intro.str.contains(ben_string, na=False, case=False)].subtitle_indices.values
ben_indices

array([list([506]), list([527]), list([843]), list([2068]),
       list([2619, 2620])], dtype=object)

In [10]:
ben_flattened_indices = np.concatenate(ben_indices).ravel()
ben_flattened_indices

array([ 506,  527,  843, 2068, 2619, 2620])

We can see that one of these sentences spans two separate subtitles. We'll leave both of these in, because if it spans such a long duration, there's a good chance it'll have Ben's face onscreen.

In [11]:
subtitle_df[subtitle_df.index.isin(ben_flattened_indices)]

Unnamed: 0,srt_index,original_text,start_time,end_time,concat_sep_text,separated_flag,laugh,hesitation,speaker,music,parenthetical,el_parenthetical,el_italic,cleaned_text
506,450,"I'm Ben, by the way.""",00:17:57.244000,00:17:58.662000,"I'm Ben, by the way.""",0,0,0,,0,,0,0,"I'm Ben, by the way."""
527,466,- I'm Ben.\n- Hi. I'm Kara.,00:18:26.356000,00:18:28.066000,I'm Ben.,1,0,0,,0,,0,0,I'm Ben.
843,747,It's the same thing with me.\nI'm Ben King.,00:29:33.982000,00:29:35.650000,It's the same thing with me. I'm Ben King.,0,0,0,,0,,0,0,It's the same thing with me. I'm Ben King.
2068,1742,"Uh, I'm Ben.\nThis is Alice.",01:10:01.448000,01:10:03.200000,"Uh, I'm Ben. This is Alice.",0,0,0,,0,,0,0,"Uh, I'm Ben. This is Alice."
2619,2232,"I'm Ben,",01:31:08.547000,01:31:10.341000,"I'm Ben,",0,0,0,,0,,0,0,"I'm Ben,"
2620,2233,and most of you here\nprobably know me as Chuck's son.,01:31:10.383000,01:31:13.427000,and most of you here probably know me as Chuck's son.,0,0,0,,0,,0,0,and most of you here probably know me as Chuck's son.


For each subtitle, we'll calculate the `mid_time`, or the difference between the start and end times.

In [12]:
mid_time = subtitle_mid_time(subtitle_df.iloc[506].start_time, subtitle_df.iloc[506].end_time)
time_to_frame(mid_time)

1078

In [13]:
mid_time_frames = []
for sub_index in ben_flattened_indices:
    mid_time = subtitle_mid_time(subtitle_df.iloc[sub_index].start_time, subtitle_df.iloc[sub_index].end_time)
    mid_time_frames.append(time_to_frame(mid_time))

mid_time_frames

[1078, 1107, 1775, 4202, 5469, 5472]

## Collecting Face Encodings
We can now search all the frames (images) we gathered and collect the face encodings. From six frames, it looks like we were only able to find five encodings.

In [14]:
movie_choice = 'plus_one'

ben_encodings = []

for frame_number in mid_time_frames:
    frame = load_frame(movie_choice, frame_number)

    locations = face_recognition.face_locations(frame, number_of_times_to_upsample=2)
    encodings = face_recognition.face_encodings(frame, locations)
    if encodings:
        ben_encodings.append(encodings)

In [15]:
len(ben_encodings)

5

## Comparing Face Encodings
We can now use the `compare_faces()` function to compare the five encodings to each other. The first two faces appear to not match any other faces, while the final three all match each other.

In [16]:
ben_scratch = ben_encodings.copy()
ben_flattened = []
for x in ben_scratch:
    for y in x:
        ben_flattened.append(y)
ben_compare = ben_flattened[0]
del ben_flattened[0]
face_recognition.compare_faces(ben_flattened, ben_compare)

[False, False, False, False]

In [17]:
ben_scratch = ben_encodings.copy()
ben_flattened = []
for x in ben_scratch:
    for y in x:
        ben_flattened.append(y)
ben_compare = ben_flattened[1]
del ben_flattened[1]
face_recognition.compare_faces(ben_flattened, ben_compare)

[False, False, False, False]

In [18]:
ben_scratch = ben_encodings.copy()
ben_flattened = []
for x in ben_scratch:
    for y in x:
        ben_flattened.append(y)
ben_compare = ben_flattened[2]
del ben_flattened[2]
face_recognition.compare_faces(ben_flattened, ben_compare)

[False, False, True, True]

In [19]:
ben_scratch = ben_encodings.copy()
ben_flattened = []
for x in ben_scratch:
    for y in x:
        ben_flattened.append(y)
ben_compare = ben_flattened[3]
del ben_flattened[3]
face_recognition.compare_faces(ben_flattened, ben_compare)

[False, False, True, True]

In [20]:
ben_scratch = ben_encodings.copy()
ben_flattened = []
for x in ben_scratch:
    for y in x:
        ben_flattened.append(y)
ben_compare = ben_flattened[4]
del ben_flattened[4]
face_recognition.compare_faces(ben_flattened, ben_compare)

[False, False, True, True]

In this example, we looked at five possible Ben face encodings, and found that three of them match each other. We can assume that this is Ben's face.

Finding a majority of faces that match each other is a good threshold. We can automate the above process to create an array of the three encodings that match one another.

In [21]:
ben_scratch = ben_encodings.copy()
ben_flattened = []
for x in ben_scratch:
    for y in x:
        ben_flattened.append(y)

good_bens = []
x = 0
face_candidates = len(ben_flattened)

while x < face_candidates:
    ben_loop = ben_flattened.copy()
    ben_compare = ben_loop[x]
    del ben_loop[x]
    if sum(face_recognition.compare_faces(ben_loop, ben_compare)) >= (face_candidates - 1)/2:
        good_bens.append(ben_compare)
    x += 1

good_bens = np.array(good_bens)

len(good_bens)

3

## Creating Average Encodings
We now have an array of three encodings of Ben's face. We can take the average of these three to create a composite encoding of his face, which we'll eventually use to identify other Ben appearances throughout the film.

In [22]:
good_bens.shape

(3, 128)

In [23]:
average_ben = np.average(good_bens, axis=0)
average_ben

array([-0.07721068,  0.06586049,  0.02200168, -0.07368417, -0.01955807,
       -0.10598054,  0.01534823, -0.10837072,  0.12311224, -0.05983678,
        0.19644454, -0.08710126, -0.26179906, -0.10360246,  0.01502854,
        0.07001646, -0.15321859, -0.08398456, -0.04728389, -0.10109255,
        0.08364282,  0.04475699,  0.03514893,  0.07407011, -0.10499338,
       -0.29861903, -0.08578485, -0.16280653,  0.04825203, -0.09466489,
        0.02931186,  0.12031828, -0.12802502, -0.05757039, -0.01983338,
        0.05774689, -0.07193449, -0.07432778,  0.23797781, -0.01733976,
       -0.11766649, -0.06165399, -0.02650265,  0.28524284,  0.11283993,
       -0.06366006,  0.0512409 , -0.04913463,  0.17728387, -0.24886096,
        0.03229232,  0.15899217,  0.1631519 ,  0.07925956,  0.12238127,
       -0.13485139,  0.05493894,  0.1408784 , -0.27828231,  0.14787856,
        0.05263238, -0.10969398, -0.00398294, -0.0787286 ,  0.16714332,
        0.08753599, -0.10992935, -0.0644823 ,  0.19467644, -0.10

We've successfully used multiple streams of data together — we identified a specific phrase in the subtitles, and used it to generate a composite face encoding. We've packaged all of this together as a function in `character_identification_io.py`.

However, this process involved finding face encodings from the individual frames. This required some degree of computational power, and constant lookups would be unnecessarily costly. It would be much more efficient to just look up the faces in each frame once, store the encodings in a DataFrame, and then serialize these DF objects for later lookup. We'll explore this in the next notebook.