# dialogue_scene_boundary
This notebook contains preliminary code for identifying two-character dialogue scenes' beginnings and ends, based on common scene layouts found in film editing. We'll also be using a CNN image model trained elsewhere in this repository, to determine if shots are Medium Close-Ups, the most common cinematography shot used for two-character dialogue scenes.

As an example, we'll be analyzing 400 frames from the film "The Hustle". These frames, each representing one second of the film, depict two consecutive full scenes and portions of the two scenes before and after. We'll be trying to identify the beginning and end frames for each of the two full scenes.

It's *strongly* recommended to follow along using the readme. It's very helpful to see the frames represented as actual images, instead of just abstract file or cluster numbers.

In [1]:
import os
from keras.preprocessing import image
from keras.preprocessing.image import ImageDataGenerator, array_to_img, img_to_array, load_img
from keras.applications.vgg16 import VGG16
from keras.applications.vgg16 import preprocess_input
from keras import models
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans, AgglomerativeClustering

Using TensorFlow backend.


# Data Preparation
## Designating film and frames
We select frames 600-999 from The Hustle's directory.

In [2]:
# choose film and frames
film = 'hustle'
frame_choice = list(range(600, 1000))

In [3]:
# establish folder for this film
dialogue_folder = os.path.join('dialogue_frames', film)

print('There are', len(os.listdir(dialogue_folder)), 'images in the folder')
print('Selected', len(frame_choice), 'of those frames')

There are 5877 images in the folder
Selected 400 of those frames


## VGG16 Vectorization
Keras' VGG16 CNN model will be used to vectorize the input frames. We'll be using the "imagenet" weights, the result of VGG16's training on 15 million images in the imagenet dataset.

In [4]:
model = VGG16(weights='imagenet', include_top=False)
model.summary()

vgg16_feature_list = []


for x in frame_choice:
    img_path = dialogue_folder + '/' + film + '_frame'+ str(x) + '.jpg'
    img = image.load_img(img_path, target_size=(256, 256))
    img_data = image.img_to_array(img)
    img_data = np.expand_dims(img_data, axis=0)
    img_data = preprocess_input(img_data)

    vgg16_feature = model.predict(img_data)
    vgg16_feature_np = np.array(vgg16_feature)
    vgg16_feature_list.append(vgg16_feature_np.flatten())

    x += 1

Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_1 (InputLayer)         (None, None, None, 3)     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, None, None, 1

In [5]:
# convert to NumPy array and verify shape
vgg16_feature_list_np = np.array(vgg16_feature_list)
vgg16_feature_list_np.shape

(400, 32768)

## Clustering
A HAC clustering object will be fit to the VGG16's vectorization of input frames. We set the distance_threshold to 3000: a higher threshold means fewer clusters, and vice versa. This threshold can be tuned for better results during future development.

In [6]:
hac = AgglomerativeClustering(n_clusters = None, distance_threshold = 3000).fit(vgg16_feature_list_np)
hac_labels = hac.labels_
print('Number of clusters:', hac.n_clusters_)
print(hac_labels)

Number of clusters: 37
[12 12 29 29 29 23 23 23 23 23 23 23 23 23 23 12 12 12 12 12 12 12  2  2
  2  2 20 20 20 20  8  8  8  8  8  8  8  8  8 11 11 11 28 28 28 11 11 11
  5  5  5  5  5 25 25 17 17 31 31 31 31 31 31  1  1  1 14 14 14 14  1  1
  1  1  1 35 31 31 31 31 14 14  5  5  5 17 17 17 17 10 10 10  4  4 30 30
 30 30 10 10  4  4 17 17 17 25 25 25 25  5  5  5 35 17 17 17 17 17 17 17
 35 35 35 35 30 30  4  4  4  4 30 30 30 30  4  4  1  1  1  1  1 14 14 14
 14 14 30 30 30 35 35 35 35 30 30 30 35 30 10 10 10 26 26 26 27 27 27 27
 27 22 22 22 22 22 22 22 22  8  8  8  8  8 27 27 27 27 27 27 27 27  2  2
  2  2 27  2  2  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  3  9  9
  9  9  9  9 33 33 33  9  9 33  9  9  9  9  9  9  9  9 33 33 33 33 33  9
  9  9  9  9  9  9  9  9 21 21 21 21 21 21 21 21 21 21 21 33 33 33 33 33
  0  0  0 33 33  9  9 33 33 33  9  9  9  9  9  0  0  0  0  0  2  2  2 33
 33 33  0  0  0  0  0  0  9  9  9  9 33 33  9  9  9  9 33 33 33 33  9  9
  9  9  9  9  9  9 19 19 19 

## Load Saved Model and Generate MCU Predictions

We've previously trained a CNN model to identify if a movie frame is a Medium Close-Up or not. MCUs are the most common cinematography shot of dialogue scenes. We'll use the model to make predictions on each frame and use them alter in the scene identification process.

In [7]:
tuned_model = models.load_model('saved_models/tuned_model')

In [8]:
image_list = []
for x in frame_choice:
    image_list.append(img_to_array(load_img(dialogue_folder + '/' + film + '_frame'+ str(x) + '.jpg', target_size = (128, 128), color_mode = 'grayscale')))

In [9]:
image_array = np.array(image_list)
y_pred = tuned_model.predict_classes(image_array)

In [10]:
# the model's predict_classes method creates a NumPy array of arrays; this converts it to a list of 0/1 integers
y_pred_values = []
for prediction in y_pred:
    y_pred_values.append(prediction[0])

## Create DataFrame

Before we create the DataFrame, we'll create an identification system for an individual shot. Each time the cluster value changes, it's a new shot. Multiple shots can share the same cluster value.

In [11]:
shot_id = 0
shot_id_list = []
prev_frame = 1000

for frame_file, cluster in zip(frame_choice, hac_labels):
    if cluster != prev_frame and prev_frame != 1000:
        shot_id += 1
    shot_id_list.append(shot_id)
    prev_frame = cluster

The DataFrame is created using the frame_file, its cluster, its shot_id, and its MCU prediction.

In [12]:
scene_df = pd.DataFrame(zip(frame_choice, hac_labels, shot_id_list, y_pred_values), columns=['frame_file', 'cluster', 'shot_id', 'mcu'])
scene_df.head(7)

Unnamed: 0,frame_file,cluster,shot_id,mcu
0,600,12,0,0
1,601,12,0,0
2,602,29,1,0
3,603,29,1,0
4,604,29,1,0
5,605,23,2,0
6,606,23,2,0


# Scene Identification
*Note: it is strongly recommended to follow along using the readme; the example code can be better understood by seeing the actual frames in the readme.*

Two-character dialogue scenes are primarily comprised of two speakers, each speaking in an alternating pattern: Speaker A, Speaker B, Speaker A, Speaker B, etc. We'll be looking for clusters that fit this pattern, and verify that they are Medium Close-Ups, the standard cinemtography shot for dialogue scenes, using the MCU image classifier model.

Clusters representing Speaker A and Speaker B are the Anchor clusters, and a rough designation of the scene can be defined by the first and last frames with Anchor clusters.

The scene can be further expanded by considering the Cutaway clusters, any cluster that appears within the Anchor scene boundary. If these appear shortly before the Anchor start or after the Anchor end, they can be considered part of the scene. 

1. Check all clusters for each pair of two clusters that form an A/B/A/B pattern
2. Verify that each of the two clusters in each pattern are Medium Close-Ups (MCUs), and discard patterns containing non-MCUs
3. Identify the earliest and latest frames with either speaker cluster, to determine the scene's Anchor start and end
4. Identify all clusters that lie in between the Anchor start and end frames, to determine the Cutaway clusters
5. Expand the scene in either direction by checking for adjacent Cutaway clusters before the starting Anchor and after the ending Anchor

Below is a step-by-step walkthrough, followed by the same code bundled into functions.

## 1. Identifying A/B/A/B cluster pairs
The first step is finding cluster pairs that form an A/B/A/B pattern. Every time there's a new shot (cluster), we store the previous cluster in memory; we'll need to do this for the previous three clusters. When the current cluster matches prev_clust_2, and when prev_clust_1 matches prev_clust_3, we have an A/B/A/B pattern.

The below dataframe shows an example of an A/B/A/B cluster pattern.

In [13]:
scene_df.loc[(scene_df['frame_file'] > 746) & (scene_df['frame_file'] < 757)]

Unnamed: 0,frame_file,cluster,shot_id,mcu
147,747,30,39,1
148,748,30,39,1
149,749,35,40,1
150,750,35,40,0
151,751,35,40,1
152,752,35,40,1
153,753,30,41,1
154,754,30,41,1
155,755,30,41,1
156,756,35,42,1


In [14]:
# to check for an A/B/A/B pattern, we must store the previous three clusters in memory
prev_clust_1 = 1001
prev_clust_2 = 1002
prev_clust_3 = 1003
prev_shot_id = -1
alternate_a_list = []
alternate_b_list = []

# zip our various lists into a usable data structure
for frame_file, cluster, mcu_flag, shot_id in zip(frame_choice, hac_labels, y_pred_values, shot_id_list):
    # when iterating through each frame, look for an A/B/A/B pattern, and save the clusters of any patterns
    if cluster == prev_clust_2 and prev_clust_1 == prev_clust_3:
        alternate_a_list.append(min(cluster, prev_clust_1)) # min and max are used to avoid duplicates of (1, 2), (2, 1)
        alternate_b_list.append(max(cluster, prev_clust_1))

    # we use prev_shot_id to identify when there's a new shot (when the cluster value changes)
    # every time there's a new shot, we update the cluster memory
    if shot_id != prev_shot_id:
        prev_shot_id = shot_id
        prev_clust_3 = prev_clust_2
        prev_clust_2 = prev_clust_1
        prev_clust_1 = cluster
        
    # the below print can be used for troubleshooting and visualizing the memory state at each frame
    # print(frame_file, '\t', mcu_flag, '\t', cluster,'\t', shot_id, '\t', prev_shot_id, '\t', prev_clust_1, '\t', prev_clust_2, '\t', prev_clust_3, '\tend')

# save unique alternating pairs
alternating_pairs = []
for a, b, in zip(alternate_a_list, alternate_b_list):
    if [int(a), int(b)] not in alternating_pairs:
        alternating_pairs.append([int(a), int(b)])
        
alternating_pairs

[[4, 30], [30, 35], [2, 27], [9, 33]]

## 2. Checking if both clusters are MCUs
Although we now have cluster pairs that form an A/B/A/B pattern, we're not sure if these actually represent characters. We'll check each pair to determine if both clusters represent Medium Close-Ups, the classic cinematography shot for dialogue scenes. We evaluate the predictions for EVERY frame assigned to a specific cluster.

This is necessary because the MCU-identification model is more discriminating than the image clustering algorithm (at its current threshold). So a few frames might be grouped in the same cluster but still have differing MCU/non-MCU predictions.

For now, we'll only accept cluster pairs if both clusters have a MCU-prediction mean greater than .5. We discard two pairs.

In [15]:
speaker_pairs = []
print('cluster\t', 'count\t', 'mcu probability')

for pair in alternating_pairs:
    # calculate the mean of each cluster's MCU column
    mean_a = scene_df.loc[scene_df['cluster'] == pair[0]]['mcu'].mean()
    mean_b = scene_df.loc[scene_df['cluster'] == pair[1]]['mcu'].mean()
    print(pair[0], '\t', scene_df.loc[scene_df['cluster'] == pair[0]]['mcu'].count(), '\t', '{0:.2f}%'.format(mean_a * 100))
    print(pair[1], '\t', scene_df.loc[scene_df['cluster'] == pair[1]]['mcu'].count(), '\t', '{0:.2f}%'.format(mean_b * 100))
    
    # an alternating pair will pass the MCU check if BOTH clusters have a MCU mean greater than .5
    if mean_a > .5 and mean_b > .5:
        print('Passes MCU check')
        speaker_pairs.append(pair)
    else:
        print('Fails MCU check')
    print()
    
speaker_pairs

cluster	 count	 mcu probability
4 	 10 	 0.00%
30 	 17 	 88.24%
Fails MCU check

30 	 17 	 88.24%
35 	 11 	 81.82%
Passes MCU check

2 	 30 	 33.33%
27 	 14 	 100.00%
Fails MCU check

9 	 48 	 97.92%
33 	 28 	 100.00%
Passes MCU check



[[30, 35], [9, 33]]

## 3. Establishing first and last frames of Anchor clusters
After checking that the cluster pairs are indeed MCUs, we can assume that they represent shots of Speakers A and B. As a preliminary designation of a given scene, we can designate the earliest frame and last frame containing EITHER of these shots as the anchor_start and anchor_end, a primitive definition of the scene's start and end frames.

Since we have two speaker_pairs that passed the MCU check, we'll pick one as an example to continue.

In [16]:
pair = speaker_pairs[0]
pair

[30, 35]

In [17]:
# earliest frame with an Anchor cluster
scene_df.loc[(scene_df['cluster'] == pair[0]) | (scene_df['cluster'] == pair[1])].head(3)

Unnamed: 0,frame_file,cluster,shot_id,mcu
75,675,35,17,0
94,694,30,24,0
95,695,30,24,1


In [18]:
# last frame with an Anchor cluster
scene_df.loc[(scene_df['cluster'] == pair[0]) | (scene_df['cluster'] == pair[1])].tail(3)

Unnamed: 0,frame_file,cluster,shot_id,mcu
155,755,30,41,1
156,756,35,42,1
157,757,30,43,1


In [19]:
anchor_start = scene_df.loc[(scene_df['cluster'] == pair[0]) | (scene_df['cluster'] == pair[1])].frame_file.min()
anchor_end = scene_df.loc[(scene_df['cluster'] == pair[0]) | (scene_df['cluster'] == pair[1])].frame_file.max()
print(anchor_start, anchor_end)

675 757


## 4. Identifying cutaways
With the Anchor start and end frames, we have a preliminary idea of where the scene starts and ends. However, we should look at all of the other shots (clusters) between the Anchor start and end frames. These clusters represent cutaways, which can include the following:
- POV shots, showing what characters are looking at offscreen
- Inserts, different shots of Speaker A or B, such as a one-off close-up
- Other characters, both silent and speaking

In [20]:
cutaways = scene_df.loc[(scene_df['frame_file'] > anchor_start) & (scene_df['frame_file'] < anchor_end)].cluster.unique()
cutaways = cutaways[cutaways != pair[0]] # remove the Speaker A and Speaker B clusters from this list
cutaways = cutaways[cutaways != pair[1]]
cutaways

array([31, 14,  5, 17, 10,  4, 25,  1])

## 5. Expanding the scene's beginning and end using cutaways
After we identify these cutaways, we may be able to expand the scene's start frame backward, and the end frame forward. If we see these cutaways again, but before the Anchor start or after the Anchor end, they must still be part of the scene. In the interest of caution, we will only look for cutaways that are adjacent to the Anchor frames.

Beginning with the Anchor's start frame, we look at the previous frame (which currently isn't designated part of the scene). If that frame's cluster value is in the cutaway lists, we include it in the scene and continue backwards. This continues until we encounter a frame which isn't a cutaway. We repeat this for the Anchor's end frame, this time progressing forward.

In [21]:
scene_start = anchor_start
min_flag = 0

while min_flag == 0:
    try:
        if int(scene_df.loc[scene_df['frame_file'] == (scene_start - 1)].cluster) in cutaways:
            scene_start -= 1
        else:
            min_flag = 1
    except TypeError: # error if hitting the beginning of the frame list
        min_flag = 1
scene_start

648

In [22]:
scene_end = anchor_end
max_flag = 0
while max_flag == 0:
    try:
        if int(scene_df.loc[scene_df['frame_file'] == (scene_end + 1)].cluster) in cutaways:
            scene_end += 1
        else:
            max_flag = 1
    except TypeError: # error if hitting the end of the frame list
        max_flag = 1 
scene_end

760

The beginning of the scene was expanded by 27 frames, and the ending by 3 frames.

## Using functions
The above functionality is replicated below, this time using functions. For this example, we'll be looking at 400 frames from *Extremely Wicked, Shockingly Evil and Vile*. The functions which can be found further below, at the end of this notebook.

In [31]:
film = 'extremely_wicked'
frame_choice = list(range(650, 1050))
threshold = 3000

dialogue_folder = os.path.join('dialogue_frames', film)
print('There are', len(os.listdir(dialogue_folder)), 'images in the folder')
print('Selected', len(frame_choice), 'of those frames')

hac_labels = label_clusters(dialogue_folder, frame_choice, film, threshold)

There are 6603 images in the folder
Selected 400 of those frames
Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_2 (InputLayer)         (None, None, None, 3)     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, None, None, 64)    1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, None, None, 64)    36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, None, None, 64)    0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, None, None, 128)   73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, None, None, 128)   147584    
______________________________________________________________

In [None]:
# only necessary if this wasn't run in the previous example
# tuned_model = models.load_model('saved_models/tuned_model')

In [33]:
y_pred_values = predict_mcu(dialogue_folder, tuned_model, frame_choice, film)

In [34]:
shot_id_list = get_shot_ids(frame_choice, hac_labels)

In [35]:
scene_df = pd.DataFrame(zip(frame_choice, hac_labels, shot_id_list, y_pred_values), columns=['frame_file', 'cluster', 'shot_id', 'mcu'])
scene_df.head(3)

Unnamed: 0,frame_file,cluster,shot_id,mcu
0,650,2,0,1
1,651,2,0,1
2,652,2,0,1


In [36]:
alternating_pairs = get_alternating_pairs(frame_choice, hac_labels, y_pred_values, shot_id_list)
alternating_pairs

[[0, 2], [9, 13], [0, 1], [5, 10], [0, 6]]

In [37]:
speaker_pairs = mcu_check(alternating_pairs, scene_df)
speaker_pairs

cluster	 count	 mcu probability
0 	 165 	 48.48%
2 	 61 	 65.57%
Fails MCU check

9 	 28 	 96.43%
13 	 21 	 100.00%
Passes MCU check

0 	 165 	 48.48%
1 	 9 	 66.67%
Fails MCU check

5 	 23 	 95.65%
10 	 17 	 100.00%
Passes MCU check

0 	 165 	 48.48%
6 	 9 	 55.56%
Fails MCU check



[[9, 13], [5, 10]]

The anchor_scenes() and expand_scenes() functions are to be used separately. anchor_scenes() only returns the Anchor frames of each scene, while expand_scenes() does this and also tries to expand the scene, returning Expansion frames.

In [38]:
anchors = anchor_scenes(speaker_pairs, scene_df)
anchors

Speaker A and B Clusters: [9, 13]
Anchor Start/End Frames: 690 743

Speaker A and B Clusters: [5, 10]
Anchor Start/End Frames: 954 997



[(690, 743), (954, 997)]

In [39]:
scenes = expand_scenes(speaker_pairs, scene_df)
scenes

Speaker A and B Clusters: [9, 13]
Anchor Start/End Frames: 690 743
Cutaway Clusters: [0]
Expanded Start/End Frames: 690 745

Speaker A and B Clusters: [5, 10]
Anchor Start/End Frames: 954 997
Cutaway Clusters: [16]
Expanded Start/End Frames: 954 997



[(690, 745), (954, 997)]

### Functions

In [24]:
def label_clusters(dialogue_folder, frame_choice, film, threshold):

    model = VGG16(weights='imagenet', include_top=False)
    model.summary()

    vgg16_feature_list = []

    for x in frame_choice:
        img_path = dialogue_folder + '/' + film + '_frame'+ str(x) + '.jpg'
        img = image.load_img(img_path, target_size=(256, 256))
        img_data = image.img_to_array(img)
        img_data = np.expand_dims(img_data, axis=0)
        img_data = preprocess_input(img_data)

        vgg16_feature = model.predict(img_data)
        vgg16_feature_np = np.array(vgg16_feature)
        vgg16_feature_list.append(vgg16_feature_np.flatten())

        x += 1

    vgg16_feature_list_np = np.array(vgg16_feature_list)
    vgg16_feature_list_np.shape

    hac = AgglomerativeClustering(n_clusters = None, distance_threshold = threshold).fit(vgg16_feature_list_np)
    hac_labels = hac.labels_
    print('Number of clusters:', hac.n_clusters_)

    return hac_labels

In [25]:
def predict_mcu(dialogue_folder, model, frame_choice, film):
    image_list = []
    for x in frame_choice:
        image_list.append(img_to_array(load_img(dialogue_folder + '/' + film + '_frame'+ str(x) + '.jpg', target_size = (128, 128), color_mode = 'grayscale')))

    image_array = np.array(image_list)
    y_pred = model.predict_classes(image_array)

    # the model's predict_classes method creates a NumPy array of arrays; this converts it to a list of 0/1 integers
    y_pred_values = []
    for prediction in y_pred:
        y_pred_values.append(prediction[0])
        
    return y_pred_values

In [26]:
def get_shot_ids(frame_choice, hac_labels):
    shot_id = 0
    shot_id_list = []
    prev_frame = 1000

    for frame_file, cluster in zip(frame_choice, hac_labels):
        if cluster != prev_frame and prev_frame != 1000:
            shot_id += 1
        shot_id_list.append(shot_id)
        prev_frame = cluster
    
    return shot_id_list

In [27]:
def get_alternating_pairs(frame_choice, hac_labels, y_pred_values, shot_id_list):
    
    # to check for an A/B/A/B pattern, we must store the previous three clusters in memory
    prev_clust_1 = 1001
    prev_clust_2 = 1002
    prev_clust_3 = 1003
    prev_shot_id = -1
    alternate_a_list = []
    alternate_b_list = []

    # zip our various lists into a usable data structure
    for frame_file, cluster, mcu_flag, shot_id in zip(frame_choice, hac_labels, y_pred_values, shot_id_list):
        # when iterating through each frame, look for an A/B/A/B pattern, and save the clusters of any patterns
        if cluster == prev_clust_2 and prev_clust_1 == prev_clust_3:
            alternate_a_list.append(min(cluster, prev_clust_1)) # min and max are used to avoid duplicates of (1, 2), (2, 1)
            alternate_b_list.append(max(cluster, prev_clust_1))

        # we use prev_shot_id to identify when there's a new shot (when the cluster value changes)
        # every time there's a new shot, we update the cluster memory
        if shot_id != prev_shot_id:
            prev_shot_id = shot_id
            prev_clust_3 = prev_clust_2
            prev_clust_2 = prev_clust_1
            prev_clust_1 = cluster

    # save unique alternating pairs
    alternating_pairs = []
    for a, b, in zip(alternate_a_list, alternate_b_list):
        if [int(a), int(b)] not in alternating_pairs:
            alternating_pairs.append([int(a), int(b)])
        
    return alternating_pairs

In [28]:
def mcu_check(alternating_pairs, scene_df):
    
    speaker_pairs = []
    print('cluster\t', 'count\t', 'mcu probability')
    
    for pair in alternating_pairs:
        # calculate the mean of each cluster's MCU column
        mean_a = scene_df.loc[scene_df['cluster'] == pair[0]]['mcu'].mean()
        mean_b = scene_df.loc[scene_df['cluster'] == pair[1]]['mcu'].mean()
        print(pair[0], '\t', scene_df.loc[scene_df['cluster'] == pair[0]]['mcu'].count(), '\t', '{0:.2f}%'.format(mean_a * 100))
        print(pair[1], '\t', scene_df.loc[scene_df['cluster'] == pair[1]]['mcu'].count(), '\t', '{0:.2f}%'.format(mean_b * 100))
        
        # an alternating pair will pass the MCU check if BOTH clusters have a MCU mean greater than .5
        if mean_a > .5 and mean_b > .5:
            print('Passes MCU check')
            speaker_pairs.append(pair)
        else:
            print('Fails MCU check')
        print()
    
    return speaker_pairs

In [29]:
def anchor_scenes(speaker_pairs, scene_df):
    
    anchor_scenes = []

    for pair in speaker_pairs:
        # designate the first and last frames with either Speaker A or Speaker B clusters as Anchors
        anchor_start = scene_df.loc[(scene_df['cluster'] == pair[0]) | (scene_df['cluster'] == pair[1])].frame_file.min()
        anchor_end = scene_df.loc[(scene_df['cluster'] == pair[0]) | (scene_df['cluster'] == pair[1])].frame_file.max()

        print('Speaker A and B Clusters:', pair)
        print('Anchor Start/End Frames:', anchor_start, anchor_end)
        print()
        anchor_scenes.append((anchor_start, anchor_end))
        
    return anchor_scenes

In [30]:
def expand_scenes(speaker_pairs, scene_df):
    
    expanded_scenes = []

    for pair in speaker_pairs:
        # designate the first and last frames with either Speaker A or Speaker B clusters as Anchors
        anchor_start = scene_df.loc[(scene_df['cluster'] == pair[0]) | (scene_df['cluster'] == pair[1])].frame_file.min()
        anchor_end = scene_df.loc[(scene_df['cluster'] == pair[0]) | (scene_df['cluster'] == pair[1])].frame_file.max()
        # find all unique clusters between the anchor_start and anchor_end frames
        cutaways = scene_df.loc[(scene_df['frame_file'] > anchor_start) & (scene_df['frame_file'] < anchor_end)].cluster.unique()
        cutaways = cutaways[cutaways != pair[0]] # remove the Speaker A and Speaker B clusters from this list
        cutaways = cutaways[cutaways != pair[1]]
        print('Speaker A and B Clusters:', pair)
        print('Anchor Start/End Frames:', anchor_start, anchor_end)
        print('Cutaway Clusters:', cutaways)

        scene_start = anchor_start
        min_flag = 0

        # expand 
        while min_flag == 0:
            try:
                if int(scene_df.loc[scene_df['frame_file'] == (scene_start - 1)].cluster) in cutaways:
                    scene_start -= 1
                else:
                    min_flag = 1
            except TypeError: # error if hitting the beginning of the frame list
                min_flag = 1

        scene_end = anchor_end
        max_flag = 0
        while max_flag == 0:
            try:
                if int(scene_df.loc[scene_df['frame_file'] == (scene_end + 1)].cluster) in cutaways:
                    scene_end += 1
                else:
                    max_flag = 1
            except TypeError: # error if hitting the end of the frame list
                max_flag = 1
        
        print('Expanded Start/End Frames:', scene_start, scene_end)
        print()
        expanded_scenes.append((scene_start, scene_end))
            
    return expanded_scenes