# Location Context
To understand more about the scene, we can read more into the subtitles. A scene at a restaurant might have dialogue that indicates its location, like "What can I get you?" or "Check, please." A scene in a car might have the sound effects, and more importantly, SDH subtitles of "(engine roaring)". We can look at both dialogue and subtitle parenthetical labels.

In [1]:
from subtitle_dataframes_io import *
from collections import Counter
from datetime import datetime, date, timedelta
nlp = spacy.load('en')

In [2]:
subs = pysrt.open('../subtitles/lost_in_translation_2003.srt')
subtitle_df = generate_base_subtitle_df(subs)
subtitle_df = generate_subtitle_features(subtitle_df)
subtitle_df['cleaned_text'] = subtitle_df['concat_sep_text'].map(clean_line)
sentences = partition_sentences(remove_blanks(subtitle_df['cleaned_text'].tolist()), nlp)
subtitle_indices = tie_sentence_subtitle_indices(sentences, subtitle_df)
sentence_df = pd.DataFrame(list(zip(sentences, subtitle_indices)), columns=['sentence', 'subtitle_indices'])

## Scene Location Contextual Dialogue
We can check for dialogue common to certain types of scenes. A restaurant scene may have a waiter taking an order.

In [3]:
sentence = sentence_df.iloc[566].sentence
sentence

'What can I get you?'

In [4]:
restaurant_dialogue = ['what can i get you', 'check please', 'check, please']

In [5]:
for dialogue in restaurant_dialogue:
    if dialogue in sentence.lower():
        print('scene possibly in restaurant')

scene possibly in restaurant


Below is a running list of contextual dialogue.

In [6]:
restaurant_dialogue = ['what can i get you', 'check please', 'check, please']
casino_dialogue = ['no more bets']
airport_dialogue = ['final boarding call']

## Scene Location Contextual Parenthetical
We can also look for parentheticals, to identify types of scenes or locations.

In [7]:
subs = pysrt.open('../subtitles/booksmart_2019.srt')
subtitle_df = generate_base_subtitle_df(subs)
subtitle_df = generate_subtitle_features(subtitle_df)
subtitle_df['cleaned_text'] = subtitle_df['concat_sep_text'].map(clean_line)
sentences = partition_sentences(remove_blanks(subtitle_df['cleaned_text'].tolist()), nlp)
subtitle_indices = tie_sentence_subtitle_indices(sentences, subtitle_df)
sentence_df = pd.DataFrame(list(zip(sentences, subtitle_indices)), columns=['sentence', 'subtitle_indices'])

In [8]:
parenthetical_1 = subtitle_df.iloc[2419].parenthetical
parenthetical_1

'tires screeching'

In [9]:
parenthetical_2 = subtitle_df.iloc[2393].parenthetical
parenthetical_2

'engine revving'

In [10]:
car_parenthetical = ['engine starts', 'tires screeching', 'engine revving']

In [11]:
for parenthetical in car_parenthetical:
    if parenthetical in parenthetical_1.lower():
        print('scene possibly in car')

scene possibly in car


In [12]:
for parenthetical in car_parenthetical:
    if parenthetical in parenthetical_2.lower():
        print('scene possibly in car')

scene possibly in car


Below is a running list of contextual parentheticals.

In [13]:
car_parenthetical = ['engine starts', 'engine rev', 'engine roar', 'tires screech', 'brakes screech', 'hand brake', 'handbrake']
train_parenthetical = ['brakes screech', 'train whistle']
bathroom_parenthetical = ['toilet', 'flush']
elevator_parenthetical = ['elevator bell']
airport_parenthetical = ['plane passes overhead', 'plane flies overhead']

## Scene Location Context Speaker
If a character is speaking from offscreen, they are usually labeled. Sometimes this is the character name, but if it's a generic, unnamed character, it may be something like "Pilot" or "Waiter". This can give us important clues to the context of the scene.

In [14]:
airplane_speaker = ['pilot', 'flight attendant']

In [15]:
offscreen_speaker = 'PILOT'

In [16]:
for phrase in airplane_speaker:
    if phrase in offscreen_speaker.lower():
        print('scene possibly in airplane')

scene possibly in airplane


Below is a running list of contextual speakers.

In [17]:
airplane_speaker = ['pilot', 'flight attendant']
restaurant_speaker = ['waiter', 'waitress']
wheel_speaker = ['driver']
reception_speaker = ['receptionist']
casino_speaker = ['dealer', 'gambler', 'pit boss', 'croupier']
party_speaker = ['deejay']
train_speaker = ['train attendant']
school_speaker = ['teacher', 'principal', 'student']

## Walla
Walla is industry parlance for indistinct dialogue used to make a crowded scene sound alive. It's (allegedly) named so because background extras could simulate this by saying "walla walla walla" over and over. These days, extras stay quiet during shooting and walla is just inserted as part of the sound mix.

We can look for subtitle descriptions of walla, like "indistinct conversation".

In [18]:
subs = pysrt.open('../subtitles/plus_one_2019.srt')
subtitle_df = generate_base_subtitle_df(subs)
subtitle_df = generate_subtitle_features(subtitle_df)
subtitle_df['cleaned_text'] = subtitle_df['concat_sep_text'].map(clean_line)
sentences = partition_sentences(remove_blanks(subtitle_df['cleaned_text'].tolist()), nlp)
subtitle_indices = tie_sentence_subtitle_indices(sentences, subtitle_df)
sentence_df = pd.DataFrame(list(zip(sentences, subtitle_indices)), columns=['sentence', 'subtitle_indices'])

In [19]:
parenthetical = subtitle_df.iloc[816].parenthetical
parenthetical

'Indistinct conversations'

In [20]:
walla_parentheticals = ['indistinct conversation', 'chatter']

In [21]:
for dialogue in restaurant_dialogue:
    if dialogue in sentence.lower():
        print('walla found')

walla found
