# scene_details
Since we can automatically identify a few scenes of a film, we can begin taking a closer look at each. In this notebook, we'll try and identify as many features as we can: the level of drama in the dialogue, the emotions of the characters, the mood of the score, etc.

In [1]:
from sklearn.feature_extraction.text import TfidfVectorizer
from time_reference_io import *
from film_details_io import *
from collections import Counter
import pandas as pd
from scene_identification_io import *
sys.path.append('../subtitle_features')
from subtitle_dataframes_io import *

In [2]:
film = 'lost_in_translation_2003'
srt_df, subtitle_df, sentence_df, vision_df, face_df = read_pickle(film)

## Creating Scene-Specific DataFrames
We have a function to create a dictionary of scenes.

In [3]:
scene_dictionaries = generate_scenes(vision_df, face_df, substantial_minimum=4, anchor_search=6)
len(scene_dictionaries)

5

In [4]:
scene_dict = scene_dictionaries[3]
scene_dict

{'scene_id': 3,
 'first_frame': 2489,
 'last_frame': 2533,
 'scene_duration': 45,
 'left_anchor_shot_cluster': 196,
 'left_anchor_face_cluster': 18.0,
 'matching_left_face_clusters': [9.0, 12.0],
 'right_anchor_shot_cluster': 38,
 'right_anchor_face_cluster': 6.0,
 'matching_right_face_clusters': [],
 'cutaway_shot_clusters': [29]}

Each dictionary gives us the first and last frames. We can use these to create new subtitle-related dataframes containing only data from this scene.

In [5]:
scene_duration = scene_dict['last_frame'] + 1 - scene_dict['first_frame']
scene_start_time = frame_to_time(scene_dict['first_frame'])
scene_end_time = frame_to_time(scene_dict['last_frame'] + 1) # add 1 second; scene ends one second after this frame is onscreen
scene_subtitle_df = subtitle_df[
    (subtitle_df['end_time'] > scene_start_time) & (subtitle_df['start_time'] < scene_end_time)].copy()

scene_sentence_indices = []
x = 0
for sub_index_list in sentence_df.subtitle_indices.values:
    for sub_index in sub_index_list:
        if sub_index in scene_subtitle_df.index.values:
            scene_sentence_indices.append(x)
    x += 1
scene_sentence_df = sentence_df[scene_sentence_indices[0]: scene_sentence_indices[-1] + 1]

In [6]:
scene_subtitle_df.head(3)

Unnamed: 0,srt_index,original_text,start_time,end_time,concat_sep_text,separated_flag,laugh,hesitation,speaker,music,parenthetical,el_parenthetical,el_italic,cleaned_text
630,631,(GIGGLES)\nHello.,00:41:30.488000,00:41:31.780000,(GIGGLES) Hello.,0,1,0,,0,GIGGLES,0,0,Hello.
631,632,Hello.\nHow are you?,00:41:31.865000,00:41:33.073000,Hello. How are you?,0,0,0,,0,,0,0,Hello. How are you?
632,633,Good. How are you?,00:41:33.158000,00:41:34.575000,Good. How are you?,0,0,0,,0,,0,0,Good. How are you?


In [7]:
scene_sentence_df.head(3)

Unnamed: 0,sentence,subtitle_indices,profanity,self_intro,other_intro,direct_address,conv_boundary,offscreen_speaker,implied_speaker
721,Hello.,[630],0,,,,,,
722,Hello.,[631],0,,,,,,18.0
723,How are you?,[631],0,,,,starter,,18.0
