# Dialogue Attribution
Dialogue attribution is an important component of comprehending films. While we currently have a ground-truth transcription of all spoken dialogue (in the subtitles), we need to be able to tie these sentences to the speaker of that line.

This isn't the first time we've attempted dialogue attribution. In the directory `/depreciated_modules/dialogue_attribution`, we attempted automated speaker diarization of voices, assigning sentences to one of two voice encodings. A conversation would be divided into sentences by speaker M and speaker N, but these would sometimes be entirely flipped, attributing almost all lines to the incorrect speaker.

Another attempt was made with the function `get_implied_speaker()`. This would calculate the frames onscreen during the entire duration of the subtitle, and try and count the onscreen faces which have their mouth open. This would return the face encoding of the character onscreen, but also wasn't reliably accurate.

After continued development in visual, audio, and subtitle, we can make another attempt to solve dialogue attribution. It will be a combination of the two approaches developed above: diarizing the conversation audio, and then looking for onscreen faces.

In [1]:
import os
import sys
sys.path.append('../subtitle_features')
sys.path.append('../audio_features')
sys.path.append('../data_serialization')
from serialization_preprocessing_io import *
from time_reference_io import *
from film_details_io import *
from scene_identification_io import *
from scene_details_io import *
from character_identification_io import *
from character_details_io import *
from subtitle_dataframes_io import *
import pyAudioAnalysis.audioSegmentation
import pandas as pd
nlp = spacy.load('en')

## Choosing a Scene
We'll work with the first scene identified in *Lost in Translation* (2003).

In [2]:
film = 'lost_in_translation_2003'
srt_df, subtitle_df, sentence_df, vision_df, face_df = read_pickle(film)
scene_dictionaries = generate_scenes(vision_df, face_df, substantial_minimum=4, anchor_search=6)

In [3]:
scene_dictionaries[1]

{'scene_id': 1,
 'first_frame': 1929,
 'last_frame': 2082,
 'scene_duration': 154,
 'left_anchor_shot_cluster': 207,
 'left_anchor_face_cluster': 6.0,
 'matching_left_face_clusters': [],
 'right_anchor_shot_cluster': 66,
 'right_anchor_face_cluster': 17.0,
 'matching_right_face_clusters': [18.0, 12.0],
 'cutaway_shot_clusters': [7, 29, 142]}