# conversation_mood
One goal of *Moviegoer* is calculating and comparing the emotional aspects of scenes' conversations. We can quantify a conversation's mood/vibe in various ways.

In [1]:
from subtitle_dataframes_io import *
nlp = spacy.load('en')

In [2]:
subs = pysrt.open('../subtitles/plus_one_2019.srt')
subtitle_df = generate_base_subtitle_df(subs)
subtitle_df = generate_subtitle_features(subtitle_df)
subtitle_df['cleaned_text'] = subtitle_df['concat_sep_text'].map(clean_line)
sentences = partition_sentences(remove_blanks(subtitle_df['cleaned_text'].tolist()), nlp)
subtitle_indices = tie_sentence_subtitle_indices(sentences, subtitle_df)
sentence_df = pd.DataFrame(list(zip(sentences, subtitle_indices)), columns=['sentence', 'subtitle_indices'])

## Profanity
The easiest way to measure drama is by identifying and counting profanity. We can define a list of profanities for which to search. Then we'll break the sentence into lemma, so that we can find derivations in the vein of "hecked", "heckin'", "hecking", "hecky", etc.

This is defined as a function *profanity()* in *phrases_io.py* and is used to find profanity at the scene- and film-level. These can be compared to find scenes with higher profanity than the film average.

In [3]:
sentence = sentence_df.iloc[814].sentence
sentence

"He's fucking talking about babies."

In [4]:
sent_doc = nlp(sentence)
profanity_count = 0

profanities = ['fuck', 'fucking', 'fuckin', 'fucked', 'shit', 'shitty', 'bullshit', 'ass', 'dumbass', 'bitch', 'asshole', 'tit', 'cunt', 'goddamn', 'damn', 'dammit', 'cock', 'cocksucker', 'dick']
for word in sent_doc:
    if word.lemma_.lower() in profanities:
        profanity_count += 1

profanity_count


1

## Cadence (Conversation Speed)
Cadence, or conversation speed, is an easy way to quantify a scene's conversation. Comedies might fire off jokes at lightning-speed. Introspective dramas might have characters speak very slowly.

Since we can now identify specific scenes, let's measure the cadence of a scene.

In [5]:
import sys
sys.path.append('../data_serialization')
from serialization_preprocessing_io import *
from time_reference_io import *
from scene_identification_io import *
from scene_details_io import *
from character_identification_io import *
from character_details_io import *

In [6]:
film = 'plus_one_2019'
srt_df, subtitle_df, sentence_df, vision_df, face_df = read_pickle(film)
scene_dictionaries = generate_scenes(vision_df, face_df, substantial_minimum=4, anchor_search=8)
character_dictionaries = generate_characters(scene_dictionaries)

In [8]:
scene_dict = scene_dictionaries[7]
scene_duration = scene_dict['last_frame'] + 1 - scene_dict['first_frame']
scene_subtitle_df = subtitle_df[(subtitle_df['end_time'] > scene_start_time) & (subtitle_df['start_time'] < scene_end_time)].copy()

We simply take the words per minute.

In [10]:
cadence = len(scene_subtitle_df) / (scene_duration / 60)
cadence

25.90909090909091

We can define three speed categories: fast, medium, and slow.

In [11]:
    if round(cadence) >= 35:
        print('This scene has a fast cadence, with a conversation speed of', round(cadence), 'sentences per minute.')
    elif round(cadence) < 20:
        print('This scene has a slow cadence, with a conversation speed of', round(cadence), 'sentences per minute.')
    else:
        print('This scene has a medium cadence, with a conversation speed of', round(cadence), 'sentences per minute.')

This scene has a medium cadence, with a conversation speed of 26 sentences per minute.
