# film_baseline
To get the general feel for a film, we'll define some baseline characteristics. Some of these, such as conversation speed, can be compared to individual scenes; this helps us find scenes which might be particularly dramatic or emotional.

In [1]:
import sys
sys.path.append('../data_serialization')
from serialization_preprocessing_io import *

In [2]:
film = 'plus_one_2019'
srt_df, subtitle_df, sentence_df, vision_df, face_df = read_pickle(film)

### Disregarding credits
First, we'll need a way to disregard frames that contain the credits. This will throw off our baseline characteristics.

As a quick-and-dirty way of accomplishing this, we can look for the last frame that has a face in it, and then disregard every frame after that.

In [3]:
frame_before_credits = face_df[face_df['faces_found'] > 0].tail(1).index[0]  # final frame before credits
vision_nocreds_df = vision_df[0:frame_before_credits].copy()
face_nocreds_df = face_df[0:frame_before_credits].copy()
subtitle_nocreds_df = subtitle_df[0:frame_before_credits].copy()
sentence_nocreds_df = sentence_df[0:frame_before_credits].copy()

## Dialogue
### Profanity Per Word
First, we'll take a look at the dialogue. We'll first quantify how much profanity is in the film as a whole. We'll look for "profanity per word", like "1 in X words is a profanity". After we calculate this, we'll be able to search for dramatic scenes with an above-average amount of profanity.

We'll define a function `get_word_count()` to calculate total words. Then we'll count up the number of profanities.

In [4]:
def get_word_count(sentence_nocreds_df):
    space_count = 0
    sentence_list = sentence_nocreds_df.sentence.tolist()
    for sentence in sentence_list:
        for character in sentence:
            if character.isspace():
                space_count += 1
    word_count = space_count + len(sentence_list)

    return word_count

In [5]:
def get_profanity_per_word(sentence_nocreds_df):
    word_count = get_word_count(sentence_nocreds_df)
    profanity_count = sentence_nocreds_df.profanity.sum()

    profanity_per_word = profanity_count/word_count

    return profanity_per_word

In [6]:
profanity_per_word = get_profanity_per_word(sentence_nocreds_df)

In [7]:
if profanity_per_word == 0:
    print('The film contains no profanity.')
else:
    print('One in', round(1 / profanity_per_word), 'words is a profanity.')

One in 117 words is a profanity.
