In [286]:
from textblob import TextBlob, Sentence
from nltk.corpus import stopwords
from collections import defaultdict
import pandas as pd
import re

## Some Processing Stuff

In [6]:
script_dir = './scripts'
script_name = 'Thanksgiving.txt'
with open("{}/{}".format(script_dir, script_name), 'r') as f:
    text = f.read()

In [39]:
text[:1000]

"1\n  [Bob Barker] The first\n Showcase Showdown.\n  On its way, ladies and gentlemen.\n  It is round and around and around\n that it goes.\n  [cheers and applause on TV]\n  Where Denise at?\n  She upstairs with her little boyfriend.\n  [laughing] Oh, wait.\n Denise got a little boyfriend?\n  - Mm-hmm.\n - [laughs]\n  [cheers and applause on TV]\n  Y'all better stop running\n through this house.\n  [Denise] Yes, Grandma.\n  Denise, what were y'all doing up there?\n  Watching Fresh Prince.\n  You weren't eating candy, were you?\n  No.\n  Dev, do y'all even celebrate\n Thanksgiving in your house?\n  Is that a thing y'all do\n in the Indian community?\n  We have lunch together.\n  Then my dad watches The Godfather\n and falls asleep.\n  [laughs]\n  Well, you are welcome to come have\n  Thanksgiving with us anytime you want.\n  What's the Indian community?\n  [laughs]\n  Dev is Indian.\n  Wait. I thought Dev was black.\n  - I'm brown.\n - Black people are brown, too.\n  [Catherine] Oh, Lor

We've got some things that aren't part of the dialogue here: names like [Denise] and [Catherine], and actions like [laughs] and [cheers and applause on TV]. While the latter can give some context to the mood of the scene, I'm making an executive decision here to remove them since I want to limit my analysis to the script dialogue.

In [113]:
def clean_text(text):
    """
    Return text with special characters and non-dialogue (e.g. [laughs], [Denise]) removed.
    Params:
        text: string
    """
    text = re.sub(r'^1\s*', '', text) # scraped scripts start with 1
    dialogue = re.sub(r'\[.*\]', '', text)
    return re.sub('[^A-Za-z0-9\'?!. ]+', '', dialogue)

In [114]:
cleaned_text = clean_text(text)

When I analyze word usage frequency, I want to exclude common stopwords without an particular significance like "the" and "and". Just gonna leave this here for now.

In [155]:
def remove_stop_words(text):
    """
    Return text with stopwords removed.
    Note that this removes punctuation like (!?.,) as well.
    Params:
        text: string
    """
    stop_words = stopwords.words('english')
    blob = TextBlob(text)
    words = [word for word in blob.words if word not in stop_words]
    return ' '.join(words)

In [156]:
text_no_stopwords = remove_stop_words(cleaned_text)

A look at what we've got now:

In [115]:
cleaned_text[:1000]

" The first Showcase Showdown.  On its way ladies and gentlemen.  It is round and around and around that it goes.    Where Denise at?  She upstairs with her little boyfriend.   Oh wait. Denise got a little boyfriend?   Mmhmm.      Y'all better stop running through this house.   Yes Grandma.  Denise what were y'all doing up there?  Watching Fresh Prince.  You weren't eating candy were you?  No.  Dev do y'all even celebrate Thanksgiving in your house?  Is that a thing y'all do in the Indian community?  We have lunch together.  Then my dad watches The Godfather and falls asleep.    Well you are welcome to come have  Thanksgiving with us anytime you want.  What's the Indian community?    Dev is Indian.  Wait. I thought Dev was black.   I'm brown.  Black people are brown too.   Oh Lord.  Okay.  Look both of you are minorities.  What's a minority?  It's a group of people who have to work  twice as hard in life to get half as far  and Denise you a black woman  so you gonna have to work three 

In [182]:
blob = TextBlob(cleaned_text)

### Overall sentiment of entire episode:

In [184]:
blob.sentiment

Sentiment(polarity=0.12602021893037516, subjectivity=0.5488213854382333)

In [247]:
def get_sentence_polarities(blob):
    """
    Return dictionary with {sentence (string): polarity} for sentences in blob.
    Params:
        blob: TextBlob
    """
    sentence_polarities = {}

    for sent in blob.sentences:
        polarity = sent.sentiment.polarity
        sentence_polarities[str(sent)] = polarity
        
    return sentence_polarities

In [263]:
sentence_polarities = get_sentence_polarities(blob)

Cool, now I have a dictionary mapping sentences to their respective polarities. I'm gonna load all of that into a dataframe now and see what we can find.

In [270]:
sentence_df = pd.DataFrame(list(sentence_polarities.items()), columns=['sentence', 'polarity'])
sorted_sentences = sentence_df.sort_values(by=['polarity'], ascending=False)

### Top 10 Positive Sentences

In [275]:
print('Sentence|Polarity')
for index, row, in sorted_sentences[:10].iterrows():
    print('{}|{}'.format(row['sentence'], row['polarity']))

Sentence|Polarity
I'm happy for you.|0.8
They're great.|0.8
Welcome darling.|0.8
Well you are welcome to come have  Thanksgiving with us anytime you want.|0.8
I said your yams turned out really nice this year!|0.75
Grandma Ernestine your yams turned out really nice this year!|0.75
That's good to know.|0.7
Yeah man it's really good.|0.7
It's so great to  Nice to meet you.|0.7
They good?|0.7


### Top 10 Negative Sentences

In [285]:
print('Sentence|Polarity')
for index, row, in sorted_sentences[::-1][:10].iterrows():
    print('{}|{}'.format(row['sentence'], row['polarity']))

Sentence|Polarity
Horrible.|-1.0
That's horrible.|-1.0
Damn why you got to hate on the Pan?|-0.8
That character was an idiot.|-0.8
Stupid.|-0.7999999999999999
seventeenfoot aluminum boat that broke apart    Man I told you this is stupid.|-0.7999999999999999
Don't you ask another fucking question!|-0.75
Well you know my hearing is bad.|-0.6999999999999998
Why would they be mad?|-0.625
I feel like all three of them are gonna be mad at you about that.|-0.625
