<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Sentiment Analysis of Movie Reviews with Spacy and VADER

_Authors: Kiefer Katovich (SF)_

---

### Learning Objectives
- Understand the goal of basic sentiment analysis.
- Calculate sentiment scores manually using a reviews dataset and scores tagged by word.
- Practice using the spacy parser to get out part of speech tags from text.
- Fit a model using sentiment and grammar features.
- Use the VADER sentiment analyzer to get out more accurate sentiment scores and compare the models.

### Lesson Guide
- [Introduction to sentiment analysis](#intro)
- [Load the word sentiment dataset](#load-sen)
    - [Engineer objectivity and positive difference scores](#adj-scores)
    - [Put scores in a part of speech dictionary](#pos-dict)
- [Load the rotten tomatoes review dataset](#rt-reviews)
    - [Restrict reviews to valid lengths and ratings](#subset)
- [Import spacy](#spacy)
    - [Parse all the quotes using spacy's multithreaded parser](#multi)
- [Part of speech features](#pos-features)
- [Assign sentiment scores](#assign)
- [Print out the most positive and most negative reviews](#print-most)
- [Print out the most objective and most subjective reviews](#print-most-obj)
- [Build a model to classify fresh vs. rotten with the sentiment and grammar features](#model)
- [User the VADER library to get better sentiment scores](#vader)
    - [Build a model using the VADER sentiment features](#vader-model)

<a id='intro'></a>

## Introduction to sentiment analysis
---

Sentiment analysis is one of the most popular topics in NLP. Most commonly it is the quantification of text into valence and subjectivity scores.

First we will load in a dataset of pre-coded sentiment scores for positivity and negativity on words. These words are also tagged with their part of speech in the sentence. We can use these valence scores to evaluate the sentiment of rottentomatoes movie reviews. Many packages such as TextBlob come pre-packaged with sentiment scores for words after parsing text, but doing the sentiment parsing manually will show you how it can be done without any "magic".

We will also explore a more advanced sentiment analysis library in python: [VADER](https://github.com/cjhutto/vaderSentiment). We can parse the sentiment of the movie reviews using this package and compare it to our more basic method.



<a id='load-sen'></a>

## Load the word sentiment dataset
---

Below we will load in some pre-tagged positive and negative valence scores for a dictionary of words. Each row of the dataset contains the part of speech, the word, the positive score, and the negative score for the word. A word may appear more than once if it can appear with different part of speech tags. 

These scores are designed so that we can also derive the *objectivity score* of the word from the positive and negative scores.

Objectivity is calculated: 

    1. - (positive_score + negative_score)

Thus if a score has zero positive score and negative score it is completely objective. If a score has, for example, 0.5 positive and 0.5 negative, it may not be any more positive than negative but we can tell that it is subjective (objectivity = 0.).


In [1]:
import pandas as pd
import numpy as np

In [2]:
sen = pd.read_csv('../datasets/sentiment_words_simple.csv')

In [3]:
sen.head()

Unnamed: 0,pos,word,pos_score,neg_score
0,adj,.22-caliber,0.0,0.0
1,adj,.22-calibre,0.0,0.0
2,adj,.22_caliber,0.0,0.0
3,adj,.22_calibre,0.0,0.0
4,adj,.38-caliber,0.0,0.0


**Make the part of speech tags uppercase (this will come in handy later when we use Spacy).**

In [4]:
sen.pos = sen.pos.map(lambda x: x.upper())

<a id='adj-scores'></a>

### Engineer objectivity and positive difference scores

Since subjective vs. objective is embedded in the positive and negative scores, we should extract this and convert the positive and negative into a relative difference scores.

**Calculate two new scores:**

    objectivity = 1. - (pos_score + neg_score)
    pos_vs_neg = pos_score - neg_score
    

In [5]:
sen['objectivity'] = 1. - (sen.pos_score + sen.neg_score)
sen['pos_vs_neg'] = sen.pos_score - sen.neg_score

In [6]:
sen.head()

Unnamed: 0,pos,word,pos_score,neg_score,objectivity,pos_vs_neg
0,ADJ,.22-caliber,0.0,0.0,1.0,0.0
1,ADJ,.22-calibre,0.0,0.0,1.0,0.0
2,ADJ,.22_caliber,0.0,0.0,1.0,0.0
3,ADJ,.22_calibre,0.0,0.0,1.0,0.0
4,ADJ,.38-caliber,0.0,0.0,1.0,0.0


<a id='pos-dict'></a>

### Put scores in a part of speech dictionary

The dictionary format of the data will be much easier to index using our parsing functions later on. Create a dictionary where the keys are the four part of speech tags:

    ADJ
    NOUN
    VERB
    ADV

For each key, store a dictionary that contains all of the words for that part of speech with their objectivity and positive vs. negative scores.

In [9]:
sen_dict = {'ADJ':{},'NOUN':{},'VERB':{},'ADV':{}}

for i, row in enumerate(sen.itertuples()):
#     if (i % 10000) == 0:
#         print(i)
    sen_dict[row[1]][row[2]] = {'objectivity':row[5], 'pos_vs_neg':row[6]}


<a id='rt-reviews'></a>

## Load the rotten tomatoes reviews dataset

---

This dataset has:
    
    critic: critic's name
    fresh: fresh vs. rotten rating
    imdb: code for imdb
    publication: where the review was published
    quote: the review snippet
    review_date: date of review
    rtid: rottentomatoes id
    title: name of movie

In [10]:
rt = pd.read_csv('../datasets/rt_critics.csv')

In [11]:
rt.head()

Unnamed: 0,critic,fresh,imdb,publication,quote,review_date,rtid,title
0,Derek Adams,fresh,114709.0,Time Out,"So ingenious in concept, design and execution ...",2009-10-04,9559.0,Toy story
1,Richard Corliss,fresh,114709.0,TIME Magazine,The year's most inventive comedy.,2008-08-31,9559.0,Toy story
2,David Ansen,fresh,114709.0,Newsweek,A winning animated feature that has something ...,2008-08-18,9559.0,Toy story
3,Leonard Klady,fresh,114709.0,Variety,The film sports a provocative and appealing st...,2008-06-09,9559.0,Toy story
4,Jonathan Rosenbaum,fresh,114709.0,Chicago Reader,"An entertaining computer-generated, hyperreali...",2008-03-10,9559.0,Toy story


<a id='subset'></a>

### Restrict data to reviews with valid ratings and reviews over 10 words long

Also clean up the reviews, making a column with the case and punctuation removed.

In [12]:
rt = rt[rt.fresh.isin(['fresh','rotten'])]
rt.fresh = rt.fresh.map(lambda x: 1 if x == 'fresh' else 0)

In [13]:
rt['quote_len'] = rt.quote.map(lambda x: len(x.split()))
rt = rt[rt.quote_len > 10]
rt.shape

(11215, 9)

In [18]:
import string
rt['qt'] = rt.quote.map(lambda x: ''.join([y for y in list(x.lower()) if y in string.ascii_lowercase+" -'"]))
rt['qt'] = rt['qt'].map(lambda x: x.replace('-',' '))

In [19]:
rt.sample(5)

Unnamed: 0,critic,fresh,imdb,publication,quote,review_date,rtid,title,quote_len,qt
2458,James Berardinelli,1,107614.0,ReelViews,"In terms of plot, the film is rather feeble, b...",2000-01-01,10997.0,Mrs. Doubtfire,37,in terms of plot the film is rather feeble but...
12576,Robert Horton,0,190641.0,Film.com,There isn't a moment of wonder or poetry in it...,2000-01-01,9629.0,Pokémon: The First Movie,14,there isn't a moment of wonder or poetry in it...
4900,Desson Thomson,1,91288.0,Washington Post,You may also become permanently sick of goats....,2000-01-01,10960.0,Jean de Florette,32,you may also become permanently sick of goats ...
1154,James Berardinelli,1,114558.0,ReelViews,"It's big, explosive entertainment and, althoug...",2000-01-01,13697.0,Strange Days,22,it's big explosive entertainment and although ...
11301,,0,72684.0,Time Out,The constant array of waxworks figures against...,2006-01-26,11717.0,Barry Lyndon,14,the constant array of waxworks figures against...


<a id='spacy'></a>

## Import spacy

---

The spacy package is the current gold standard for parsing the grammatical structure of text (aside from neural network architectures). We are going to use it to find the part of speech tags for the review words. 

Once we have parsed the tags with spacy, we can assign objectivity and valence scores by finding the match in our sentiment dataset.

In [20]:
import spacy
en_nlp = spacy.load('en')

**Parse a single quote:**

In [21]:
tmp = en_nlp(rt.qt.values[0])

In [22]:
tmp

so ingenious in concept design and execution that you could watch it on a postage stamp sized screen and still be engulfed by its charm

In [23]:
# we can get out single words with indexing:
tmp[3]

concept

**Print out the part of speech tags for each word in the quote:**

In [26]:
for token in tmp:
    print(token,token.pos_)

so ADV
ingenious ADJ
in ADP
concept NOUN
design NOUN
and CCONJ
execution NOUN
that ADP
you PRON
could VERB
watch VERB
it PRON
on ADP
a DET
postage NOUN
stamp NOUN
sized ADJ
screen NOUN
and CCONJ
still ADV
be VERB
engulfed VERB
by ADP
its ADJ
charm NOUN


<a id='multi'></a>
### Parse all the quotes using spacy's multithreaded parser

Parsing a lot of text can take quite awhile. Luckily spacy comes with multithreading functionality to speed up the process considerably. Below is code that will parse the quotes across multiple threads and assign them to a list.

In [29]:
parsed_quotes = []
for i, parsed in enumerate(en_nlp.pipe(rt.qt.values, batch_size=50, n_threads=4)):
    assert parsed.is_parsed
    if (i % 1000) == 0:
        print(i)
    parsed_quotes.append(parsed)        

0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000


<a id='pos-features'></a>

## Create features with part of speech proportions

---

With our spacy parsed reviews, we have a lot of feature engineering potential even before we get to sentiment. Something simple we could do is calculate the proportion of words in the quote that have each part of speech tag. We can try using these as predictors in a model later.

**Find all the unique part of speech categories in the reviews.**

In [31]:
unique_pos = []
for parsed in parsed_quotes:
    unique_pos.extend([t.pos_ for t in parsed])
unique_pos = np.unique(unique_pos)
print(unique_pos)

['ADJ' 'ADP' 'ADV' 'CCONJ' 'DET' 'INTJ' 'NOUN' 'NUM' 'PART' 'PRON' 'PROPN'
 'PUNCT' 'SPACE' 'SYM' 'VERB' 'X']


**Create the proportion columns for each part of speech.**

In [32]:
for pos in unique_pos:
    rt[pos+'_prop'] = 0.
       

**Iterate through the reviews and calculate the proportions of each part of speech tag.**

In [36]:
rt = rt.reset_index(drop=True)
for i, parsed in enumerate(parsed_quotes):
    if (i % 100) == 0:
        print(i, end=' ')
    parsed_len = len(parsed)
    for pos in unique_pos:
        count = len([x for x in parsed if x.pos_ == pos])
        rt.loc[i, pos+'_prop'] = float(count)/parsed_len

0 100 200 300 400 500 600 700 800 900 1000 1100 1200 1300 1400 1500 1600 1700 1800 1900 2000 2100 2200 2300 2400 2500 2600 2700 2800 2900 3000 3100 3200 3300 3400 3500 3600 3700 3800 3900 4000 4100 4200 4300 4400 4500 4600 4700 4800 4900 5000 5100 5200 5300 5400 5500 5600 5700 5800 5900 6000 6100 6200 6300 6400 6500 6600 6700 6800 6900 7000 7100 7200 7300 7400 7500 7600 7700 7800 7900 8000 8100 8200 8300 8400 8500 8600 8700 8800 8900 9000 9100 9200 9300 9400 9500 9600 9700 9800 9900 10000 10100 10200 10300 10400 10500 10600 10700 10800 10900 11000 11100 11200 

<a id='assign'></a>

## Assign sentiment scores
---

We will now use the parsed reviews and the sentiment dataset to assign the average objectivity and positive vs. negative scores.

If a word cannot be found in the dataset we can ignore it. If a review has no words that match something in our dataset, will can assign overall neutral scores of `objectivity = 1` and `pos_vs_neg = 0`.

There are definitely problems with this approach, but for now we can keep it "dumb" and see if things improve when we use the VADER analyzer later.

In [37]:
def scorer(parsed):
    obj_scores, pvn_scores = [], []
    for token in [t for t in parsed if t.pos_ in ['NOUN','VERB','ADV','ADJ']]:
        try:
            obj_scores.append(sen_dict[token.pos_][str(token)]['objectivity'])
            pvn_scores.append(sen_dict[token.pos_][str(token)]['pos_vs_neg'])
        except:
            pass
    if len(obj_scores) == 0:
        obj_scores = [1.]
    if len(pvn_scores) == 0:
        pvn_scores = [0.]
    return [obj_scores, pvn_scores]


scores = []
for i, parsed in enumerate(parsed_quotes):
    if (i % 1000) == 0:
        print(i)
    scores.append(scorer(parsed))
    
rt['objectivity_avg'] = [np.mean(x[0]) for x in scores]
rt['pos_vs_neg_avg'] = [np.mean(x[1]) for x in scores]

0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000


<a id='print-most'></a>
## Print out the most positive and most negative reviews
---

Now that we have the average valence for reviews, try printing out the top 10 most positive and top 10 most negative reviews to visually verify that our approach makes sense.

In [40]:
rt.sort_values('pos_vs_neg_avg', ascending=False, inplace=True)
for quote in rt.quote[0:10]:
    print(quote)
    print('-'*80)

If you love vintage Woody Allen, you'll like the nouveau Rob Reiner.
--------------------------------------------------------------------------------
As bustling and impassioned as the best Sturges and Capra movies.
--------------------------------------------------------------------------------
Hanks is superb, reemploying the childlike presence he brought to Big.
--------------------------------------------------------------------------------
Streep (the best thing she has done in ages) carries it along.
--------------------------------------------------------------------------------
High Noon combines its points about good citizenship with some excellent picturemaking.
--------------------------------------------------------------------------------
Paths of Glory is all about that greatest of all movie subjects: power.
--------------------------------------------------------------------------------
Improbabilities and all, Simpatico still boasts wonderful scenes and a cast that is t

In [41]:
rt.sort_values('pos_vs_neg_avg', ascending=True, inplace=True)
for quote in rt.quote[0:10]:
    print(quote)
    print('-'*80)

Unoriginal and insulting, 3 Strikes goes down without scoring a single chuckle.
--------------------------------------------------------------------------------
Brooding, somber film is ragged around the edges and not without problematic aspects.
--------------------------------------------------------------------------------
Its tone is never exactly comedic and its horrific touches are more disgusting than scary.
--------------------------------------------------------------------------------
It's a disturbing, hopeless, irredeemable series of images that will scar you if you wander into it unprepared.
--------------------------------------------------------------------------------
...Liar Liar stands to make a liar out of those who predicted that Carrey's career was on the skids.
--------------------------------------------------------------------------------
A silly movie, with silly jokes and a silly story. But the talents at work in it are not silly.
-----------------------------

<a id='print-most-obj'></a>

## Print out the most objective and most subjective reviews
---

Do the same as above, but now sort by the objectivity. What kind of differences do you notice between these? Does our approach actually appear to capture meaningful subjectivity and objectivity in the reviews?

In [42]:
rt.sort_values('objectivity_avg', ascending=False, inplace=True)
for quote in rt.quote[0:10]:
    print(quote)
    print('-'*80)

One of the finest collaborations between husband and wife ever committed to film.
--------------------------------------------------------------------------------
Brian De Palma's take on Tom Wolfe's The Bonfire of the Vanities is a misfire of inanities.
--------------------------------------------------------------------------------
Barbara Stanwyck is the sexiest con woman ever captured on film.
--------------------------------------------------------------------------------
The Ref benefits from having actor's actors like Davis and Spacey in the leads.
--------------------------------------------------------------------------------
A crackling thriller that feels unusually attuned to its lowlife characters.
--------------------------------------------------------------------------------
Vicky Cristina Barcelona is the cinematic equivalent of a book on tape: a movie that watches itself for you and tells you what it sees.
---------------------------------------------------------------

In [43]:
rt.sort_values('objectivity_avg', ascending=True, inplace=True)
for quote in rt.quote[0:10]:
    print(quote)
    print('-'*80)

They keep getting worse and worse and worse . . .
--------------------------------------------------------------------------------
Jumps adroitly between the macho and anti-macho, the romantic and anti-romantic.
--------------------------------------------------------------------------------
It's grave, lumbering, arrhythmic, and bloated, an emotional hogwallow of catchpenny insights and easy sentimentality.
--------------------------------------------------------------------------------
If you love vintage Woody Allen, you'll like the nouveau Rob Reiner.
--------------------------------------------------------------------------------
I am not sure why this isn't very funny, but it's not.
--------------------------------------------------------------------------------
Hawthorne is by turn outrageous and pathetic and imperious and poignant and very funny.
--------------------------------------------------------------------------------
Delivers guilt-free escapism about pretty people hav

In [27]:
# There doesn't seem to be much signal in the objectivity score. They look pretty
# similar to me!

<a id='model'></a>

## Build a model to classify fresh vs. rotten with the sentiment and grammar features

---

Let's use the features we've created to construct a Logistic Regression to predict whether a review is fresh vs. rotten. 

Don't forget to check the baseline score, and it's a good practice to standardize your predictors.


In [44]:
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.preprocessing import StandardScaler

In [45]:
X = rt[['objectivity_avg','pos_vs_neg_avg','quote_len']+[x for x in rt.columns if x.endswith('_prop')]]
y = rt.fresh.values

ss = StandardScaler()
Xs = ss.fit_transform(X)

lr_scores = cross_val_score(LogisticRegression(), Xs, y, cv=10)
print(np.mean(lr_scores), rt.fresh.mean())

0.6367301195070518 0.6150691038787338


In [46]:
# We do a bit better than the baseline using the default logistic regression.

In [47]:
X.columns

Index(['objectivity_avg', 'pos_vs_neg_avg', 'quote_len', 'ADJ_prop',
       'ADP_prop', 'ADV_prop', 'CCONJ_prop', 'DET_prop', 'INTJ_prop',
       'NOUN_prop', 'NUM_prop', 'PART_prop', 'PRON_prop', 'PROPN_prop',
       'PUNCT_prop', 'SPACE_prop', 'SYM_prop', 'VERB_prop', 'X_prop'],
      dtype='object')

In [48]:
lr = LogisticRegression().fit(Xs, y)
for var, coef in zip(X.columns, lr.coef_[0]):
    print(var, coef)

objectivity_avg -0.131444297627609
pos_vs_neg_avg 0.4464137790705639
quote_len 0.09727097153371356
ADJ_prop 0.06213084448904941
ADP_prop 0.003694343967884319
ADV_prop -0.09793705720106265
CCONJ_prop 0.06760724087733976
DET_prop -0.038602326689058074
INTJ_prop -0.018496623504046832
NOUN_prop 0.11570754102438945
NUM_prop 0.025407887114821125
PART_prop -0.06921766261357128
PRON_prop 0.09500622741485829
PROPN_prop 0.02207238224866559
PUNCT_prop -0.013472040372492371
SPACE_prop -0.0011233154043179935
SYM_prop -0.0015272322848200398
VERB_prop -0.1570218432085458
X_prop 0.017936629166324103


In [49]:
# The coefficients make sense for the sentiment features.

In [50]:
from sklearn.linear_model import LogisticRegressionCV

# Try a lasso penalty
lrcv = LogisticRegressionCV(penalty='l1', solver='liblinear', Cs=25, cv=10)
lrcv.fit(Xs, y)

LogisticRegressionCV(Cs=25, class_weight=None, cv=10, dual=False,
           fit_intercept=True, intercept_scaling=1.0, max_iter=100,
           multi_class='ovr', n_jobs=1, penalty='l1', random_state=None,
           refit=True, scoring=None, solver='liblinear', tol=0.0001,
           verbose=0)

In [51]:
# looks like it still keeps a fair amount of the features.
lrcv.coef_

array([[-0.12699201,  0.44109708,  0.09206572,  0.0489623 ,  0.        ,
        -0.10325072,  0.06131766, -0.04171676, -0.01529365,  0.097199  ,
         0.0183418 , -0.06937002,  0.08047236,  0.01666603, -0.01109301,
         0.        ,  0.        , -0.16260935,  0.01268696]])

<a id='vader'></a>

## Use the VADER library to get better sentiment scores
---

The [VADER](https://github.com/cjhutto/vaderSentiment) package for python is a more advanced way to calculate positivity, negativity, and objectivity in our reviews. The github page describes VADER as:

> VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

You will likely need to install VADER with pip or conda. Instructions can be found on the github page. Once you have it installed you can load the `SentimentIntensityAnalyzer` and parse text.

**Parse a couple of quotes with the `SentimentIntensityAnalyzer` and print out the dictionary of scores using `analyzer.polarity_scores`:

In [52]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

In [53]:
analyzer = SentimentIntensityAnalyzer()
for sentence in rt.quote.values[0:2]:
    vs = analyzer.polarity_scores(sentence)
    print(sentence)
    print(vs)

They keep getting worse and worse and worse . . .
{'neg': 0.65, 'neu': 0.35, 'pos': 0.0, 'compound': -0.8519}
Jumps adroitly between the macho and anti-macho, the romantic and anti-romantic.
{'neg': 0.0, 'neu': 0.787, 'pos': 0.213, 'compound': 0.4019}


You can see that these scores look more legitimate. VADER polarity score dictionaries have 4 elements: `neg`, `pos`, `neu` and `compound`. The compound score is a single metric that represents the "overall" valence.

**Calculate the four scores for each review and save them as features in the dataframe.**

In [54]:
rt['vader_neg'] = 0
rt['vader_pos'] = 0
rt['vader_neu'] = 0
rt['vader_compound'] = 0

for i, q in enumerate(rt.quote.values):
    vs = analyzer.polarity_scores(q)
    rt.iloc[i, -4] = vs['neg']
    rt.iloc[i, -3] = vs['pos']
    rt.iloc[i, -2] = vs['neu']
    rt.iloc[i, -1] = vs['compound']

<a id='vader-model'></a>

### Fit a model using the VADER sentiment features

Does this model perform better? 

In [55]:
X = rt[['vader_neg','vader_pos','vader_neu','vader_compound','quote_len']]
y = rt.fresh.values

Xs = StandardScaler().fit_transform(X)

scores = cross_val_score(LogisticRegression(), Xs, y, cv=10)
print(scores)
print(np.mean(scores))

# We do slightly better. I've also left out the part of speech stuff so that has an impact too.

[0.67736185 0.68092692 0.65240642 0.66755793 0.65508021 0.65151515
 0.64795009 0.62087422 0.62410714 0.59017857]
0.6467958507707682


In [56]:
lr = LogisticRegression().fit(Xs, y)

In [58]:
for c, v in zip(LogisticRegression().fit(Xs, y).coef_[0], ['neg','pos','neu','compound','len']):
    print(v, c)
    
# All the coefficients make sense.

neg -0.23561761659663846
pos 0.32067595059188947
neu -0.12730574331071565
compound 0.19851782945578514
len 0.10070925676621627


<a id='vader-top'></a>

### Print out the top most negative, positive, neutral, and subjective features by VADER score

In [61]:
rt.sort_values('vader_neg', ascending=False, inplace=True)
for i in range(5):
    print(rt['quote'].values[i])
    print('-'*80)

I hated this movie. Hated hated hated hated hated this movie. Hated it.
--------------------------------------------------------------------------------
They keep getting worse and worse and worse . . .
--------------------------------------------------------------------------------
What's the fourth "Die Hard" called? I keep forgetting. "Die Hard: With a Pension"? "Die Hardened Arteries"? "Die Laughing"?
--------------------------------------------------------------------------------
A shambolic, deafening, intelligence-insulting mess, a crushing failure on almost all counts.
--------------------------------------------------------------------------------
So distressingly dark, grim, and cynical it's likely to make kids cry!
--------------------------------------------------------------------------------


In [62]:
rt.sort_values('vader_pos', ascending=False, inplace=True)
for i in range(5):
    print(rt['quote'].values[i])
    print('-'*80)

A truly funny, sophisticated, compassionate, mainstream Hollywood comedy about very modern homosexuality.
--------------------------------------------------------------------------------
Extremely handsome production values and a great supporting cast round out the virtues.
--------------------------------------------------------------------------------
For lovers of romantic comedies through the ages, Roman Holiday remains a favorite.
--------------------------------------------------------------------------------
The Karate Kid exhibits warmth and friendly, predictable humor, its greatest assets.
--------------------------------------------------------------------------------
Charming performances and easygoing humour are the strengths of Jones's enjoyable Oirish romp.
--------------------------------------------------------------------------------


In [63]:
rt.sort_values('vader_neu', ascending=False, inplace=True)
for i in range(5):
    print(rt['quote'].values[i])
    print('-'*80)

Raoul Coutard's Techniscope cinematography contemplates an espresso, filling the screen in monumental close-up with a rotating vortex of bubbles and foam.
--------------------------------------------------------------------------------
This film was produced by Mr. Wilder for Allied Artists -- in black-and-white. It is a hit.
--------------------------------------------------------------------------------
This film, minus the deft and artistic handling of the director, Alfred Hitchcock, despite its cast and photography, would not stand up for Grade A candidacy.
--------------------------------------------------------------------------------
The movie is all anticlimax once we realize it's going to be about gimmicks, not characters.
--------------------------------------------------------------------------------
The musical numbers, actually performed by the on-screen band, are sensational.
--------------------------------------------------------------------------------


In [64]:
rt.sort_values('vader_neu', ascending=True, inplace=True)
for i in range(5):
    print(rt['quote'].values[i])
    print('-'*80)

I hated this movie. Hated hated hated hated hated this movie. Hated it.
--------------------------------------------------------------------------------
Welcome to Natural Born Killers, Stone's empty, manic meditation on society's glorification of violence and the ugly heroes it loves to hate.
--------------------------------------------------------------------------------
Branagh's lame stab at a romantic psychological thriller makes no sense.
--------------------------------------------------------------------------------
A truly funny, sophisticated, compassionate, mainstream Hollywood comedy about very modern homosexuality.
--------------------------------------------------------------------------------
Hilarious, sexy, clever, playful and as initially teasing as it is ultimately satisfying.
--------------------------------------------------------------------------------
