### Home Work - 8
Balaji Avvaru

In [1]:
#import required libraries
import requests
from bs4 import BeautifulSoup
import nltk
from nltk.tokenize import word_tokenize 
import numpy as np
import re
import pandas as pd
import itertools
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import KMeans
from sklearn.metrics import adjusted_rand_score
from nltk.sentiment.vader import SentimentIntensityAnalyzer
import random
random.seed(10)

#### Perform a vocabulary-based sentiment analysis of the movie reviews you used in homework 5 and homework 7, by doing the following: 


1.	In Python, load one of the sentiment vocabularies referenced in the textbook, and run the sentiment analyzer as explained in the corresponding reference. Add words to the sentiment vocabulary, if you think you need to, to better fit your particular text collection.

2.	For each of the clusters you created in homework 7, compute the average, median, high, and low sentiment scores for each cluster. Explain whether you think this reveals anything interesting about the clusters.


3.	For extra credit, analyze sentiment of chunks as follows:

    a.	Take the chunks from homework 5, and in Python, run each chunk individually through your sentiment analyzer that you used in question 1. If the chunk registers a nonneutral sentiment, save it in a tabular format (the chunk, the sentiment score).

    b.	Now sort the table twice, once to show the highest negative-sentiment-scoring chunks at the top and again to show the highest positive-sentiment-scoring chunks at the top. Examine the upper portions of both sorted lists, to identify any trends, and explain what you see. 

Submit all of your inputs and outputs and your code for this assignment, along with a brief written explanation of your findings. 



In [2]:
#IMDB website URL
base_url = "https://www.imdb.com"

# API call to select:100 feature films which are atleast rated 4 with 50,000 votes in thriller genre sorted by rating
url = '''https://www.imdb.com/search/title/?title_type=feature&user_rating=4.0,10.0
&num_votes=50000,&genres=thriller&view=simple&sort=user_rating,desc&count=100'''

# Convert IMDB url to a BeautifulSoup object
response = requests.get(url)
movies_soup = BeautifulSoup(response.text, 'html.parser')

# get movie tags 
movie_tags = movies_soup.find_all('a', attrs={'class': None})

# filter the anchor-tags to get the titles of feature films
movie_tags = [tag.attrs['href'] for tag in movie_tags 
                  if tag.attrs['href'].startswith('/title') & tag.attrs['href'].endswith('/')]

# remove duplicate links
movie_tags = list(dict.fromkeys(movie_tags))

# Print out the number of reviews we have and show the first 5 items
print("There are a total of " + str(len(movie_tags)) + " movie user reviews")
print("Displaying first 5 user reviews links")
movie_tags[:5]

There are a total of 100 movie user reviews
Displaying first 5 user reviews links


['/title/tt0468569/',
 '/title/tt1375666/',
 '/title/tt6751668/',
 '/title/tt0114369/',
 '/title/tt0102926/']

In [3]:
# build out the list of reviews
review_links = [base_url + tag + 'reviews' for tag in movie_tags]

print("There are a total of " + str(len(review_links)) + " movie user reviews")
print("Displaying first 5 user reviews full url")
review_links[:5]

There are a total of 100 movie user reviews
Displaying first 5 user reviews full url


['https://www.imdb.com/title/tt0468569/reviews',
 'https://www.imdb.com/title/tt1375666/reviews',
 'https://www.imdb.com/title/tt6751668/reviews',
 'https://www.imdb.com/title/tt0114369/reviews',
 'https://www.imdb.com/title/tt0102926/reviews']

In [4]:
# get a list of soup objects
movie_soups = []
for link in review_links:
    response = requests.get(link)
    soup = BeautifulSoup(response.text, 'html.parser')
    movie_soups.append(soup)


In [5]:
# get a list movie review soup objects
movie_review_list = []
for movie_soup in movie_soups:
    # get a list of user ratings
    user_review_ratings = [tag.previous_element for tag in 
                           movie_soup.find_all('span', attrs={'class': 'point-scale'})]
    
    # find the index of negative and positive review, least user rating is considered as negative review and highest user rating is considered as positive review
    n_index = list(map(int, user_review_ratings)).index(min(list(map(int, user_review_ratings))))
    p_index = list(map(int, user_review_ratings)).index(max(list(map(int, user_review_ratings))))
    
    # get the review tags
    user_review_list = movie_soup.find_all('a', attrs={'class':'title'})
    
    # get the negative and positive review tags
    n_review_tag = user_review_list[n_index]
    p_review_tag = user_review_list[p_index]
    
    # return the negative and positive review link
    n_review_link = base_url + n_review_tag['href']
    p_review_link = base_url + p_review_tag['href']
    
    movie_review_list.append(n_review_link)
    movie_review_list.append(p_review_link)

movie_review_list[:5]

['https://www.imdb.com/review/rw1945777/',
 'https://www.imdb.com/review/rw1999145/',
 'https://www.imdb.com/review/rw2365579/',
 'https://www.imdb.com/review/rw2879376/',
 'https://www.imdb.com/review/rw5204791/']

In [6]:
# get review text from the review link
review_texts = []
for url in movie_review_list:
    # get the review_url's soup
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # find div tags with class text show-more__control
    tag = soup.find('div', attrs={'class': 'text show-more__control'})
    review_texts.append(tag.getText())

review_texts[:5]

["The first film in the re-imagining of the series was a big hit, but this sequel was a global success, especially with the superb performance by the star of Brokeback Mountain who tragically died from a (prescribed) drugs overdose shortly after filming had finished, from director Christopher Nolan (Memento, Insomnia). Basically a criminal terrorist and mastermind calling himself the Joker (posthumous Oscar, BAFTA and Golden Globe winning Heath Ledger) robs the bank run by the mob, and to take on the Mafia district attorney Harvey Dent (Aaron Eckhart) becomes the new face for justice and hope in Gotham City, with the help of Batman aka Bruce Wayne (Christian Bale) and Lieutenant James 'Jim' Gordon (Gary Oldman). Mob bosses Sal Maroni (Eric Roberts), Gambol (Michael Jai White) and the Chechen (Ritchie Coaster), who have had Chinese accountant Lau (Chin Han) hide their funds, are confronted by the Joker because he wants to kill the, but they all refuse to help, putting a bounty on him. T

In [7]:
# function for KMeans clustering with default k value of 5
def getKMeans(reviews, kVal = 5):
    # build a TFIDFVectorizer with the engligh stop words
    vectorizer = TfidfVectorizer(stop_words='english')
    X = vectorizer.fit_transform(reviews)

    # execute KMeans on the vectorized data
    model = KMeans(n_clusters=kVal, init='k-means++', max_iter=100, n_init=1)
    model.fit(X)

    # print out the top terms per cluster for the user
    print("Top terms per cluster:")
    order_centroids = model.cluster_centers_.argsort()[:, ::-1]
    terms = vectorizer.get_feature_names()
    
    clusters = []
    for i in range(kVal):
        cluster_terms = []
        print("Cluster %d:" % i),
        for ind in order_centroids[i, :10]:
            cluster_terms.append(terms[ind])
            print(' %s' % terms[ind])
        
        clusters.append(cluster_terms)
        print('\n')

    print("\n")
    return clusters

In [8]:
# Execute the K-Means function on the reviews. We'll initially use the default number of clusters which is 5
clusters_k5 = getKMeans(review_texts)

Top terms per cluster:
Cluster 0:
 dicaprio
 nascimento
 movie
 film
 elite
 cole
 blood
 best
 war
 max


Cluster 1:
 10
 best
 film
 cox
 bethany
 movie
 script
 direction
 superb
 story


Cluster 2:
 film
 great
 really
 times
 thought
 like
 movie
 guy
 films
 just


Cluster 3:
 movie
 film
 time
 man
 scene
 like
 just
 grant
 good
 things


Cluster 4:
 film
 movie
 great
 films
 best
 like
 bit
 really
 just
 say






In [9]:
# Execute the K-Means function on the reviews, use a number of clusters equal to 10
clusters_k10 = getKMeans(review_texts,10)

Top terms per cluster:
Cluster 0:
 film
 max
 going
 welles
 haider
 think
 drama
 character
 turing
 movie


Cluster 1:
 hitchcock
 movie
 time
 like
 things
 murder
 characters
 film
 great
 people


Cluster 2:
 film
 movie
 films
 really
 violence
 like
 action
 seen
 just
 good


Cluster 3:
 nick
 amy
 100
 nominated
 best
 jane
 affleck
 blanche
 johnny
 years


Cluster 4:
 movie
 bollywood
 twists
 nascimento
 awesomely
 story
 adrián
 blind
 watch
 quality


Cluster 5:
 10
 film
 best
 cox
 bethany
 superb
 direction
 performances
 brilliant
 great


Cluster 6:
 kurosawa
 samurai
 people
 dollars
 inspired
 town
 westerns
 kinds
 film
 mifune


Cluster 7:
 watergate
 post
 woodward
 investigation
 history
 players
 oliver
 stone
 president
 washington


Cluster 8:
 film
 great
 best
 just
 bit
 movie
 brando
 special
 point
 laughton


Cluster 9:
 grant
 cary
 mortimer
 movie
 christina
 11
 die
 brewster
 arsenic
 devlin






In [10]:
# Execute the K-Means function on the reviews, use a number of clusters equal to 20
clusters_k20 = getKMeans(review_texts,20)

Top terms per cluster:
Cluster 0:
 movie
 bollywood
 awesomely
 twists
 blind
 quality
 watch
 production
 andhadhun
 movies


Cluster 1:
 haider
 turing
 going
 film
 room
 keller
 remarkable
 feel
 life
 really


Cluster 2:
 anime
 animation
 watch
 shell
 amazing
 ghost
 film
 silence
 10
 cyberpunk


Cluster 3:
 cole
 alien
 future
 fi
 sci
 james
 gilliam
 cast
 virus
 1996


Cluster 4:
 nascimento
 elite
 brazilian
 rio
 tropa
 squad
 janeiro
 bope
 fraga
 matias


Cluster 5:
 best
 nominated
 dicaprio
 nick
 amy
 oscar
 kill
 dent
 ripley
 bell


Cluster 6:
 portman
 vendetta
 work
 reno
 besson
 hurt
 didn
 film
 does
 victim


Cluster 7:
 shark
 great
 10
 escape
 theme
 scary
 film
 lorre
 richard
 island


Cluster 8:
 hitchcock
 scene
 film
 psycho
 revenge
 shower
 vertigo
 masterpiece
 rebecca
 creepy


Cluster 9:
 hitchcock
 bergman
 films
 romantic
 persona
 rear
 film
 10
 performance
 strangers


Cluster 10:
 ajay
 kher
 anupam
 given
 wednesday
 pandey
 akshay
 neeraj

1.	In Python, load one of the sentiment vocabularies referenced in the textbook, and run the sentiment analyzer as explained in the corresponding reference. Add words to the sentiment vocabulary, if you think you need to, to better fit your particular text collection.

In [11]:
sid = None
try:
    sid = SentimentIntensityAnalyzer('vader_lexicon')
except:
    nltk.download('vader_lexicon')
    sid = SentimentIntensityAnalyzer()
    

sid.polarity_scores('):{')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\balaj\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


{'neg': 1.0, 'neu': 0.0, 'pos': 0.0, 'compound': -0.5106}

2.	For each of the clusters you created in homework 7, compute the average, median, high, and low sentiment scores for each cluster. Explain whether you think this reveals anything interesting about the clusters.

In [12]:
# get the sentiment scores for each cluster
def getClusterSentimentScores(termsList):
    all_scores = []

    for terms in termsList:
        scores = []

        for term in terms:
            score = sid.polarity_scores(term)
            scores.append(score['compound'])

        all_scores.append(np.array(scores))

    for i, scores in enumerate(all_scores):
        print('Cluster {}: '.format(i), end='')

        mean = scores.mean()
        print('mean: {0:.1}'.format(mean), end=' | ')

        median = np.median(scores)
        print('median: {}'.format(median), end=' | ')

        _max = scores.max()
        print('max: {0:.1}'.format(_max), end=' | ')

        _min = scores.min()
        print('min: {0:.1}'.format(_min))

In [13]:
# get the cluster sentiment scores for the 5 clusters
getClusterSentimentScores(clusters_k5)

Cluster 0: mean: 0.004 | median: 0.0 | max: 0.6 | min: -0.6
Cluster 1: mean: 0.1 | median: 0.0 | max: 0.6 | min: 0e+00
Cluster 2: mean: 0.1 | median: 0.0 | max: 0.6 | min: 0e+00
Cluster 3: mean: 0.1 | median: 0.0 | max: 0.4 | min: 0e+00
Cluster 4: mean: 0.2 | median: 0.0 | max: 0.6 | min: 0e+00


In [14]:
# get the cluster sentiment scores for the 10 clusters
getClusterSentimentScores(clusters_k10)

Cluster 0: mean: 0e+00 | median: 0.0 | max: 0e+00 | min: 0e+00
Cluster 1: mean: 0.03 | median: 0.0 | max: 0.6 | min: -0.7
Cluster 2: mean: 0.02 | median: 0.0 | max: 0.4 | min: -0.6
Cluster 3: mean: 0.06 | median: 0.0 | max: 0.6 | min: 0e+00
Cluster 4: mean: -0.04 | median: 0.0 | max: 0e+00 | min: -0.4
Cluster 5: mean: 0.2 | median: 0.0 | max: 0.6 | min: 0e+00
Cluster 6: mean: 0.05 | median: 0.0 | max: 0.5 | min: 0e+00
Cluster 7: mean: 0e+00 | median: 0.0 | max: 0e+00 | min: 0e+00
Cluster 8: mean: 0.2 | median: 0.0 | max: 0.6 | min: 0e+00
Cluster 9: mean: -0.02 | median: 0.0 | max: 0.4 | min: -0.6


In [15]:
# get the cluster sentiment for the 20 clusters
getClusterSentimentScores(clusters_k20)

Cluster 0: mean: -0.04 | median: 0.0 | max: 0e+00 | min: -0.4
Cluster 1: mean: 0.06 | median: 0.0 | max: 0.6 | min: 0e+00
Cluster 2: mean: 0.03 | median: 0.0 | max: 0.6 | min: -0.3
Cluster 3: mean: 0e+00 | median: 0.0 | max: 0e+00 | min: 0e+00
Cluster 4: mean: 0e+00 | median: 0.0 | max: 0e+00 | min: 0e+00
Cluster 5: mean: -0.005 | median: 0.0 | max: 0.6 | min: -0.7
Cluster 6: mean: -0.08 | median: 0.0 | max: 0e+00 | min: -0.5
Cluster 7: mean: 0.03 | median: 0.0 | max: 0.6 | min: -0.5
Cluster 8: mean: 0.01 | median: 0.0 | max: 0.6 | min: -0.5
Cluster 9: mean: 0.04 | median: 0.0 | max: 0.4 | min: 0e+00
Cluster 10: mean: 0.06 | median: 0.0 | max: 0.6 | min: 0e+00
Cluster 11: mean: 0.07 | median: 0.0 | max: 0.4 | min: 0e+00
Cluster 12: mean: 0.06 | median: 0.0 | max: 0.6 | min: 0e+00
Cluster 13: mean: -0.05 | median: 0.0 | max: 0e+00 | min: -0.5
Cluster 14: mean: 0.08 | median: 0.0 | max: 0.4 | min: 0e+00
Cluster 15: mean: 0.2 | median: 0.0 | max: 0.6 | min: 0e+00
Cluster 16: mean: 0e+00 |

For the five cluster sentiment, we see that each mean is above zero which would imply a positive sentiment in all clusters. This could be due to higher positive terms in those clusters.

For the ten cluster sentiment, we see that the majority of the clusters still maintain a positive sentiment. Cluster 0 and 7 has a mean of zero which would indicate a neutral sentiment. Clusters 4 and 9 have a negative mean which would indicate negative sentiment for these clusters.

For the twenty cluster sentiment, we see that for clusters 3, 4 and 16 the mean is zero which indicates a neutral sentiment. Clusters 0, 5, 6 and 13 have negative means which indicate negative sentiment. The remaining clusters have a positive mean which indicates a positive sentiment.

#### 3.	For extra credit, analyze sentiment of chunks as follows:

a.	Take the chunks from homework 5, and in Python, run each chunk individually through your sentiment analyzer that you used in question 1. If the chunk registers a nonneutral sentiment, save it in a tabular format (the chunk, the sentiment score).


In [16]:
# get movie name from the review link
movie_titles = []
for url in movie_review_list:
    # get the review_url's soup
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # find div tags with class text show-more__control
    tag = soup.find('h1')
    movie_titles.append(list(tag.children)[1].getText())

movie_titles[:5]

['The Dark Knight', 'The Dark Knight', 'Inception', 'Inception', 'Parasite']

In [17]:
# label each review with negative or positive
review_sentiment = np.array(['negative', 'positive'] * (len(movie_review_list)//2))

In [18]:
# construct a dataframe
df = pd.DataFrame({'movie': movie_titles, 'user_review_permalink': movie_review_list,
             'user_review': review_texts, 'sentiment': review_sentiment})

# First five records from the dataframe
df.head()

Unnamed: 0,movie,user_review_permalink,user_review,sentiment
0,The Dark Knight,https://www.imdb.com/review/rw1945777/,The first film in the re-imagining of the seri...,negative
1,The Dark Knight,https://www.imdb.com/review/rw1999145/,Batman (Christian Bale) joins force with Lieut...,positive
2,Inception,https://www.imdb.com/review/rw2365579/,"As I type this, ""Inception"" is sitting at #6 o...",negative
3,Inception,https://www.imdb.com/review/rw2879376/,This is a world where people can go into your ...,positive
4,Parasite,https://www.imdb.com/review/rw5204791/,"I was able to see ""Parasite"" a few days ago at...",negative


In [19]:
grammar = """
    NP:    {<DT><WP><VBP>*<RB>*<VBN><IN><NN>}
           {<NN|NNS|NNP|NNPS><IN>*<NN|NNS|NNP|NNPS>+}
           {<JJ>*<NN|NNS|NNP|NNPS><CC>*<NN|NNS|NNP|NNPS>+}
           {<JJ>*<NN|NNS|NNP|NNPS>+}
    """   

user_review_chunks2 = []
for user_review in review_texts:
    user_review_ch = []
    sentences = nltk.sent_tokenize(user_review)
    sentences = [nltk.word_tokenize(sent) for sent in sentences]
    sentences = [nltk.pos_tag(sent) for sent in sentences]
    for sent in sentences:
        nps = []
        cp = nltk.RegexpParser(grammar)
        tree = cp.parse(sent)
           
        # loop through the trees produced and pull out only the 
        # NP subtrees
        for subtree in tree.subtrees():
            if subtree.label() == 'NP':
                t = subtree
                t = ' '.join(word for word, tag in t.leaves())
                nps.append(t)
        user_review_ch.append(nps)
        
    user_review_chunks2.append(user_review_ch)   

df['user_review_chunks2'] = user_review_chunks2
df.head()

Unnamed: 0,movie,user_review_permalink,user_review,sentiment,user_review_chunks2
0,The Dark Knight,https://www.imdb.com/review/rw1945777/,The first film in the re-imagining of the seri...,negative,"[[first film, re-imagining, series, big hit, s..."
1,The Dark Knight,https://www.imdb.com/review/rw1999145/,Batman (Christian Bale) joins force with Lieut...,positive,"[[Batman, Christian Bale, force with Lieutenan..."
2,Inception,https://www.imdb.com/review/rw2365579/,"As I type this, ""Inception"" is sitting at #6 o...",negative,"[[Inception, IMDb, firm, voting average], [], ..."
3,Inception,https://www.imdb.com/review/rw2879376/,This is a world where people can go into your ...,positive,"[[world, people, dreams], [dreams, actions], [..."
4,Parasite,https://www.imdb.com/review/rw5204791/,"I was able to see ""Parasite"" a few days ago at...",negative,"[[Parasite, few days, Philadelphia Film Festiv..."


In [20]:
# Sentiment score for a sentence
def sentiment_scores(sentence,printScores=True): 
  
    # Create a SentimentIntensityAnalyzer object. 
    sid_obj = SentimentIntensityAnalyzer() 
  
    # polarity_scores method of SentimentIntensityAnalyzer 
    # oject gives a sentiment dictionary. 
    # which contains pos, neg, neu, and compound scores. 
    sentiment_dict = sid_obj.polarity_scores(sentence) 
    
    if (printScores):
        print("Overall sentiment dictionary is : ", sentiment_dict) 
        print("sentence was rated as ", sentiment_dict['neg']*100, "% Negative") 
        print("sentence was rated as ", sentiment_dict['neu']*100, "% Neutral") 
        print("sentence was rated as ", sentiment_dict['pos']*100, "% Positive") 
  
        print("Sentence Overall Rated As", end = " ") 
  
        # decide sentiment as positive, negative and neutral 
        if sentiment_dict['compound'] >= 0.05 : 
            print("Positive") 
  
        elif sentiment_dict['compound'] <= - 0.05 : 
            print("Negative") 
  
        else : 
            print("Neutral") 

    return sentiment_dict['compound']

In [21]:
# get the sentiment score for each chunk
def getChunkedSentimentScores(reviewChunk,printScores=False):

    termList = []
    scoreList = []

    for terms in reviewChunk:
        for term in terms:
            termStr = ' '.join(term)
            sent_score = sentiment_scores(termStr,printScores)

            if sent_score >= 0.05 : 
                termList.append(termStr)
                scoreList.append(sent_score)
            elif sent_score <= - 0.05 : 
                termList.append(termStr)
                scoreList.append(sent_score)

    return termList, scoreList

In [22]:
# get the chunked sentiment analysis results. Get the score and the chunk
termList, scoreList = getChunkedSentimentScores(df['user_review_chunks2'].tolist())

# build out a data frame 
df_sent = pd.DataFrame({'TermsStr': termList, 'Sentiment_Score': scoreList})

# first five records from dataframe
df_sent.head()

Unnamed: 0,TermsStr,Sentiment_Score
0,first film re-imagining series big hit sequel ...,0.8316
1,criminal terrorist and mastermind Joker posthu...,0.1027
2,Mob Sal Maroni Eric Roberts Gambol Michael Jai...,0.128
3,Joker Gambol gang control Batman new technolog...,0.4939
4,Joker plan into action city Dark Knight true i...,0.5106


b.	Now sort the table twice, once to show the highest negative-sentiment-scoring chunks at the top and again to show the highest positive-sentiment-scoring chunks at the top. Examine the upper portions of both sorted lists, to identify any trends, and explain what you see. 

In [23]:
# Sort the data frame on sentiment score, ascending so we get the negative reviews at the top
df_sent.sort_values(by=['Sentiment_Score'],ascending=True,inplace=True)

# Display the first 10 records with negative sentiment
df_sent.head(10)

Unnamed: 0,TermsStr,Sentiment_Score
1009,Sierra Leona story of Danny Archer Leonardo Di...,-0.9887
62,serial killer criminal spree FBI chief Scott G...,-0.9678
152,people mental health stigma need of help treat...,-0.9643
685,murder mystery character frustration clear pat...,-0.9612
512,suffocating Film Noir shadow title Bhardwaj & ...,-0.9517
892,chilling suspenseful story timid young woman m...,-0.9517
63,cruel killer Buffalo Bill Ted Levine victims H...,-0.9403
567,Flashbacks Nick and Amy marriage apart time Ne...,-0.9337
906,story team fictional secret agents Baby task f...,-0.9325
259,theme in Alfred Hitchcock movies chaos innocen...,-0.9325


In [24]:
# Sort the data frame on sentiment score, descending so we get the positive reviews at the top
df_sent.sort_values(by=['Sentiment_Score'],ascending=False,inplace=True)

# Display the first 10 records with positive sentient
df_sent.head(10)

Unnamed: 0,TermsStr,Sentiment_Score
15,Oscar for Best Sound Editing Best Art Directio...,0.997
651,Oscars for Best Motion Picture Year Best Writi...,0.9957
239,Oscars for Best Sound Effects Editing Best Vis...,0.9935
1228,Oscars for Best Film Editing Best Song Tex Rit...,0.9913
280,Sir Alfred Hitchcock number Greatest Pop Cultu...,0.9803
1229,Grace Kelly number Greatest Movie Stars number...,0.9729
481,case with Rope film lot good reviews great rev...,0.9595
400,Whether Kim Basinger Oscar win debate worthy w...,0.9595
100,Departed Oscars for Best Adapted Screenplay Be...,0.9571
19,Dark Knight great surprise blockbuster deep st...,0.9509


Looking that the top ten items from the most negative and most positive sentiments we see trends in the individual listings. For example, in the negative sentiments we see a lot of negative words such as killer, murder and frustration. These terms all elicit a negative emotion which make sense. 

Looking at the positive list, We see a lot of positive words such as best, great, and good. These listing of positive sentiments doesn't seem to have a lot of focus on a particular genere of movie or a particular movie. The sentiment seems to be more spread out for the positive sentiment items.