## Exercise

Use NLTK to calculate the sentiment of the following sentences using the average of SentiWordNet, and using VADER.

Load the libraries:

In [1]:
import nltk
from nltk.corpus import wordnet as wn
from nltk.stem import WordNetLemmatizer
from nltk.corpus import sentiwordnet as swn

In [2]:
sen1 = "The Harry Potter Series are terribly written and disappointing :(."
sen2 = "The Harry Potter Series are brilliant, well written and funny."

### SentiWordNet

Write a function to classify the sentence by using a lemmatiser and tokenizer and SentiWordNet. Return a pair of the overall positive, and the overall negative score:

In [3]:
# Hint 1: Note that the function sentiwordnet needs Synset string
sent = swn.senti_synset('nice.a.01')
sent.pos_score()

0.875

In [4]:
print(wn.synsets('cat'))
print(wn.synsets('Cats'))

[Synset('cat.n.01'), Synset('guy.n.01'), Synset('cat.n.03'), Synset('kat.n.01'), Synset('cat-o'-nine-tails.n.01'), Synset('caterpillar.n.02'), Synset('big_cat.n.01'), Synset('computerized_tomography.n.01'), Synset('cat.v.01'), Synset('vomit.v.01')]
[Synset('cat.n.01'), Synset('guy.n.01'), Synset('cat.n.03'), Synset('kat.n.01'), Synset('cat-o'-nine-tails.n.01'), Synset('caterpillar.n.02'), Synset('big_cat.n.01'), Synset('computerized_tomography.n.01'), Synset('cat.v.01'), Synset('vomit.v.01')]


In [5]:
# Hint 2  with out lemmatizer seems ok, but just to be safe, use wn_lem.lemmatize(token)
wn_lem = WordNetLemmatizer()
wn_lem.lemmatize('Token')

'Token'

In [6]:
def classifySentence(sen):
    wn_lem = WordNetLemmatizer()
    pos = 0
    neg = 0
    for token in nltk.word_tokenize(sen):
        lemma = wn_lem.lemmatize(token)
        if len(wn.synsets(lemma))>0:
            synset = wn.synsets(lemma)[0]
            sent = swn.senti_synset(synset.name())
            print("Sentiment of "+token+" "+str(sent))
            pos = pos + sent.pos_score()
            neg = neg + sent.neg_score()
    return (pos, neg)

Check your result:

In [7]:
pos, neg = classifySentence(sen1)
print("Sen 1: is pos: "+str(pos)+" neg: "+str(neg))

Sentiment of Harry <harass.v.01: PosScore=0.0 NegScore=0.125>
Sentiment of Potter <potter.n.01: PosScore=0.0 NegScore=0.0>
Sentiment of Series <series.n.01: PosScore=0.0 NegScore=0.0>
Sentiment of are <are.n.01: PosScore=0.0 NegScore=0.0>
Sentiment of terribly <terribly.r.01: PosScore=0.25 NegScore=0.0>
Sentiment of written <write.v.01: PosScore=0.0 NegScore=0.0>
Sentiment of disappointing <disappoint.v.01: PosScore=0.0 NegScore=0.25>
Sen 1: is pos: 0.25 neg: 0.375


In [8]:
pos, neg = classifySentence(sen2)
print("Sen 2: is pos: "+str(pos)+" neg: "+str(neg))

Sentiment of Harry <harass.v.01: PosScore=0.0 NegScore=0.125>
Sentiment of Potter <potter.n.01: PosScore=0.0 NegScore=0.0>
Sentiment of Series <series.n.01: PosScore=0.0 NegScore=0.0>
Sentiment of are <are.n.01: PosScore=0.0 NegScore=0.0>
Sentiment of brilliant <brilliant.s.01: PosScore=0.875 NegScore=0.0>
Sentiment of well <well.n.01: PosScore=0.0 NegScore=0.0>
Sentiment of written <write.v.01: PosScore=0.0 NegScore=0.0>
Sentiment of funny <funny_story.n.01: PosScore=0.0 NegScore=0.0>
Sen 2: is pos: 0.875 neg: 0.125


### VADER

Now, do the same using VADER (also return the compound score, computed by normalizing the scores above):

In [9]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer as SIA
sid = SIA()

def classifySentenceVADER(sen):
    ss = sid.polarity_scores(sen) 
    print(ss)
    return (ss['pos'], ss['neg'], ss['compound'])

In [10]:
pos, neg, comp = classifySentenceVADER(sen1)
print("Sen 1: is pos: "+str(pos)+" neg: "+str(neg), " overall: ", str(comp))

{'neg': 0.459, 'neu': 0.541, 'pos': 0.0, 'compound': -0.7783}
Sen 1: is pos: 0.0 neg: 0.459  overall:  -0.7783


In [11]:
pos, neg, comp = classifySentenceVADER(sen2)
print("Sen 2: is pos: "+str(pos)+" neg: "+str(neg), " overall: ", str(comp))

{'neg': 0.0, 'neu': 0.443, 'pos': 0.557, 'compound': 0.8316}
Sen 2: is pos: 0.557 neg: 0.0  overall:  0.8316


In [12]:
#or a more direct way
sid.polarity_scores(sen1) 

{'neg': 0.459, 'neu': 0.541, 'pos': 0.0, 'compound': -0.7783}