# In this notebook:
1. How to use ``WordNet``
2. Path similarity and Lin similarity
3. How to use ``SentiWordNet``
4. How to use ``VADER``

## 1. Using WordNet (NLTK)

Check the following websites if you want to know more ``WordNet``

Tutorial on Youtube
- https://www.youtube.com/watch?v=T68P5-8tM-Y

A step-by-step walk-through analysis
- https://nlpforhackers.io/sentiment-analysis-intro/

``wordnet.synsets`` documentation
- http://www.nltk.org/api/nltk.corpus.reader.html?highlight=wordnet#nltk.corpus.reader.wordnet.Lemma.synset

First, let's import the libraries:

In [1]:
# first time running, download the 'sentiwordnet' package by running the following comment
# nltk.download('sentiwordnet')
import nltk
from nltk.corpus import wordnet as wn
from nltk.stem import WordNetLemmatizer
from nltk.corpus import sentiwordnet as swn

First, let's see some basic operations using wordnet

In [2]:
# list of a wordnet synset for word 'good'
syns_wn = wn.synsets('good')
print(syns_wn)
# only the word of the 0-th element
print('\n The 0-th element:',syns_wn[0].lemmas()[0].name()) 

[Synset('good.n.01'), Synset('good.n.02'), Synset('good.n.03'), Synset('commodity.n.01'), Synset('good.a.01'), Synset('full.s.06'), Synset('good.a.03'), Synset('estimable.s.02'), Synset('beneficial.s.01'), Synset('good.s.06'), Synset('good.s.07'), Synset('adept.s.01'), Synset('good.s.09'), Synset('dear.s.02'), Synset('dependable.s.04'), Synset('good.s.12'), Synset('good.s.13'), Synset('effective.s.04'), Synset('good.s.15'), Synset('good.s.16'), Synset('good.s.17'), Synset('good.s.18'), Synset('good.s.19'), Synset('good.s.20'), Synset('good.s.21'), Synset('well.r.01'), Synset('thoroughly.r.02')]

 The 0-th element: good


In [3]:
# Check word's definition
print(syns_wn[0].definition())
# Sentence examples
print(syns_wn[0].examples())

benefit
['for your own good', "what's the good of worrying?"]


Some relations are defined by WordNet only over Lemmas, see https://www.nltk.org/howto/wordnet.html

In [4]:
wn.synset('nice.a.01').lemmas()[0].antonyms()

[Lemma('nasty.a.01.nasty')]

In [5]:
# or another way
wn.lemmas('nice')[1].antonyms()

[Lemma('nasty.a.01.nasty')]

In [6]:
# set of synonyms of the word 'nice'
synonym = []
for w in wn.synsets('nice'):
    for l in w.lemmas():
        synonym.append(l.name())
    
set(synonym)

{'Nice',
 'courteous',
 'dainty',
 'decent',
 'gracious',
 'nice',
 'overnice',
 'prissy',
 'skillful',
 'squeamish'}

In [7]:
# comparing words' similarity
w1 = wn.synset('phone.n.01')
w2 = wn.synset('computer.n.01')
print(w1.wup_similarity(w2))
# or print(wn.wup_similarity(w1, w2))

w1 = wn.synset('bad.a.01')
w2 = wn.synset('regretful.a.01')
print(w1.wup_similarity(w2))


w1 = wn.synset('have.v.01')
w2 = wn.synset('own.v.01')
print(w1.wup_similarity(w2))

0.6666666666666666
0.5
0.5


Note that the adjectives in ``WordNet`` are not arranged in a hierarchy, so shortest path will not work with adjectives. 
You might get 'None' when comparing two adjectives. The same for adverbs.
See http://wn-similarity.sourceforge.net/ for more details.

## 2. Text similarity

### Path similarity

In [8]:
w1 = wn.synset('dog.n.01')
w2 = wn.synset('corgi.n.01')
print(wn.path_similarity(w1, w2))

0.5


### Lin similarity

In [9]:
# first we need to create an information content dictionary from a corpus 
from nltk.corpus import wordnet_ic
brown_ic = wordnet_ic.ic('ic-brown.dat') #use brown corpus

w1 = wn.synset('dog.n.01')
w2 = wn.synset('cat.n.01')
print(wn.lin_similarity(w1,w2, brown_ic))

0.8768009843733973


In [10]:
from nltk.corpus import genesis
genesis_ic = wn.ic(genesis, False, 0.0)
print(wn.lin_similarity(w1,w2, genesis_ic))

0.8043806652422293


<br/>


## 3. Using SentiWordNet (NLTK)

One of the most straightforward approaches is to use SentiWordnet to compute the polarity of the words and average that value.  Now let's see how to use ``SentiWordnet``.

In [11]:
# recall the WordNet synsets for the word 'good'
print(wn.synsets('good'))

[Synset('good.n.01'), Synset('good.n.02'), Synset('good.n.03'), Synset('commodity.n.01'), Synset('good.a.01'), Synset('full.s.06'), Synset('good.a.03'), Synset('estimable.s.02'), Synset('beneficial.s.01'), Synset('good.s.06'), Synset('good.s.07'), Synset('adept.s.01'), Synset('good.s.09'), Synset('dear.s.02'), Synset('dependable.s.04'), Synset('good.s.12'), Synset('good.s.13'), Synset('effective.s.04'), Synset('good.s.15'), Synset('good.s.16'), Synset('good.s.17'), Synset('good.s.18'), Synset('good.s.19'), Synset('good.s.20'), Synset('good.s.21'), Synset('well.r.01'), Synset('thoroughly.r.02')]


Let's now check the synonym set using ``SentiWordnet``.

In [13]:
# list of a sentiwordnet synset of the word 'good'
nltk.download('sentiwordnet')
syns_swn = swn.senti_synsets('good')
list(syns_swn)

[nltk_data] Downloading package sentiwordnet to
[nltk_data]     /home/jovyan/nltk_data...
[nltk_data]   Unzipping corpora/sentiwordnet.zip.


[SentiSynset('good.n.01'),
 SentiSynset('good.n.02'),
 SentiSynset('good.n.03'),
 SentiSynset('commodity.n.01'),
 SentiSynset('good.a.01'),
 SentiSynset('full.s.06'),
 SentiSynset('good.a.03'),
 SentiSynset('estimable.s.02'),
 SentiSynset('beneficial.s.01'),
 SentiSynset('good.s.06'),
 SentiSynset('good.s.07'),
 SentiSynset('adept.s.01'),
 SentiSynset('good.s.09'),
 SentiSynset('dear.s.02'),
 SentiSynset('dependable.s.04'),
 SentiSynset('good.s.12'),
 SentiSynset('good.s.13'),
 SentiSynset('effective.s.04'),
 SentiSynset('good.s.15'),
 SentiSynset('good.s.16'),
 SentiSynset('good.s.17'),
 SentiSynset('good.s.18'),
 SentiSynset('good.s.19'),
 SentiSynset('good.s.20'),
 SentiSynset('good.s.21'),
 SentiSynset('well.r.01'),
 SentiSynset('thoroughly.r.02')]

In [14]:
synset_filter = swn.senti_synsets('good','a') # you may specify variable type, as adjective
synset_list = list(synset_filter)
synset_list[:5] # check the first 5 elements

[SentiSynset('good.a.01'),
 SentiSynset('full.s.06'),
 SentiSynset('good.a.03'),
 SentiSynset('estimable.s.02'),
 SentiSynset('beneficial.s.01')]

SentiWordNet assigns to each synset of WordNet three
sentiment scores: positivity, negativity, and objectivity.

In [15]:
# positivity score
swn.senti_synset('good.a.01').pos_score()

0.75

In [16]:
# negativity score
swn.senti_synset('good.a.01').neg_score()

0.0

In [17]:
# objectivity score
swn.senti_synset('good.a.01').obj_score()

0.25

Useful characteristics of a synset

In [18]:
good_syns = swn.senti_synsets('good')

for good_sentisyn in good_syns:
    good_synset = good_sentisyn.synset
    print("\n", good_synset.definition())
    print(good_synset.lemmas())
    print("Pos: "+str(good_sentisyn.pos_score())+" neg: "+str(good_sentisyn.neg_score()))


 benefit
[Lemma('good.n.01.good')]
Pos: 0.5 neg: 0.0

 moral excellence or admirableness
[Lemma('good.n.02.good'), Lemma('good.n.02.goodness')]
Pos: 0.875 neg: 0.0

 that which is pleasing or valuable or useful
[Lemma('good.n.03.good'), Lemma('good.n.03.goodness')]
Pos: 0.625 neg: 0.0

 articles of commerce
[Lemma('commodity.n.01.commodity'), Lemma('commodity.n.01.trade_good'), Lemma('commodity.n.01.good')]
Pos: 0.0 neg: 0.0

 having desirable or positive qualities especially those suitable for a thing specified
[Lemma('good.a.01.good')]
Pos: 0.75 neg: 0.0

 having the normally expected amount
[Lemma('full.s.06.full'), Lemma('full.s.06.good')]
Pos: 0.0 neg: 0.0

 morally admirable
[Lemma('good.a.03.good')]
Pos: 1.0 neg: 0.0

 deserving of esteem and respect
[Lemma('estimable.s.02.estimable'), Lemma('estimable.s.02.good'), Lemma('estimable.s.02.honorable'), Lemma('estimable.s.02.respectable')]
Pos: 1.0 neg: 0.0

 promoting or enhancing well-being
[Lemma('beneficial.s.01.beneficial'), L

<br/>

### 4. Using Valence Aware Dictionary and sEntiment Reasoner (VADER)

Another function for calculating Polarity, even in sentence level

-``SentimentIntensityAnalyzer()`` Documentation
http://www.nltk.org/howto/sentiment.html

-``VADER`` demostration
https://towardsdatascience.com/sentimental-analysis-using-vader-a3415fef7664

Note:
The score ``compound`` is computed by normalizing the scores ``negative``, ``neutral`` and ``positive`` obtained from VADER’s ``SentimentIntensityAnalyzer()``.

More detailes see https://towardsdatascience.com/sentimental-analysis-using-vader-a3415fef7664

In [19]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer as SIA
nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /home/jovyan/nltk_data...


True

In [20]:
sentences=["Delivery was terrible."
           , "Media and web analytics is boring."
           , "M&WA is the best course in the world!"
           , "M&WA is the best course in the world"]

# Create a sentiment intensity analyzer object:
sid = SIA()

# Loop the sentences
for sentence in sentences:
    ss = sid.polarity_scores(sentence) 
    print(ss)
    if ss['compound'] < 0:
        print(sentence, ' is overall negative ', ss['compound'])
    elif ss['compound'] == 0:
        print(sentence, ' is overall neutral')
    else:
        print(sentence, ' is overall positive ', ss['compound'])

{'neg': 0.608, 'neu': 0.392, 'pos': 0.0, 'compound': -0.4767}
Delivery was terrible.  is overall negative  -0.4767
{'neg': 0.315, 'neu': 0.685, 'pos': 0.0, 'compound': -0.3182}
Media and web analytics is boring.  is overall negative  -0.3182
{'neg': 0.0, 'neu': 0.609, 'pos': 0.391, 'compound': 0.6696}
M&WA is the best course in the world!  is overall positive  0.6696
{'neg': 0.0, 'neu': 0.625, 'pos': 0.375, 'compound': 0.6369}
M&WA is the best course in the world  is overall positive  0.6369


Some more examples:

In [21]:
sentences=["That's cool."
           , "That's quite cool."
           , "That's cool :)"
           , "That's super cool :)"]

sid = SIA()

# Loop the sentences
for sentence in sentences:
    ss = sid.polarity_scores(sentence) 
    if ss['compound'] < 0:
        print(sentence, ' is overall negative ', ss['compound'])
    elif ss['compound'] == 0:
        print(sentence, ' is overall neutral')
    else:
        print(sentence, ' is overall positive ', ss['compound'])

That's cool.  is overall positive  0.3182
That's quite cool.  is overall positive  0.3804
That's cool :)  is overall positive  0.6486
That's super cool :)  is overall positive  0.8481
