In [1]:
import pandas as pd

### Sentiment Prediction Strategy 1: Word based scoring 
AFINN is a list of English words rated for valence with an integer
between minus five (negative) and plus five (positive). The words have
been manually labeled by Finn Årup Nielsen in 2009-2011. The file
is tab-separated. There are two versions:

AFINN-111: Newest version with 2477 words and phrases.

AFINN-96: 1468 unique words and phrases on 1480 lines. Note that there
are 1480 lines, as some words are listed twice. The word list in not
entirely in alphabetic ordering.  

An evaluation of the word list is available in:

Finn Årup Nielsen, "A new ANEW: Evaluation of a word list for
sentiment analysis in microblogs", http://arxiv.org/abs/1103.2903

Valence, as used in psychology, especially in discussing emotions, means the intrinsic attractiveness/"good"-ness (positive valence) or averseness/"bad"-ness (negative valence) of an event, object, or situation. ... For example, emotions popularly referred to as "negative", such as anger and fear, have negative valence.
    - source Wikipedia

In [2]:
afinn = pd.read_csv("AFINN-111.txt", 
                    sep="\t", header=None)
# It is a corpus which is used for lookups

In [3]:
afinn.head()

Unnamed: 0,0,1
0,abandon,-2
1,abandoned,-2
2,abandons,-2
3,abducted,-2
4,abduction,-2


In [4]:
afinn.tail()

Unnamed: 0,0,1
2472,yucky,-2
2473,yummy,3
2474,zealot,-2
2475,zealots,-2
2476,zealous,2


In [5]:
afinn.columns = ['Term','Score']

In [6]:
afinn.Score.value_counts()

-2    966
 2    448
-1    309
-3    264
 1    208
 3    172
 4     45
-4     43
-5     16
 5      5
 0      1
Name: Score, dtype: int64

In [7]:
afinn.head()

Unnamed: 0,Term,Score
0,abandon,-2
1,abandoned,-2
2,abandons,-2
3,abducted,-2
4,abduction,-2


In [8]:
term_scores = dict(afinn.values)

In [9]:
list(term_scores.keys())[:10]

['abandon',
 'abandoned',
 'abandons',
 'abducted',
 'abduction',
 'abductions',
 'abhor',
 'abhorred',
 'abhorrent',
 'abhors']

In [10]:
term_scores['abandon']

-2

In [47]:
term_scores['marvellous']
# If the word is not present then it will give error

KeyError: 'marvellous'

In [48]:
term_scores.get('marvellous',0)

0

In [49]:
term_scores.get('good',0)

3

In [50]:
term_scores.get('sweet',0)

2

In [51]:
term_scores.get('terrible',0)

-3

In [52]:
txt = "nlp is amazing"

In [53]:
from nltk.tokenize import word_tokenize

In [54]:
word_tokenize(txt)

['nlp', 'is', 'amazing']

In [55]:
term_scores.get('nlp',0)

0

In [20]:
term_scores.get('is',0)

0

In [21]:
term_scores.get('amazing',0)

4

In [22]:
score = 0
for term in word_tokenize(txt):
    score += term_scores.get(term,0) # a = a+b or a+ = b
print(score)

4


In [23]:
def get_sentiment(sent):
    tokens = word_tokenize(sent.lower())
    score = 0
    for term in tokens:
        score += term_scores.get(term,0)
    return score

In [24]:
get_sentiment('nlp is amazing')

4

In [25]:
get_sentiment('This car is amazing with a terrible experience')

1

### Sentiment Prediction Strategy 2: vader

**VADER (*V*alence *A*ware *D*ictionary and s*E*ntiment *R*easoner)**

is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.

In [26]:
import nltk
nltk.download('vader_lexicon')
# It is a built in corpus which is used to do the lookup

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     C:\Users\S.Joshi\AppData\Roaming\nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

In [27]:
from nltk.sentiment.vader import SentimentIntensityAnalyzer

In [29]:
analyser = SentimentIntensityAnalyzer()
analyser

<nltk.sentiment.vader.SentimentIntensityAnalyzer at 0x267671ecdd8>

In [31]:
analyser.polarity_scores("the food is great")
# neg- negative
# neu- neutral 42.3% neutral
# pos- positive 57.7% positive
# compound
# These are all percentages

{'neg': 0.0, 'neu': 0.423, 'pos': 0.577, 'compound': 0.6249}

Compound scores are the final sentiment scores from vader in the range +1 to -1. These are normalized sentiment scores
- Any score close to +1 means it has more positive sentiment in the sentence
- Any score close to -1 means it has more negative sentiment in the sentence
- Any Score close to 0 means the sentence has more neutral sentiment
The outcome of polarity_scores will also tell you the % of negitivity, neutrality and positivity of a sentence

In [32]:
analyser.polarity_scores("the food is terrible")

{'neg': 0.508, 'neu': 0.492, 'pos': 0.0, 'compound': -0.4767}

In [33]:
analyser.polarity_scores("the food is good")

{'neg': 0.0, 'neu': 0.508, 'pos': 0.492, 'compound': 0.4404}

In [34]:
analyser.polarity_scores("the food is good!")

{'neg': 0.0, 'neu': 0.484, 'pos': 0.516, 'compound': 0.4926}

In [35]:
analyser.polarity_scores("the food is GOOD!")

{'neg': 0.0, 'neu': 0.433, 'pos': 0.567, 'compound': 0.6027}

In [36]:
analyser.polarity_scores("heard the news")

{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}

In [37]:
analyser.polarity_scores("heard the news smh")

{'neg': 0.434, 'neu': 0.566, 'pos': 0.0, 'compound': -0.3182}

In [39]:
analyser.polarity_scores("the food is good")

{'neg': 0.0, 'neu': 0.508, 'pos': 0.492, 'compound': 0.4404}

In [40]:
analyser.polarity_scores("the food is good :)")

{'neg': 0.0, 'neu': 0.337, 'pos': 0.663, 'compound': 0.7096}

In [41]:
# Vader does not understand sarcasm

In [42]:
analyser.polarity_scores("I love tea. I hate coffee")

{'neg': 0.374, 'neu': 0.202, 'pos': 0.424, 'compound': 0.128}

In [43]:
#Fetching the compound scores alone
analyser.polarity_scores("I love tea. I hate coffee")['compound']

0.128

In [44]:
def get_vader_sentiment(sent):
    return analyser.polarity_scores(sent)['compound']

In [45]:
get_vader_sentiment("this is HORRIBLE")

-0.6408

In [46]:
get_vader_sentiment("the food is good :)")

0.7096

In [21]:
?analyser.polarity_scores