<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Sentiment Analysis With SpaCy and VADER

# What is Sentiment Analysis?
#  
#  
#  



## SpaCy and Part of Speech (PoS)

---


In [None]:
# !pip install spacy

In [3]:
# !python -m spacy download en

In [4]:
import spacy
en_nlp = spacy.load('en')

**Parse a single quote.**

In [8]:
sentence = u"this is a very nice sentence about football and food"
sentence_parsed = en_nlp(sentence)

In [9]:
len(sentence_parsed) # number of words!

10

In [10]:
sentence_parsed[0]

this

In [11]:
type(sentence_parsed[0])

spacy.tokens.token.Token

In [12]:
sentence_parsed.sentiment

0.0

In [13]:
for token in sentence_parsed:
    print(token, token.pos_)

this DET
is VERB
a DET
very ADV
nice ADJ
sentence NOUN
about ADP
football NOUN
and CCONJ
food NOUN


In [14]:
pos_counts = {}
for token in sentence_parsed:
    pos = token.pos_
    pos_counts[pos] = pos_counts.get(pos,0) + 1   
pos_counts

{'DET': 2, 'VERB': 1, 'ADV': 1, 'ADJ': 1, 'NOUN': 3, 'ADP': 1, 'CCONJ': 1}

In [15]:
pos_perc = {}
for k,v in pos_counts.items():
    pos_perc [k] = 1.*v/len(sentence_parsed) 
pos_perc

{'DET': 0.2,
 'VERB': 0.1,
 'ADV': 0.1,
 'ADJ': 0.1,
 'NOUN': 0.3,
 'ADP': 0.1,
 'CCONJ': 0.1}

#### Those are new features you can use!

#  
#  
#  
## Sentiment analysis

In [16]:
import pandas as pd

sen = pd.read_csv('datasets/sentiment_words_simple.csv')
sen['pos'] = sen['pos'].str.upper()

sen.sample(10)

Unnamed: 0,pos,word,pos_score,neg_score
154020,VERB,take_after,0.1875,0.0625
105183,NOUN,postmodernism,0.0,0.0
73031,NOUN,herniated_disc,0.0,0.0
49492,NOUN,demonism,0.0,0.0
77772,NOUN,interpol,0.0,0.0
102140,NOUN,phospholipid,0.0,0.0
30715,NOUN,basinful,0.0,0.0
134917,NOUN,vocaliser,0.0,0.0
95153,NOUN,nonmalignant_tumor,0.0,0.125
3866,ADJ,circumstantial,0.125,0.0


In [17]:
# let's define positive-negative
sen['pos_vs_neg'] = sen['pos_score'] - sen['neg_score']

In [18]:
# example 1
sen[(sen['word']=='sentence') & (sen['pos']=='NOUN')]

Unnamed: 0,pos,word,pos_score,neg_score,pos_vs_neg
116721,NOUN,sentence,0.0,0.0,0.0


### We can get a score for each word and average the results

In [23]:
import numpy as np

sentiments = []
for token in sentence_parsed:
    score = sen[(sen['word']==str(token)) & (sen['pos']==str(token.pos_))]['pos_vs_neg'].values
    if len(score)>0:
        print(token, token.pos_, score[0])
        sentiments.append(score[0])
print('Average sentiment: {}'.format(np.mean(sentiments)))

very ADV 0.125
nice ADJ 0.5750000000000001
sentence NOUN 0.0
football NOUN 0.0
food NOUN -0.0416666666667
Average sentiment: 0.13166666666666


<a id='print-most-obj'></a>
#  
#  
#  
## Objective and Subjective
---

Objective = 1 - (positive+negative)  

"terrible":
    * positve = 0.0
    * negative = 0.8
    * objective = 0.2
    
"very":
    * positve = 0.7
    * negative = 0.0
    * objective = 0.3
    
"room":
    * positve = 0.02
    * negative = 0.03
    * objective = 0.95


#  
#  
#  

## Sentiment Scores with VADER Library
---

In [25]:
#!pip install vaderSentiment

Collecting vaderSentiment
[?25l  Downloading https://files.pythonhosted.org/packages/86/9e/c53e1fc61aac5ee490a6ac5e21b1ac04e55a7c2aba647bb8411c9aadf24e/vaderSentiment-3.2.1-py2.py3-none-any.whl (125kB)
[K    100% |████████████████████████████████| 133kB 823kB/s ta 0:00:01
[?25hInstalling collected packages: vaderSentiment
Successfully installed vaderSentiment-3.2.1


In [26]:
# Pip install vaderSentiment.

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

In [27]:
sentences = ['Hawthorne is by turn outrageous and pathetic and imperious and poignant and very funny.',
            'Delivers guilt-free escapism about pretty people having wicked-hot fun in pretty places.',
            'Brian De Palma take on Tom Wolfe The Bonfire of the Vanities is a misfire of inanities.',
            'I hated this movie. Hated hated hated hated hated this movie. Hated it.']

In [29]:
analyzer = SentimentIntensityAnalyzer()
for sentence in sentences:
    vs = analyzer.polarity_scores(sentence)
    print(sentence)
    print(vs)
    print('')

Hawthorne is by turn outrageous and pathetic and imperious and poignant and very funny.
{'neg': 0.321, 'neu': 0.526, 'pos': 0.153, 'compound': -0.5434}

Delivers guilt-free escapism about pretty people having wicked-hot fun in pretty places.
{'neg': 0.0, 'neu': 0.481, 'pos': 0.519, 'compound': 0.8658}

Brian De Palma take on Tom Wolfe The Bonfire of the Vanities is a misfire of inanities.
{'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}

I hated this movie. Hated hated hated hated hated this movie. Hated it.
{'neg': 0.855, 'neu': 0.145, 'pos': 0.0, 'compound': -0.9854}

