__Survey of various snetiment analysis tools__

__Tool:__ VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media.  

In [59]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer 

In [60]:
def sentiment(sentence):
    analyzer = SentimentIntensityAnalyzer()
    scores = analyzer.polarity_scores(sentence)
    if (scores['compound'] >= 0.05):
        compound='positive'
    elif (scores['compound'] <= -0.05):
        compound = 'negative'
    else:
        compound = 'neutral'
    print('Overall sentiment: {}  scores: {}'.format(compound, scores))

In [61]:
sentiment('I am very happy')

Overall sentiment: positive  scores: {'neg': 0.0, 'neu': 0.429, 'pos': 0.571, 'compound': 0.6115}


In [62]:
sentiment('I am very sad')

Overall sentiment: negative  scores: {'neg': 0.531, 'neu': 0.469, 'pos': 0.0, 'compound': -0.5256}


In [63]:
sentiment('My boss thinks I am lazy')

Overall sentiment: negative  scores: {'neg': 0.333, 'neu': 0.667, 'pos': 0.0, 'compound': -0.3612}


In [64]:
sentiment('Dogs are fun')

Overall sentiment: positive  scores: {'neg': 0.0, 'neu': 0.377, 'pos': 0.623, 'compound': 0.5106}


In [65]:
sentiment('Mt fovrite color is green')

Overall sentiment: neutral  scores: {'neg': 0.0, 'neu': 1.0, 'pos': 0.0, 'compound': 0.0}


__Tool:__ SpaCy library

This comes direct from the docs: https://spacy.io/ and this blog: https://spacy.io/usage/rule-based-matching

In [66]:
import spacy
nlp = spacy.load("en_core_web_sm")

In [67]:
text = ("When Sebastian Thrun started working on self-driving cars at "
        "Google in 2007, few people outside of the company took him "
        "seriously.")
doc = nlp(text)

In [68]:
nlp.pipe_names

['tok2vec', 'tagger', 'parser', 'ner', 'attribute_ruler', 'lemmatizer']

In [69]:
for token in doc:
    # Print the token and its part-of-speech tag
    print(token.text, "-->", token.pos_)

When --> ADV
Sebastian --> PROPN
Thrun --> PROPN
started --> VERB
working --> VERB
on --> ADP
self --> NOUN
- --> PUNCT
driving --> VERB
cars --> NOUN
at --> ADP
Google --> PROPN
in --> ADP
2007 --> NUM
, --> PUNCT
few --> ADJ
people --> NOUN
outside --> ADP
of --> ADP
the --> DET
company --> NOUN
took --> VERB
him --> PRON
seriously --> ADV
. --> PUNCT


In [70]:
# Find named entities, phrases and concepts
for entity in doc.ents:
    print(entity.text, entity.label_)

Sebastian Thrun PERSON
2007 DATE


In [71]:
spacy.explain('PROPN')

'proper noun'

In [72]:
# dependency parsing
for token in doc:
    print(token.text, "-->", token.dep_)

When --> advmod
Sebastian --> compound
Thrun --> nsubj
started --> advcl
working --> xcomp
on --> prep
self --> npadvmod
- --> punct
driving --> amod
cars --> pobj
at --> prep
Google --> pobj
in --> prep
2007 --> pobj
, --> punct
few --> amod
people --> nsubj
outside --> prep
of --> prep
the --> det
company --> pobj
took --> ROOT
him --> dobj
seriously --> advmod
. --> punct


In [73]:
spacy.explain('advmod')

'adverbial modifier'

In [75]:
# Filter out stop words and 1 or 2 letter words
filtered_tokens = [token for token in doc if not token.is_stop and len(token) > 2]
filtered_tokens

[Sebastian,
 Thrun,
 started,
 working,
 self,
 driving,
 cars,
 Google,
 2007,
 people,
 outside,
 company,
 took,
 seriously]

In [76]:
# Normalizing words to their root
normalized_tokens = [token.lemma_ for token in filtered_tokens]
normalized_tokens

['Sebastian',
 'Thrun',
 'start',
 'work',
 'self',
 'drive',
 'car',
 'Google',
 '2007',
 'people',
 'outside',
 'company',
 'take',
 'seriously']

In [77]:
# Vectorizing text
vectorized_tokens = [token.vector for token in filtered_tokens]
vectorized_tokens

[array([-5.71817160e-01,  1.65022194e+00,  5.35619080e-01, -5.80845475e-01,
        -8.51170719e-02,  2.22893178e-01, -6.09330714e-01,  1.23475575e+00,
        -1.03335309e+00, -6.34859502e-01, -6.43051028e-01, -7.78789580e-01,
        -5.22866726e-01,  5.22352755e-01, -9.43297893e-02, -6.21124148e-01,
        -1.64939427e+00,  1.40644759e-02,  2.47086716e+00,  5.93555808e-01,
        -3.76887977e-01,  3.23743439e+00, -5.46526074e-01,  9.19954479e-01,
         6.10621452e-01,  1.08310580e-03, -1.68772489e-02, -5.82848787e-01,
         2.43205214e+00, -6.05870485e-01,  8.21614146e-01, -1.00611478e-01,
        -1.43793678e+00, -1.57914984e+00, -1.26228705e-02,  7.05610737e-02,
         7.99672008e-01,  3.53682309e-01,  3.29753816e-01,  6.48135662e-01,
        -1.74151444e+00,  4.38590646e-01, -3.60152662e-01, -6.46073759e-01,
        -1.03747994e-01, -1.08078790e+00,  4.12641406e-01, -1.78395867e-01,
         2.38993734e-01,  5.37644565e-01,  2.25287223e+00,  6.87343776e-01,
        -8.1

Still need a fulll NLP example for Spacy

__Tool__ NLTK (Natural Language Toolkit) from the University of Pennsylvania 

__Tool__ TextBlob