VADER Sentiment Analysis
1. http://www.nltk.org/howto/sentiment.html
2. https://github.com/cjhutto/vaderSentiment

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.

In [35]:
import pandas as pd
from nltk.sentiment import SentimentAnalyzer
from nltk.sentiment.util import *
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import subjectivity
from nltk.sentiment.vader import SentimentIntensityAnalyzer

nltk.download('vader_lexicon')

[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/cesar/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

Information of a tweet
- id
- created_at
- text
- user -> location

In [24]:
# Analyze tweet
hashtag = 'Macron'

tweets = []
with open(hashtag+'.json', 'r') as f:
    for line in f:
        tweet = {}
        dict_tweet = json.loads(line)
        tweet['id'] = dict_tweet['id']
        tweet['created_at'] = dict_tweet['created_at']
        tweet['text'] = dict_tweet['text']
        tweet['location'] = dict_tweet['user']['location']
        tweets.append(tweet)
tweets[0]

{'created_at': 'Sun Apr 23 20:51:48 +0000 2017',
 'id': 856249266834223104,
 'location': 'New Jersey',
 'text': 'RT @simonjhix: #Macron v #LePen =&gt; end of the economic left-right of C20th, &amp; a return to the left-right of C19th, of liberalism v conserva…'}

In [41]:
df_tweets = pd.DataFrame.from_dict(tweets)

In [60]:
df_tweets.head(2)

Unnamed: 0,created_at,id,location,text,sentiment
0,Sun Apr 23 20:51:48 +0000 2017,856249266834223104,New Jersey,RT @simonjhix: #Macron v #LePen =&gt; end of t...,0.0
1,Sun Apr 23 20:51:48 +0000 2017,856249267169755138,,"RT @nicolasbayfn: ""Le clivage a le mérite d'êt...",0.0


In [36]:
sid = SentimentIntensityAnalyzer()

Compound Variable
- positive sentiment: compound score >= 0.5
- neutral sentiment: (compound score > -0.5) and (compound score < 0.5)
- negative sentiment: compound score <= -0.5

In [55]:
def sentiment(x):
    sentence = x['text']
    sentiment = 0
    ss = sid.polarity_scores(sentence)
    for k in sorted(ss):
        if(k=='compound'):
            sentiment = ss[k]
    return sentiment

In [56]:
df_tweets['sentiment'] = df_tweets.apply(lambda x: sentiment(x), axis=1)

In [61]:
df_tweets.head(2)

Unnamed: 0,created_at,id,location,text,sentiment
0,Sun Apr 23 20:51:48 +0000 2017,856249266834223104,New Jersey,RT @simonjhix: #Macron v #LePen =&gt; end of t...,0.0
1,Sun Apr 23 20:51:48 +0000 2017,856249267169755138,,"RT @nicolasbayfn: ""Le clivage a le mérite d'êt...",0.0


In [68]:
df_tweets.count()

created_at    847
id            847
location      538
text          847
sentiment     847
dtype: int64

In [64]:
df_tweets_pos = df_tweets[(df_tweets.sentiment>=0.5)]
df_tweets_pos.count()

created_at    19
id            19
location      15
text          19
sentiment     19
dtype: int64

In [67]:
df_tweets_neg = df_tweets[(df_tweets.sentiment<=-0.5)]
df_tweets_neg.count()

created_at    13
id            13
location       5
text          13
sentiment     13
dtype: int64