VADER Sentiment Analysis
1. http://www.nltk.org/howto/sentiment.html
2. https://github.com/cjhutto/vaderSentiment

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.

In [1]:
import pandas as pd
import os
from nltk.sentiment import SentimentAnalyzer
from nltk.sentiment.util import *
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import subjectivity
from nltk.sentiment.vader import SentimentIntensityAnalyzer

nltk.download('vader_lexicon')



[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/cesar/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

Information of a tweet
- id
- created_at
- text
- user -> location

Location is a bit useless, we can use:
- timezone
- geo
- coordinates

In [2]:
# Analyze tweet
hashtag = 'trump'

tweets = []
with open(hashtag+'.json', 'r') as f:
    for line in f:
        tweet = {}
        dict_tweet = json.loads(line)
        tweet['id'] = dict_tweet['id']
        tweet['created_at'] = dict_tweet['created_at']
        tweet['text'] = dict_tweet['text']
        tweet['location'] = dict_tweet['user']['location']
        tweet['timezone'] = dict_tweet['user']['time_zone']
        tweet['coord'] = dict_tweet['coordinates']
        tweet['place'] = dict_tweet['place']
        tweets.append(tweet)
tweets[0]

{'coord': None,
 'created_at': 'Thu May 11 20:44:26 +0000 2017',
 'id': 862770395853860864,
 'location': 'Ireland',
 'place': None,
 'text': 'RT @2ALAW: 📣Hey Hillary The FBI Is Going To Re-open Your Investigation!!\n\nHillary: Wait What? Like With A Can Opener Or Someth… ',
 'timezone': None}

In [3]:
df_tweets = pd.DataFrame.from_dict(tweets)

In [4]:
df_tweets.count()

coord            0
created_at    1681
id            1681
location      1075
place           19
text          1681
timezone       865
dtype: int64

In [5]:
sid = SentimentIntensityAnalyzer()

Compound Variable
- positive sentiment: compound score >= 0.5
- neutral sentiment: (compound score > -0.5) and (compound score < 0.5)
- negative sentiment: compound score <= -0.5

In [6]:
def sentiment(x):
    sentence = x['text']
    sentiment = 'neutral'
    ss = sid.polarity_scores(sentence)
    for k in sorted(ss):
        if(k=='compound'):
            if(ss[k]>=0.5):
                sentiment = 'positive'
            elif(ss[k]<=-0.5):
                sentiment = 'negative'
            else:
                sentiment = 'neutral'
    return sentiment

In [7]:
def sentiment_compound(x):
    sentence = x['text']
    sentiment_compound = 0
    ss = sid.polarity_scores(sentence)
    for k in sorted(ss):
        sentiment_compound = ss[k]
    return sentiment_compound

In [8]:
df_tweets['sentiment'] = df_tweets.apply(lambda x: sentiment(x), axis=1)
df_tweets['sentiment_compound'] = df_tweets.apply(lambda x: sentiment_compound(x), axis=1)

In [9]:
df_tweets.head(2)

Unnamed: 0,coord,created_at,id,location,place,text,timezone,sentiment,sentiment_compound
0,,Thu May 11 20:44:26 +0000 2017,862770395853860864,Ireland,,RT @2ALAW: 📣Hey Hillary The FBI Is Going To Re...,,neutral,0.134
1,,Thu May 11 20:44:26 +0000 2017,862770395992272896,,,He's a madman https://t.co/B2ltvqzop3 #trump #...,,neutral,0.0


In [10]:
df_tweets.count()

coord                    0
created_at            1681
id                    1681
location              1075
place                   19
text                  1681
timezone               865
sentiment             1681
sentiment_compound    1681
dtype: int64

In [11]:
df_tweets.groupby(['sentiment']).count()['id']

sentiment
negative     278
neutral     1238
positive     165
Name: id, dtype: int64

In [12]:
pd.options.display.max_colwidth = 266

In [13]:
df_tweets[(df_tweets['sentiment']=='positive')].head(5) 

Unnamed: 0,coord,created_at,id,location,place,text,timezone,sentiment,sentiment_compound
8,,Thu May 11 20:44:37 +0000 2017,862770443622666240,"California, USA",,RT @Unpersuaded112: Here are 3 #conservative #republican #sexual #predators that republicans support and love. #Oreilly #Ailes #trump…,,positive,0.33
10,,Thu May 11 20:44:39 +0000 2017,862770450132213760,citrus heights CA,,RT @Unpersuaded112: Here are 3 #conservative #republican #sexual #predators that republicans support and love. #Oreilly #Ailes #trump…,,positive,0.33
18,,Thu May 11 20:44:45 +0000 2017,862770475423985666,,,RT @DrDenaGrayson: @20committee #Russia🇷🇺sent the message👉🏼it can manipulate #Trump whenever they feel like it. Putin doesn’t care if…,,positive,0.263
29,,Thu May 11 20:44:51 +0000 2017,862770502208811008,"Wexford, Ireland",,People in the US really love #Trump. https://t.co/JTpuvohg4a,Dublin,positive,0.391
30,,Thu May 11 20:44:52 +0000 2017,862770503639027713,,,"""I don’t trust anything coming out of this White House, &amp; I don’t trust this feckless #Congress to constrain #Trump"" https://t.co/u4ZJZbyAfr",,positive,0.268


In [14]:
df_tweets[(df_tweets['sentiment']=='negative')].head(5)

Unnamed: 0,coord,created_at,id,location,place,text,timezone,sentiment,sentiment_compound
5,,Thu May 11 20:44:30 +0000 2017,862770411452456960,,,The UN is a Bad Joke. #Trump should pull us out of that miserable failure and save a Bundle. https://t.co/FW2f7LzmOd,,negative,0.19
9,,Thu May 11 20:44:37 +0000 2017,862770443496943616,,,RT @RonanLTynan: #Trump killing #Syria/n civilians ignoring #Assad helping #ISIS because resp &gt;90% of civilian deaths &amp; cause of rad…,,negative,0.084
11,,Thu May 11 20:44:39 +0000 2017,862770451805868033,"Knoxville, TN",,RT @2ALAW: Right After Maxine Waters Argued That Trump Should Not Have Fired James Comey....She Said This⬇️\n\n#Trump 🇺🇸…,Eastern Time (US & Canada),negative,0.0
17,,Thu May 11 20:44:43 +0000 2017,862770467576393728,M.I.A.,,@realDonaldTrump Lmao!!!!! @Rosie needs to delete her account and lay low for a few after this one. #Trump,,negative,0.0
27,,Thu May 11 20:44:50 +0000 2017,862770498006114304,United States,,RT @bocavista2016: LYING LIBS\n\nMcCabe\n\n👉#Trump DIDN'T interfere\n👉#ComeyFiring has ZERO impact\n👉Comey DIDN'T ask for funds\n\nhttps://t.co/3I1…,,negative,0.0


In [15]:
# Save
dir_df = os.path.join(os.path.abspath(''),'stg')
result_filename = r'df_tweets.pkl'
result_fullpath = os.path.join(dir_df, result_filename)
df_tweets.to_pickle(result_fullpath)