VADER Sentiment Analysis
1. http://www.nltk.org/howto/sentiment.html
2. https://github.com/cjhutto/vaderSentiment

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.

In [1]:
import pandas as pd
import os
from nltk.sentiment import SentimentAnalyzer
from nltk.sentiment.util import *
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import subjectivity
from nltk.sentiment.vader import SentimentIntensityAnalyzer

nltk.download('vader_lexicon')



[nltk_data] Downloading package vader_lexicon to
[nltk_data]     /Users/cesar/nltk_data...
[nltk_data]   Package vader_lexicon is already up-to-date!


True

Information of a tweet
- id
- created_at
- text
- user -> location

Location is a bit useless, we can use:
- timezone
- geo
- coordinates

In [15]:
# Analyze tweet
hashtag = 'brexit'

tweets = []
with open(hashtag+'.json', 'r') as f:
    for line in f:
        tweet = {}
        dict_tweet = json.loads(line)
        tweet['id'] = dict_tweet['id']
        tweet['created_at'] = dict_tweet['created_at']
        tweet['text'] = dict_tweet['text']
        tweet['location'] = dict_tweet['user']['location']
        tweet['timezone'] = dict_tweet['user']['time_zone']
        tweet['coord'] = dict_tweet['coordinates']
        tweet['place'] = dict_tweet['place']
        tweets.append(tweet)
tweets[0]

{'coord': None,
 'created_at': 'Sun Mar 19 17:26:30 +0000 2017',
 'id': 843514025287794689,
 'location': 'New Jersey',
 'place': None,
 'text': '#Career #opportunity for #Python Developer (17-00822) - NY - New York https://t.co/IAzcLi3kbm #ApTask. More here: https://t.co/d4T7jqfBNy',
 'timezone': 'Pacific Time (US & Canada)'}

In [16]:
df_tweets = pd.DataFrame.from_dict(tweets)

In [17]:
df_tweets.count()

coord           0
created_at    616
id            616
location      454
place           6
text          616
timezone      397
dtype: int64

In [18]:
sid = SentimentIntensityAnalyzer()

Compound Variable
- positive sentiment: compound score >= 0.5
- neutral sentiment: (compound score > -0.5) and (compound score < 0.5)
- negative sentiment: compound score <= -0.5

In [19]:
def sentiment(x):
    sentence = x['text']
    sentiment = 'neutral'
    ss = sid.polarity_scores(sentence)
    for k in sorted(ss):
        if(k=='compound'):
            if(ss[k]>=0.5):
                sentiment = 'positive'
            elif(ss[k]<=-0.5):
                sentiment = 'negative'
            else:
                sentiment = 'neutral'
    return sentiment

In [20]:
df_tweets['sentiment'] = df_tweets.apply(lambda x: sentiment(x), axis=1)

In [21]:
df_tweets.head(2)

Unnamed: 0,coord,created_at,id,location,place,text,timezone,sentiment
0,,Sun Mar 19 17:26:30 +0000 2017,843514025287794689,New Jersey,,#Career #opportunity for #Python Developer (17...,Pacific Time (US & Canada),neutral
1,,Sun Mar 19 17:26:32 +0000 2017,843514034590769152,"London, England",,RT @raamana_: So @MathWorks itself put togethe...,,neutral


In [22]:
df_tweets.count()

coord           0
created_at    616
id            616
location      454
place           6
text          616
timezone      397
sentiment     616
dtype: int64

In [23]:
df_tweets.groupby(['sentiment']).count()['id']

sentiment
negative     88
neutral     440
positive     88
Name: id, dtype: int64

In [24]:
pd.options.display.max_colwidth = 266

In [25]:
df_tweets[(df_tweets['sentiment']=='positive')].head(5) 

Unnamed: 0,coord,created_at,id,location,place,text,timezone,sentiment
4,,Sun Apr 02 16:50:06 +0000 2017,848578298447876096,The EU Hegemony,,Angels &amp; ministers of grace defend us!\n\nIts Armageddon! \n\nAbandon all hope the end is nigh!\n\n#Brexit is the harbing… https://t.co/J9X9IUYXSm,,positive
18,,Sun Apr 02 16:50:12 +0000 2017,848578322640642049,London,,I'm sure those who will suffer a massive drop in income will love that the holidays they can't afford will require a blue passport #Brexit,London,positive
23,,Sun Apr 02 16:50:18 +0000 2017,848578347571585024,,,RT @socioblah: Tory blue passports\nLab/TU won rights bonfire\nRoyal Yacht\nByebye #NHS public service + civic life\nHello ColonialCuckooLand\n#…,Edinburgh,positive
26,,Sun Apr 02 16:50:25 +0000 2017,848578374763204608,,,RT @LostChordof1963: @BathforEurope thanks for the brilliant March today...keep up the good work\n#Bath #Brexit https://t.co/9s6MdWNYuK,,positive
39,,Sun Apr 02 16:50:39 +0000 2017,848578436805402624,"Macclesfield, England","{'id': '8ef32ff56ef11c22', 'url': 'https://api.twitter.com/1.1/geo/id/8ef32ff56ef11c22.json', 'place_type': 'admin', 'name': 'Engeland', 'full_name': 'Engeland, Verenigd Koninkrijk', 'country_code': 'GB', 'country': 'Verenigd Koninkrijk', 'bounding_box': {'type'...","The problem is: Everyone involved want the best deal for #Brexit, except nobody knows what it looks like.",Amsterdam,positive


In [26]:
df_tweets[(df_tweets['sentiment']=='negative')].head(5)

Unnamed: 0,coord,created_at,id,location,place,text,timezone,sentiment
2,,Sun Apr 02 16:50:05 +0000 2017,848578291430830080,"Belfast, Ireland",,RT @stewartcdickson: Serious threat to airlines and business as Brexit mad PM drags us out of Open Skies agreement https://t.co/UjvzjPkCpu,London,negative
8,,Sun Apr 02 16:50:08 +0000 2017,848578306157031424,"The Christmas Barse, Biffin's Bridge",,LET’S HAVE A WAR! #Brexit,London,negative
9,,Sun Apr 02 16:50:09 +0000 2017,848578310707785729,"birmingham, UK",,"I know, let's blow the extra money we were going to spend on the NHS on blue passports, a royal yacht and war with Spain #Brexit",London,negative
10,,Sun Apr 02 16:50:09 +0000 2017,848578311475396609,"Swansea, Wales",,Tories threaten withdrawal of defence cooperation then war at the beginning of #Brexit negotiation. Unfit for office https://t.co/ecy20yseZ9,,negative
11,,Sun Apr 02 16:50:10 +0000 2017,848578312914046976,,,RT @GeraintDaviesMP: Tories threaten withdrawal of defence cooperation then war at the beginning of #Brexit negotiation. Unfit for office h…,,negative


In [27]:
# Save
dir_df = os.path.join(os.path.abspath(''),'stg')
result_filename = r'df_tweets.pkl'
result_fullpath = os.path.join(dir_df, result_filename)
df_tweets.to_pickle(result_fullpath)