# Tweet Sentiment Analysis

*Simple sentiment analysis using text blob*

Results for the whole Tweet dataset (English): 51% positive sentiment, 36% neutral sentiment, 13% negative sentiment.

- Results for the tweets with #apple or @apple: 52% positive sentiment, 41% neutral sentiment, 7% negative sentiment.

- Results for the tweets with #samsung or @samsung: 50% positive sentiment, 37% neutral sentiment, 13% negative sentiment.

It seems that text blob False Negative rate is not trivial for the tweets about shopping, deals and discounts. However, we can still compare the sentiment about different companies with each other, assuming that the False Negative rate is the same for large samples. Relatively speaking, the sentiment of tweets about Apple around Black Friday & Cyber Monday was more positive than the sentiment of tweets about Samsung.

Sentiment Analysis can help businesses in early detection of negative tweets/comments and addressing customer concerns.

In [1]:
import pandas as pd
import numpy as np

In [2]:
tweets = pd.read_csv('df_tweets.csv', usecols=['created_at', 'language', 'full_text', 'hashtags', 'user_mentions'],\
                     parse_dates=['created_at'], infer_datetime_format=True, low_memory=False)

In [3]:
# drop the tweets in languages other than English 
tweets = tweets[tweets['language'] == 'en']

In [4]:
print (len(tweets), 'tweets out of the ~1.7 million collected tweets were in english')

1204244 tweets out of the ~1.7 million collected tweets were in english


In [5]:
tweets.head()

Unnamed: 0,created_at,language,user_mentions,hashtags,full_text
2,2017-11-20 11:15:18,en,[],"['AMAZON', 'DEALS', 'Christmas', 'holiday', 't...",['HURRY #AMAZON LIGHTNING #DEALS LIVE &gt; htt...
3,2017-11-20 11:15:18,en,['blackfriday'],"['BlackFriday', 'CORSETS', 'dress', 'fashion']",['Black Friday Sale- 55% Off\nNayla Brocade Ov...
4,2017-11-20 11:15:18,en,[],"['ghd', 'Christmas', 'hair', 'BlackFriday', 'B...",['Black Friday Deals\nSave £20 on the ghd IV S...
5,2017-11-20 11:15:19,en,[],"['sale', 'save', 'blackfriday', 'bathmat', 'ut...",['23% discount #sale #save #blackfriday #bathm...
6,2017-11-20 11:15:20,en,[],"['blackfriday', 'printer', 'deal', 'sale', 'be...",['Are you Looking for Officejet Printer Visit ...


In [6]:
# Sentiment Analysis 
from textblob import TextBlob
import re

def clean_tweet(tweet):
    '''
    Utility function to clean the text in a tweet by removing 
    links and special characters using regex.
    '''
    return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())

def analize_sentiment(tweet):
    '''
    Utility function to classify the polarity of a tweet
    using textblob.
    '''
    analysis = TextBlob(clean_tweet(tweet))
    if analysis.sentiment.polarity > 0:
        return 1
    elif analysis.sentiment.polarity == 0:
        return 0
    else:
        return -1

In [7]:
# Adding a column for sentiment    
tweets['SA'] = tweets['full_text'].apply(lambda x: analize_sentiment(x))

In [8]:
# Percentages
sentiments = tweets.groupby('SA').agg({'SA':'count'}).rename(columns=({'SA': 'Count'}))
sentiments['% of total'] = sentiments['Count']/sentiments['Count'].sum()
sentiments ['Sentiment'] = ['Negative', 'Neutral', 'Positive']
sentiments = sentiments.reindex(columns = ['Sentiment', 'Count','% of total'])
sentiments

Unnamed: 0_level_0,Sentiment,Count,% of total
SA,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
-1,Negative,157020,0.130389
0,Neutral,434201,0.360559
1,Positive,613023,0.509052


## Tweet Sentiments: Apple vs Samsung

In [9]:
# Changing the type of the hashtags and user_mentions columns from string to list
tweets['hashtags'] = tweets['hashtags'].apply(lambda x: x.strip('[]').lower().replace("'",'').split(', '))
tweets['user_mentions'] = tweets['user_mentions'].apply(lambda x: x.strip('[]').lower().replace("'",'').split(', '))

In [10]:
# tweets with hashtags or user mentions of apple or samsung
tweets ['apple'] = (tweets['hashtags'] + tweets['user_mentions']).apply(lambda x: True if 'apple' in x else False)
tweets ['samsung'] = (tweets['hashtags'] + tweets['user_mentions']).apply(lambda x: True if 'samsung' in x else False)

In [11]:
apple = tweets[tweets['apple'] == True]
apple.head()

Unnamed: 0,created_at,language,user_mentions,hashtags,full_text,SA,apple,samsung
894,2017-11-20 11:37:01,en,[apple],"[twitterpoll, blackfriday, cybermonday, iphone...",['Please vote! #TwitterPoll #BlackFriday #Cybe...,0,True,False
1085,2017-11-20 11:42:00,en,[],"[macbook, blackfriday, apple]",['#MacBook MEGA MADNESS : #BlackFriday korting...,0,True,False
1086,2017-11-20 11:42:01,en,[],"[macbook, blackfriday, apple]",['#MacBook MEGA MADNESS : #BlackFriday korting...,0,True,False
1119,2017-11-20 11:43:00,en,[],"[macbook, apple, blackfriday]",['#MacBook MEGA Madness : tous les MacBooks #A...,0,True,False
1120,2017-11-20 11:43:00,en,[],"[macbook, apple, blackfriday]",['#MacBook MEGA Madness : tous les MacBooks #A...,0,True,False


In [12]:
samsung = tweets[tweets['samsung'] == True]
samsung.head()

Unnamed: 0,created_at,language,user_mentions,hashtags,full_text,SA,apple,samsung
1210,2017-11-20 11:45:48,en,[],"[blackfridayfeeling, blackfriday, blackfriday2...","[""Hurry Up! Black Friday at Kohl's is Live.......",-1,False,True
2821,2017-11-20 12:22:37,en,[],"[blackfriday, blackfriday2017, cybermonday, de...","[""Microsoft's Special Deal on Xbox One S\n#Bla...",1,False,True
4309,2017-11-20 12:58:51,en,"[samsung, curryspcworld]","[discrimination, badsalestechnique, blackfrida...",['@Samsung @curryspcworld DISGUSTED with the w...,-1,False,True
4401,2017-11-20 13:00:09,en,[samsung],"[sale, save, blackfriday, samsung, smarttv]",['35% discount #sale #save #blackfriday #samsu...,1,False,True
4411,2017-11-20 13:00:11,en,[samsung],"[sale, save, blackfriday, samsung, smarttv]",['35% discount #sale #save #blackfriday #samsu...,1,False,True


In [13]:
# Sentiment around Apple
apple_SA = apple.groupby('SA').agg({'SA':'count'}).rename(columns=({'SA': 'Count'}))
apple_SA ['% of total'] = apple_SA['Count']/apple_SA['Count'].sum()
apple_SA ['Sentiment'] = ['Negative', 'Neutral', 'Positive']
apple_SA = apple_SA.reindex(columns = ['Sentiment', 'Count','% of total'])
apple_SA

Unnamed: 0_level_0,Sentiment,Count,% of total
SA,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
-1,Negative,239,0.073516
0,Neutral,1328,0.40849
1,Positive,1684,0.517994


In [14]:
# Sentiment around Samsung
samsung_SA = samsung.groupby('SA').agg({'SA':'count'}).rename(columns=({'SA': 'Count'}))
samsung_SA ['% of total'] = samsung_SA['Count']/samsung_SA['Count'].sum()
samsung_SA ['Sentiment'] = ['Negative', 'Neutral', 'Positive']
samsung_SA = samsung_SA.reindex(columns = ['Sentiment', 'Count','% of total'])
samsung_SA

Unnamed: 0_level_0,Sentiment,Count,% of total
SA,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
-1,Negative,252,0.132701
0,Neutral,704,0.370721
1,Positive,943,0.496577


In [15]:
# Example of a tweet with nagative sentiment about samsung (True Negative)
samsung['full_text'][4309]

'[\'@Samsung @curryspcworld DISGUSTED with the way you have treated a disabled customer this morning! Reviews state u send out smart remote no issues but now refuse! Not easy for a disabled customer to return a 58" tv! #discrimination #badsalestechnique #blackfriday #4kUltra #Samsung\']'

In [16]:
# Example of a tweet with detected nagative sentiment about samsung which is not actually negative (False Negative)
samsung['full_text'][1210]

'["Hurry Up! Black Friday at Kohl\'s is Live.....\\n#blackfridayfeeling #BlackFriday #BlackFriday2017 #Kohls #BlackFriday #CyberMonday #KohlsSweeps #deals #giveaway #sale #Samsung #Christmas #Thanksgiving #Retail https://t.co/lXdXouydhD https://t.co/LYyLRMlBPj"]'