# Twitter Sentiment Analysis

## Abstract

The goal of this project is to use python for determining the sentiment of the given Twitter post. The text can be of following major category- positive, negative, neutral.

Following questions will be answered at the end:

* What is the ratio of positive to negative words for the given trending topic?
* What is my interpretation of the ratio?
* What is the managerial insight that can be offered based on the results?

To perform the analysis will use following python libraries:

* tweepy, to access twitter API
* Stanford Natural Language Toolkit, for natural languages functionalities
* re, for Regular Expression
* string, to perform string manipulations


## Twitter API configuration and tweet extraction

In [1]:
import tweepy
import codecs
import credentials

#### Authenicating user by passing credential keys

In [2]:
def twitter_setup():
    auth = tweepy.OAuthHandler(credentials.CONSUMER_KEY, credentials.CONSUMER_SECRET)
    auth.set_access_token(credentials.ACCESS_TOKEN, credentials.ACCESS_SECRET)
    api = tweepy.API(auth)
    return api

#### Extracting tweets for the keyword='Trump'

In [3]:
api=twitter_setup()
alltweets = []
#max limit to extract tweet is 200 per request
new_tweets = api.user_timeline(screen_name = "Trump",count=200)

alltweets.extend(new_tweets)

oldest = alltweets[-1].id - 1
   
#keep grabbing tweets until there are no tweets left to grab
while len(new_tweets) > 0:
    print("getting tweets before %s" % (oldest))

    #all subsiquent requests use the max_id param to prevent duplicates
    new_tweets = api.user_timeline(screen_name = "Trump",count=200,max_id=oldest)

    #save most recent tweets
    alltweets.extend(new_tweets)
        
    #update the id of the oldest tweet less one
    oldest = alltweets[-1].id - 1

    print("...%s tweets downloaded so far" % (len(alltweets)))

getting tweets before 866641990716465151
...400 tweets downloaded so far
getting tweets before 783314566386110463
...479 tweets downloaded so far
getting tweets before 765283905062727679
...479 tweets downloaded so far


#### Writing tweets in a text file 

In [4]:
file = codecs.open("TrumpTweets.txt", "w", "utf-8")
for tweet in alltweets:
    print(tweet.text)
    file.write(tweet.text)
    file.write("\n")
    
file.close() 

Routed on over 500 acres and forged from what was previously flat and barren desert, @TrumpGolfDubai has become a p… https://t.co/2K5XAz7Yt1
View our 4Q 2017 @TrumpRealty Luxury Market Report, featuring the latest luxury market statistics and analysis… https://t.co/2PrN1C3hBG
1,300 acres of stunning panoramas, rolling Blue Ridge Mountains, large lakes and 210 acres of vines @TrumpWinery https://t.co/421e8vcACW
We are very proud to announce that @TrumpVancouver, developed by the @HolbornGrp, has won the Best International Ho… https://t.co/g3OWQjSxEu
Newly Engaged? Explore our portfolio of award-winning properties to bring your wedding day vision to life… https://t.co/wEoeqr8CYc
Rise and shine! It's the first working Monday of 2018 #MondayMotivation https://t.co/nyfgzZsI6a
Happy Birthday, @EricTrump! 🎂 https://t.co/fTAzMSJlB8
Live from @TrumpTower: Winter Storm Grayson is bringing whiteout conditions to New York City and most of the East C… https://t.co/e8qaErtpLo
As much of the East Coa

#### Now we have generated a file containing tweets for the further analysis. 

## Pre-processing Tweets

In [5]:
#importing packages
import re
import nltk
from nltk.corpus import stopwords
import string
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

#### Clean the tweets using regular expressions

In [6]:
def clean_tweet(tweet):
    #Convert to lower case
    tweets = tweet.lower()
    #Convert www.* or https?://* or  http?://* to URL
    tweet = re.sub('((www\.[^\s]+)|(https?://[^\s]+)|(http?://[^\s]+)|(https:/[^\s]+)|(http?[^\s]+))','URL',tweet)
    #Convert @username to AT_USER
    tweet = re.sub('@[^\s]+',"AT_USER",tweet)
    #Remove additional white spaces
    tweet = re.sub('[\s]+', ' ', tweet)
    #Replace #word with word
    tweet = re.sub(r'#([^\s]+)', r'\1', tweet)
    #Replace all non alphanumeric i.e remove all special characters
    tweet=re.sub(r'[^\w]', ' ', tweet)
    #trim
    tweet = tweet.strip('\'"')
    #tweet= tweet.strip()
    tweet_list=tweet.split()
    return tweet_list


#### Making a list of Stop words

In [7]:
#make a list of stop words from the imported list of stopwords

stop_words = list(stopwords.words('english'))

#append url and at_user as stop words
def stop_word_list(stop_words):
    stop_words.append('AT_USER')
    stop_words.append('URL')
    stop_words.append('RT')
    stop_words.append('a')
    return stop_words

# add these stop words to stopword list
stop_words= stop_word_list(stop_words)

#### Uploading file of positive and negative list of words

In [8]:
# accessing positive and negative set of words

positive_tweets=open('PositiveWords.txt','r').read().split("\n")

negative_tweets=open('NegativeWords.txt','r').read().split("\n")
            

## Determining Sentiment

#### Assign sentiment to each word in the list of cleaned tweets

In [9]:
def determine_sentiment(final_processed_tweet):
    count_sentiment=0
    for words in final_processed_tweet:
        if words in positive_tweets:
            count_sentiment=count_sentiment+1
        if words in negative_tweets:
            count_sentiment=count_sentiment-1
    return count_sentiment

#### Determining sentiment by uploading the text file containing 500 tweets

In [10]:
final_processed_tweet=[]
processed_tweet=[]
#Read the file line by line
fp = open('TrumpTweets.txt', 'r',encoding="utf8")
line = fp.readline()
while line:
    line = fp.readline()

    #Add the cleaned tweets to list
    clean_tweets=clean_tweet(line)
    
    #removing stopwords from the clean_tweets list
    for i in clean_tweets:
        if i in stop_words:
            continue
        else:
            processed_tweet.append(i) 
            
final_processed_tweet.extend(processed_tweet) 

#Add sentiment to words for each tweet
count=determine_sentiment(final_processed_tweet)
if count > 0:
    print('The sentiment is positive')
elif count <0:
    print('The sentiment is negative')
else:
    print('The sentiment is neutral')


print(count)
#end loop
fp.close()

The sentiment is positive
305


## Counting Positive and Negative words from the Cleaned tweets

In [11]:
Positive_word=0
Negative_word=0
for word in final_processed_tweet:
    if word not in stop_words:
        if word in positive_tweets:
            Positive_word+=1
        elif word in negative_tweets:
            Negative_word+=1
print('Positive words are %s' %Positive_word)
print('Negative words are %s' %Negative_word)

Positive words are 332
Negative words are 27


## Ratio of Positive and Negative words in Tweets

In [12]:
Ratio=Positive_word/Negative_word
print('Ratio of Positive words to Negative words is:',Ratio)

Ratio of Positive words to Negative words is: 12.296296296296296


## Analysis on Sentiment

We can clearly see that the twitter sentiment for the given topic is Positive

The ratio of positive to negative words is also greater than 1 which tells that the number of positive words are more in the given twitter sample

Managerial Insights: If anyone is publically associating with the keyword='trump' in social media then it is positively received