## Sentiment Analysis on Tweets

First and foremost, we need to collect a series of tweets on a particular subject. Tweepy is the most popular python library for interacting with Twitter, and its "API class provides access to the entire twitter RESTful API methods". ([source](https://tweepy.readthedocs.io/en/latest/getting_started.html))

In [1]:
import tweepy

Twitter's API does not allow unauthorized access, we should create an app via https://developer.twitter.com and state the purpoe of our app. After that, we can obtain two keys, "consumer_key" and "consumer_secret" and pass it to the OAuth handler of tweepy. [source](https://tweepy.readthedocs.io/en/latest/auth_tutorial.html)

In [9]:
auth = tweepy.AppAuthHandler("--", "--")

Now we can use the auth object to create an API objectand use its methods. (to search for tweets containing a particular term)

In [72]:
tweets = []
api = tweepy.API(auth)
for tweet in tweepy.Cursor(api.search, q='trump').items(500):
    print(tweet.text)
    tweets.append(tweet.text)

@ElliotSKaufman @DamonLinker @lionel_trolling As for Codevilla, there’s a straight line from “America’s Ruling Clas… https://t.co/Nr6oXOqrEh
RT @seanhannity: BLOWBACK... https://t.co/vEDxvfdzNz
RT @RWPUSA: Trump is giving his middle finger to the Constitution.
He needs to be impeached and removed from office NOW. 

Can anyone lay a…
RT @piersmorgan: *NEW COLUMN*
Boris Johnson's triumph proves democracy-denying radical socialists backed by self-righteous celebrities on T…
RT @kylegriffin1: Trump's senior aides have further restricted the number of admin officials allowed to listen to Trump's phone calls with…
RT @RealMattCouch: Bernie Sanders says that President Trump is the most dangerous President in the history of the United States..

Record S…
RT @RadioFreeTom: People ask me when I think protest is appropriate and matters, since I'm usually not a fan. 
This, right here, should bri…
Dobbs breaks down phase one of Trump's China trade deal
https://t.co/oJwWfxjFA2
RT @Danie1607: "It’s a

#### We are going to use numpy to store the data and manipulate it.

In [74]:
import numpy as np

In [75]:
tweets_copy = np.array(tweets)

In [76]:
tweets_copy[:20]

array(['@ElliotSKaufman @DamonLinker @lionel_trolling As for Codevilla, there’s a straight line from “America’s Ruling Clas… https://t.co/Nr6oXOqrEh',
       'RT @seanhannity: BLOWBACK... https://t.co/vEDxvfdzNz',
       'RT @RWPUSA: Trump is giving his middle finger to the Constitution.\nHe needs to be impeached and removed from office NOW. \n\nCan anyone lay a…',
       "RT @piersmorgan: *NEW COLUMN*\nBoris Johnson's triumph proves democracy-denying radical socialists backed by self-righteous celebrities on T…",
       "RT @kylegriffin1: Trump's senior aides have further restricted the number of admin officials allowed to listen to Trump's phone calls with…",
       'RT @RealMattCouch: Bernie Sanders says that President Trump is the most dangerous President in the history of the United States..\n\nRecord S…',
       "RT @RadioFreeTom: People ask me when I think protest is appropriate and matters, since I'm usually not a fan. \nThis, right here, should bri…",
       "Dobbs breaks down

Now we need to preprocess the tweets. We use Tweet Preprocessor package to do this. [source](https://pypi.org/project/tweet-preprocessor/)

In [77]:
import preprocessor as prep

In [78]:
for i, tweet in enumerate(tweets_copy):
    tweets_copy[i] = prep.clean(tweet)

In [79]:
tweets_copy[:20]

array(['As for Codevilla, there’s a straight line from “America’s Ruling Clas…',
       ': BLOWBACK...',
       ': Trump is giving his middle finger to the Constitution. He needs to be impeached and removed from office NOW. Can anyone lay a…',
       ": *NEW COLUMN* Boris Johnson's triumph proves democracy-denying radical socialists backed by self-righteous celebrities on T…",
       ": Trump's senior aides have further restricted the number of admin officials allowed to listen to Trump's phone calls with…",
       ': Bernie Sanders says that President Trump is the most dangerous President in the history of the United States.. Record S…',
       ": People ask me when I think protest is appropriate and matters, since I'm usually not a fan. This, right here, should bri…",
       "Dobbs breaks down phase one of Trump's China trade deal",
       ': "It’s a horrible thing to be using the tool of impeachment which is supposed to be used in an emergency!" …',
       ': Based on the libturds t

As we can see it needs some further preprocessing.

In [80]:
import re

In [153]:
#source: https://stackoverflow.com/questions/5843518/remove-all-special-characters-punctuation-and-spaces-from-string
for i, tweet in enumerate(tweets_copy):
    tweets_copy[i] = re.sub('[^A-Za-z0-9 ]+', '', tweet)

In [88]:
tweets_copy[:20]

array(['As for Codevilla theres a straight line from Americas Ruling Clas',
       ' BLOWBACK',
       ' Trump is giving his middle finger to the Constitution He needs to be impeached and removed from office NOW Can anyone lay a',
       ' NEW COLUMN Boris Johnsons triumph proves democracydenying radical socialists backed by selfrighteous celebrities on T',
       ' Trumps senior aides have further restricted the number of admin officials allowed to listen to Trumps phone calls with',
       ' Bernie Sanders says that President Trump is the most dangerous President in the history of the United States Record S',
       ' People ask me when I think protest is appropriate and matters since Im usually not a fan This right here should bri',
       'Dobbs breaks down phase one of Trumps China trade deal',
       ' Its a horrible thing to be using the tool of impeachment which is supposed to be used in an emergency ',
       ' Based on the libturds threshold for Impeachment Obama should have 

In [90]:
tweets[:20]

['@ElliotSKaufman @DamonLinker @lionel_trolling As for Codevilla, there’s a straight line from “America’s Ruling Clas… https://t.co/Nr6oXOqrEh',
 'RT @seanhannity: BLOWBACK... https://t.co/vEDxvfdzNz',
 'RT @RWPUSA: Trump is giving his middle finger to the Constitution.\nHe needs to be impeached and removed from office NOW. \n\nCan anyone lay a…',
 "RT @piersmorgan: *NEW COLUMN*\nBoris Johnson's triumph proves democracy-denying radical socialists backed by self-righteous celebrities on T…",
 "RT @kylegriffin1: Trump's senior aides have further restricted the number of admin officials allowed to listen to Trump's phone calls with…",
 'RT @RealMattCouch: Bernie Sanders says that President Trump is the most dangerous President in the history of the United States..\n\nRecord S…',
 "RT @RadioFreeTom: People ask me when I think protest is appropriate and matters, since I'm usually not a fan. \nThis, right here, should bri…",
 "Dobbs breaks down phase one of Trump's China trade deal\nhttps://

Looks good ! Now we are going to use TextBlob to perform sentiment analysis. [source](https://textblob.readthedocs.io/en/dev/)

In [92]:
from textblob import TextBlob

In [94]:
!python -m textblob.download_corpora

[nltk_data] Downloading package brown to /Users/iDev/nltk_data...
[nltk_data]   Unzipping corpora/brown.zip.
[nltk_data] Downloading package punkt to /Users/iDev/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package wordnet to /Users/iDev/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package averaged_perceptron_tagger to
[nltk_data]     /Users/iDev/nltk_data...
[nltk_data]   Package averaged_perceptron_tagger is already up-to-
[nltk_data]       date!
[nltk_data] Downloading package conll2000 to /Users/iDev/nltk_data...
[nltk_data]   Unzipping corpora/conll2000.zip.
[nltk_data] Downloading package movie_reviews to
[nltk_data]     /Users/iDev/nltk_data...
[nltk_data]   Unzipping corpora/movie_reviews.zip.
Finished.


In [97]:
blobs = []
for tweet in tweets_copy:
    blobs.append(TextBlob(tweet))

In [98]:
blobs[0]

TextBlob("As for Codevilla theres a straight line from Americas Ruling Clas")

In [99]:
blobs[0].tags

[('As', 'IN'),
 ('for', 'IN'),
 ('Codevilla', 'NNP'),
 ('theres', 'VBZ'),
 ('a', 'DT'),
 ('straight', 'JJ'),
 ('line', 'NN'),
 ('from', 'IN'),
 ('Americas', 'NNP'),
 ('Ruling', 'NNP'),
 ('Clas', 'NNP')]

In [105]:
polarity = np.zeros(500)
for i, blob in enumerate(blobs):
    polarity[i] = blob.sentiment.polarity

In [106]:
polarity[:20]

array([ 0.2       ,  0.        ,  0.        ,  0.21818182,  0.        ,
       -0.05      ,  0.30357143, -0.15555556, -1.        ,  0.        ,
       -0.4       ,  0.        , -0.125     ,  0.        ,  0.        ,
        0.        ,  0.1       ,  0.        ,  0.        , -0.38958333])

In [109]:
tags = np.zeros(500)

for i, num in enumerate(polarity):
    if num < 0:
        tags[i] = -1
    if num > 0:
        tags[i] = +1
    else:
        tags[i] = 0

In [111]:
tags[:20]

array([1., 0., 0., 1., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1.,
       0., 0., 0.])

In [116]:
result = np.vstack((tweets, polarity, tags))

In [118]:
result[:20]

array([['@ElliotSKaufman @DamonLinker @lionel_trolling As for Codevilla, there’s a straight line from “America’s Ruling Clas… https://t.co/Nr6oXOqrEh',
        'RT @seanhannity: BLOWBACK... https://t.co/vEDxvfdzNz',
        'RT @RWPUSA: Trump is giving his middle finger to the Constitution.\nHe needs to be impeached and removed from office NOW. \n\nCan anyone lay a…',
        ...,
        'RT @MaxKennerly: If we had absolutely no information about what Trump did with Ukraine, it would still be shocking and incriminating that t…',
        'RT @BernieSanders: Despair is not an option. Stand up, get involved, organize. That is the only way we defeat Trump.',
        'RT @seanhannity: BLOWBACK... https://t.co/vEDxvfdzNz'],
       ['0.2', '0.0', '0.0', ..., '-0.55', '0.0', '0.0'],
       ['1.0', '0.0', '0.0', ..., '0.0', '0.0', '0.0']], dtype='<U144')

Now we can do the same using NLTK

In [144]:
import nltk
from nltk.tokenize import word_tokenize

In [145]:
# same things we did before 
tweets_copy = np.array(tweets)

for i, tweet in enumerate(tweets_copy):
    tweets_copy[i] = re.sub('[^A-Za-z0-9 ]+', '', prep.clean(tweet))

In [146]:
tweets_copy[:20]

array(['As for Codevilla theres a straight line from Americas Ruling Clas',
       ' BLOWBACK',
       ' Trump is giving his middle finger to the Constitution He needs to be impeached and removed from office NOW Can anyone lay a',
       ' NEW COLUMN Boris Johnsons triumph proves democracydenying radical socialists backed by selfrighteous celebrities on T',
       ' Trumps senior aides have further restricted the number of admin officials allowed to listen to Trumps phone calls with',
       ' Bernie Sanders says that President Trump is the most dangerous President in the history of the United States Record S',
       ' People ask me when I think protest is appropriate and matters since Im usually not a fan This right here should bri',
       'Dobbs breaks down phase one of Trumps China trade deal',
       ' Its a horrible thing to be using the tool of impeachment which is supposed to be used in an emergency ',
       ' Based on the libturds threshold for Impeachment Obama should have 

In [147]:
for i, tweet in enumerate(tweets_copy):
    tweet = tweet.lower()
    tokenized_tweet = word_tokenize(tweet)
    stemmed_tweet = []
    for token in tokenized_tweet:
        stemmed_tweet.append(token)
        
    tweets_copy[i] = " ".join(stemmed_tweet)

In [148]:
tweets_copy[:20]

array(['as for codevilla theres a straight line from americas ruling clas',
       'blowback',
       'trump is giving his middle finger to the constitution he needs to be impeached and removed from office now can anyone lay a',
       'new column boris johnsons triumph proves democracydenying radical socialists backed by selfrighteous celebrities on t',
       'trumps senior aides have further restricted the number of admin officials allowed to listen to trumps phone calls with',
       'bernie sanders says that president trump is the most dangerous president in the history of the united states record s',
       'people ask me when i think protest is appropriate and matters since im usually not a fan this right here should bri',
       'dobbs breaks down phase one of trumps china trade deal',
       'its a horrible thing to be using the tool of impeachment which is supposed to be used in an emergency',
       'based on the libturds threshold for impeachment obama should have been impe

In [149]:
posed_tweets = []
for i, tweet in enumerate(tweets_copy):
    tokenized_tweet = word_tokenize(tweet)
    posed_tweets.append(nltk.pos_tag(tokenized_tweet))

In [150]:
posed_tweets = np.array(posed_tweets)

In [151]:
posed_tweets[0]

[('as', 'IN'),
 ('for', 'IN'),
 ('codevilla', 'NN'),
 ('theres', 'VBZ'),
 ('a', 'DT'),
 ('straight', 'JJ'),
 ('line', 'NN'),
 ('from', 'IN'),
 ('americas', 'NN'),
 ('ruling', 'NN'),
 ('clas', 'NNS')]

### Conclusion
#### For implementing sentiment analysis using NLTK, we need a corpora to compare the text with. TextBlob provides this feature built-in, it also extracts the most important parts that influence the sentiment of the text. In NLTK, sentiment could be done as a classification task, but one would need a labeled dataset for training (extracting features).
### Future Work
#### Labeled datasets could be used to measure the performance of the algorithm. It would interesting to see the how stemming/lemmatization could affect the accuracy.

## Version History

* June 2018 - Project Created
* December 2019 - Project Updated