# Twitter Sentiment Analysis using Python
This project aims to conduct sentiment analysis of a particular topic by parsing the tweets fetched from Twitter using Python.

## What is sentiment analysis?
Sentiment Analysis is the process of ‘computationally’ determining whether a piece of writing is positive, negative or neutral. It’s also known as opinion mining: deriving the opinion or attitude of a speaker or set of speakers.

## Benefits of Sentiment Analysis
### Business 
In marketing, companies use sentiment analysis to develop strategies, understand customers’ attitudes towards products or brands, learn how people respond to strategies, campaigns, or product launches, and gain deeper insight into why consumers may or may some products, support some teams or companies, etc.

### Politics
In politics, SA is used to keep track of political points of view, to detect consistency and inconsistency between statements and actions at the government level when election monitoring, and also as a proxy for election predictions / who will win.

### Public Actions
Sentiment Analysis is also used to monitor and analyze social phenomena. For instance, the intelligence community uses SA to track potentially dangerous situations and determine the general mood of the blogosphere in high threat areas, or in cases where they have a criminal suspect that they are monitoring, or network of such, etc.

## Project Programs / Packages Needed:
For this project, Tweepy and Textblob will be utilized. Tweepy is the python client for the official Twitter API, and Textblob is a python library used for processing textual data (which are what Tweets are!). Finally we'll also use the Natural Language Tool Kit library (NLTK) in Python.

## Setup
### Authentication
In order to fetch tweets through Twitter's API, one needs to register an App through a twitter account. This was accomplished through the following actions:

1) Navigated to https://apps.twitter.com/ and clicked: ‘Create New App’
2) Completed the application details and made the mock application.
3) Copied the necessary ‘Consumer Key’, ‘Consumer Secret’, ‘Access token’ and ‘Access Token Secret’ for app development.

'Consumer Key': oUkORoc3PPmAChcnxwfSKbRFL
'Consumer Secret': PmASGyF92keg7uTx8j4sNvRtWxzvw8Xm88rpxN7AqECj7YBLDe
'Access Token': 65187050-L0WLjPJCAGuvcfQCYhqtJ95MtCPezJGWzhN9niDBa
'Access Token Secret': C8OsEpteeL0s9xXoUU5wiip4plExftfIT72qfV7tc3xMf

### Finally, on to implementing our program in Python!

In [1]:
# First, we import the packages necessary for our analysis:

import re 
import tweepy 
from tweepy import OAuthHandler 
from textblob import TextBlob

ModuleNotFoundError: No module named 'tweepy'

In [34]:
# Next we create a generic Twitter class for our analysis:

# First we create a TwitterClient class. 
# This class contains all the methods to interact with Twitter API and allows us to parse tweets.
# We then use an initialization (__init__) function to handle the authentication of Twitter's API client.

class TwitterClient(object): 
    ''' 
    Generic Twitter Class for sentiment analysis. 
    '''
    def __init__(self): 
        ''' 
        initialization our method (which DB file didn't do in asynch ;))
        '''
        # keys and tokens from the Twitter Dev Console 
        consumer_key = 'oUkORoc3PPmAChcnxwfSKbRFL'
        consumer_secret = 'PmASGyF92keg7uTx8j4sNvRtWxzvw8Xm88rpxN7AqECj7YBLDe'
        access_token = '65187050-L0WLjPJCAGuvcfQCYhqtJ95MtCPezJGWzhN9niDBa'
        access_token_secret = 'C8OsEpteeL0s9xXoUU5wiip4plExftfIT72qfV7tc3xMf'
  
        # attempt authentication 
        try: 
            # create OAuthHandler object 
            self.auth = OAuthHandler(consumer_key, consumer_secret) 
            # set access token and secret 
            self.auth.set_access_token(access_token, access_token_secret) 
            # create tweepy API object to fetch tweets 
            self.api = tweepy.API(self.auth) 
        except: 
            print("Error: Authentication Failed")
  
    def clean_tweet(self, tweet): 
        # This function cleans tweet text by removing hyperlinks and special characters
        return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())
  
    def get_tweet_sentiment(self, tweet): 
        ''' 
        This function classifies sentiment of passed tweet 
        using textblob's predefined and built sentiment method
        '''
        # create TextBlob object of passed tweet text 
        analysis = TextBlob(self.clean_tweet(tweet)) 
        # set sentiment 
        if analysis.sentiment.polarity > 0: 
            return 'positive'
        elif analysis.sentiment.polarity < 0:
            return 'negative'
  
    def get_tweets(self, query, count = 10):
        ''' 
        This function then fetches tweets and parses them, using lists and dictionaries to store them 
        '''
        # empty list to store parsed tweets 
        tweets = []
  
        try: 
            # call twitter api to fetch tweets 
            fetched_tweets = self.api.search(q = query, count = count)
  
            # parsing tweets one by one 
            for tweet in fetched_tweets: 
                # empty dictionary to store required params of a tweet 
                parsed_tweet = {}
  
                # saving text of tweet 
                parsed_tweet['text'] = tweet.text 
                # saving sentiment of tweet 
                parsed_tweet['sentiment'] = self.get_tweet_sentiment(tweet.text) 
  
                # appending parsed tweet to tweets list 
                if tweet.retweet_count > 0: 
                    # if tweet has retweets, ensure that it is appended only once 
                    if parsed_tweet not in tweets: 
                        tweets.append(parsed_tweet) 
                else: 
                    tweets.append(parsed_tweet) 
  
            # return parsed tweets 
            return tweets 
  
        except tweepy.TweepError as e: 
            # print error (if any) 
            print("Error : " + str(e))

# Then we define and call our main function to get back all our tweets:
def main(q): 
    # creating object of TwitterClient Class 
    api = TwitterClient() 
    # calling function to get tweets mentioning a specific phrase
    tweets = api.get_tweets(query = q, count = 5000)
  
    # picking positive tweets from tweets 
    ptweets = [tweet for tweet in tweets if tweet['sentiment'] == 'positive'] 
    # percentage of positive tweets
    print("Positive tweets percentage: {} %".format(100*len(ptweets)/len(tweets))) 
    # picking negative tweets from tweets 
    ntweets = [tweet for tweet in tweets if tweet['sentiment'] == 'negative'] 
    # percentage of negative tweets 
    print("Negative tweets percentage: {} %".format(100*len(ntweets)/len(tweets))) 
  
    # printing first 5 positive tweets 
    print("\n\nPositive tweets:") 
    for tweet in ptweets[:5]: 
        print(tweet['text']) 
  
    # printing first 5 negative tweets 
    print("\n\nNegative tweets:") 
    for tweet in ntweets[:5]: 
        print(tweet['text']) 
if __name__ == "__main__": 
    # calling main function 
    main('sad') # we parse our query into our main function. Here a simple test to display our program works correctly

Positive tweets percentage: 20.512820512820515 %
Negative tweets percentage: 70.51282051282051 %


Positive tweets:
RT @WeAreQ4451: What a shame!! How can anyone in the Liberla party, MSM or any sane person justify this? I can honestly say I'm embarrassed…
RT @itsyaboyyheng: gets home from a great day with friends

don’t do it 
don’t do it
don’t do it
don’t do it
don’t do it 
don’t do it
don’t…
RT @Ziyaatulhaqq: I’m not even pro marriage. But I’ve noticed how women have nothing good to say about marriage ths days. Please MARRIAGE I…
Watching #ChildrenInNeed. In bits. So many sad stories. So many amazing brave kids...x
why is it when i am rwally sad that nobody is online lmao


Negative tweets:
RT @namtiddies: i still can't get over the fact that fucking tata hides in his dimple im so fucking sad  https://t.co/SsdLnJ1Ndr
RT @JULIIIEETTTEEE: ...i just know this is gonna be sad as fuck😔 https://t.co/a6F9DQbC3N
RT @Runiktv: i got a happy ass personality w a sad soul, sorry if i b acting we

As we would expect, we see much more negative sentiment tweets when we query 1,000 tweets from twitter under the search term "sad". We can verify this again by looking for "happy" tweets:

In [31]:
main('happy')

Positive tweets percentage: 84.44444444444444 %
Negative tweets percentage: 8.88888888888889 %


Positive tweets:
RT @Peruzzi_VIBES: I celebrate everything! I catch every cruise!! I enjoy every minute of my life and these moments!!! Won’t even act like…
So like y’all are happy that school is closed due to the air condition ?
People are making jokes cause others are w… https://t.co/U1iGsPDNlm
RT @ohteenquotes: Being happy is a very personal thing and it has nothing to do with someone else.
Happy Friday everyone. We hope that everyone has a great wonderful weekend!!!!!!! #theoutcastclub2 #Happy #Friday… https://t.co/3wvuRWNQdK
@grounder761 Hope your day was a good one Chris! Wishing you a happy weekend ☺️
RT @MichaelaOkla: If you’re happy and you know it yeet your hands
[yeet yeet]
RT @OURUFNEKS: We know we belong to the land, and the land we belong to is GRAND! 

Happy Birthday, Oklahoma! https://t.co/NOva37ME30
RT @sahluwal: Happy Friday! Laura Ingram’s radio show was dropped by Fox.
R

Happy is overwhelmingly positive! But now, let's take a look at 5,000 tweets (so we'll go back, adjust our function, and run it again, based on the search term of "Iran Nuclear Deal". Why? We want to see how the twitterverse feels about this right now, as it will be a proxy for whether people are thinking or speaking about Iran positively or negatively:

In [35]:
main('iran nuclear deal')

Positive tweets percentage: 22.972972972972972 %
Negative tweets percentage: 18.91891891891892 %


Positive tweets:
✌ @Reading "Iran deal: More sanctions from Trump mean misery for Iran's fliers" https://t.co/9dIIOP89GZ
@JoyVBehar How is Democracy at risk? Passing huge legislation after lying about keeping your doctor and coverage, “… https://t.co/GbL6tEKAXH
Aside from all the accusations, who in their right mind would make a deal w/ Trump, when he just tore up the Iran N… https://t.co/7A8zDWs1Nw
RT @HeshmatAlavi: 9)
…
Here @michaelkugelman cites a “scholar” to promote the main purpose of his article:

Trump adopting Obama’s approach…
RT @TheIranPulse: The reimposition of #sanctions is directly hitting civil aviation in #Iran, dashing the nuclear deal’s promises of safer…


Negative tweets:
RT @AtlanticCouncil: [ANALYSIS] Macron has yet to achieve any major foreign policy breakthroughs and his attempts to persuade Trump to stay…
RT @dotcomdon1: @BreitbartNews If you think that’s bad, w

Interestingly, we can see that Iran is typically referred to positively. This can be deceiving however, as people could be positively speaking about withdrawl from the iran nuclear deal, and may have negative sentiment toward Iran, or toward the deal as a whole. This is one limitation of sentiment analysis, which we can see in action.

Let's try this again. Lately, LaMelo Ball - the younger brother of Lonzo Ball, a professional NBA player and a top amateur basketball player himself - has seen his school's opponents canceling games against him and his team because he declared professional previously. That is, many schools don't believe he is a "valid" amateur athlete. Thus, let's see how the twitterverse feels about LaMelo Ball here:

In [36]:
main('LaMelo Ball')

Positive tweets percentage: 71.7948717948718 %
Negative tweets percentage: 7.6923076923076925 %


Positive tweets:
RT @JordanStrack: Ok. So here’s the word:

Tomorrow (Saturday) at 9 am will be Rogers vs. Spire followed by St. Francis vs. Spire at 10 am.…
I liked a @YouTube video https://t.co/hqpggt8tmC LAMELO BALL PLAYS BASKETBALL LIKE...
RT @TMZ: Why Opponents Of LaMelo Ball’s New High School Team Cancel Games (via @NESN) https://t.co/3BjLsBuA5L
Lol Lamelo Ball is being treated like a cancer. If he makes it to the league, I'd be thoroughly surprised.
Why Opponents Of LaMelo Ball’s New High School Team Cancel Games (via @NESN) https://t.co/3BjLsBuA5L


Negative tweets:
RT @VSUMPC: 🏀 Life Christian Academy will play SPIRE Institute and LaMelo Ball in a high school basketball non-conference game on Wednesday…
Little kids be thinking you’re talking about LaMelo Ball when you’re talking about Carmelo Anthony
🏀 Life Christian Academy will play SPIRE Institute and LaMelo Ball in a high scho

Again, we see some limitations of our textblob package. At least one tweet in the positive sentiment analysis looks negative, but both our tweets in negative tweets appear neutral to positive? One thing we can do is tweak our sentiment score filters. The scores are on a scale of -1 to +1, so let's adjust them in our function above to 0.5 (were previously 0) and -0.5, to get really positive and really negative scores only:

In [37]:
class TwitterClient(object): 
    def __init__(self): 
        consumer_key = 'oUkORoc3PPmAChcnxwfSKbRFL'
        consumer_secret = 'PmASGyF92keg7uTx8j4sNvRtWxzvw8Xm88rpxN7AqECj7YBLDe'
        access_token = '65187050-L0WLjPJCAGuvcfQCYhqtJ95MtCPezJGWzhN9niDBa'
        access_token_secret = 'C8OsEpteeL0s9xXoUU5wiip4plExftfIT72qfV7tc3xMf'
        
        try: 
            self.auth = OAuthHandler(consumer_key, consumer_secret) 
            self.auth.set_access_token(access_token, access_token_secret)
            self.api = tweepy.API(self.auth) 
        except: 
            print("Error: Authentication Failed")
  
    def clean_tweet(self, tweet): 
        return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())
  
    def get_tweet_sentiment(self, tweet): 
        analysis = TextBlob(self.clean_tweet(tweet)) 
        if analysis.sentiment.polarity > 0.5: # here we changed our positive score threshold
            return 'positive'
        elif analysis.sentiment.polarity < -0.5: # here we changed our negative score threshold
            return 'negative'
  
    def get_tweets(self, query, count = 10):
        tweets = []
  
        try: 
            fetched_tweets = self.api.search(q = query, count = count)
  
            for tweet in fetched_tweets: 
                parsed_tweet = {}
  
                parsed_tweet['text'] = tweet.text 
                parsed_tweet['sentiment'] = self.get_tweet_sentiment(tweet.text) 
  
                if tweet.retweet_count > 0: 
                    if parsed_tweet not in tweets: 
                        tweets.append(parsed_tweet) 
                else: 
                    tweets.append(parsed_tweet) 
  
            return tweets 
  
        except tweepy.TweepError as e: 
            print("Error : " + str(e))

def main(q): 
    api = TwitterClient() 
    tweets = api.get_tweets(query = q, count = 5000)
  
    ptweets = [tweet for tweet in tweets if tweet['sentiment'] == 'positive'] 
    print("Positive tweets percentage: {} %".format(100*len(ptweets)/len(tweets))) 
    ntweets = [tweet for tweet in tweets if tweet['sentiment'] == 'negative'] 
    print("Negative tweets percentage: {} %".format(100*len(ntweets)/len(tweets))) 
  
    print("\n\nPositive tweets:") 
    for tweet in ptweets[:5]: 
        print(tweet['text']) 
  
    print("\n\nNegative tweets:") 
    for tweet in ntweets[:5]: 
        print(tweet['text']) 
        
if __name__ == "__main__": 
    main('LaMelo Ball')

Positive tweets percentage: 10.526315789473685 %
Negative tweets percentage: 0.0 %


Positive tweets:
I liked a @YouTube video https://t.co/hqpggt8tmC LAMELO BALL PLAYS BASKETBALL LIKE...
I liked a @YouTube video https://t.co/GF6f6ZcCO4 This 7’7" HS Junior Is Now LAMELO BALL'S Teammate. Can Robert Bobroczky Go From
I liked a @YouTube video https://t.co/mnIRGVAw3N The Professor Reacts to Lamelo Ball's 92pts
@Djcammon They told you what it was? When I got mine restricted for dogging lamelo ball for slapping a man they never told me! Lol


Negative tweets:


No negative tweets about LaMelo? Maybe people really like the kid :).
    
Let's try with a more polarizing, and prominent figure: Donald Trump. How does the twitterverse feel about The Donald today?

In [38]:
main('Donald Trump')

Positive tweets percentage: 3.389830508474576 %
Negative tweets percentage: 5.084745762711864 %


Positive tweets:
RT @kimguilfoyle: Heading all across the country today with the incredible @DonaldJTrumpJr - we are hitting five states doing six stops put…
https://t.co/HtDk2F6s5S

Trump wrote his own answers. God, I hope so!!!!


Negative tweets:
Donald Trump’s defense mechanisms have truly failed him (Insanely projects chaos and evil in himself on others!)… https://t.co/d7NdAfYQGK
RT @CNN: Democratic Sen. Tammy Duckworth said President Donald Trump has "failed miserably" in his attempts to support US troops. Duckworth…
'the worst communicator' https://t.co/3A2nuJiari via @DailyMailCeleb


Here we see something interesting. With our main threshold set to 0.5 and -0.5 for the scores of sentiment, we see that Donal Trump is being tweeted about in a marginally negative context, when analysing the latest 5,000 tweets about him. If, however, we were to return our score thresholds to zero - what do people tend to think about Trump?

In [40]:
class TwitterClient(object): 
    def __init__(self): 
        consumer_key = 'oUkORoc3PPmAChcnxwfSKbRFL'
        consumer_secret = 'PmASGyF92keg7uTx8j4sNvRtWxzvw8Xm88rpxN7AqECj7YBLDe'
        access_token = '65187050-L0WLjPJCAGuvcfQCYhqtJ95MtCPezJGWzhN9niDBa'
        access_token_secret = 'C8OsEpteeL0s9xXoUU5wiip4plExftfIT72qfV7tc3xMf'
        
        try: 
            self.auth = OAuthHandler(consumer_key, consumer_secret) 
            self.auth.set_access_token(access_token, access_token_secret)
            self.api = tweepy.API(self.auth) 
        except: 
            print("Error: Authentication Failed")
  
    def clean_tweet(self, tweet): 
        return ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", tweet).split())
  
    def get_tweet_sentiment(self, tweet): 
        analysis = TextBlob(self.clean_tweet(tweet)) 
        if analysis.sentiment.polarity > 0: # here we changed our positive score threshold
            return 'positive'
        elif analysis.sentiment.polarity < 0: # here we changed our negative score threshold
            return 'negative'
  
    def get_tweets(self, query, count = 10):
        tweets = []
  
        try: 
            fetched_tweets = self.api.search(q = query, count = count)
  
            for tweet in fetched_tweets: 
                parsed_tweet = {}
  
                parsed_tweet['text'] = tweet.text 
                parsed_tweet['sentiment'] = self.get_tweet_sentiment(tweet.text) 
  
                if tweet.retweet_count > 0: 
                    if parsed_tweet not in tweets: 
                        tweets.append(parsed_tweet) 
                else: 
                    tweets.append(parsed_tweet) 
  
            return tweets 
  
        except tweepy.TweepError as e: 
            print("Error : " + str(e))

def main(q): 
    api = TwitterClient() 
    tweets = api.get_tweets(query = q, count = 5000)
  
    ptweets = [tweet for tweet in tweets if tweet['sentiment'] == 'positive'] 
    print("Positive tweets percentage: {} %".format(100*len(ptweets)/len(tweets))) 
    ntweets = [tweet for tweet in tweets if tweet['sentiment'] == 'negative'] 
    print("Negative tweets percentage: {} %".format(100*len(ntweets)/len(tweets))) 
  
    print("\n\nPositive tweets:") 
    for tweet in ptweets[:5]: 
        print(tweet['text']) 
  
    print("\n\nNegative tweets:") 
    for tweet in ntweets[:5]: 
        print(tweet['text']) 
        
if __name__ == "__main__": 
    main('Donald Trump')

Positive tweets percentage: 31.34328358208955 %
Negative tweets percentage: 20.895522388059703 %


Positive tweets:
RT @PalmerReport: Donald Trump’s day so far:

- Judge reinstates Jim Acosta’s press pass
- Mueller arresting big fish within ten days
- Jul…
California, you know how to welcome Trump tomorrow as he goes there to see firefighters only

TRUMP SUGGESTS NOT RA… https://t.co/vtdPNYyE3T
RT @itsJeffTiedrich: Melania Trump calls for the ouster of a top national security aide and then returns to her day job: sitting in her roo…
RT @HuffPost: President Donald Trump says he was too busy "on calls" to honor soldiers on Veterans Day. https://t.co/XNthkSt2C2
1/2 Love it!: "Whitakers all the way down"--nice turn of phrase by the ever-estimable David Brooks for the inept, c… https://t.co/MqCowWDhxW


Negative tweets:
#DonaldTrump has "failed miserably" in supporting troops and their families, says #Democratsenator and...… https://t.co/zUc7fnyeQM
RT @CNNTonight: .@DonLemon: Pres Trump, in

It appears that the latest tweets about the POTUS are mostly positive. Finally, we look at one last objective measure: the stock market.

In [41]:
main('stock market')

Positive tweets percentage: 31.46067415730337 %
Negative tweets percentage: 17.97752808988764 %


Positive tweets:
RT @FinancialTimes: As China’s economic growth slows and a trade war with the US damages consumer sentiment and the stock market, Beijing i…
RT @psis226k: Thanks to our long standing JPMorgan Chase PENCIL partnership, our 6th graders just picked their stocks playing the Stock Mar…
🇺🇸 Health care, energy companies power US stock market higher 📈 https://t.co/3soF95znFY YAHOO!
Adage 格言 According to an old adage on Wall Street, the stock market can deal with good news and bad, but it cannot tackle uncertainty.
RT @bopinion: It seems like a good time to revisit the stock market's performance since Election Day 2016 https://t.co/x5OiOFjYOr


Negative tweets:
@vogul1960 Have you seen him celebrating when something wrong happen a to Pakistan. When stock market went down he was like Yahoooooo.
CNBC’s Jim Cramer says stock market is in ‘a very serious correction’ — and there’s nowhe

Good news today, traders, the sentiment on The Street is upbeat!

# Conclusion
What can we learn from all of this? First and foremost, we can learn that SA (sentiment analysis) is tough. We as humans read things and interpret them based on our points of view. A computational method of analysis for that is inherently going to contain some biases, and so the question becomes - how can we overcome those biases to be more objective? Things like language, context, past history, sarcasm, etc. are all phenomena we consider as humans when we assess the "sentiment" of a statement. When a computer does that, it can lead to some interesting results like with LaMelo Ball, the Iran Deal, or even Donald Trump.

We also see, though, that in the case of Donald Trump for instance - the textblob SA package does a decent job. Of the latest 5,000 tweets, only about 30% of them are positive when referencing Trump, to 20% negative. That means about half are "neutral". We could potentially dive deeper and look at the netural tweets to see if they are coming from news organizations or supposedly "unbiased" journalists, to disprove "fake news" and bias claims for instance. However, we can see here that this is a narrow margin, which holds with our view of Trump in the publich sphere. He can sometimes be positive, but narrowly, thanks to his base. That holds true here (sic).

So in closing, we took in live tweets, bucketed them by their sentiment score, and then played around with different queries to test our program. Business, politicians, and even public service officers can do the same to aid them in the completion of their daily jobs!