# EXTRACTING TWITTER DATA FOR SENTIMENT ANALYSIS

First of all, we will import our libraries like tweepy, pandas and numpy.

In [57]:
import tweepy           # To consume Twitter's API
import pandas as pd     # To handle data
import numpy as np      # For number computing

# For plotting and visualization:
from IPython.display import display
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

## Importing data from Twitter 

In order to extract tweets for a posterior analysis, we need to get access to our Twitter developer account. After having a Twitter developer account, we can have these API tokens.

In [58]:
# Extracting twitter access

# Consumer:
CONSUMER_KEY    = 'bBFDIbQqe6SaaVplFIpQF5AwQ'
CONSUMER_SECRET = 'N1tBR9FqSv6TlrE2ZfvJMA68pcYzXZmcArpYUcyhsaOCOSyBh6'

# Access:
ACCESS_TOKEN  = '714559323746119681-NCT0C5VkfsmxpsmJGJrya9qilLeopuM'
ACCESS_SECRET = 'SJiGWg3L6K1D304JNt1Coai3Zhca3FWZprqmOviAtYyTS'

Here, we will be importing our access keys and these will allow us to use the keys as variables.

In [59]:
from credentials import *    

# API's setup:
def twitter_setup():
    """
    Utility function to setup the Twitter's API
    with our access keys provided.
    """
    # Authentication and access using keys:
    auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
    auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)

    # Return API with authentication:
    api = tweepy.API(auth)
    return api

In [60]:
# We create an extractor object:
extractor = twitter_setup()

# We create a tweet list as follows:
tweets = extractor.user_timeline(screen_name="JoeBiden", count=200)
print("Number of tweets extracted: {}.\n".format(len(tweets)))

# We print the most recent 5 tweets:
print("5 recent tweets:\n")
for tweet in tweets[:5]:
    print(tweet.text)
    print()

Number of tweets extracted: 200.

5 recent tweets:

The middle class built America, and unions built the middle class.  

This Labor Day, we honor all the workers, and… https://t.co/bJjfCLdLWC

RT @POTUS: I know some folks are hesitant to get vaccinated, but the vaccine is safe, effective, and the best way to protect yourself and t…

We are now the only developed country in the world whose economy is bigger than it was before the pandemic. Our eco… https://t.co/bYtOyMJt8I

The Supreme Court's overnight ruling is an unprecedented assault on constitutional rights and requires an immediate… https://t.co/WR1tWg5Lfo

To everyone who is still in harm's way and for all those struggling to deal with the aftermath of the storms and fi… https://t.co/kTr5qgq4jM



## Creating a Dataframe

In [61]:
# We create a pandas dataframe as follows:
data = pd.DataFrame(data=[tweet.text for tweet in tweets], columns=['Tweets'])

# We display the first 10 elements of the dataframe:
display(data.head(10))

Unnamed: 0,Tweets
0,"The middle class built America, and unions bui..."
1,RT @POTUS: I know some folks are hesitant to g...
2,We are now the only developed country in the w...
3,The Supreme Court's overnight ruling is an unp...
4,To everyone who is still in harm's way and for...
5,RT @POTUS: I want to express my heartfelt than...
6,Texas law SB8 will significantly impair people...
7,I was not going to extend this forever war. ht...
8,This decision about Afghanistan is not just ab...
9,"There is nothing low-grade, low-risk, or low-c..."


In [62]:
# Internal methods of a single tweet object:
print(dir(tweets[0]))

['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getstate__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_api', '_json', 'author', 'contributors', 'coordinates', 'created_at', 'destroy', 'entities', 'favorite', 'favorite_count', 'favorited', 'geo', 'id', 'id_str', 'in_reply_to_screen_name', 'in_reply_to_status_id', 'in_reply_to_status_id_str', 'in_reply_to_user_id', 'in_reply_to_user_id_str', 'is_quote_status', 'lang', 'parse', 'parse_list', 'place', 'possibly_sensitive', 'retweet', 'retweet_count', 'retweeted', 'retweets', 'source', 'source_url', 'text', 'truncated', 'user']


Here in this part, we are printing info from the first tweet and adding relevant data.

In [63]:
print(tweets[0].id)
print(tweets[0].created_at)
print(tweets[0].source)
print(tweets[0].favorite_count)
print(tweets[0].retweet_count)
print(tweets[0].geo)
print(tweets[0].coordinates)
print(tweets[0].entities)

1434894331845005318
2021-09-06 15:00:42
Sprout Social
13058
1969
None
None
{'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/bJjfCLdLWC', 'expanded_url': 'https://twitter.com/i/web/status/1434894331845005318', 'display_url': 'twitter.com/i/web/status/1…', 'indices': [117, 140]}]}


In [64]:
data['len']  = np.array([len(tweet.text) for tweet in tweets])
data['ID']   = np.array([tweet.id for tweet in tweets])
data['Date'] = np.array([tweet.created_at for tweet in tweets])
data['Source'] = np.array([tweet.source for tweet in tweets])
data['Likes']  = np.array([tweet.favorite_count for tweet in tweets])
data['RTs']    = np.array([tweet.retweet_count for tweet in tweets])

Now, let us display first 10 elements from dataframe:

In [65]:
display(data.head(10))

Unnamed: 0,Tweets,len,ID,Date,Source,Likes,RTs
0,"The middle class built America, and unions bui...",140,1434894331845005318,2021-09-06 15:00:42,Sprout Social,13058,1969
1,RT @POTUS: I know some folks are hesitant to g...,140,1434188759747211267,2021-09-04 16:17:00,Twitter for iPhone,0,3240
2,We are now the only developed country in the w...,140,1433891056953790482,2021-09-03 20:34:03,Sprout Social,23882,3181
3,The Supreme Court's overnight ruling is an unp...,140,1433562887298199555,2021-09-02 22:50:01,Sprout Social,79058,9889
4,To everyone who is still in harm's way and for...,140,1433546883205578756,2021-09-02 21:46:25,Sprout Social,21524,2336
5,RT @POTUS: I want to express my heartfelt than...,140,1433515992492253187,2021-09-02 19:43:40,Twitter Web App,0,2308
6,Texas law SB8 will significantly impair people...,140,1433183957424689157,2021-09-01 21:44:17,Sprout Social,45763,6559
7,I was not going to extend this forever war. ht...,67,1433063313362104325,2021-09-01 13:44:53,Sprout Social,22266,2228
8,This decision about Afghanistan is not just ab...,140,1432884413235347462,2021-09-01 01:54:00,Sprout Social,110957,10798
9,"There is nothing low-grade, low-risk, or low-c...",131,1432876149437255683,2021-09-01 01:21:10,Sprout Social,23331,2723


## Sentiment Analysis

Here, we are doing a sentiment analysis by calling the Sentiment Intensity Analyzer. Then, using list comprehensions to create a new column in the dataframe for each polarity_scores metric. The Python library VADER Sentiment Analysis make it super easy to generate simple sentiment metrics without training a model. They offer out of the box solutions and are easy to interpret.

VADER stands for Valence Aware Dictionary and sEntiment Reasoner. It is a simple lexicon and rule-based model for general sentiment analysis.

In [68]:
!pip install vaderSentiment

Collecting vaderSentiment
  Downloading vaderSentiment-3.3.2-py2.py3-none-any.whl (125 kB)
Installing collected packages: vaderSentiment
Successfully installed vaderSentiment-3.3.2


In [71]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer= SentimentIntensityAnalyzer()

In [72]:
#load VADER
analyzer = SentimentIntensityAnalyzer()
#Add VADER metrics to dataframe
data['compound'] = [analyzer.polarity_scores(v)['compound'] for v in data['Tweets']]
data['neg'] = [analyzer.polarity_scores(v)['neg'] for v in data['Tweets']]
data['neu'] = [analyzer.polarity_scores(v)['neu'] for v in data['Tweets']]
data['pos'] = [analyzer.polarity_scores(v)['pos'] for v in data['Tweets']]
data.head(10)

Unnamed: 0,Tweets,len,ID,Date,Source,Likes,RTs,compound,neg,neu,pos
0,"The middle class built America, and unions bui...",140,1434894331845005318,2021-09-06 15:00:42,Sprout Social,13058,1969,0.4939,0.0,0.862,0.138
1,RT @POTUS: I know some folks are hesitant to g...,140,1434188759747211267,2021-09-04 16:17:00,Twitter for iPhone,0,3240,0.9565,0.038,0.529,0.433
2,We are now the only developed country in the w...,140,1433891056953790482,2021-09-03 20:34:03,Sprout Social,23882,3181,0.0,0.0,1.0,0.0
3,The Supreme Court's overnight ruling is an unp...,140,1433562887298199555,2021-09-02 22:50:01,Sprout Social,79058,9889,-0.0516,0.17,0.67,0.161
4,To everyone who is still in harm's way and for...,140,1433546883205578756,2021-09-02 21:46:25,Sprout Social,21524,2336,-0.4215,0.109,0.891,0.0
5,RT @POTUS: I want to express my heartfelt than...,140,1433515992492253187,2021-09-02 19:43:40,Twitter Web App,0,2308,0.872,0.0,0.669,0.331
6,Texas law SB8 will significantly impair people...,140,1433183957424689157,2021-09-01 21:44:17,Sprout Social,45763,6559,0.4939,0.0,0.842,0.158
7,I was not going to extend this forever war. ht...,67,1433063313362104325,2021-09-01 13:44:53,Sprout Social,22266,2228,-0.6617,0.404,0.596,0.0
8,This decision about Afghanistan is not just ab...,140,1432884413235347462,2021-09-01 01:54:00,Sprout Social,110957,10798,0.0,0.0,1.0,0.0
9,"There is nothing low-grade, low-risk, or low-c...",131,1432876149437255683,2021-09-01 01:21:10,Sprout Social,23331,2723,-0.8316,0.302,0.698,0.0


As simple as that, sentiment metrics have been added to the dataframe. Here we are calling the Sentiment Intensity Analyzer. Then using list comprehensions to create a new column in the dataframe for each polarity_scores metric. 