# Reading Data From Twitter

Install tweepy

In [1]:
!pip install git+https://github.com/tweepy/tweepy.git

Defaulting to user installation because normal site-packages is not writeable


In [None]:
!pip install TextBlob

## Insert bearer token

1. You will need a bearer token from the twitter development portal: https://developer.twitter.com/en
2. Place the token in a variable

In [7]:
bearer_token = 'AAAAAAAAAAAAAAAAAAAAAD7cjQEAAAAA30Ipcn3vDlWpr6XIj0h0xZDa4io%3DVL20hy1g6ZsgLnvZOxMX24F7w3IaHRzTIB1CibUpu5XmZsZqXx'

Create a client with your bearer_token

In [8]:
import tweepy as tw
client = tw.Client(bearer_token=bearer_token)

Search recent tweets from the official HSG Twitter account @HSGStGallen

In [9]:
response = client.search_recent_tweets('(from:HSGStGallen)')
response

Response(data=[<Tweet id=1597865509005176832 text='Gerade bei #Krisen wie der #Pandemie ist wissenschaftliche Expertise wichtig. In politischem Handeln münden die Fakten aber oft nicht. Warum ist das Verhältnis #Wissenschaft &amp; #Politik so schwierig? #HSGProf Caspar Hirschi @hsg_shss war zu Gast bei @3sat https://t.co/1gwkJ3A42f https://t.co/CeKPCZi0ra'>, <Tweet id=1597501566818582532 text='Morgen ist Andreas Hänni zu Gast im Sports Economics Talk. Er ist ein ehemaliger Schweizer Eishockeyprofi und im Jahr 2016 gründete er die Firma «49 mining, analytics &amp; consulting GmbH», zu deren Kunden alle Schweizer Profi-Eishockeymannschaften und einige Amateurvereine zählen. https://t.co/xzLOelviK0'>, <Tweet id=1597207834005233664 text='Die HSG bildet Medizinstudierende an ihrer School of Medicine für den Ernstfall aus: Die angehenden Ärztinnen und Ärzte trainieren bei der Behandlung von Schauspieler:innen, die medizinische Probleme vorspielen, für die Berufspraxis.\n\nhttps://t.co/74k1mu

We get an response object with
- data: The returned tweets
- meta: A dictionary with additional information. We got the information that 10 results have been returned along with a next_token

Try to request more tweets from HSG. The documentation tells us that this endpoint allows a maximum number of tweets per request of 100 and returns the tweets of the last seven days

In [10]:
response = client.search_recent_tweets('(from:HSGStGallen)', max_results=100)
response

Unauthorized: 401 Unauthorized
Unauthorized

Around 10 tweets, not too many. Try to request all tweets from HSG

In [54]:
response = client.search_all_tweets('(from:HSGStGallen)', max_results=100)
response

Forbidden: 403 Forbidden
When authenticating requests to the Twitter API v2 endpoints, you must use keys and tokens from a Twitter developer App that is attached to a Project. You can create a project via the developer portal.

The API returns a '403 Forbidden' status code. It seems we have correct authentication (otherwise it would be 401 Unauthenticated), but are missing the required access rights. The documentation informs us, that the full-archive search requires academic research access. 

# Pagination

Search for more tweets: Get all recent tweets that contain the hashtag FIFAWORLDCUP and are written in German

In [4]:
query = "#FifaWorldCup lang:en"
response = client.search_recent_tweets(query, max_results=100)
response



Thats a lot! It seems we have hit the maximum number of results per request

In [5]:
print(response.meta["result_count"])
print(len(response.data))

100
100


The API provides aus with a next_token meta information, to indicate that there are more results. But as only 100 tweets are returned per request, we need to start a new request. 

In [6]:
next_token = response.meta["next_token"]
next_token

'b26v89c19zqg8o3fpzhlqmd8a7985914sej8l5559jbzx'

Request the next 100 tweets for this search

In [7]:
response = client.search_recent_tweets(query, max_results=100, next_token=next_token)
response



We got another 100 tweets. This process is called *Pagination*. Imagine a telefone book: Each page can store 100 tweets and you have to turn to the next page to get more information.

To ease this process tweepy has a Paginator object 

In [11]:
tweets_paginator = tw.Paginator(client.search_recent_tweets, query, max_results=100).flatten(limit=250)
tweets_paginator

<generator object Paginator.flatten at 0x000001B59B0A9DD0>

We got a paginator object that will handle the requests for us. When we iterate over the object, it will return the first 250 tweets. 

In [10]:
for tweet in tweets_paginator:
    print(tweet.text)

Generator are one time use objects. Rerunning the below code cell results in no output

In [121]:
for tweet in tweets_paginator:
    print(tweet.text)

To store the tweets in a persistant manner, we are storring them in a list. Remember to rerun the Pagination request each time!

In [122]:
tweets_paginator = tw.Paginator(client.search_recent_tweets, query, max_results=100).flatten(limit=250)

tweets = []

for tweet in tweets_paginator:
    tweets.append(tweet.text)

tweets

['RT @pressat: RED BULL HEADQUARTERS HIT BY UKRAINE PROTEST\n\n"RED BULL GIVES PUTIN WINGS"\n\nhttps://t.co/l841e5rbj7\n\n@SolidarityUKR #Ukraine #…',
 "RT @jcbehrends: Personally, I don't believe #Kadyrov or #Prigozhin will topple #Putin. \nI regard them as warlords. They stand for the disin…",
 'RT @pressat: RED BULL HEADQUARTERS HIT BY UKRAINE PROTEST\n\n"RED BULL GIVES PUTIN WINGS"\n\nhttps://t.co/l841e5rbj7\n\n@SolidarityUKR #Ukraine #…',
 'RT @pressat: RED BULL HEADQUARTERS HIT BY UKRAINE PROTEST\n\n"RED BULL GIVES PUTIN WINGS"\n\nhttps://t.co/l841e5rbj7\n\n@SolidarityUKR #Ukraine #…',
 'RT @pressat: RED BULL HEADQUARTERS HIT BY UKRAINE PROTEST\n\n"RED BULL GIVES PUTIN WINGS"\n\nhttps://t.co/l841e5rbj7\n\n@SolidarityUKR #Ukraine #…',
 'RT @pressat: RED BULL HEADQUARTERS HIT BY UKRAINE PROTEST\n\n"RED BULL GIVES PUTIN WINGS"\n\nhttps://t.co/l841e5rbj7\n\n@SolidarityUKR #Ukraine #…',
 'RT @pressat: RED BULL HEADQUARTERS HIT BY UKRAINE PROTEST\n\n"RED BULL GIVES PUTIN WINGS"\n\nhttp

# Sentiment Analysis Tweets


In [123]:
import pandas as pd
tweets_df = pd.DataFrame(data=tweets, columns=['Tweets']).drop_duplicates()
tweets_df

Unnamed: 0,Tweets
0,RT @pressat: RED BULL HEADQUARTERS HIT BY UKRA...
1,"RT @jcbehrends: Personally, I don't believe #K..."
8,@GlasnostGone @ZelenskyyUa A true leader unlik...
14,#MikePompeo proved he’s not ready for Prime ti...
19,RT @usownstheplanet: The Embassy would like to...
22,#Ukraine after russia’s latest attack on our c...
26,If you don’t own ANY $FBX you are NOT prepared...
27,"@SenRonJohnson @SenateGOP yeah, the Russian me..."
31,"RT @wilkoskillz: Good morning sir , \nYour att..."
32,They never learnt their lesson!! 🔥💪👑❤️‍🔥 #Trut...


Recap Module 2: text mining and sentiment analysis with TextBlob

In [124]:
from textblob import TextBlob

# function that only returns the polarity score of TextBlob
def polarity(text):
    try:
        return TextBlob(text).sentiment.polarity
    except:
        return None

# function that only returns the subjectivity score of TextBlob
def subjectivity(text):
    try:
        return TextBlob(text).sentiment.subjectivity
    except:
        return None

In [125]:
tweets_df["Polarity"]= tweets_df["Tweets"].apply(polarity)
tweets_df["Subjectivity"]= tweets_df["Tweets"].apply(subjectivity)
tweets_df

Unnamed: 0,Tweets,Polarity,Subjectivity
0,RT @pressat: RED BULL HEADQUARTERS HIT BY UKRA...,0.0,0.0
1,"RT @jcbehrends: Personally, I don't believe #K...",0.0,0.3
8,@GlasnostGone @ZelenskyyUa A true leader unlik...,0.1125,0.5125
14,#MikePompeo proved he’s not ready for Prime ti...,-0.109524,0.577381
19,RT @usownstheplanet: The Embassy would like to...,0.0,0.0
22,#Ukraine after russia’s latest attack on our c...,0.25,0.85
26,If you don’t own ANY $FBX you are NOT prepared...,0.3,0.5625
27,"@SenRonJohnson @SenateGOP yeah, the Russian me...",0.125,0.1
31,"RT @wilkoskillz: Good morning sir , \nYour att...",0.7,0.6
32,They never learnt their lesson!! 🔥💪👑❤️‍🔥 #Trut...,0.0,0.0


In [126]:
tweets_df.mean()

  tweets_df.mean()


Polarity        0.042639
Subjectivity    0.342943
dtype: float64