# Get Old Tweets
Downloading tweets using a Python 3 library and a corresponding command line utility for accessing old tweets.
https://pypi.org/project/GetOldTweets3/

In [1]:
# pip install GetOldTweets3

In [2]:
import GetOldTweets3 as got
import pandas as pd

### TwitterCriteria: A collection of search parameters to be used together with TweetManager.

- setUsername (str or iterable): An optional specific username(s) from a twitter account (with or without "@").
- setSince (str. "yyyy-mm-dd"): A lower bound date (UTC) to restrict search.
- setUntil (str. "yyyy-mm-dd"): An upper bound date (not included) to restrict search.
- setQuerySearch (str): A query text to be matched.
- setTopTweets (bool): If True only the Top Tweets will be retrieved.
- setNear(str): A reference location area from where tweets were generated.
- setWithin (str): A distance radius from "near" location (e.g. 15mi).
- setMaxTweets (int): The maximum number of tweets to be retrieved. If this number is unsetted or lower than 1 all possible tweets will be retrieved.

### Get tweets by query search

In [3]:
# Example:
tweetCriteria = got.manager.TweetCriteria().setQuerySearch('europe refugees')\
                                           .setSince("2015-05-01")\
                                           .setUntil("2015-09-30")\
                                           .setMaxTweets(1)
tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
print(tweet.text)

[106] Europe's Refugees &amp; American Elections w/ Chris Hedges https://youtu.be/GYqgj3l4r18 via @YouTube


### Defining criteria

In [4]:
country = 'portugal'
query_search = 'Uber Eats'
start_date = '2020-03-15'
end_date = '2020-07-15'
max_tweets = 10000

### Requesting by criterias

In [5]:
tweetCriteria = got.manager.TweetCriteria().setQuerySearch(query_search)\
                                           .setSince(start_date)\
                                           .setUntil(end_date)\
                                           .setMaxTweets(max_tweets)\
                                           .setEmoji("unicode")\
                                           .setNear(country)

tweet = got.manager.TweetManager.getTweets(tweetCriteria)
print(len(tweet))

21


# Searching in multiple countries

In [97]:
locations = ['lisboa', 'porto']
query_search = 'Uber Eats'
start_date = '2020-03-15'
end_date = '2020-07-15'
max_tweets = 10000

In [98]:
# select for multiple countries
tweetCriteria_list = []
for location in locations:
    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(query_search)\
                                           .setSince(start_date)\
                                           .setUntil(end_date)\
                                           .setMaxTweets(max_tweets)\
                                           .setEmoji("unicode")\
                                           .setNear(location)
    tweetCriteria_list.append(tweetCriteria)

In [100]:
#create twitter info for each city
tweet_dict = {}
for criteria, location in zip(tweetCriteria_list, locations):
    tweets = got.manager.TweetManager.getTweets(criteria)
    tweet_dict[location] = tweets

In [101]:
#create df
tweet_df = pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in tweet_dict.items() ]))
tweet_df['tweet_count'] = tweet_df.index
tweet_df = pd.melt(tweet_df, id_vars=["tweet_count"], var_name='Contry', value_name='got_criteria')
tweet_df = tweet_df.dropna()

In [102]:
print(len(tweet_df))
tweet_df.head()

307


Unnamed: 0,tweet_count,Contry,got_criteria
0,0,lisboa,<GetOldTweets3.models.Tweet.Tweet object at 0x...
1,1,lisboa,<GetOldTweets3.models.Tweet.Tweet object at 0x...
2,2,lisboa,<GetOldTweets3.models.Tweet.Tweet object at 0x...
3,3,lisboa,<GetOldTweets3.models.Tweet.Tweet object at 0x...
4,4,lisboa,<GetOldTweets3.models.Tweet.Tweet object at 0x...


In [105]:
tweet_df['got_criteria'][2].text

'Esquece o jejum. Isso faz mal à alma. A menos que permita um copo de branco fresquinho... Que por sua vez pede um aperitivo... que por sua vez pede... Ubereats'

# Extract Info

In [106]:
#create a function to extract twitter information into a pandas df
def get_twitter_info():
    tweet_df["body"] = tweet_df["got_criteria"].apply(lambda x: x.text)
    tweet_df["date"] = tweet_df["got_criteria"].apply(lambda x: x.date)
    tweet_df["hashtags"] = tweet_df["got_criteria"].apply(lambda x: x.hashtags)
    tweet_df["link"] = tweet_df["got_criteria"].apply(lambda x: x.permalink)

In [34]:
# # Extrair corpo de texto
# def get_text(x):
#     return x.text

# tweet_df['body'] = tweet_df['got_criteria'].apply(get_text)

# tweet_df.head()

Unnamed: 0,tweet_count,Contry,got_criteria,body
0,0,portugal,<GetOldTweets3.models.Tweet.Tweet object at 0x...,Máscara será obrigatória em ambientes fechados...
1,1,portugal,<GetOldTweets3.models.Tweet.Tweet object at 0x...,Covid-19: fase final dos ensaios clínicos de v...
2,2,portugal,<GetOldTweets3.models.Tweet.Tweet object at 0x...,Saúde: Covid-19: O que mais preocupa as crianç...
3,3,portugal,<GetOldTweets3.models.Tweet.Tweet object at 0x...,Saúde: Covid-19: O que mais preocupa as crianç...
4,4,portugal,<GetOldTweets3.models.Tweet.Tweet object at 0x...,Covid-19: Governo não contabiliza prazos duran...


### Tweet: Model class that describes a specific tweet.

- id (str)
- permalink (str)
- username (str)
- to (str)
- text (str)
- date (datetime) in UTC
- retweets (int)
- favorites (int)
- mentions (str)
- hashtags (str)
- geo (str)

# To do:
1. Recuperar os dados e colocar no df
2. data cleaning com o mesmo standard do sentiment analysis tutorial
3. Run algorithm do sent.ana nesses tweets novos -> label positive or negative
4. Clustering dos tweets (unsupervised) -> ver aula de unsupervised learning
5. dashboard? -> investing.com