# Get Old Tweets
Downloading old tweets using a Python 3 library and a corresponding command line utility for accessing old tweets.
https://pypi.org/project/GetOldTweets3/

Usefull to:
- Needing access to Tweets older than 1 week (the Twitter API only serves Tweets from the past week)

- A large volume of Tweets (Twitter API limits the number of Tweets you can download after around 3,000)

In [1]:
# pip install GetOldTweets3

In [2]:
import GetOldTweets3 as got
import pandas as pd

### TwitterCriteria: A collection of search parameters to be used together with TweetManager.

- setUsername (str or iterable): An optional specific username(s) from a twitter account (with or without "@").
- setSince (str. "yyyy-mm-dd"): A lower bound date (UTC) to restrict search.
- setUntil (str. "yyyy-mm-dd"): An upper bound date (not included) to restrict search.
- setQuerySearch (str): A query text to be matched.
- setTopTweets (bool): If True only the Top Tweets will be retrieved.
- setNear(str): A reference location area from where tweets were generated.
- setWithin (str): A distance radius from "near" location (e.g. 15mi).
- setMaxTweets (int): The maximum number of tweets to be retrieved. If this number is unsetted or lower than 1 all possible tweets will be retrieved.

### Get tweets by query search

In [3]:
# Example:
tweetCriteria = got.manager.TweetCriteria().setQuerySearch('europe refugees')\
                                           .setSince("2015-05-01")\
                                           .setUntil("2015-09-30")\
                                           .setMaxTweets(1)
tweet = got.manager.TweetManager.getTweets(tweetCriteria)[0]
print(tweet.text)

[106] Europe's Refugees &amp; American Elections w/ Chris Hedges https://youtu.be/GYqgj3l4r18 via @YouTube


### Defining criteria

In [4]:
country = 'portugal'
query_search = 'Uber Eats'
start_date = '2020-03-15'
end_date = '2020-07-15'
max_tweets = 10000

### Requesting by criterias

In [5]:
tweetCriteria = got.manager.TweetCriteria().setQuerySearch(query_search)\
                                           .setSince(start_date)\
                                           .setUntil(end_date)\
                                           .setMaxTweets(max_tweets)\
                                           .setEmoji("unicode")\
                                           .setNear(country)

tweet = got.manager.TweetManager.getTweets(tweetCriteria)
print(len(tweet))

21


# Searching in multiple locations

In [18]:
locations = ['New York', 'Los Angeles']
query_search = 'Uber Eats'
start_date = '2020-03-15'
end_date = '2020-07-15'
max_tweets = 10000

In [19]:
# select for multiple countries
tweetCriteria_list = []
for location in locations:
    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(query_search)\
                                           .setSince(start_date)\
                                           .setUntil(end_date)\
                                           .setMaxTweets(max_tweets)\
                                           .setEmoji("unicode")\
                                           .setNear(location)
    tweetCriteria_list.append(tweetCriteria)

In [20]:
#create twitter info for each city
tweet_dict = {}
for criteria, location in zip(tweetCriteria_list, locations):
    tweets = got.manager.TweetManager.getTweets(criteria)
    tweet_dict[location] = tweets

In [21]:
#create df
tweet_df = pd.DataFrame(dict([ (k,pd.Series(v)) for k,v in tweet_dict.items() ]))
tweet_df['tweet_count'] = tweet_df.index
tweet_df = pd.melt(tweet_df, id_vars=["tweet_count"], var_name='city', value_name='got_criteria')
tweet_df = tweet_df.dropna()

In [22]:
print(len(tweet_df))
tweet_df.head()

3412


Unnamed: 0,tweet_count,city,got_criteria
0,0,New York,<GetOldTweets3.models.Tweet.Tweet object at 0x...
1,1,New York,<GetOldTweets3.models.Tweet.Tweet object at 0x...
2,2,New York,<GetOldTweets3.models.Tweet.Tweet object at 0x...
3,3,New York,<GetOldTweets3.models.Tweet.Tweet object at 0x...
4,4,New York,<GetOldTweets3.models.Tweet.Tweet object at 0x...


In [29]:
tweet_df['got_criteria'][1].text

'Yea they’re terrible smh. @UberEats @Uber_Support'

# Extract Info

### Tweet: Model class that describes a specific tweet.

- id (str)
- permalink (str)
- username (str)
- to (str)
- text (str)
- date (datetime) in UTC
- retweets (int)
- favorites (int)
- mentions (str)
- hashtags (str)
- geo (str)

In [30]:
#create a function to extract twitter information into a pandas df
def get_twitter_info():
    tweet_df["body"] = tweet_df["got_criteria"].apply(lambda x: x.text)
    tweet_df["date"] = tweet_df["got_criteria"].apply(lambda x: x.date)
    tweet_df["hashtags"] = tweet_df["got_criteria"].apply(lambda x: x.hashtags)
    tweet_df["link"] = tweet_df["got_criteria"].apply(lambda x: x.permalink)
    tweet_df['geo'] = tweet_df['got_criteria'].apply(lambda x: x.geo)

In [31]:
get_twitter_info()
tweet_df = tweet_df.drop("got_criteria", 1)
tweet_df.head()

Unnamed: 0,tweet_count,city,body,date,hashtags,link,geo
0,0,New York,@UberEats Promo Code never works for 50%. what...,2020-07-14 23:57:38+00:00,,https://twitter.com/FashionsWeek/status/128318...,
1,1,New York,Yea they’re terrible smh. @UberEats @Uber_Support,2020-07-14 23:36:33+00:00,,https://twitter.com/kensthetic_/status/1283183...,
2,2,New York,@UberEats when are you coming to upstate ny?,2020-07-14 23:33:33+00:00,,https://twitter.com/Thebobover/status/12831828...,
3,3,New York,@diginn why does the dig inn app and @UberEats...,2020-07-14 23:30:41+00:00,,https://twitter.com/Hot4TaterTots/status/12831...,
4,4,New York,For my brother to have his Uber job back! He h...,2020-07-14 23:04:47+00:00,,https://twitter.com/Rayofshine69/status/128317...,


In [32]:
tweet_df.to_csv('old_tweets.csv',index=False)