This is a tweet scaping script based *Heavily* on the example script found here: https://gist.github.com/bjmarsh/315a632aa1ab0e8436e631f8a1acf40b orignially created by Bennett Marsh.

In [1]:
from collections import defaultdict
import os, sys
import time
import pandas as pd
import GetOldTweets3 as got


In [3]:
os.makedirs('tweet_data', exist_ok=True)
users = ["elonmusk"]
username = users[0]

In [18]:
count = 10
# Creation of query object                                                                                                                                                                                      
tweetCriteria = got.manager.TweetCriteria().setUsername(username)\
                                               .setMaxTweets(count)\
                                               .setSince("2020-05-30")\
                                               .setUntil("2020-05-31")
tweets = None
for ntries in range(2):        
    try:
        tweets = got.manager.TweetManager.getTweets(tweetCriteria)
    except SystemExit:
        print("Trying again in 15 minutes.")
        time.sleep(15*60)
    else:
        break
if tweets is None:
    print("Failed after 2 tries, quitting!")
    exit(1)

In [19]:
len(tweets)

4

In [9]:
tweets[0]

<GetOldTweets3.models.Tweet.Tweet at 0x115037fa0>

Got it, the getTweets() function returns a tweet object.  No docstring on the got tweet object though.

In [12]:
tweets[0].id
tweets[0].to

'NASASpaceflight'

Bennet's original script does just fine in gathering up all of Elon's tweets.  I'd like to have a record of the semantic content of the tweet/reply conversations that Elon has with his followers and twitters API does not make this properly available.  I'm going to have to fudge it, but I think that this algorithm will do a decent job of getting at least some of the conversations Elon has.

In [20]:
def get_other_user_reply(username,t_init,t_final): 
    #searches a secondary user's tweets within a range of time and 
    #returns tweets that either reply to or @elonmusk
    print(username)
    count = 0
    # Creation of query object                                                                                                                                                                                      
    tweetCriteria = got.manager.TweetCriteria().setUsername(username)\
                                               .setMaxTweets(count)\
                                               .setSince(t_init)\
                                               .setUntil(t_final)
    # Creation of list that contains all tweets                                                                                                                                                                     
    tweets = None
    for ntries in range(5):
        try:
            tweets = got.manager.TweetManager.getTweets(tweetCriteria)
        except SystemExit:
            print("Trying again in 15 minutes.")
            time.sleep(15*60)
        else:
            break
    if tweets is None:
        print("Failed after 5 tries, quitting!")
        exit(1)

    data = defaultdict(list)
    for t in tweets:
        if t.to == 'elonmusk' or t.mentions == '@elonmusk':
            data["username"].append(username)
            data["tweet_id"].append(t.id)
            data["reply_to"].append(t.to)
            data["date"].append(t.date)
            data["retweets"].append(t.retweets)
            data["favorites"].append(t.favorites)
            data["hashtags"].append(list(set(t.hashtags.split())))
            data["mentions"].append(t.mentions)
            data["text"].append(t.text)
            data["permalink"].append(t.permalink)
        else:
            pass
    return data

In [30]:
elon_tweets_df = pd.read_csv('./elonmusk.csv')

In [31]:
elon_tweets_df.columns

Index(['Unnamed: 0', 'username', 'tweet_id', 'reply_to', 'date', 'retweets',
       'favorites', 'hashtags', 'mentions', 'text', 'permalink'],
      dtype='object')

In [32]:
# Convert 'Time' column to datetime and strip time information.
elon_tweets_df['Time'] = pd.to_datetime(elon_tweets_df['date'])#.dt.date

In [33]:
elon_tweets_df.dtypes

Unnamed: 0                  int64
username                   object
tweet_id                    int64
reply_to                   object
date                       object
retweets                    int64
favorites                   int64
hashtags                   object
mentions                   object
text                       object
permalink                  object
Time          datetime64[ns, UTC]
dtype: object

In [35]:
elon_tweets_df = elon_tweets_df.drop(['Unnamed: 0','date'],axis='columns')

In [36]:
elon_tweets_df.index

RangeIndex(start=0, stop=9807, step=1)

In [48]:
elon_tweets_df.head(15)

Unnamed: 0,username,tweet_id,reply_to,retweets,favorites,hashtags,mentions,text,permalink,Time
0,elonmusk,1267180654896254976,SpaceX,22581,250519,[],,Nine years later,https://twitter.com/elonmusk/status/1267180654...,2020-05-31 19:46:25+00:00
1,elonmusk,1267160409498357764,NASASpaceflight,81,2494,[],,Must be due to relativistic aging,https://twitter.com/elonmusk/status/1267160409...,2020-05-31 18:25:58+00:00
2,elonmusk,1267157474886455296,NASASpaceflight,708,14436,[],,Brought home by same person who placed it ther...,https://twitter.com/elonmusk/status/1267157474...,2020-05-31 18:14:19+00:00
3,elonmusk,1267156817295085575,Rogozin,1209,7558,[],,"Спасибо, сэр, ха-ха. Мы рассчитываем на взаимо...",https://twitter.com/elonmusk/status/1267156817...,2020-05-31 18:11:42+00:00
4,elonmusk,1267146619562201090,SpaceX,5576,67423,[],@Space_Station,Congratulations Bob & Doug on docking & hatch ...,https://twitter.com/elonmusk/status/1267146619...,2020-05-31 17:31:11+00:00
5,elonmusk,1267057495773675521,TeslaGong,81,3948,[],,Sure,https://twitter.com/elonmusk/status/1267057495...,2020-05-31 11:37:02+00:00
6,elonmusk,1267056905601638404,TeslaTested,1650,84762,[],,Probably,https://twitter.com/elonmusk/status/1267056905...,2020-05-31 11:34:41+00:00
7,elonmusk,1267056312497721344,SpaceX,16259,149590,[],@Space_Station,Dragon docks with @Space_Station in ~3 hours,https://twitter.com/elonmusk/status/1267056312...,2020-05-31 11:32:20+00:00
8,elonmusk,1266890648587776003,NASA,4042,64610,[],,Dragonship Endeavor,https://twitter.com/elonmusk/status/1266890648...,2020-05-31 00:34:02+00:00
9,elonmusk,1266811094527508481,,54238,862612,[],,5 mins to T-0,https://twitter.com/elonmusk/status/1266811094...,2020-05-30 19:17:55+00:00


In [51]:
elon_replies_df = elon_tweets_df.loc[elon_tweets_df['reply_to'].notna()]
elon_mentions_df = elon_tweets_df.loc[elon_tweets_df['mentions'].notna()]

In [57]:
elon_mentions_df.index

Int64Index([   4,    7,   31,   63,  187,  209,  360,  381,  392,  429,
            ...
            9739, 9755, 9756, 9761, 9762, 9777, 9778, 9779, 9780, 9795],
           dtype='int64', length=821)