In [1]:
"""
To run as a standalone script, set your CONSUMER_KEY and CONSUMER_SECRET. To
call search from code, pass in your credentials to the search_twitter function.

Script to fetch a twitter search of tweets into a directory. Fetches all available
tweet history accessible by the application (7 days historical).

## Operation

Search fetches tweets in pages of 100 from the most recent tweet backwards.
Thus, you could fetch just the most recent few by interrupting the script at
any time.

By default tweets will be fetched into a zip file containing one .json file per
tweets. The --nozip flag will result in .json files being writting directly to
the output directory.

## Subsequent search execution

In case of interrupted searches, you may continue where you left off:

On subsequent runs of the same query, search will check for existing tweets in
the output directory and will pick up where it left off at the lowest tweet ID,
and again work backwards in pages through the remaining history.

Thus, in order to execute a full query from scratch, be sure to remove any
existing tweets from the relevant output directory -- but note that some of the
oldest tweets may no longer be available for a fresh search.

During subsequent runs of a query you may also use the --new flag wich will
cause the search to only fetch tweets newer than those currently in the
output directory.

Search will throttle at 440 requests per 15 minutes to keep it safely under the
designated 450 allowed as per the Twitter docs here:
https://developer.twitter.com/en/docs/tweets/search/api-reference/get-search-tweets.html
"""

'\nTo run as a standalone script, set your CONSUMER_KEY and CONSUMER_SECRET. To\ncall search from code, pass in your credentials to the search_twitter function.\n\nScript to fetch a twitter search of tweets into a directory. Fetches all available\ntweet history accessible by the application (7 days historical).\n\n## Operation\n\nSearch fetches tweets in pages of 100 from the most recent tweet backwards.\nThus, you could fetch just the most recent few by interrupting the script at\nany time.\n\nBy default tweets will be fetched into a zip file containing one .json file per\ntweets. The --nozip flag will result in .json files being writting directly to\nthe output directory.\n\n## Subsequent search execution\n\nIn case of interrupted searches, you may continue where you left off:\n\nOn subsequent runs of the same query, search will check for existing tweets in\nthe output directory and will pick up where it left off at the lowest tweet ID,\nand again work backwards in pages through the 

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


In [3]:
!pip install birdy
!pip install ratelimiter


Collecting birdy
  Downloading https://files.pythonhosted.org/packages/cc/30/3f825b8d4248ebd9de9d218ba4b931c93be664e077c328c4b6dd19eb9d8a/birdy-0.3.2.tar.gz
Building wheels for collected packages: birdy
  Building wheel for birdy (setup.py) ... [?25l[?25hdone
  Created wheel for birdy: filename=birdy-0.3.2-cp36-none-any.whl size=10853 sha256=6ce83512f97bbe4690e27e0f39179fbea926b3afe99500dd4c198d646f059c50
  Stored in directory: /root/.cache/pip/wheels/ad/f9/a7/928ef99a65cfa8182e42fb0a052b0a61faa69b7d085fae2723
Successfully built birdy
Installing collected packages: birdy
Successfully installed birdy-0.3.2
Collecting ratelimiter
  Downloading https://files.pythonhosted.org/packages/51/80/2164fa1e863ad52cc8d870855fba0fbb51edd943edffd516d54b5f6f8ff8/ratelimiter-1.2.0.post0-py3-none-any.whl
Installing collected packages: ratelimiter
Successfully installed ratelimiter-1.2.0.post0


In [0]:
import json, os, sys, time
from zipfile import ZipFile
from birdy.twitter import AppClient, UserClient, TwitterRateLimitError
from ratelimiter import RateLimiter


"""
Credentials can be found by selecting the "Keys and tokens" tab for your
application selected from:

https://developer.twitter.com/en/apps/
"""
CONSUMER_KEY = 'fyzZfPI1BYmz788aesPLmsrY3'
CONSUMER_SECRET = 'lQ3WnI3Hjm70jByOK5MpgirJpfM3EEVMIpf4jbEQcrWWZTBB6N'

# need to spcify output 
OUTPUT_DIR = '/content/drive/My Drive/Colab Notebooks/Network Analysis Project'
MAX_TWEETS = 10000 # max results for a search
max_id = None
_client = None


def client(consumer_key=None, consumer_secret=None):
    global _client
    if consumer_key is None:
        consumer_key = CONSUMER_KEY
    if consumer_secret is None:
        consumer_secret = CONSUMER_SECRET
    if _client is None:
        _client = AppClient(consumer_key, consumer_secret)
        access_token = _client.get_access_token()
        _client = AppClient(consumer_key, consumer_secret, access_token)
    return _client


def limited(until):
    duration = int(round(until - time.time()))
    print('Rate limited, sleeping for {:d} seconds'.format(duration))


@RateLimiter(max_calls=440, period=60*15, callback=limited)
def fetch_tweets(query, consumer_key=None, consumer_secret=None):
    global max_id
    print(f'Fetching: "{query}" TO MAX ID: {max_id}')
    try:
        tweets = client(consumer_key, consumer_secret).api.search.tweets.get(
            q=query,
            count=100,
            max_id=max_id).data['statuses']
    except TwitterRateLimitError:
        sys.exit("You've reached your Twitter API rate limit. "\
            "Wait 15 minutes before trying again")
    try:
        id_ = min([tweet['id'] for tweet in tweets])
    except ValueError:
        return None
    if max_id is None or id_ <= max_id:
        max_id = id_ - 1
    return tweets


def initialize_max_id(file_list):
    global max_id
    for fn in file_list:
        n = int(fn.split('.')[0])
        if max_id is None or n < max_id:
            max_id = n - 1
    if max_id is not None:
        print('Found previously fetched tweets. Setting max_id to %d' % max_id)


def halt(_id):
    print('Reached historically fetched ID: %d' % _id)
    print('In order to re-fetch older tweets, ' \
        'remove tweets from the output directory or output zip file.')
    sys.exit('\n!!IMPORTANT: Tweets older than 7 days will not be re-fetched')


def search_twitter(query, consumer_key=None, consumer_secret=None,
            newtweets=False, dozip=True, verbose=False):
    output_dir = os.path.join(OUTPUT_DIR, '_'.join(query.split()))
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)
    if dozip:
        fn = os.path.join(output_dir, '%s.zip' % '_'.join(query.split()))
        outzip = ZipFile(fn, 'a')
    if not newtweets:
        if dozip:
            file_list = [f for f in outzip.namelist() if f.endswith('.json')]
        else:
            file_list = [f for f in os.listdir(output_dir) if f.endswith('.json')]
        initialize_max_id(file_list)
    while True:
        try:
            tweets = fetch_tweets(
                query,
                consumer_key=consumer_key,
                consumer_secret=consumer_secret)
            if tweets is None:
                print('Search Completed')
                if dozip:
                    outzip.close()
                break
            for tweet in tweets:
                if verbose:
                    print(tweet['id'])
                fn = '%d.json' % tweet['id']
                if dozip:
                    if fn in (file_list):
                        outzip.close()
                        halt(tweet['id'])
                    else:
                        outzip.writestr(fn, json.dumps(tweet, indent=4))
                        file_list.append(fn)
                else:
                    path = os.path.join(output_dir, fn)
                    if fn in (file_list):
                        halt(tweet['id'])
                    else:
                        with open(path, 'w') as outfile:
                            json.dump(tweet, outfile, indent=4)
                        file_list.append(fn)
                if len(file_list) >= MAX_TWEETS:
                    if fn in (file_list):
                        outzip.close()
                    # sys.exit('Reached maximum tweet limit of: %d' % MAX_TWEETS)
        except:
            if dozip:
                outzip.close()
            raise

In [18]:
search_twitter('Colorado Avalanche')

Fetching: "Colorado Avalanche" TO MAX ID: None
Fetching: "Colorado Avalanche" TO MAX ID: 1230725424369434629
Fetching: "Colorado Avalanche" TO MAX ID: 1230628482490798079
Fetching: "Colorado Avalanche" TO MAX ID: 1230474439780794368
Fetching: "Colorado Avalanche" TO MAX ID: 1230328751130632198
Fetching: "Colorado Avalanche" TO MAX ID: 1230256787338731522
Fetching: "Colorado Avalanche" TO MAX ID: 1230198045909880831
Fetching: "Colorado Avalanche" TO MAX ID: 1230130791687938047
Fetching: "Colorado Avalanche" TO MAX ID: 1229923671021621248
Fetching: "Colorado Avalanche" TO MAX ID: 1229886504035594242
Fetching: "Colorado Avalanche" TO MAX ID: 1229858456267567106
Fetching: "Colorado Avalanche" TO MAX ID: 1229794678481932287
Fetching: "Colorado Avalanche" TO MAX ID: 1229733563437809664
Fetching: "Colorado Avalanche" TO MAX ID: 1229632397232267263
Fetching: "Colorado Avalanche" TO MAX ID: 1229595164001423360
Fetching: "Colorado Avalanche" TO MAX ID: 1229581673634947073
Fetching: "Colorado Ava

In [15]:
search_twitter('St. Louis Blues')


Fetching: "St. Louis Blues" TO MAX ID: None
Fetching: "St. Louis Blues" TO MAX ID: 1230820770198294527
Fetching: "St. Louis Blues" TO MAX ID: 1230692890415968256
Fetching: "St. Louis Blues" TO MAX ID: 1230635655794954239
Fetching: "St. Louis Blues" TO MAX ID: 1230514707854852095
Fetching: "St. Louis Blues" TO MAX ID: 1230263872952684543
Fetching: "St. Louis Blues" TO MAX ID: 1230194972453937152
Fetching: "St. Louis Blues" TO MAX ID: 1230105910959378434
Fetching: "St. Louis Blues" TO MAX ID: 1229980592923652095
Fetching: "St. Louis Blues" TO MAX ID: 1229950009317216258
Fetching: "St. Louis Blues" TO MAX ID: 1229927638392852479
Fetching: "St. Louis Blues" TO MAX ID: 1229905478886608899
Fetching: "St. Louis Blues" TO MAX ID: 1229889429508173823
Fetching: "St. Louis Blues" TO MAX ID: 1229881749280370687
Fetching: "St. Louis Blues" TO MAX ID: 1229875215745744896
Fetching: "St. Louis Blues" TO MAX ID: 1229873235124260863
Fetching: "St. Louis Blues" TO MAX ID: 1229862299596312575
Fetching: "S

ValueError: ignored

In [20]:
search_twitter('Dallas Stars')

Fetching: "Dallas Stars" TO MAX ID: None
Fetching: "Dallas Stars" TO MAX ID: 1230856054965231615
Fetching: "Dallas Stars" TO MAX ID: 1230658726006317055
Fetching: "Dallas Stars" TO MAX ID: 1230573198699352063
Fetching: "Dallas Stars" TO MAX ID: 1230507131369029631
Fetching: "Dallas Stars" TO MAX ID: 1230355349837942784
Fetching: "Dallas Stars" TO MAX ID: 1230326845482115072
Fetching: "Dallas Stars" TO MAX ID: 1230307709444067328
Fetching: "Dallas Stars" TO MAX ID: 1230265667053596671
Fetching: "Dallas Stars" TO MAX ID: 1230212238180593663
Fetching: "Dallas Stars" TO MAX ID: 1230162464161062912
Fetching: "Dallas Stars" TO MAX ID: 1229959343329574911
Fetching: "Dallas Stars" TO MAX ID: 1229846163257483264
Fetching: "Dallas Stars" TO MAX ID: 1229672427220656128
Fetching: "Dallas Stars" TO MAX ID: 1229531018606018562
Fetching: "Dallas Stars" TO MAX ID: 1229436355949649919
Fetching: "Dallas Stars" TO MAX ID: 1229256371855556607
Fetching: "Dallas Stars" TO MAX ID: 1229220776953421830
Fetchin