In [13]:
import twitter
import datetime
import sys
import time

### Twitter API Authorization
It is good practice to store credentials in a text file so that if/when you push your notebook to Github, you don't have to remember to remove the credentials every time.

In [14]:
print("Authorizing...")

with open('twitter_auth.txt') as f:
    file_content = f.readlines()
    file_content = [x.strip() for x in file_content]

CONSUMER_KEY = file_content[0]
CONSUMER_SECRET = file_content[1]
OAUTH_TOKEN = file_content[2]
OAUTH_TOKEN_SECRET = file_content[3]

#twitter authorization
auth = twitter.oauth.OAuth(OAUTH_TOKEN, OAUTH_TOKEN_SECRET,
                           CONSUMER_KEY, CONSUMER_SECRET)

twitter_api = twitter.TwitterStream(auth=auth)
  
if (not twitter_api):
    print ("Can't Authenticate")
    sys.exit(-1)

print("Authorization successful")
    

Authorizing...
Authorization successful


### Streaming API
The streaming API rate limit is not publicized. For this API, you are not limited by number of tweets, but number of requests. A request is like a unique opening of the Twitter Stream. If you need to make multiple requests, it is possible to lump them all together into one request. If you need to keep track of which request corresponds to which tweet, however, you will have to loop through them individually. This is when you might run into rate limiting problems because each query constitutes a separate request. 

For example, if you want tweets that you think might be questions, you would run a single query (request) for anything that has a character in `['?', 'where', 'what', 'how',...]` as you wouldn't need to keep track of which tweet came from which query. 

If you wanted 200 tweets from each of `['?', 'where', 'what', 'how']` however, you might need to make separate requests for each. Additionally, there is some limit on how many queries you can jam into one request. 

### From [Twitter](https://dev.twitter.com/streaming/overview/connecting):
>Rate limiting
>Clients which do not implement backoff and attempt to reconnect as often as possible will have their connections rate limited for a small number of minutes. Rate limited clients will receive HTTP 420 responses for all connection requests.

>Clients which break a connection and then reconnect frequently (to change query parameters, for example) run the risk of being rate limited.

>Twitter does not make public the number of connection attempts which will cause a rate limiting to occur, but there is some tolerance for testing and development. A few dozen connection attempts from time to time will not trigger a limit. However, it is essential to stop further connection attempts for a few minutes if a HTTP 420 response is received. If your client is rate limited frequently, it is possible that your IP will be blocked from accessing Twitter for an indeterminate period of time.

Also:

>Back off exponentially for HTTP 420 errors. Start with a 1 minute wait and double each attempt. Note that every HTTP 420 received increases the time you must wait until rate limiting will no longer will be in effect for your account.

In [15]:
# the queries we are going to run
qs = ['the', 'an', 'it', 'who', 'were']

# the twitter stream object
twitter_stream = twitter.TwitterStream(auth=twitter_api.auth)

# initialize the counters
requests = 0
backoff_timer = 60 # this is how long we'll sleep if we get rate limited
sleep_timer = 0 # this is how long we'll sleep after each query
uids = []

# we are going to iterate a few times to demonstrate
for iters in range(0,3):
    
    # for each query in the queries
    for q in qs:
        
        requests += 1 # count of requests made
        count=0 # count of tweets per query
        
        
        
        ### this is the chunk that handles rate limiting ################
        
        while True:
            try: # try to open the stream
                stream = twitter_stream.statuses.filter(track=q)
            
            except Exception as e: # if it doesn't work (i.e. we were limited)
                print('\nrate limited...sleeping for {0} seconds\n'.format(backoff_timer))
                sys.stdout.flush()
                time.sleep(backoff_timer) # 'back off' for a certain amount of time
                backoff_timer = backoff_timer * 2 # double the backoff timer
                
                ### since we got rate limited, we must not be sleeping long enough per request
                sleep_timer = sleep_timer + 2 # add 2 seconds to the sleep timer
                
            break
            
        ###################################################################
        
        
        ### get user id's from our stream #################################
        
        for tweet in stream:
                try:
                    uids.append(tweet['user']['id'])
                except:
                    continue
                
                count += 1

                if count % 100 == 0: # for every hundred tweets we get
                    
                    print('Request {0} complete'.format(requests))
                    print('100 tweets seen from "{0}"'.format(q))
                    print(datetime.datetime.now())
                    sys.stdout.flush()
                    
                    break # go to the next query
        ###################################################################
        
        
        time.sleep(sleep_timer) # this is the key to not getting rate limited

Request 1 complete
100 tweets seen from "the"
2015-11-09 17:18:25.279737
Request 2 complete
100 tweets seen from "an"
2015-11-09 17:18:29.610144
Request 3 complete
100 tweets seen from "it"
2015-11-09 17:18:37.037874
Request 4 complete
100 tweets seen from "who"
2015-11-09 17:18:47.391463
Request 5 complete
100 tweets seen from "were"
2015-11-09 17:18:56.465825
Request 6 complete
100 tweets seen from "the"
2015-11-09 17:19:06.781647

rate limited...sleeping for 60 seconds

Request 7 complete
100 tweets seen from "an"
2015-11-09 17:20:13.256494
Request 8 complete
100 tweets seen from "it"
2015-11-09 17:20:20.055611
Request 9 complete
100 tweets seen from "who"
2015-11-09 17:20:26.719127
Request 10 complete
100 tweets seen from "were"
2015-11-09 17:20:38.724599
Request 11 complete
100 tweets seen from "the"
2015-11-09 17:20:45.472743
Request 12 complete
100 tweets seen from "an"
2015-11-09 17:20:53.342037
Request 13 complete
100 tweets seen from "it"
2015-11-09 17:20:58.000863

rate limi

In [16]:
len(uids)

1500

###REST API

For user timelines and other requests from the REST API, the [rate limits are made public](https://dev.twitter.com/rest/public/rate-limits). These rate limits are defined as a maximum number of requests per 15 minute window. For pulling user timelines, the maximum is 180 requests per 15 minutes. There is also a maximum number of tweets you can get per request.

From [Twitter](https://dev.twitter.com/rest/reference/get/statuses/user_timeline) on the `count` parameter:
>Specifies the number of tweets to try and retrieve, **up to a maximum of 200 per distinct request**. The value of count is best thought of as a limit to the number of tweets to return because suspended or deleted content is removed after the count has been applied. We include retweets in the count, even if include_rts is not supplied. It is recommended you always send include_rts=1 when using this API method.

If more than 200 tweets are desired, pagination must me used. See [Working with Timelines](https://dev.twitter.com/rest/public/timelines).

Since Twitter may not honor our request from time to time, it is helpful to find out why. This code will handle rate limiting in a special way and print out all other errors. For help with the error codes see [here](https://dev.twitter.com/overview/api/response-codes).

In [30]:
# get the twitter object
t = twitter.Twitter(auth=auth)

# initialize the counter
counter = 0

#initialize the start time
start = datetime.datetime.now()

# for each user (just the first 300)
for user in uids[:300]:
    counter += 1
    
    while True:
        try: # try to get a user's timeline
            tweets = t.statuses.user_timeline(user_id=user, count=100)
        except Exception as e: # if it doesn't work
            if e.e.code == 429: # the error code for rate limit with REST API is 429
                print('Rate Limited! sleeping for 1 minute')
                sys.stdout.flush()
                time.sleep(60)
            else: # if not rate limit, print the error and user
                print('Failed on {0} with error {1}'.format(user,e.e.code))
                break
        break
    
    # for every 170 requests (to stay below limit)
    if counter % 170 == 0:
        # get the elapsed time
        elapsed = datetime.datetime.now() - start
        # calculate how long to sleep (if it is negative, set to 0)
        # right now we make 170 requests every 16 minute window to stay below limit
        sleep_timer = max((16*60 - elapsed.seconds),0) 
        print('{0} requests made, sleeping for {1} seconds'.format(counter, sleep_timer))
        sys.stdout.flush()
        # sleep for exactly how long we need to
        time.sleep(sleep_timer)
        # reset the timer
        start = datetime.datetime.now()


170 requests made, sleeping for 796 seconds
Failed on 48218817 with error 401


Note the 401 error means 'Not Authorized' which means that user has a private account.