## Gather Follower Descriptions
We start with a twitter user name (as well as things like twitter credentials) and pull all the followers of that user. We then pull the descriptions of those users and write them to a file for future mining. You can use more than one user in a group.

I wrote some functions for you in the file `twitter_functions.py`. This will need to be in the same folder where you're running this notebook. 

In [17]:
from datetime import datetime
import json
from pprint import pprint

from twitter_functions import * # these are the functions I wrote for you.

The next cell holds your Twitter authorization credentials. Then it calls a function that initializes your connection to Twitter. I've left my keys in there (slightly perturbed so that they won't actually work) so that you see the form these take).

In [10]:
from twitter_config import *

#auth =  { "consumer_key": "xks2XTK4gr2PajPio1RBGWsYU",
#          "consumer_secret": "jkjkCjph2vx38uBVbVHLkhzesGVY6ZqEywXd3B0sDeSAWVcDNo",
#          "access_key": "33029025-i1Mm907o7BsKnufMIxjVzByKsuDEhOBb0yV3EAa1E",
#          "access_secret": "jkjkXqAijIwRQMZmW3b7AgpFXU6Ve0RU30fzsbzpfx9uf"
#        }

api = initialize_twitter(auth)

Now you set the handle (or handles) that represent one group or topic on Twitter. These should be in a list. The output file name (`ofile_name`) is determined based on today's date and the first element in the list. Feel free to modify. 

In [11]:
starting_user = ['GeneralMills','kraftfoods'] # my first group
#starting_user = ['michaelpollan'] # my second group

ofile_name = (datetime.today().strftime("%Y%m%d") + "_" + 
             starting_user[0] + "_" + # Just take the first one if there are multiple
             "followers.txt")

In [12]:
# We'll now go lookup the full information on your starting user(s). 
starting_user_id = []

# All records will be a dictionary with the twitter ID as the key and 
# a UserRecord as the value. This is a named tuple I created. 
all_records = lookup_users_from_handles(api, starting_user)

# We need the IDs that we're getting followers from in a list. 
for id in all_records : #access the keys, which are ids.
    starting_user_id.append(id)

Start lookup_users_from_ids on 2 handles.
20181106-000057: looking up user records for 2 handles.
20181106-000057:  users pulled:  2
total failures: 0


In [13]:
# How long is it going to take us to pull these followers?
total_followers = 0
for id, rec in all_records.items() :
    total_followers += rec.followers_count
    
print("Ooh, {fol:d} followers. A complete run with no limits run is ".format(fol=total_followers) + 
      "going to take {min:.2f} minutes ({hour:.2f} hours)".format(min=total_followers/5000,
                                                                  hour=total_followers/(60*5000)))

Ooh, 93217 followers. A complete run with no limits run is going to take 18.64 minutes (0.31 hours)


In [14]:
# Now let's pull all the followers of our starting_user
# the function I wrote allows you to cap the number of followers you pull
# and uses the ID to generate the query.
# 
# Note that this pull is subject to rate limiting. You can make 15 calls per
# 15 minute window and each can return 5000 users. 
followers_of_starting = gather_followers(api,
                                         starting_user_id,
                                         follower_limit=None) # Modify this limit if you need to. 
                                                              # Set it to "None" to get all   

# followers_of_starting will be a dictionary with the key being the id(s) in starting_user_id
# and the value is a list of all the followers.

Pulling followers for 280557152
Rate limit reached. Sleeping for: 770
Number pulled: 5000
Number pulled: 10000
Number pulled: 15000
Number pulled: 20000
Number pulled: 25000
Rate limit reached. Sleeping for: 898
Number pulled: 30000
Number pulled: 35000
Number pulled: 40000
Number pulled: 45000
Number pulled: 50000
Number pulled: 55000
Number pulled: 60000
Number pulled: 65000
Rate limit reached. Sleeping for: 898
Number pulled: 70000
Number pulled: 75000
Number pulled: 80000
Number pulled: 85000
Number pulled: 90000
Number pulled: 93217


In [15]:
# And now we'll "hydrate" these user records.
for start_id, list_of_followers in followers_of_starting.items() :
    
    # Using a set here instead of a list so that we pull each ID only once.
    ids_to_hydrate = {id for id in list_of_followers if id not in all_records} 
    
    these_records = lookup_users_from_ids(api,ids=ids_to_hydrate)

    for id, rec in these_records.items() :
        all_records[id] = rec


Start lookup_users_from_ids on 93217 IDs.
20181106-004404: looking up user records for 100 IDs.
20181106-004405: looking up user records for 100 IDs.
20181106-004406: looking up user records for 100 IDs.
20181106-004407: looking up user records for 100 IDs.
20181106-004409: looking up user records for 100 IDs.
20181106-004410: looking up user records for 100 IDs.
20181106-004411: looking up user records for 100 IDs.
20181106-004412: looking up user records for 100 IDs.
20181106-004413: looking up user records for 100 IDs.
20181106-004415: looking up user records for 100 IDs.
20181106-004416: looking up user records for 100 IDs.
20181106-004417: looking up user records for 100 IDs.
20181106-004418: looking up user records for 100 IDs.
20181106-004420: looking up user records for 100 IDs.
20181106-004421: looking up user records for 100 IDs.
20181106-004422: looking up user records for 100 IDs.
20181106-004423: looking up user records for 100 IDs.
20181106-004424: looking up user records

In [16]:
# Now let's write out all the records. I wrote some functions to help.
with open(ofile_name,'w') as ofile :
    write_user_rec_headers(ofile)
    for id, rec in all_records.items() :
        write_user_rec(ofile, rec)