# Retrieving Twitter Data

In this notebook we are going to retrieve the Twitter followers for a list of tweeters as well as the followers of their followers. Because the Twitter API limits us to 15 requests before a 15 minute time-out, we have to have some patience.

This is kind of a hack and there is definitely a cleaner way to do this.

### Uncomment and run if we need to install tweepy

In [None]:
#!conda install tweepy -y

In [None]:
# General:
import tweepy           # To consume Twitter's API
import pandas as pd     # To handle data
import numpy as np      # For number computing

# For plotting and visualization:
from IPython.display import display
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
import networkx as nx
import os
from IPython.display import Image
from itertools import product
import time
import pickle
import gzip

### You will need to edit the `credentials.py` file as described [here](https://dev.to/rodolfoferro/sentiment-analysis-on-trumpss-tweets-using-python-)

In [None]:
# We import our access keys:
from credentials import *    # This will allow us to use the keys as variables

# API's setup:
def twitter_setup():
    """
    Utility function to setup the Twitter's API
    with our access keys provided.
    """
    # Authentication and access using keys:
    auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
    auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)

    # Return API with authentication:
    api = tweepy.API(auth)
    return api

### Here we authenticate with Twitter
#### We will have to repeatedly call this as we will be logged off by Twitter

In [None]:
# We create an extractor object:
extractor = twitter_setup()


## A function to test whether tweeter `follower` actually follows `followed`
#### deprecated

In [None]:
def is_following(followed, follower):
    try:
        followers = extractor.followers(followed)
        return follower in [f.screen_name for f in followers]
    except tweepy.RateLimitError:
        print("Hit rate limit; waiting 15 minutes")
        time.sleep(15*60)
        return is_following(followed, follower)

## A function to get the followers of a tweeter

We try to get the followers. If we fail because of a `RateLimitError` we sleep for 15 minutes and then recursively call the function again.

#### Note: we may be logged out by Twitter while we sleep

In [None]:
def get_followers(tweeter, extractor):
    print(tweeter)
    try:
        return extractor.followers(tweeter)
    except tweepy.RateLimitError:
        print("Hit rate limit; waiting 15 mintues")
        time.sleep(15*60)
        return get_followers(tweeter, extractor)
    

## Define an initial set of Tweeters/Nodes

In [None]:
tweeters = ["chapmanbe", "wendywchapman", "meh1rad"]

### Find all their followers and create a network

In [None]:
followers = {}
for t in tweeters:
    followers[t] = [f.screen_name for f in extractor.followers(t)]

## Put all the followers if our initial tweeters in a list

In [None]:
gen2 = []
for v in followers.values():
    gen2.extend(v)
gen2

## Create a second dictionary to store the followers of the followers

In [None]:
followers2 = {}
gen3 = gen2[:]
fails = []

### If `get_followers` fails, save tweeter it failed on
#### Then restart `while` loop

### If you get a not authorized error

Rerun the cell with this code:

```Python
# We create an extractor object:
extractor = twitter_setup()
```

Then rerun the while loop cell:

```Python
while gen3:
    t = gen3.pop()
    followers2[t] = get_followers(t, extractor)
```

In [None]:
fails.append(t)

In [None]:
while gen3:
    t = gen3.pop()
    followers2[t] = get_followers(t, extractor)
    

## Dump our data into a pickle file

In [None]:
with gzip.open("twitter.pickle.gz", "wb") as f0:
    pickle.dump((followers, followers2), f0)