# Bot or Not: Using the Twitter API to look at our "fake" followers

![Bot](http://media.npr.org/assets/img/2017/02/04/trumpbot_51_wide-99a6d194a30546394eff755db56b8579745e5921-s800-c85.png "Bot")

Before we get started with the Twitter API and detecting bots, let's take a look at [HTTP](https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol) and how it works.

## HTTP

Most APIs are built on top of a simple protocol called HTTP. HTTP powers most of the communications on the web, including your browser and probably most of the apps that you use. HTTP allows you (via your browser, a mobile app or even code you write!) to **request** data (HTML, PDFs, MP3s, etc) from a service across the internets (e.g. google.com, twitter.com) and that service will respond with the requested data (i.e. the **response**).

Let's take a look at how HTTP (and the Internet) works in more detail...


## What's an API?

An API, or application programming interface, allows you to specify the data you want and returns it in a computer-friendly format like JSON or XML. The "interface"  is a regularized way to make requests, and a consistent specification for the data you asked for. So many organizations now publish APIs for their data. From [The New York Times](https://developer.nytimes.com/) to [ProPublica](https://propublica.github.io/campaign-finance-api-docs/), to governmental organizations like the [EPA](https://developer.epa.gov/category/api/), to social media sites like [Twitter](https://dev.twitter.com/overview/api) and [Instagram](https://www.instagram.com/developer/) and [LinkedIn](https://developer.linkedin.com).


## API Authentication

Most API providers require you as the developer to use a form of authentication while using their APIs. There are various forms of authentication: oauth, api keys and even username and passwords.

For example, like [The New York Times](https://developer.nytimes.com/) only require that you use an API key when making API calls. With API keys, you usually just pass the key in your API calls, like:

```
https://developer.nytimes.com/article_search_v2.json?api_key=abcxyz&q=tesla
```

[OAuth](https://en.wikipedia.org/wiki/OAuth) is a bit more complicated but provides more fine-grained control for the API service as well as the users. Let's come back to it right after we set up our Twitter API keys (yep, they use OAuth for their API authentication).


## Using The Twitter API

To access the Twitter API, we need to register an "application" (um, an "app"!) that will be pulling data from their service. This means that in one sweet instant you have become a developer! 😮 The steps are pretty easy and listed below. You'll first need to get a set of "keys" to drive this bad boy and then install a Python library that exposes the Twitter API through special objects. 
    
**1) Get Your API Keys**

If you don't already have credentials for Twitter, you have to create an application and generate a set of keys (an API key, API secret, Access token and Access token secret) on the Twitter developer site. There are five easy steps!

1. Create a Twitter user account if you do not already have one.
2. Go to [https://apps.twitter.com](https://apps.twitter.com/) and log in with your Twitter user account. This step gives you a Twitter developer's account under the same name as your user account. (Um, and congratulations! You're now a developer!)
3. Click “Create New App”
4. Fill out the form, agree to the terms, and click “Create your Twitter application”
5. In the next page, click on “Keys and Access Tokens” tab, and copy your “API key” and “API secret”. Scroll down and click “Create my access token”, and copy your “Access token” and “Access token secret”.

Once you have your tokens, copy them below.

In [None]:
CONSUMER_KEY = "put your consumer key here"
CONSUMER_SECRET = "put your consumer secret here"
ACCESS_TOKEN = "your access token goes here"
ACCESS_TOKEN_SECRET = "and your token secret goes here"

**2) Install the Tweepy Library**

The developer community has created [hundreds of Twitter libraries](https://dev.twitter.com/resources/twitter-libraries) that help you access Twitter's API. By "help" we mean they have created objects that hide the details of making requests for data from Twitter, and leave you with a clean coding interface. Your requests to Twitter are in the form of neat methods (verbs) that return data on users, their statuses and followers. You can even post tweets using these libraries.

We will by using Tweepy to call the Twitter API. Why? It has many of the best features of the other libraries and its documentation is complete. Often, free software projects can be thinly documented, leaving you a little out to sea if you have a problem.

Keep these two links open in tabs as we go through the code below: [Tweepy documentation](http://tweepy.readthedocs.io/en/v3.5.0/
) and [source code](https://github.com/tweepy/tweepy).

Use the following to install the Tweepy library (version 3.5) on your machine. Recall that the double percent signs indicate that the code in the cell is to be interpreted as something other than Python commands. In this case, we are giving instructions to the UNIX **sh**ell. We'll have more to say about that later, but the shell is essentially what you're typing into when you initiate this notebook with "jupyter notebook".

In [None]:
%%sh
pip install tweepy==3.5.0

Before we start making API calls, we need to initialize our Tweepy object.

In [None]:
# before we can make Twitter API calls, we need to initialize a few things...
from tweepy import OAuthHandler, API

# setup the authentication
auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

# create an object we will use to communicate with the Twitter API
api = API(auth)
type(api)

## Getting a User's Profile Info

Now you are prepped and ready to start making Twitter API calls. First, lets look at some user profiles. 

We will be calling the `users/show` api: https://dev.twitter.com/rest/reference/get/users/show
    

In [None]:
# get a user's profile (the 'nytimes' in this case)
user = api.get_user('nytimes')

# print out some of the user's information
print user.screen_name
print user.statuses_count
print user.friends_count
print user.followers_count
print user.description

>**NOTE:** When you run the `api.get_user()` code above and you get an error that looks like this:
```
"TweepError: [{u'message': u'Invalid or expired token.', u'code': 89}]
```
This means that you don't have the right API keys, tokens and secrets.

In [None]:
print type(user)

The result of the call to the API is technically a JSON string. As we did with tweets, we could parse it into primitive Python objects like lists and dictionaries and numbers and strings. Tweepy creates high-level objects to represent the result of an API call. This is why you access ".screen_name" and ".followers_count" as attributes of the object. 

Objects have both data and methods and the methods for this object are things like follow() and unfollow() the user. All of this conveniently wrapped up in a high-level object.

**Try This!** 

Modify the code above to get the user profile information for `@realDonaldTrump` and `@rosieTuring`.

## Send a Tweet!

If you were writing a bot, it would need to tweet! You can send out a tweet with one line of code.

We will be using the `statuses/update` api to send the tweet: https://dev.twitter.com/rest/reference/post/statuses/update

In [None]:
# send a tweet!
# you probably want to modify this message before you send it :-)

api.update_status(status='Learning all about bots. I really love this class! Period.')

## Look at a User's Tweets

Now, let's look at [`@realDonaldTrump`](https://twitter.com/realDonaldTrump)'s tweets. If you've had enough of all that, replace it with [`@justinbieber`](https://twitter.com/justinbieber) or someone less anxious-making.

We will use the `statuses/user_timeline` api to do this: https://dev.twitter.com/rest/reference/get/statuses/user_timeline

In [None]:
# get the "real" Donald's last 100 tweets
tweets = api.user_timeline(screen_name='realDonaldTrump', count=100)

print len(tweets)

You can loop over the 100 tweets and print them out. Here, a tweet object from Tweepy has data attributes like the text of the tweet, stored in ".text"

In [None]:
# get the "real" Donald's last 100 tweets
tweets = api.user_timeline(screen_name='realDonaldTrump', count=100)

# loop over the tweets and print out the tweet text. don't tell Mark we are doing loops!
for tweet in tweets:
    print tweet.text

In [None]:
print type(tweet)

**Try This!**

Use the example above to get the latest tweets for yourself, `scottoiesky`, `@nytimes`, etc. What other information would be useful to have besides the text of the tweet?

## Let's Look at @rosieturning's Followers

Our goal is to see if we can detect bots/fake twitter accounts, so let's start looking at `@rosieturning`'s followers.

If you look at the Twitter API documentation, you will see there are a few ways to get information about a user's followers:

1. the [`followers/list` api](https://dev.twitter.com/rest/reference/get/followers/list), or
2. the [`followers/ids` api](https://dev.twitter.com/rest/reference/get/followers/ids)

Which one should we use?


In [None]:
# get Rosie's followers list
follower_ids = api.followers_ids(screen_name='rosieturing')

# how many followers (i.e. what's the "lenght" of the list of follower ids)
print type(follower_ids)
print len(follower_ids)

In [None]:
for follower_id in follower_ids:
    print follower_id

**Great!** But we need the profile information for these users. How do we get the user profile's for ~2500 users?

We could use the `users/show` api (which we used above) to each of the 2500 user's profile info: https://dev.twitter.com/rest/reference/get/users/show


In [None]:
# remember how we got the user profile for a single user?
user = api.get_user(user_id=2489833821)
print user

In [None]:
# so, could we do something like this?
follower_ids = api.followers_ids(screen_name='rosieturing')

# loop over each follower id and make an api call to get the user's profile info
#for follower_id in follower_ids:
    #user = api.get_user(user_id=follower_id)
    #print user
    
# what's wrong with this code?

Instead of making 2500 API calls to Twitter, let's see if we can be more efficient.

The **`users/lookup`** api allows us to get 100 user's at a time!
https://dev.twitter.com/rest/reference/get/users/lookup

Ok, so how do we get 100 user id's at a time from our `follower_ids` list??

In [None]:
follower_ids = api.followers_ids(screen_name='rosieturing')

# get the total number of ids in the list
number_of_followers = len(follower_ids)

# get 100 ids from the list at a time
for i in range(0, number_of_followers, 100):
    
    subset_of_follower_ids = follower_ids[i:i + 100]
    print len(subset_of_follower_ids)

Ok, let's pull it all together now. We have our list of Rosie's ~2500 follower ids and know how to slice the list to get 100 of them at a time. Next, we can use the **`users/lookup`** api to get the profile info for these 100 users at a time.

https://dev.twitter.com/rest/reference/get/users/lookup

In [None]:
# first, we get the list of Rosie's follower ids (this is old hat by now)
follower_ids = api.followers_ids(screen_name='rosieturing')

# how many follower ids do we have in the list?
number_of_followers = len(follower_ids)

# loop over 100 of the follower id's at a time
for i in range(0, number_of_followers, 100):
    
    # slice the list of follower_ids to get 100 at a time
    subset_of_follower_ids = follower_ids[i:i + 100]
    
    # call the users/lookup api on 100 users
    user_profiles = api.lookup_users(subset_of_follower_ids)
    
    # did we get 100 user profiles? let's check by calling len()
    print len(user_profiles)
    
    # looks good. now, let's loop over these 100 user profiles and print them
    # definitely don't tell Mark that we are doing loops inside of loops!!!
    for user_profile in user_profiles:
        print user_profile.screen_name
        print user_profile.description
        
    # just for fun, i'm going to "break" out of the list so we dont get all 2500 followers just yet
    break
    

## Save @rosieturning's Followers to a CSV File

It's easier to work on data stored locally on your machine. Calling Twitter 10s or 100s of times can be slow!

So, let's grab all of the `@rosieturing's` followers and save them in a CSV file.

What information about each of her followers should we save?

But, before we that, let's revisit the Python `csv` module which lets us easily read and write csv files.

In [None]:
# write some test data to a csv file
from csv import writer

# first, open the file that we will "write" our csv data to
test_file = open("test.csv", "wb")

# second, create the csv "writer" object
csv_file = writer(test_file)

# write the header row and then a few rows of data
csv_file.writerow(["screen name", "number of followers"])
csv_file.writerow(["myoung", "4000"])
csv_file.writerow(["cocteau", "3000"])
csv_file.writerow(["realDonaldTrump", "24000000"])


In [None]:
# read the csv file
from pandas import read_csv
data = read_csv("test.csv")
data

Ok, lets get back to Rosie...now, we can get all of her followers and save their profile info to a CSV file.

In [None]:
# first, let's create the csv file
rosies_file = open("rosies_followers.csv", "wb")

# next, create the csv "writer" object
csv_file = writer(rosies_file)

# write the "header" row to the csv file
csv_file.writerow(["screen name", "bio", "friends", "followers", "tweets", "listed", "favorites"])

# make the twitter api call to get rosie's follower ids
follower_ids = api.followers_ids(screen_name='rosieturing')

# loop through the list of follower ids and get 100 user profiles from twitter at a time
number_of_followers = len(follower_ids)
for i in range(0, number_of_followers, 100):
    
    # get 100 follower ids at a time
    subset_of_follower_ids = follower_ids[i:i + 100]
    
    print "getting 100 more user profiles..."
    # the "user lookup" api call - this should return 100 user profiles
    user_profiles = api.lookup_users(subset_of_follower_ids)
    
    # write some of the user's info to the csv file
    for user_profile in user_profiles:
        csv_file.writerow([
            user_profile.screen_name.encode('utf-8'),
            user_profile.description.encode('utf-8'),
            user_profile.friends_count,
            user_profile.followers_count,
            user_profile.statuses_count,
            user_profile.listed_count,
            user_profile.favourites_count,
        ])
        
print "done!"

In [None]:
# read the csv file
from pandas import read_csv
data = read_csv("rosies_followers.csv")
data.head(50)

Yay, we have our data and it stored on our machine. We can use the Twitter API if we need to get updated information about these users, but we have what we need for now.

## Bot or Not?
How can we tell which of Rosie's followers are bots and which are not? Here are some resources to hep you address the question. We will dig into it more on Thursday, but for now think about what you'd do. What data are available? How would you use it?


[twitteraudit](https://www.twitteraudit.com/)

[Fake Follower Check](https://fakers.statuspeople.com/)

[Bot or Not](http://botornot.co/)

[Why can't Twitter kill its bots](http://fusion.net/story/195901/twitter-bots-spam-detection/)

[The Rise of Social Bots](http://cacm.acm.org/magazines/2016/7/204021-the-rise-of-social-bots/fulltext)

[The DARPA Twitter Bot Challenge](https://arxiv.org/abs/1601.05140)

[How Twitter Bots Are Shaping the Election](https://www.theatlantic.com/technology/archive/2016/11/election-bots/506072/)
