In this tutorial, we use the Python package Tweepy to collect a user's public tweets.

## Tutorial contents
* [Providing authorization to the Twitter API](#Providing-authorization-to-the-Twitter-API)
* [Collecting tweets](#Collecting-tweets)
* [Getting information about an account](#Getting-information-about-an-account)
* [Getting follower IDs](#Getting-follower-IDs) 
* [Getting the IDs of users being followed by a specified account](#Getting-the-IDs-of-users-being-followed-by a-specified-account) 
* [Getting tweets favorited by a user](#Getting-tweets-favorited-by-a-user)
* [Getting info on friendship relations](#Getting-info-on-friendship-relations)
* [Getting retweets of a certain status](#Getting-retweets-of-a-certain-status)
* [Rate limits and cursor](#Rate-limits-and-cursor)

## Providing authorization to the Twitter API

The first step is to become a Twitter developer. For this you need a Twitter account yourself, and [to create a new app](https://apps.twitter.com/).

Once you're a developer, you will found your access credentials under the Keys and Access Tokens tab of your new app. You will need the following fields:

1. Consumer Key (API Key)
2. Consumer Secret (API Secret)
3. Access Token
4. Access Token Secret

Now we have access. Tweepy is a Python module which you will find in the PyPI repository

    pip install tweepy
    
This provides a convenient front-end for the Twitter API, giving us easy access without having to venture outside of our Python environment.

In [2]:
import tweepy

CONSUMER_KEY        = 'your-consumer-key'
CONSUMER_KEY_SECRET = 'your-consumer-key-secret'
ACCESS_TOKEN        = 'your-access-token'
ACCESS_TOKEN_SECRET = 'your-access-token-secret'

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_KEY_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)

api = tweepy.API(auth)

Now that we have access to the Twitter API, there are a range of different requests we can make. We can use GET to retreive information about any public users or tweets, and even POST to make changes to the account we used to authorize, such as following accounts and making tweets. All functions of the API are [thoroughly documented](https://dev.twitter.com/rest/reference), so below we will only go over a few examples of the most common tasks.  

## Collecting tweets

Statuses posted by a specified user can be collected with a [GET statuses/user_timeline](https://dev.twitter.com/rest/reference/get/statuses/user_timeline) request. We need to specify either the ID, user ID or screen name of the user, and we can include other options such as the number of statuses to retrieve, the first and last status to be collected, and whether retweets should be included or not. If provided, `count` limits the number of results returned from the search. Otherwise you will simply encounter the rate limit on the Twitter API or the end of the user's timeline. Retweets are counted towards your app rate limit. See the last section on [Rate limits and cursor](#Rate-limits-and-cursor) to learn how to handle rate limits and get more tweets using the cursor. 

The search returns a list of *Status* objects. 

In [3]:
search = api.user_timeline(screen_name = 'david_cameron', count = 100, include_rts = True) 

Each `Status` object contains a number of relevant fields, which can be accessed with `status.[field_name]`. 

In [4]:
status=search[0]
print "Tweet text:", status.text

Tweet text: Wishing everyone a very happy and peaceful Christmas! https://t.co/TavD4h1ake


You can get a list of all the field names:  

In [5]:
for key,value in status.__dict__.items():  #same thing as `vars(status)`
    print key

contributors
truncated
text
is_quote_status
in_reply_to_status_id
id
favorite_count
_api
author
_json
coordinates
entities
in_reply_to_screen_name
id_str
retweet_count
in_reply_to_user_id
favorited
source_url
user
geo
in_reply_to_user_id_str
possibly_sensitive
lang
created_at
in_reply_to_status_id_str
place
source
extended_entities
retweeted


Printing the entire content of the request is not very informative, since it contains a large amount of meta-data. And while it is useful to know how to access particular fields, often what we want is to retrieve all the information and store it somewhere for later processing. We will therefore write our search output to a file, where each line corresponds to a tweet in .json format. Note that this is one of the fields included in the `Status` object. 

In [6]:
import json
F_NAME = 'david_cameron_timeline.json'
with open(F_NAME,'w') as f_out:
    for status in search:
        json.dump(status._json, f_out)
        f_out.write('\n')

## Getting information about a user account

We can also get detailed information about an account, such as the account description, number of followers, number of users followed, the date the account was created, location, number of tweets, a link to the profile image, number of favorites, etc. The argument needed is either `id`, `user_id` or `screen_name`. The output is a User object. We will again save the output object as a .json file.  

In [7]:
user_info = api.get_user(screen_name = 'david_cameron')
print "Account description:", user_info.description
print "Followers:", user_info.followers_count

#for key,value in user_info.__dict__.items():  #same thing as `vars(status)`
#    print key

F_NAME = 'david_cameron_user_info.json'
with open(F_NAME,'w') as f_out:
    json.dump(user_info._json, f_out)

Account description: Former Prime Minister of the United Kingdom
Followers: 1611537


## Getting follower IDs

We can get a list of the IDs of the first 5000 users following a certain account with `api.followers([id/screen_name/user_id])`. See the [Rate limits and cursor](#Rate-limits-and-cursor) section at the end to find out how to get more than the first 100 users.  

In [8]:
followers = api.followers_ids(screen_name = 'david_cameron')

#Save list of followers:
F_NAME = 'david_cameron_followers.txt'
with open(F_NAME,'w') as f_out:
    for follower in followers:
        f_out.write("%s\n" % follower)

## Getting the IDs of users being followed by a specified account

We can also get the IDs of users being followed by the specified user:

In [9]:
friends = api.friends_ids(screen_name = 'david_cameron')
print "Cameron follows", len(friends), "users."

F_NAME = 'followed_by_david_cameron.txt'
with open(F_NAME,'w') as f_out:
    for friend in friends:
        f_out.write("%s\n" % friend)

Cameron follows 391 users.


## Getting tweets favorited by a user
We can get a list of tweets favorited by a user:

In [10]:
favorites = api.favorites(screen_name = 'david_cameron')
print "Number of likes:", len(favorites)

F_NAME = 'david_cameron_favorites.json'
with open(F_NAME,'w') as f_out:
    for favorite in favorites:
        json.dump(favorite._json, f_out)
        f_out.write('\n')

Number of likes: 2


## Getting info on friendship relations

We can get informaton about the existance of a friendhip between two users (a `subject user` and a `target`), and other characeristics of the relation with `api.show_friendship(source_id/source_screen_name, target_id/target_screen_name)`. 

In [11]:
friendship=api.show_friendship(source_screen_name="david_cameron", target_screen_name="ExeterQStep")
print "Source(Cameron) followed by target(Exeter Q-Step)?", friendship[0].followed_by
print "Target(Exeter Q-Step) followed by source(Cameron)?", friendship[1].followed_by

Source(Cameron) followed by target(Exeter Q-Step)? True
Target(Exeter Q-Step) followed by source(Cameron)? False


## Getting retweets of a certain status
`api.retweets(id[, count])` returns up to 100 of the first retweets of a given tweet.

In [12]:
retweets = api.retweets(id = 701057384869969921, count=100)

F_NAME = 'status_retweets.json'
with open(F_NAME,'w') as f_out:
    for retweet in retweets:
        json.dump(retweet._json, f_out)
        f_out.write('\n')

## Rate limits and cursor

Twitter [API rate limits](https://dev.twitter.com/rest/public/rate-limiting) are limiting the number of requests you can make in a certain time frame. Tweepy can help handle these limitations. 
First, you can set a number of additional parameters in the `tweepy.api` class: 
* `retry_count` – default number of retries to attempt when error occurs
* `retry_delay` – number of seconds to wait between retries
* `retry_errors` – which HTTP status codes to retry
* `wait_on_rate_limit` – Whether or not to automatically wait for rate limits to replenish
* `wait_on_rate_limit_notify` – Whether or not to print a notification when Tweepy
Setting the last two parameters to `True` usually handles the rate limits. 
So we can redefine our API instance with these parameters: 

In [43]:
api = tweepy.API(auth, 
                 retry_count=5,
                 retry_delay=10,
                 retry_errors=set([401, 404, 500, 503]),
                 wait_on_rate_limit=True,
                 wait_on_rate_limit_notify=True)

To handle pagination, Tweepy has the extremely helpful Cursor object. Instead of manually iterating through the pages of a user timeline, we can use the cursor: 

In [None]:
F_NAME = 'ExeterQStep_timeline_all.json'
with open(F_NAME,'w') as f_out:
    for status in search:
        for status in tweepy.Cursor(api.user_timeline, screen_name = 'ExeterQStep', include_rts = True).items():
            json.dump(status._json, f_out)
            f_out.write('\n')

In [None]:
F_NAME = 'ExeterQStep_followers.txt'            
#Save list of followers:
with open(F_NAME,'w') as f_out:
    for follower in tweepy.Cursor(api.followers_ids, screen_name = 'david_cameron').items():
        f_out.write("%s\n" % follower)