# Exercise 9: Twitter API with Tweepy

Welcome to exercise 9! In this session, you will set up your own Twitter API credentials so you can access the Twitter API from Python using `tweepy`.

If you have launched this notebook in `binder` the `tweepy` library should already have been installed on the underlying virtual machine that the copy of Jupyter Notebook is running in. If you have downloaded this notebook to use on your own computer (or in the CTR), you might need to install `tweepy` first before running this notebook. To install `tweepy` if you have Anaconda installed, open up the command-line (`Command prompt` on Windows, or `Git Bash` in the CTR) and type:

```
conda install tweepy
```

and press Enter. If you only have Python 3 and do not have Anaconda installed, type:

```
pip install tweepy
```

and press Enter. Some text should whiz by indicating that it's installing various things. Once that is done, please restart Jupyter Notebook.

For full documentation about how to use `tweepy` you can reference here: http://www.tweepy.org

<img src="images/twitter-python-json.png/">

Essentially what we are going to do is to use Python to query Twitter's Web-based API, which will return us some JSON data. We can transform this JSON into something that we can play with - a `DataFrame`.

### Import our libraries

In [None]:
import tweepy
import pandas as pd
import numpy as np
from IPython.display import display
import matplotlib.pyplot as plt
%matplotlib inline

Before we can do anything, you need to sign up to Twitter (if you haven't already got an account) so that Twitter can grant you programmatic access credentials. Access is free, but there are limitations for non-commercial access.

To get your consumer and access credentials set up, please follow the instructions found here: https://www.gabfirethemes.com/create-twitter-api-key/

In [None]:
# Enter your Twitter API key and access tokens here
api_key = ''
api_secret = ''
access_token = ''
access_token_secret = ''

In [None]:
# Set up the Auth Tweepy object. This takes your credentials from above
# and does some authentication with Twitter
auth = tweepy.OAuthHandler(api_key, api_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit_notify=True, wait_on_rate_limit=True)

Now let's extract some tweets for a particular user. For example, we could use `realDonaldTrump` or another screen name of a Twitter user (you can choose your own if you wish).

In [None]:
# Fill in the blanks below. screen_name should be a string of the user 
# we wish to gather tweets from. count is the number of tweets.
tweets = api.user_timeline(screen_name="realDonaldTrump", count=200)
print("Number of tweets extracted: {}.\n".format(len(tweets)))

# We print the most recent 5 tweets:
print("5 recent tweets:\n")
for tweet in tweets[:5]:
    print(tweet.text)
    print()

### Creating a DataFrame

Next, create a `DataFrame` from the tweets gathered in the previous step. Essentially we have a series of tweets, so this should be straightforward.

In [None]:
# Each tweet from above is an object that holds many properties about
# each tweet. Here, we use a Python structure called a list comprehension
# to only extract the tweet text, given by tweet.text on each tweet 
# found in the tweets collection returned above.
data = pd.DataFrame(data=[tweet.text for tweet in tweets], columns=['Tweets'])
data.head(10)

Now we have a simple table based on the text of tweets. Let's take a look at what other things we might be able to extract from a `tweet` object. 

In [None]:
# dir gives us a directory of attributes that we can access using the 
# dot method on an object. Ignore everything enclosed with underscores, 
# as these are for internal Python use only.
print(dir(tweets[0]))

We can see here, there are other intersting attributes such as:

In [None]:
# Print some info from the first tweet:
print(tweets[0].id)
print(tweets[0].created_at)

In [None]:
# Add a print statment to print out the following 
print( ... )  # tweet source
print( ... )  # number of favourites
print( ... )  # number of retweets
print( ... )  # geographical information
print( ... )  # coordinates of the tweet
print( ... )  # other entities related to the tweet

Let's add some columns to our `DataFrame`. Use what you have learned about how to add columns to an existing `DataFrame`, and using the list comprehension pattern from above to add some more relevant data that we can then process.

In [None]:
# We add relevant data:
data['len']  = ...  # number of characters in a tweet (think length)
data['ID']   = ...
data['Date'] = ...
data['Source'] = ...
data['Likes']  = ... # favorite_count
data['RTs']    =  ... # retweet count

In [None]:
# Display of first 10 elements from dataframe:
display(data.head(10))

**What is the average length of tweet?**

In [None]:
mean = ...
print("The average length of the tweets: {}".format(mean))

**Which tweet had the most likes?**

In [None]:
fav_max = ...
fav_tweet = ...

# Max FAVs:
print("The tweet with more likes is: \n{}".format(data['Tweets'][fav_tweet]))
print("Number of likes: {}".format(fav_max))
print("{} characters.\n".format(data['len'][fav_tweet]))

**Which tweet had the most retweets (RTs)?**

In [None]:
rt_max = ...
rt_tweet = ...

# Max RTs:
print("The tweet with more retweets is: \n{}".format(data['Tweets'][rt_tweet]))
print("Number of retweets: {}".format(rt_max))
print("{} characters.\n".format(data['len'][rt_tweet]))

## Time series

Note that earlier, we added a `Date` column to our `DataFrame` containing tweet data. This means we can plot or analyze time series data directly from Pandas `Series` objects. For example:

In [None]:
# We can create a time series of tweet length like this
tlen = pd.Series(data=data['len'].values, index=data['Date'])

In [None]:
# ...and plot it against time
tlen.plot(figsize=(16,4), color='r')

In [None]:
# Plot a likes vs retweets visualization, by getting the time series 
# for favourites and of retweets
tfav = ...
tret = ...

tfav.plot(figsize=(16,4), label="Likes", legend=True)
tret.plot(figsize=(16,4), label="Retweets", legend=True)

When you're finished with exercise 9,

If you are running this notebook using Binder, choose **Save and Checkpoint** from the **File** menu, **rename** your notebook to add a hyphen and your initials to the notebook name e.g. `Ex09_Twitter_API_with_Tweepy-DJ`, then choose **Download as Notebook** and save it to your computer or USB stick.

If you are running this notebook on your own machine, choose **Save and Checkpoint** from the **File** menu, choose **Make a copy** from the **File** menu, then **rename** your notebook to add a hyphen and your initials to the notebook name e.g. rename from `Ex11_Twitter_API_with_Tweepy-Copy1` to `Ex09_Twitter_API_with_Tweepy-DJ`.

<sup>Copyright © David Johnson, 2018. This notebook is provided for use with permission for the Hilary Term 2019 University of Oxford course "Programming for Data Science".</sup>