# <font color='#eb3483'> HTTP Requests \& APIs </font>

In this notebook we are going to learn how to get information from a website. The fancy way of saying that is to do **HTTP requests**. HTTP requests have methods that let you **GET** and POST data. We'll focus on the GET method which is used for information retrieval.

Some extra reading material: https://realpython.com/python-requests/

The most common library used to do HTTP requests in python is the `requests` library (see [documentation](https://requests.readthedocs.io/en/master/)).

In [None]:
import requests

## <font color='#eb3483'> GET Requests </font>

To retrieve a resource from an online service, we us the HTTP **GET** method, which can be called with `requests.get(url)`. For example, if we want to get the main page at [Wikipedia](https://wikipedia.com/) we can do it like this:

In [None]:
response = requests.get("https://wikipedia.com/")

We can check the status of the request to verify that the request was successful.

In [None]:
response.status_code

In this case we get a **200** status code, meaning that our request was successful and the server returned the data we requested. You can check the meaning of standard response statuses [here](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status). 

A shortcut for checking for success is:

In [None]:
response.ok

We can see what data the server has returned with `response.content` (bytes) or `response.text` (string).

In [None]:
response.content

In [None]:
type(response.content)

In [None]:
response.text

In [None]:
type(response.text)

We can now save the request content to an html file and open it locally.

In [None]:
with open("wikipedia.html", "wb") as fname:
    fname.write(response.content)

In [None]:
import webbrowser
webbrowser.open("./wikipedia.html")

The Wikipedia page we got via `requests` has most of the relevant page content, but some of the style elements are missing. This is because `requests.get()` only gets the html content, it doesn' load the css or other resource that the page needs.

## <font color='#eb3483'>GETing useful responses</font>
### Reddit

Let's try to get the top 5 posts from the `/r/datascience` subreddit.

In [None]:
response = requests.get("http://www.reddit.com/r/datascience")
response.ok

Oh no. It looks like our request did not work. Let's check the status code.

In [None]:
response.status_code

We know that status codes starting with *2xx* indicate success. Statuses starting with *5xx* indicate server errors, meaning that the server from which we requested data experienced a problem. To get more details, we can check the attribute `reason` in the response object.

In [None]:
response.reason

**Bad Gateway?** That's weird. We can stil open reddit on our browser.

In [None]:
webbrowser.open("http://www.reddit.com/r/datascience")

When we do an http request, we send **Request Headers** alongside the request method (GET) and url. The headers provide metadata about the request and the requester. Let's check our requests headers:

In [None]:
response.request.headers

The **User-Agent** header identifies the requester. Our user agent is set to `python-requests/2.22.0`, which is pretty generic for a python application. This may make Reddit suspicious of our request, and our access is denied.

We can fix this by providing our own user agent with the `headers` argument.

In [None]:
response = requests.get(
    "http://www.reddit.com/r/datascience", 
    headers={"User-Agent":"My User Agent 1.0"})

In [None]:
response.ok

In [None]:
response.content

Sweet, it worked! But the content is a pain to parse. Thankfully Reddit has a json API. Data **API**s (short for Application Programming Interface) are frequently made available by data providers to manage data access and faclitate common requests. Let's take advantage of it.

In [None]:
response = requests.get("http://www.reddit.com/r/datascience/.json?limit=5", 
                        headers={"User-Agent":"My User Agent 1.0"})

response.headers

The API returns the response as json, which has the same format as the dictionary data type. We can use the `requests.json()` method to parse it: 

In [None]:
data = response.json()

And make the response look pretty:

In [None]:
from pprint import pprint

In [None]:
pprint(data)

The first key in the dictionary is `data` and one of the level-one nested keys is `children` with the actual posts. We can get the title for the second post with:

In [None]:
data['data']['children'][1]['data']['title']

If we want to get all the posts' titles we just have to iterate:

In [None]:
titles = []
for post in data['data']['children']:
    titles.append(post['data']['title'])
    
titles

This gets a bit complicated. An alternative is to use [`glom`](https://github.com/mahmoud/glom) which can iterate dictionaries much faster.  
`conda install -c conda-forge glom`  
**NOTE:** `glom` is a fairly new library so it might be unstable.

In [None]:
from glom import glom
glom(data, ('data.children', ['data.title']))

## <font color='#eb3483'>Tweeting up a Storm</font>

Many apps manage access to their data much more tightly with APIs. For example, while data from the Twitter website can be retrieved with an HTTP request, it won't give you much useful information. Instead, Twitter has created an API that allows developers to access much richer subsets of their data, based on user permissions. 

You can access the [Twitter API](https://developer.twitter.com/en/docs/basics/getting-started) directly, or use the [tweepy](http://docs.tweepy.org/en/latest/getting_started.html) python library, which is more convenient. We will use tweepy.

To access the data, you need to be a Twitter user and follow an authentication process for a developer account.

1. Create a twitter account
2. Create a twitter [app](https://developer.twitter.com/en/apps) by following the link and selecting the **create an app** button
3. Select the **Student** option from the Academic column and fill in the necessary details in the forms that follow. For security reasons Twitter asks several questions on how you intend to use the data. You need to answer these and can elaborate that you are learning to use APIs as part of a Data Science course.
4. Twitter will now review your request, which may take a while.

Once your developer account has been approved, you can set up access fpr the Twitter API. You will need to provide your credentials to the API before you can retrieve data with python. It is good practice to read in sensitive information like credentials from a separate file. We will use a `yaml` file to store our Twitter credentials, as yaml is human readable and commonly used for providing credentials.

In [None]:
import tweepy
import yaml

Create a `credentials.yml` file in your working directory. The file must contain your *consumer_key*, *consumer_secret*, *access_token* and *access_token_secret*. You can retrieve them from within the Twitter app (after it has been approved). The credentials file must be formatted as follows:

```
consumer_key: my_key
consumer_secret: my_secret
access_token: my_token
access_token_secret: my_token_secret
```

Next, load the credentials file so that you can pass your credentials to the Twitter API.

In [None]:
with open('credentials.yml') as f:
    credentials = yaml.load(f, Loader=yaml.FullLoader)
    
credentials

When authenticating the API, you may need to grant it access by accepting standard OAuth pop-ups (more information on authentication [here](http://docs.tweepy.org/en/latest/auth_tutorial.html)).

In [None]:
auth = tweepy.OAuthHandler(credentials['consumer_key'], credentials['consumer_secret'])
auth.set_access_token(credentials['access_token'], credentials['access_token_secret'])

api = tweepy.API(auth)

And finally you are ready to get some tweets. The `user_timeline()` method returns the 20 most recent statuses posted from the authenticating user or another user specified via the id parameter.

In [None]:
public_tweets = api.user_timeline()
for tweet in public_tweets:
    print(tweet.text)

To do more fun things, you can browse through the [tweepy API reference guide](http://docs.tweepy.org/en/latest/api.html).

### Exercise

Post a tweet to share you experience of the first two days of the Data Science course. Here is some inspiration:

```
"Just started learning #datascience with @iXperienceCT. Tweeting from my jupyter notebook. Love my TA @wiebketous. Wish I was in Cape Town though."

```