![ADSA Logo](http://i.imgur.com/BV0CdHZ.png?2 "ADSA Logo")

# ADSA Workshop 5 - Data Mining with Web APIs
Workshop content created by Shivam Bharuka and Jackson Davis, with some content adapted from:
* [Data Science from Scratch - First Principles with Python](http://www.amazon.com/Data-Science-Scratch-Principles-Python/dp/149190142X).

In this workshop, we will discuss the OAuth authentication protocol, what web APIs are, and how we can use them to get meaningful data for our projects. We will be working with popular social media APIs such as the Twitter and Instagram APIs, and cover examples showing how we can pull, run computations on, and visualize real world data.

If you’ve missed out on any of our previous Python workshops (Basic Python, Advanced Python, Numpy, Statistics, and Probability, and Pandas and Matplotlib) or want to learn Python from scratch, we have provided follow-along tutorials on our GitHub: https://github.com/ADSA-UIUC. Installation instructions are included in the Introduction to Python tutorial.
***

# What Is an API?

As we know, the internet contains a massive amount of useful data. In order to access this data (weather reports, news articles, social media posts, etc.), developers often have to write programs known as web scrapers, which manually go through a web page's HTML code in order to extract data. This is often a very tedious task, and any changes made to the HTML code on the company's end would often require us to make changes to our web scraper. 

An API, or Application Programming Interface, is a interface provided by a company or organization, which allows us to explicitly request data in a structured format without having to rely on a web scraper to get the data ourselves!

# JSON

In order to effectively communicate data between a server and a client, us in this case, developers often rely on a data model known as JSON. JSON, or JavaScript Object Notation, is a text-based format containing keys and values, which acts extremely similar to a Python dictionary.

JSON can be thought of the equivalent of a dictionary in JavaScript, a language widely used in web development, and though this format is native to JavaScript, Python provides us with tools for parsing the data into a format it can understand. 

Here is an example of a JSON object:

In [6]:
"""{"title" : "Data Science Book",
"author" : "Joel Grus",
"publicationYear" : 2014,
"topics" : [ "data", "science", "data science"] }"""

{'data': [{'attributes': {'body': 'ADSA is a cool club.',
    'title': 'ADSA Rocks'},
   'id': '1',
   'type': 'articles'}],
 'included': [{'attributes': {'name': 'John'}, 'id': '42', 'type': 'people'}]}

This might look identical to a dictionary, however, JSON is sent as a string of a text rather than a data structure. Furthermore in real servers, there will be more information than just the JSON object returned from a request (we will go over examples of this shortly). Now, we can discuss how we request data from a server.

# HTTP Requests

The HTTP protocol is a system dictating how both requests and responses should be made through the internet. This system is supported by all browsers, and is the basis for how information is transfered through the web. We will now discuss how data is requested through the HTTP protocol.

When a client makes a request, either through a web browser or coding language, attached to that request is a HTTP verb label, telling the server what sort of request is being made. The HTTP protocol has defined a list of "HTTP Methods", or "HTTP Verbs", outlining what requests we can make. A full list of the HTTP request methods can be found [here](http://www.tutorialspoint.com/http/http_requests.htm), however we can quickly go over the two most commonly used requests.

### GET 

A HTTP GET request is sent to a specific URL or URI (more on this shortly) when a client wants to retrieve information from a server. In this case, the server will pull the requested information from its database, package it into JSON (or XML in some cases), and return the data as a HTTP response. As we will discuss in a second, a server knows which information to return based on what URL the GET request is sent to. After a GET request, no data is changed on the servers end, it is only copied and sent to the client.

### POST

On the other hand, A HTTP POST request is sent to a specific URL or URI when a client wants to send information to a server. The data sent to a server is often packaged into what we call "forms or headers" rather than the actual URL we see (we will show an example of this later). In this case, the server will take whatever information has been sent, and modify its database to add the new data. When it completes its task, the server will often return a HTTP response stating whether the data went through properly or not. POST requests are used often in things such as Facebook or social media posts in general, or any time new data is being sent from a client to a server.

# HTTP Responses

Whenever a server receives a request, it will notify the client whether or not its request went through properly through a number of HTTP respose codes. While verbs such as GET, POST, PATCH, or PUT are used for HTTP requests, 3 digit status codes are used for HTTP responses. The full list of HTTP response codes can be viewed [here](https://en.wikipedia.org/wiki/List_of_HTTP_status_codes). While there are many codes, in general, there are only 5 major categories, broken up by 100s, of responses. These are:
* 1xx - Informational Responses
* 2xx - Success Responses
* 3xx - Redirection Responses
* 4xx - Client Error Responses
* 5xx - Server Error Responses

If any of this sounds confusing, an example that many of us have seen of a response code is 404, or Not Found. We often see this code when we are trying to go to a webpage that no longer exists, or a URL extension that is not valid. An example of a 404 error can be found at https://www.facebook.com/wut. Since this URL does not actual exist, and there is no specific data for that page, Facebook, or servers in general, will often make a placeholder page like the one shown. On any page that does exist, though, our goal is to receive a 200 or 2xx response code, meaning that everything is ok!

# OAuth

Considering the private nature of a lot of the data in social media websites, or servers in general, as well as the private nature of a client's password information, many APIs require you to comply with the OAuth authentication protocol. The OAuth protocol allows an access token to be created corresponding to a user, which gives that user access to various API functionality. In order to create an access token, many APIs require users to enter their user and password information one time, which is then converted into a token. By using a OAuth token rather than requiring a user to enter their password, it assures that the clients password information is not visible to anyone on or watching a server. 

While the specifics of the OAuth protocol are not important for now, it is necessary to know that most APIs will require you to authenticate your information with the server, and send requests with an access token.

# Using an API

Since HTTP gives us a convenient and defined way of carrying out web communication, companies and organizations began to create a well-defined list of URL's which, when sent an HTTP request, will return specific pieces of data. 

We will now go over some examples using the OpenWeatherMap API. The OpenWeatherMap documentation is located [here](http://openweathermap.org/api), and contains information about making requests, the structure of its responses, and the various URL's and HTTP verbs to use to retrieve data. 

In order to interact with the API, we are required to make a developer account so OpenWeather knows where its requests are coming from. For the purposes of this demo, I have created an account using an extra email. (Note that there are limits to how many requests we can make per day, so the account may not always work if multiple people are following this tutorial at once).

## Python Requests

In order to make HTTP requests in Python, we can use the conveniently named requests library. As mentioned, in order to use the OpenWeather we need to authorize our account. Each API works differently, so it's important to read the documentation. I have already saved us the trouble of registering for a developer account and receiving an OAuth token as described on [this link](http://openweathermap.org/appid#use).

Since we have the OAuth token, we can now make requests to the OpenWeatherMap API like so:

In [10]:
import requests, json

# From following the authorization instructions online
api_key = "a0ed496d90b0244be3083ed1223af1e4"

# The GET users/search endpoint
city = "Chicago"
endpoint = "http://api.openweathermap.org/data/2.5/weather?q=" + city + "&APPID=" + api_key
repos = json.loads(requests.get(endpoint).text)

print repos 

{u'clouds': {u'all': 1}, u'name': u'Chicago', u'coord': {u'lat': 41.85, u'lon': -87.65}, u'sys': {u'country': u'US', u'sunset': 1459729145, u'message': 0.0033, u'type': 1, u'id': 961, u'sunrise': 1459682931}, u'weather': [{u'main': u'Clear', u'id': 800, u'icon': u'01n', u'description': u'clear sky'}], u'cod': 200, u'base': u'cmc stations', u'dt': 1459656533, u'main': {u'pressure': 1016, u'temp_min': 271.15, u'temp_max': 274.15, u'temp': 272.56, u'humidity': 72}, u'id': 4887398, u'wind': {u'speed': 4.6, u'deg': 320}}


# The Twitter API

In [2]:
import tweepy
print "Module Imported!"

Module Imported!


In [35]:
# Initialize keys and secret tokens from the twitter dev account: apps.twitter.com
consumer_key = "L9kS8qkG3UCO1v1XaUl4rzPH8"
consumer_secret = "7cMcsIeaWExidr2iwRtiGhkgdYUV2o9MvoSakUFZMGNXJTlQ2m"
access_token = "1620955808-NwFL70fMhxRSXBMxyFRlVAvFAg5aiQ3pp31Bs1j"
access_token_secret = "DPI3295IRmktAqcF0r0q7fM2CsHXY5FndrvYEHy4wJdil"

In [37]:
# Create an Oauth Handler instance
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
# Construct the API instance
api = tweepy.API(auth)

In [49]:
# Tweet a test message and then destroy it using the tweet id
status = api.update_status("Hi this is a test message.")
status_id = status.id
status_destroy = api.destroy_status(status_id)

In [41]:
# Get the user object from "realDonaldTrump"
user = api.get_user('realDonaldTrump')
statuses = user.status
print statuses.text

.@FoxNews should be ashamed for allowing experts to explain how to make a nuclear attack!


# Alchemy API

In [58]:
# import alchemy API
from alchemyapi import AlchemyAPI
# initialize api instance
alchemyapi = AlchemyAPI()

In [62]:
# Analyze the sentiment of the text
response = alchemyapi.sentiment("text", statuses.text)
print "Sentiment: ",response["docSentiment"]["type"]

Sentiment:  negative
