In [None]:
#jupyter nbconvert api_workshop.ipynb --to slides --post serve

# Working with Open Data - Intro to APIs


### http://bit.ly/2nMa0Pn



<br/>
<br/>

### Amir Imani / @amiros_imani

## What is API?

<img src="./images/api-explained.png">

source: https://www.humcommerce.com

Application Program Interfaces, or APIs, are commonly used to retrieve data from remote websites. Sites like Twitter, and Facebook  offer certain data through their APIs. To use an API, you make a request to a remote web server, and retrieve the data you need.

+++++++++++++++++++++++++++++++++++++++++++++++++++

An API, or Application Program Interface, makes it easy for developers to integrate one app with another. 
They expose some of a program's inner workings in a limited way.

You can use APIs to get information from other programs or to automate things you normally do in your web browser. Sometimes you can use APIs to do things you just can't do any other way.

you're comfortable with Python's syntax, structure, and some built-in functions.

# Why API matters?

- The data is changing quickly.

- You want a small piece of a much larger set of data.

- There is repeated computation involved. 

- An example of this is stock price data. It doesn't really make sense to regenerate a dataset and download it every minute -- this will take a lot of bandwidth, and be pretty slow.

- Reddit comments are one example. What if you want to just pull your own comments on Reddit? It doesn't make much sense to download the entire Reddit database, then filter just your own comments.

- Spotify has an API that can tell you the genre of a piece of music. You could theoretically create your own classifier, and use it to categorize music, but you'll never have as much data as Spotify does.

<img src="./images/nyt.png">

<br/>


Request a key from https://developer.nytimes.com

In [1]:
KEY = '03ad3061eb13451d84af8c1eb86474f2'

# API Requests

APIs are hosted on web servers. When you type www.google.com in your browser's address bar, your computer is actually asking the www.google.com server for a webpage, which it then returns to your browser.

APIs work much the same way, except instead of your web browser asking for a webpage, your program asks for data. This data is usually returned in JSON format.

In order to get the data, we make a request to a webserver. The server then replies with our data. In Python, we'll use the requests library to do this.

There are many different types of requests. The most commonly used one, a GET request, is used to retrieve data.

We can use a simple GET request to retrieve information fomr API


#  Main Questions to Ask

- What does a request look like? 
- What does a response look like? 
- What goes into the request or response headers? *

In [3]:
# !pip install requests
import requests

# Set base url path
base_url = 'http://api.nytimes.com/svc/search/v2/articlesearch.json?api-key={}'.format(KEY)

# Send a request
response = requests.get(base_url)

In [4]:
# Print the status code of the response.
print(response.status_code)

200


<img src="./images/status_codes.jpg">

The 200 series means "success" — your request was valid, and the response is what logically follows from it.
The 400 series means "bad request — something was wrong with the request, so the server did not process it as you wanted it to. Common causes for HTTP 400-level errors are badly-formatted requests and authentication problems.
The 500 series means "server error" — your request may have been OK, but the server couldn't give you a good response right now for reasons out of your control. These should be rare, but you need to be aware of the possibility so you can handle them in your code.

In [6]:
# Trying a wrong URL
new_url = base_url + '?abc'
new_response = requests.get(new_url)

print(new_response.status_code)

403



200 -- everything went okay, and the result has been returned (if any)

301 -- the server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint name is changed.

401 -- the server thinks you're not authenticated. This happens when you don't send the right credentials to access an API (we'll talk about authentication in a later post).

400 -- the server thinks you made a bad request. This can happen when you don't send along the right data, among other things.

403 -- the resource you're trying to access is forbidden -- you don't have the right permissions to see it.

404 -- the resource you tried to access wasn't found on the server.§

# Query Parameters





Availble at API Documentation: http://nyti.ms/2EcCiNc

In [24]:
# Set up the parameters we want to pass to the API.
parameters = {"q": ['trump+russia'], 'fl': ['byline', 'pub_date', 'headline'],
              'sort': 'newest',
              'begin_date': '20180101', 'end_date': '20180201',
             }


# Make a get request with the parameters.
response = requests.get(base_url, params=parameters)

# Print the content of the response (the data the server returned)
print(response.status_code)

200


#  Main Questions to Ask

- What does a request look like? 
- What does a response look like? 
- What goes into the request or response headers? *

In [25]:
# Check the retrieved information
# print(response.headers)
print(response.content)



b'{"status":"OK","copyright":"Copyright (c) 2018 The New York Times Company. All Rights Reserved.","response":{"docs":[{"headline":{"main":"Trump, Brenda Fitzgerald, Virginia: Your Wednesday Evening Briefing","kicker":null,"content_kicker":null,"print_headline":"Your Evening Briefing","name":null,"seo":null,"sub":null},"pub_date":"2018-01-31T22:57:59+0000","byline":{"original":"By KAREN ZRAICK and DAVID SCULL","person":[{"firstname":"Karen","middlename":null,"lastname":"ZRAICK","qualifier":null,"title":null,"role":"reported","organization":"","rank":1},{"firstname":"David","middlename":null,"lastname":"SCULL","qualifier":null,"title":null,"role":"reported","organization":"","rank":2}],"organization":null},"score":0.10885632},{"headline":{"main":"Syria\'s Kurds Push US to Stop Turkish Assault on Key Enclave","kicker":null,"content_kicker":null,"print_headline":"Syria\'s Kurds Push US to Stop Turkish Assault on Key Enclave","name":null,"seo":null,"sub":null},"pub_date":"2018-01-31T22:24:

# Working With

<img src="./images/json.png">

JSON is a way to encode data structures like lists and dictionaries to strings that ensures that they are easily readable by machines. JSON is the primary format in which data is passed back and forth to APIs, and most API servers will send their responses in JSON format.

Python has great JSON support, with the json package. The json package is part of the standard library, so we don't have to install anything to use it. We can both convert lists and dictionaries to JSON, and convert strings to lists and dictionaries. In the case of our ISS Pass data, it is a dictionary encoded to a string in JSON format.

In [26]:
# Convert response to JSON
data = response.json()
data

{'copyright': 'Copyright (c) 2018 The New York Times Company. All Rights Reserved.',
 'response': {'docs': [{'byline': {'organization': None,
     'original': 'By KAREN ZRAICK and DAVID SCULL',
     'person': [{'firstname': 'Karen',
       'lastname': 'ZRAICK',
       'middlename': None,
       'organization': '',
       'qualifier': None,
       'rank': 1,
       'role': 'reported',
       'title': None},
      {'firstname': 'David',
       'lastname': 'SCULL',
       'middlename': None,
       'organization': '',
       'qualifier': None,
       'rank': 2,
       'role': 'reported',
       'title': None}]},
    'headline': {'content_kicker': None,
     'kicker': None,
     'main': 'Trump, Brenda Fitzgerald, Virginia: Your Wednesday Evening Briefing',
     'name': None,
     'print_headline': 'Your Evening Briefing',
     'seo': None,
     'sub': None},
    'pub_date': '2018-01-31T22:57:59+0000',
    'score': 0.10885632},
   {'byline': {'organization': 'THE ASSOCIATED PRESS',
     'or

In [27]:
# Check the first article
data['response']['docs'][0]

{'byline': {'organization': None,
  'original': 'By KAREN ZRAICK and DAVID SCULL',
  'person': [{'firstname': 'Karen',
    'lastname': 'ZRAICK',
    'middlename': None,
    'organization': '',
    'qualifier': None,
    'rank': 1,
    'role': 'reported',
    'title': None},
   {'firstname': 'David',
    'lastname': 'SCULL',
    'middlename': None,
    'organization': '',
    'qualifier': None,
    'rank': 2,
    'role': 'reported',
    'title': None}]},
 'headline': {'content_kicker': None,
  'kicker': None,
  'main': 'Trump, Brenda Fitzgerald, Virginia: Your Wednesday Evening Briefing',
  'name': None,
  'print_headline': 'Your Evening Briefing',
  'seo': None,
  'sub': None},
 'pub_date': '2018-01-31T22:57:59+0000',
 'score': 0.10885632}

In [30]:
import json
# Save JSON to local drive
with open('./local_file_on_my_laptop.json', 'w') as outfile:
    json.dump(data, outfile)

In [None]:
with open('./local_file_on_my_laptop.json.json') as json_data:
    d = json.load(json_data)
    print(d)

# JSON to CSV

In [29]:
import pandas as pd

def json_to_csv(json_file):
    """ 
    Function to take json file and return pandas datatable
    """
    # Get the useful part
    articles = json_file['response']['docs']

    # Find the column titles
    cols = list(articles[0].keys())

    # Creating new list
    row_list = []

    # Iterating through all articles and get their respective column value for 
    for item in articles:
        row = []
        for col in cols:
            if col != 'headline':
                row.append(item[col])
            else:
                row.append(item[col]['main'])
        row_list.append(row)

    # Creat a dataframe
    df = pd.DataFrame(row_list, columns=cols)
    
    return df


df = json_to_csv(data)
df.head()

Unnamed: 0,headline,pub_date,byline,score
0,"Trump, Brenda Fitzgerald, Virginia: Your Wedne...",2018-01-31T22:57:59+0000,{'original': 'By KAREN ZRAICK and DAVID SCULL'...,0.108856
1,Syria's Kurds Push US to Stop Turkish Assault ...,2018-01-31T22:24:34+0000,"{'original': 'By THE ASSOCIATED PRESS', 'perso...",0.139104
2,Trump Asked No. 2 Justice Department Official ...,2018-01-31T21:18:38+0000,"{'original': 'By REUTERS', 'person': [], 'orga...",1.179438
3,House Republican Nunes Calls FBI Objections to...,2018-01-31T20:33:42+0000,"{'original': 'By REUTERS', 'person': [], 'orga...",0.349224
4,Trump’s Debt to Reagan,2018-01-31T19:46:01+0000,"{'original': 'By CHARLES R. KESLER', 'person':...",0.240419


# Questions?

# Extra

# REST API

REST (Representational State Transfer) was introduced and defined in 2000 by Roy Fielding in his doctoral dissertation.
REST is an architectural style for designing distributed systems. It is not a standard but a set of constraints,
such as being stateless, having a client/server relationship, and a uniform interface.

APIs are hosted on web servers. When you type www.google.com in your browser's address bar, your computer is actually asking the www.google.com server for a webpage, which it then returns to your browser.

APIs work much the same way, except instead of your web browser asking for a webpage, your program asks for data. This data is usually returned in JSON format

# HTTP Methods

## CRUD (create, retrieve, update, delete)

REST is not strictly related to HTTP, but it is most commonly associated with it.

In order to get the data, we make a request to a webserver. The server then replies with our data. In Python, we'll use the requests library to do this. In this Python API tutorial we'll be using Python 3.4 for all of our examples.

## GET / POST / PUT / PATCH / DELETE

HTTP methods

GET. Retrieve information. ...
POST. Request that the resource at the URI do something with the provided entity. ...
PUT. Store an entity at a URI. ...
PATCH. Update only the specified fields of an entity at a URI. ...
DELETE. Request that a resource be removed; however, the resource does not have to be removed immediately.

https://spring.io/understanding/REST
examples for each methods