## What is API?

<img src="./img/lab_12_api-explained.png">
source: https://www.humcommerce.com

---
Application Program Interfaces, or APIs, are commonly used to retrieve data from remote websites. Sites like Twitter, and Facebook  offer certain data through their APIs. To use an API, you make a request to a remote web server, and retrieve the data you need.

---
An API makes it easy for developers to integrate one app with another.  They expose some of a program's inner workings in a limited way.

You can use APIs to get information from other programs or to automate things you normally do in your web browser. Sometimes you can use APIs to do things you just can't do any other way.

## Why I should use APIs?

- The data is changing quickly.

- You want a small piece of a much larger set of data.

- There is repeated computation involved. 

- An example of this is stock price data. It doesn't really make sense to regenerate a dataset and download it every minute -- this will take a lot of bandwidth, and be pretty slow.

- Reddit comments are one example. What if you want to just pull your own comments on Reddit? It doesn't make much sense to download the entire Reddit database, then filter just your own comments.

- Spotify has an API that can tell you the genre of a piece of music. You could theoretically create your own classifier, and use it to categorize music, but you'll never have as much data as Spotify does.

# Working with APIs

APIs are hosted on web servers. When you type www.google.com in your browser's address bar, your computer is actually asking the www.google.com server for a webpage, which it then returns to your browser.

APIs work much the same way, except instead of your web browser asking for a webpage, your program asks for data. This data is usually returned in JSON format.

In order to get the data, we make a request to a webserver. The server then replies with our data. In Python, we'll use the requests library to do this.

There are many different types of requests. The most commonly used one, a GET request, is used to retrieve data.

We can use a simple **GET** request to retrieve information fomr API.


## API Key
<img src="./img/lab_12_nyt.png">

<br/>


Request a key from https://developer.nytimes.com

In [None]:
# !pip install requests
import requests
import pandas as pd

In [None]:
KEY = 'YOUR KEY HERE' 

# Set base url path
base_url = 'http://api.nytimes.com/svc/search/v2/articlesearch.json?api-key={}'.format(KEY)
base_url

# HTTP Status Codes

In [None]:
# Send a request
response = requests.get(base_url)

# Print the status code of the response.
print(response.status_code)

In [None]:
# # Trying a wrong URL
# new_url = base_url + '?abc'
# new_response = requests.get(new_url)

# print(new_response.status_code)

<img src="./img/lab_12_status_codes.jpg" style="width: 200px;">

The 200 series means "success" — your request was valid, and the response is what logically follows from it.
The 400 series means "bad request — something was wrong with the request, so the server did not process it as you wanted it to. Common causes for HTTP 400-level errors are badly-formatted requests and authentication problems.
The 500 series means "server error" — your request may have been OK, but the server couldn't give you a good response right now for reasons out of your control. These should be rare, but you need to be aware of the possibility so you can handle them in your code.


200 -- everything went okay, and the result has been returned (if any)

301 -- the server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint name is changed.

401 -- the server thinks you're not authenticated. This happens when you don't send the right credentials to access an API (we'll talk about authentication in a later post).

400 -- the server thinks you made a bad request. This can happen when you don't send along the right data, among other things.

403 -- the resource you're trying to access is forbidden -- you don't have the right permissions to see it.

404 -- the resource you tried to access wasn't found on the server.§

# Query Parameters

###  Main Questions to Ask

- What does a request look like? 
- What does a response look like? 
- What goes into the request or response headers? *

In [None]:
# Set up the parameters we want to pass to the API.
user_input = {"q": ['trump'], 
              'fl': ['byline', 'pub_date', 'headline'],
              'sort': 'newest',
              'begin_date': '20200101', 'end_date': '20200201',
             }


# Make a get request with the parameters.
response = requests.get(base_url, params=user_input)

# Print the content of the response (the data the server returned)
print(response.status_code)


In [None]:
# Check the retrieved information
print(response.headers)
# print(response.content)

In [None]:
response.json()['response']['docs']

# More Advanced Queries

In [None]:
# set variables
KEY = 'YOUR KEY HERE' 

base_url = 'http://api.nytimes.com/svc/search/v2/articlesearch.json'
query = "crispr+china"
begin_date = '20190101'

# initialize an empty dataframe
df_all = pd.DataFrame()

# iterate through 5 pages
for i in range(5):
    # paremeters to look for in NYT
    parameters = {"api-key": KEY,
                  "q": query,
                  "api-key": KEY,
                  "begin_date": begin_date,
                  "page": i}   # here is where page number goes

    # send a Call and get the response in json format
    response = requests.get(base_url, params=parameters)
    data = response.json()

    # get all articles - remove header data
    articles = data['response']['docs']
    # convert json to dataframe
    df = pd.DataFrame.from_dict(articles)
    # concatenating all pages to a final dataframe
    df_all = pd.concat([df_all, df])

In [None]:
df_all.head()