# Intro 

Here are a few situations where data sets don't work well:

* The data change frequently. It doesn't really make sense to regenerate a data set of stock prices, for example, and download it every minute. This approach would require a lot of bandwidth, and be very slow.
* You only want a small piece of a much larger data set. Reddit comments are one example. What if you want to pull just your own comments from reddit? It doesn't make much sense to download the entire reddit database, then filter it for a few items.
* It involves repeated computation. For example, Spotify has an API that can tell you the genre of a piece of music. You could theoretically create your own classifier and use it to categorize music, but you'll never have as much data as Spotify does.

In cases like these, an application program interface (API) is the right solution. An **API is a set of methods and tools that allows different applications to interact with each other**. Programmers use APIs to query and retrieve data dynamically (which they can then integrate with their own apps). A client can retrieve information quickly and effectively through an API.

# API Requests
Organizations host their APIs on Web servers. When you type www.google.com in your browser's address bar, your computer is actually asking the www.google.com server for a Web page, which it then returns to your browser.

APIs work much the same way, except instead of your Web browser asking for a Web page, your program asks for data. The API usually returns this data in JavaScript Object Notation (JSON) format. We'll discuss JSON more later on in this mission.

We use the requests library http://www.python-requests.org/en/latest/ to send requests


## GET Requests
There are many different types of requests. The most common is a GET request, which we use to retrieve data. We'll explore the other types in later missions.

http://open-notify.org/ offers several API endpoints. An **endpoint is a server route for retrieving specific data** from an API. For example, the **/comments endpoint** on the reddit API might retrieve information about **comments**, while the /users endpoint might retrieve data about users.





In [2]:
import requests

# Make a get request to get the latest position of the ISS from the OpenNotify API.
response = requests.get("http://api.open-notify.org/iss-now.json")

status_code = response.status_code
print(status_code)

200


In [3]:
print(response.text)

{"timestamp": 1532348377, "iss_position": {"latitude": "-47.9044", "longitude": "-47.1942"}, "message": "success"}


In [8]:
print(response.json())

{'timestamp': 1532348377, 'iss_position': {'latitude': '-47.9044', 'longitude': '-47.1942'}, 'message': 'success'}


In [10]:
r = requests.get("http://api.open-notify.org/iss-pass.json")
status_code = r.status_code
print(status_code)

400


### Handling parameters

#### Example: ISS Position

You'll see that in the last example, we got a `400` status code, which indicates a bad request. If you look at the documentation for the OpenNotify API, we see that the ISS Pass endpoint requires two parameters.

This endpoint returns the next time the ISS will pass over a given location on the Earth.

To request this information, we'll need to pass the coordinates for a specific location to the API. We do this by passing in two parameters, latitude and longitude.

To accomplish this, we can add an optional keyword argument, params, to our request. In this case, we need to pass in **two parameters**:

* lat - The latitude of the location
* lon - The longitude of the location
We can make a **dictionary that contains these parameters**, and then pass them into the function.

We can also do the same thing directly by adding the query parameters to the url, like this:
http://api.open-notify.org/iss-pass.json?lat=40.71&lon=-74

It's almost always **preferable to set up the parameters as a dictionary**, because the requests library we mentioned earlier takes care of certain issues, like properly formatting the query parameters.



In [11]:
parameters = {"lat": 40.71, "lon": -74}

# Make a get request with the parameters.
response = requests.get("http://api.open-notify.org/iss-pass.json", params=parameters)

# Print the content of the response (the data the server returned)
print(response.content)


b'{\n  "message": "success", \n  "request": {\n    "altitude": 100, \n    "datetime": 1532348846, \n    "latitude": 40.71, \n    "longitude": -74.0, \n    "passes": 5\n  }, \n  "response": [\n    {\n      "duration": 605, \n      "risetime": 1532389397\n    }, \n    {\n      "duration": 628, \n      "risetime": 1532395168\n    }, \n    {\n      "duration": 559, \n      "risetime": 1532401035\n    }, \n    {\n      "duration": 565, \n      "risetime": 1532406885\n    }, \n    {\n      "duration": 635, \n      "risetime": 1532412678\n    }\n  ]\n}\n'


### JSON
JSON is the **primary format** for sending and receiving data through APIs. This format **encodes data structures like lists and dictionaries as strings** to ensure that machines can read them easily. 

Python offers **great support for JSON through its json library**. We can convert lists and dictionaries to JSON, and vice versa. Our ISS Pass data, for example, is a dictionary encoded as a string in JSON format.

The **JSON library** has two main methods:

* **dumps** -- Takes in a Python object, and converts it to a string
* **loads** -- Takes a JSON string, and converts it to a Python object

We can get the **content of a response as a Python object** by using the `.json()` method on the response.




In [12]:
json_data = response.json()
print(type(json_data))
print(json_data)


<class 'dict'>
{'message': 'success', 'request': {'altitude': 100, 'datetime': 1532348846, 'latitude': 40.71, 'longitude': -74.0, 'passes': 5}, 'response': [{'duration': 605, 'risetime': 1532389397}, {'duration': 628, 'risetime': 1532395168}, {'duration': 559, 'risetime': 1532401035}, {'duration': 565, 'risetime': 1532406885}, {'duration': 635, 'risetime': 1532412678}]}


In [19]:
# get the duration of the first pass of the ISS
print(json_data["response"][0]["duration"])

605


### Content type and headers

The server sends more than a status code and the data when it generates a response. It also sends **metadata containing information on how it generated the data and how to decode it**. This information **appears in the response headers**. We can access it using the .headers property that responses have.

The headers will appear as a dictionary. For now, the content-type within the headers is the most important key. It tells us the format of the response, and how to decode it. For the OpenNotify API, the format is JSON, which is why we could decode it with JSON earlier.

In [20]:
print(response.headers)

{'Server': 'nginx/1.10.3', 'Date': 'Mon, 23 Jul 2018 12:27:26 GMT', 'Content-Type': 'application/json', 'Content-Length': '519', 'Connection': 'keep-alive', 'Via': '1.1 vegur'}


In [21]:
print(type(response.headers))

<class 'requests.structures.CaseInsensitiveDict'>


In [27]:
content_type = response.headers["Content-Type"]

print(content_type)

application/json


# Intermediate APIs

The API we previously used didn't **require authentication**, but most do. Imagine that you're using the reddit API to pull a list of your private messages. It would be a huge privacy breach for reddit to give that information to anyone, so requiring authentication makes sense.

APIs also **use authentication to perform rate limiting**. Developers typically use APIs to build interesting applications or services. In order to ensure that it remains available and responsive for all users, an API will prevent you from making too many requests in too short a time. We call this restriction **rate limiting**. It ensures that one user **can't overload the API server by making too many requests too fast**.


## Authentication 

We'll use the Github API to play around with authentication.

To authenticate with the GitHub API, we'll need to use an **access token**. An access token is a credential we can generate on GitHub's website. The token is a string that the API can read and associate with your account.

**Using a token is preferable to a username and password** for a few reasons:

* Typically, you'll be accessing an API from a script. If you put your username and password in the script and someone manages to get their hands on it, they can take over your account. In contrast, you can **revoke an access token to cancel an unauthorized person's access** if there's a security breach.
* Access tokens can have **scopes and specific permissions**. For instance, you can make a token that has permission to write to your GitHub repositories and make new ones. Or, you can make a token that can only read from your repositories. **Using read-access-only tokens** in potentially insecure or shared scripts gives you more control over security.

You'll need to **pass your token to the GitHub API through an Authorization header**. Just like the server sends headers in response to our request, we can send the server headers when we make a request. Headers contain metadata about the request. We can use Python's `requests` library to make **a dictionary of headers**, and then pass it into our request.

We need to include the word `token` in the Authorization header, followed by our access token. Here's an example of an Authorization header:

```JSON
{"Authorization": "token 1f36137fbbe1602f779300dad26e4c1b7fbab631"}
```




In [3]:
import requests

# Create a dictionary of headers containing our Authorization header.
headers = {"Authorization": "token 1f36137fbbe1602f779300dad26e4c1b7fbab631"}

# Make a GET request to the GitHub API with our headers.
# This API endpoint will give us details about Vik Paruchuri.
response = requests.get("https://api.github.com/users/VikParuchuri", headers=headers)

# Print the content of the response.  As you can see, this token corresponds to the account of Vik Paruchuri.
print(response.json())

r = requests.get("https://api.github.com/users/VikParuchuri/orgs", headers=headers)
print(r.json())

{'message': 'Bad credentials', 'documentation_url': 'https://developer.github.com/v3'}
{'message': 'Bad credentials', 'documentation_url': 'https://developer.github.com/v3'}


### Other objects

In addition to users, the GitHub API has a few other types of objects. For example, `https://api.github.com/orgs/dataquestio` will retrieve information about the Dataquest organization on GitHub. `https://api.github.com/repos/octocat/Hello-World` will give us information about the Hello-World repository that the user octocat owns.

GitHub offers full documentation for all of the API's endpoints: https://developer.github.com/v3/



## Pagination

Github docs on pagination: https://developer.github.com/v3/#pagination

Sometimes, a request can return a lot of objects. This might happen when you're doing something like listing out all of a user's repositories, for example. Returning too much data will take a long time and slow the server down. For example, if a user has 1,000+ repositories, requesting all of them might take 10+ seconds. This isn't a great user experience, so it's **typical for API providers to implement pagination**. This means that the API provider will only return a certain number of records per page. You can **specify the page number** that you want to access. To access all of the pages, you'll need to write a loop.

To get the repositories a user has starred (marked as interesting), we can use the following API endpoint:

https://api.github.com/users/VikParuchuri/starred

We can **add two pagination query parameters** to it - `page`, and `per_page`. `page` is **the page we want to access**, and `per_page` is the **number of records** we want to see on each page. Typically, API providers enforce a cap on how high per_page can be, because setting it to an extremely high value defeats the purpose of pagination.



In [4]:

params = {"per_page":50, "page":2}
r = requests.get("https://api.github.com/users/VikParuchuri/starred",headers=headers,params=params)
page2_repos = r.json()

## User level endpoints

Since we've authenticated with our token, the system knows who we are, and can show us some relevant information without us having to specify our username.

Making a GET request to https://api.github.com/user will give us information about the user the authentication token is for.



In [5]:
user = requests.get("https://api.github.com/user",headers=headers).json()

## POST Requests

With the GitHub API, we can use POST requests to **create new repositories**.

Different API endpoints choose what types of requests they will accept. Not all endpoints will accept a POST request, and not all will accept a GET request. You'll have to consult the API's documentation to figure out which endpoints accept which types of requests.

We can make POST requests using `requests.post`. POST requests **almost always include data**, because we need to send the data the server will use to create the new object.





In [6]:
payload = {"name": "test"}
requests.post("https://api.github.com/user/repos", json=payload)

<Response [401]>

The code above will **create a new repository named test** under the account of the currently authenticated user. It will convert the payload dictionary to JSON, and pass it along with the POST request.

Following the documentation https://developer.github.com/v3/repos/, we need to provide a **set of data** to the POST endpoint such as: 

* **name** -- Required, the name of the repository
* **description** -- Optional, the description of the repository

A successful POST request will usually **return a 201 status code** indicating that it was able to create the object on the server. Sometimes, the API will **return the JSON representation of the new object** as the content 
of the response.




In [8]:
# Authenication missing

# Create the data we'll pass into the API endpoint.  While this endpoint only requires the "name" key, there are other optional keys.
payload = {"name": "learning-about-apis"}

# We need to pass in our authentication headers!
response = requests.post("https://api.github.com/user/repos", json=payload, headers=headers)
status = response.status_code
print(response.status_code)

401


## PUT / PATCH requests

Sometimes we want to update an existing object, rather than create a new one. This is where PATCH and PUT requests come into play.

We **use `PATCH` requests** when we want to **change a few attributes of an object**, but don't want to resend the entire object to the server. Maybe we just want to change the name of our repository, for example.

We **use `PUT` requests** to send the complete object we're **revising as a replacement** for the server's existing version.

In [10]:
# Patching our created repo with a changed name and description
payload = {"description": "The best repository ever!", "name": "test"}
response = requests.patch("https://api.github.com/repos/VikParuchuri/test", json=payload, headers=headers)

print(response.status_code)

401


## DELETE requests

The final major request type is the `DELETE` request. The DELETE request **removes objects from the server**. We can use the **DELETE request to remove repositories**.




In [13]:
response = requests.delete("https://api.github.com/repos/VikParuchuri/test", headers=headers)
status=response.status_code
print(status)

401


A **successful DELETE request** will usually **return a 204 status code** indicating that it successfully deleted the object.

