# API Calls with Python


----

This notebook will provide you with an instruction on how to query APIs using Python. Upon completion of this notebook you will be able to make API requests that do not require authentication through keys using the `requests` library. It will lead you through all the steps necessary for a successful API call, understanding the responses and show you how to further process JSON files.

## Table of Contents
* [Import all packages we need](#Import-all-packages-we-need)
* [How the request package works](#How-the-request-package-works)
* [JavaScript Object Notation](#JavaScript-Object-Notation)
* [Querying the Patent API](#Querying-the-Patent-API)

APIs are hosted on web servers. When you type www.google.com in your browser's address bar, your computer is actually asking the www.google.com server for a webpage, which it then returns to your browser.

APIs work much the same way, except instead of your web browser asking for a webpage, your program asks for data. This data is usually returned in JSON format. In order to process json files we need to load the json package. 

To retrieve data, we make a request to a webserver. The server then replies with our data. In Python, we'll use the requests library to do this. Thus, we can retrieve APIs through http requests. 

## Import all packages we need

In [2]:
# interacting with websites and web-APIs
import requests # easy way to interact with web sites and services
import json # read/write JavaScript Object Notation (JSON)

# data manipulation
import pandas as pd 
# normalize nested JSON files
from pandas.io.json import json_normalize

## How the request package works

We first need to understand what all information can be accessed from the API. For that we use the example of the Citi Bike API to make the API call and check the information we get. We use the request function. To access the API response, we use the function call `requests.get(url).json()` which not only gets the response from the API for the url but also gets the JSON format for it. We then dump the data using `dump()` method into content so that we can view it in a more presentable view. In oder to tell Python what to access we need to specify the url of the API endpoint.

### Making a Request
When you ping a website or portal for information this is called making a request. That is exactly what the requests library has been designed to do. There are many different types of requests. The most commonly used one, a GET request, is used to retrieve data.

OpenNotify has several API endpoints. An endpoint is a server route that is used to retrieve different data from the API. You can see a listing of all the endpoints on OpenNotify here: http://open-notify.org/Open-Notify-API/

In [3]:
r = requests.get("http://api.open-notify.org/iss-now.json")

### Get the Response Code
Before you can do anything with a website or URL in Python, it’s a good idea to check the current status code of said portal. You can do this with the dictionary look-up object. If you access an API the documentation will tell you more about specific resppnse codes to expect. However, in general you can see this types:
- 200: everything went okay, and the result has been returned (if any)
- 301: the server is redirecting you to a different endpoint. This can happen when a company switches domain names, or an endpoint name is changed.
- 401: the server thinks you're not authenticated. This happens when you don't send the right credentials to access an API.
- 400: the server thinks you made a bad request. This can happen when you don't send along the right data, among other things.
- 403: the resource you're trying to access is forbidden -- you don't have the right permissions to see it.
- 404: the resource you tried to access wasn't found on the server.

In [4]:
r.status_code

200

### Get the Content
After a web server returns a response, you can collect the content you need. This is also done using the get requests function.

In [4]:
print(r.content)

b'{"iss_position": {"latitude": "-23.4012", "longitude": "-51.9494"}, "message": "success", "timestamp": 1583506358}'


Some endpoints require specific parameters. Such as the iss-pass endpoint. So we first have to set up these parameters and then pass them into the request

In [5]:
# Set up the parameters we want to pass to the API.
# This is the latitude and longitude of New York City.
parameters = {"lat": 40.71, "lon": -74}

# Make a get request with the parameters.
r = requests.get("http://api.open-notify.org/iss-pass.json", params=parameters)

# Print the content of the response (the data the server returned)
print(r.content)
print(r.url)

b'{\n  "message": "success", \n  "request": {\n    "altitude": 100, \n    "datetime": 1583504352, \n    "latitude": 40.71, \n    "longitude": -74.0, \n    "passes": 5\n  }, \n  "response": [\n    {\n      "duration": 167, \n      "risetime": 1583505219\n    }, \n    {\n      "duration": 528, \n      "risetime": 1583553670\n    }, \n    {\n      "duration": 653, \n      "risetime": 1583559388\n    }, \n    {\n      "duration": 599, \n      "risetime": 1583565248\n    }, \n    {\n      "duration": 562, \n      "risetime": 1583571135\n    }\n  ]\n}\n'
http://api.open-notify.org/iss-pass.json?lat=40.71&lon=-74


In [6]:
# This gets the same data as the command above
# We can just include the parameters in the URL
r2 = requests.get("http://api.open-notify.org/iss-pass.json?lat=40.71&lon=-74")
print(r2.content)

b'{\n  "message": "success", \n  "request": {\n    "altitude": 100, \n    "datetime": 1583504352, \n    "latitude": 40.71, \n    "longitude": -74.0, \n    "passes": 5\n  }, \n  "response": [\n    {\n      "duration": 167, \n      "risetime": 1583505219\n    }, \n    {\n      "duration": 528, \n      "risetime": 1583553670\n    }, \n    {\n      "duration": 653, \n      "risetime": 1583559388\n    }, \n    {\n      "duration": 599, \n      "risetime": 1583565248\n    }, \n    {\n      "duration": 562, \n      "risetime": 1583571135\n    }\n  ]\n}\n'


## JavaScript Object Notation

JSON is a way to encode data structures like lists and dictionaries to strings that ensures that they are easily readable by machines. JSON is the primary format in which data is passed back and forth to APIs, and most API servers will send their responses in JSON format.

Python has great JSON support, with the json package. The json package is part of the standard library, so we don't have to install anything to use it. We can both convert lists and dictionaries to JSON, and convert strings to lists and dictionaries. In the case of our ISS Pass data, it is a dictionary encoded to a string in JSON format.

The json library has two main methods:

1. `dumps` -- Takes in a Python object, and converts it to a string.
2. `loads` -- Takes a JSON string, and converts it to a Python object.

In [7]:
# Same request, but returning a json file
parameters = {"lat": 40.71, "lon": -74}
r = requests.get("http://api.open-notify.org/iss-pass.json", params=parameters)

# Get the response data as a python object. Verify that it's a dictionary.
data = r.json()
print(type(data))
print(data)

<class 'dict'>
{'message': 'success', 'request': {'altitude': 100, 'datetime': 1583504352, 'latitude': 40.71, 'longitude': -74.0, 'passes': 5}, 'response': [{'duration': 167, 'risetime': 1583505219}, {'duration': 528, 'risetime': 1583553670}, {'duration': 653, 'risetime': 1583559388}, {'duration': 599, 'risetime': 1583565248}, {'duration': 562, 'risetime': 1583571135}]}


In [8]:
# Now we can access specific cells in the dictionary
print(data["request"])
print(data["request"] ['passes'])
print(data["response"])

{'altitude': 100, 'datetime': 1583504352, 'latitude': 40.71, 'longitude': -74.0, 'passes': 5}
5
[{'duration': 167, 'risetime': 1583505219}, {'duration': 528, 'risetime': 1583553670}, {'duration': 653, 'risetime': 1583559388}, {'duration': 599, 'risetime': 1583565248}, {'duration': 562, 'risetime': 1583571135}]


In [9]:
df = pd.DataFrame(data['response'])
df

Unnamed: 0,duration,risetime
0,167,1583505219
1,528,1583553670
2,653,1583559388
3,599,1583565248
4,562,1583571135


## Querying the Patent API

The PatentsView platform is built on data derived from the US Patent and Trademark Office (USPTO) to link inventors, their organizations, locations, and overall patenting activity. The PatentsView API provides programmatic access to longitudinal data and metadata on patents, inventors, companies, and geographic locations. 

To access the API, we use the `request` function. In oder to tell Python what to access we need to specify the url of the API endpoint.

PatentsView has several API endpoints. An endpoint is a server route that is used to retrieve different data from the API. You can think of the endpoints as just specifying what types of data you want. An Example of a PatentsView API endpoint is: http://www.patentsview.org/api/doc.html

Currently no key is necessary to access the PatentsView API. To make a request we need to provide a query URL according to the format defined by PatentsView. The details on how to do that is explained at this link: https://www.patentsview.org/api/query-language.html.

### Query String Format

The query string is always a single JSON object: **{`<field>`:`<value>`}**, where `<field>` is the name of a database field and `<value>` is the value the field will be compared to for equality (Each API Endpoint section contains a list of the data fields that can be selected for inclusion in output datasets).

We use the following base URL for the Patents Endpoint:

**Base URL**: `http://www.patentsview.org/api/patents/query?q={criteria}`


### First Query: Retrieve all patents for Stanford University

In this example, we will only pull patents from one organization: Stanford University. Let's go to the Patents Endpoint (http://www.patentsview.org/api/patent.html) and find the appropriate field for the organization's name.

The variable that we need is called `"assignee_organization"` (organization name, if assignee is organization)

> _Note_: **Assignee**: the name of the entity - company, foundation, partnership, holding company or individual - that owns the patent. In this example we are looking at universities (organization-level).

We will pull from the API using a step-by-step process:
- build the query,
- get the response,
- check the response code,
- get the content,
- convert to table.

By the end, we should have data about patents that we can work with using the tools we've already learned.

### Build the URL query 

Let's build our first URL query by combining the base url with one criterion (name of the `assignee_organization`)

**base url**: `http://www.patentsview.org/api/patents/query?` + **criterion**: `q={"assignee_organization":"stanford university"}`

In [2]:
# Save the URL in an object
base_url = 'http://www.patentsview.org/api/patents/query?'

In [None]:
# Save the query criterion in an object
criterion = 'q={assignee_organization":"stanford university"}'

In [4]:
option = '&o={"matched_subentities_only": "false"}'

In [5]:
field = '&f=["assignee_organization", "patent_id" , "patent_number", "patent_title"]'

In [6]:
# we can check the entries
print(base_url)
print(criterion)
print(option)

http://www.patentsview.org/api/patents/query?
q={"assignee_organization":"stanford university"}
&o={"matched_subentities_only": "false"}


In [7]:
# Combine the base URL with the criterion to create a finalized URL link
url = base_url + criterion + field + option
print(url)

http://www.patentsview.org/api/patents/query?q={"assignee_organization":"stanford university"}&f=["assignee_organization", "patent_id" , "patent_number", "patent_title"]&o={"matched_subentities_only": "false"}


### Get the response and check the response code
Now lets get the response using the URL defined above and the `requests` library. Checking the response code will help us to find out if the query was successful. The documentation of the response codes can be obtained from the description of the API. 

The following are the response codes for the PatentsView API:

`200` - the query parameters are all valid; the results will be in the body of the response.

`400` - the query parameters are not valid, typically either because they are not in valid JSON format, or a specified field or value is not valid; the “status reason” in the header will contain the error message.

`500` -  there is an internal error with the processing of the query; the “status reason” in the header will contain the error message.

In [8]:
# Get response from the URL.
r = requests.get(url)
r.status_code  # Check the status code

200

### Get the content
After a web server returns a response, you can collect the content you need by converting it into a JSON format. JSON is a way to encode data structures like lists and dictionaries to strings that ensures that they are easily readable by machines. JSON is the primary format in which data is passed back and forth to APIs, and most API servers will send their responses in JSON format.

In [9]:
# Convert response to JSON format
response = r.json() 

By default, we get information on `patent_id`, `patent_number`, and `patent_title`. At the end of the JSON you will see how many results are returned (variable `count`) and the total number of patents found (variable `total_patent_count`).

In [10]:
# View JSON
response

{'patents': [{'patent_id': '4200770',
   'patent_number': '4200770',
   'patent_title': 'Cryptographic apparatus and method',
   'assignees': [{'assignee_organization': 'Stanford University',
     'assignee_key_id': '301180'}]},
  {'patent_id': '4214918',
   'patent_number': '4214918',
   'patent_title': 'Method of forming polycrystalline semiconductor interconnections, resistors and contacts by applying radiation beam',
   'assignees': [{'assignee_organization': 'Stanford University',
     'assignee_key_id': '301180'}]},
  {'patent_id': '4233671',
   'patent_number': '4233671',
   'patent_title': 'Read only memory and integrated circuit and method of programming by laser means',
   'assignees': [{'assignee_organization': 'Stanford University',
     'assignee_key_id': '301180'}]},
  {'patent_id': '4234352',
   'patent_number': '4234352',
   'patent_title': 'Thermophotovoltaic converter and cell for use therein',
   'assignees': [{'assignee_organization': 'Electric Power Research Instit

The returned JSON file has a dictionary structure (dictionaries are identified by curly backets `{}`), which includes keys and values (e.g. in `patent_id`:`4200770`, `patent_id` is a key and `4200770` is a value of that key).

In [11]:
# We can see this when we ask for the type
type(response)

dict

We can check how many keys this JSON has by using `.keys()` function:

In [12]:
response.keys()

dict_keys(['patents', 'count', 'total_patent_count'])

And we can find out how many values each key has:

In [13]:
# Print all the contents of our response
print(response['count'])
print(response['total_patent_count'])
print(response['patents'])

25
143
[{'patent_id': '4200770', 'patent_number': '4200770', 'patent_title': 'Cryptographic apparatus and method', 'assignees': [{'assignee_organization': 'Stanford University', 'assignee_key_id': '301180'}]}, {'patent_id': '4214918', 'patent_number': '4214918', 'patent_title': 'Method of forming polycrystalline semiconductor interconnections, resistors and contacts by applying radiation beam', 'assignees': [{'assignee_organization': 'Stanford University', 'assignee_key_id': '301180'}]}, {'patent_id': '4233671', 'patent_number': '4233671', 'patent_title': 'Read only memory and integrated circuit and method of programming by laser means', 'assignees': [{'assignee_organization': 'Stanford University', 'assignee_key_id': '301180'}]}, {'patent_id': '4234352', 'patent_number': '4234352', 'patent_title': 'Thermophotovoltaic converter and cell for use therein', 'assignees': [{'assignee_organization': 'Electric Power Research Institute, Inc.', 'assignee_key_id': '261116'}, {'assignee_organizat

The `patents` key is interesting. It contains not just a single value, but a list of values (lists are identified by square brackets `[]`). This shows us that dictionaries also can be hierarchical, or nested: the level `patents` contains a list with multiple elements in it.

The key `patents` contains 25 elements, or, as we can see, mini-dictionaries (identified by the curly brackets `{}`), with their own keys and values. Every element in a list is a patent with its ID, number, and title. Every patent's information can be accessed by calling:

In [14]:
# Show the first patent only
response['patents'][0]  

{'patent_id': '4200770',
 'patent_number': '4200770',
 'patent_title': 'Cryptographic apparatus and method',
 'assignees': [{'assignee_organization': 'Stanford University',
   'assignee_key_id': '301180'}]}

In [15]:
# Show the first 3 patents
response['patents'][0:3] 

[{'patent_id': '4200770',
  'patent_number': '4200770',
  'patent_title': 'Cryptographic apparatus and method',
  'assignees': [{'assignee_organization': 'Stanford University',
    'assignee_key_id': '301180'}]},
 {'patent_id': '4214918',
  'patent_number': '4214918',
  'patent_title': 'Method of forming polycrystalline semiconductor interconnections, resistors and contacts by applying radiation beam',
  'assignees': [{'assignee_organization': 'Stanford University',
    'assignee_key_id': '301180'}]},
 {'patent_id': '4233671',
  'patent_number': '4233671',
  'patent_title': 'Read only memory and integrated circuit and method of programming by laser means',
  'assignees': [{'assignee_organization': 'Stanford University',
    'assignee_key_id': '301180'}]}]

### Convert to a dataframe
We can convert JSON information on patents into a table format very easily. 

In [16]:
# Convert to pandas dataframe
stanford = pd.DataFrame(response['patents'])  
stanford.head()

Unnamed: 0,patent_id,patent_number,patent_title,assignees
0,4200770,4200770,Cryptographic apparatus and method,[{'assignee_organization': 'Stanford Universit...
1,4214918,4214918,Method of forming polycrystalline semiconducto...,[{'assignee_organization': 'Stanford Universit...
2,4233671,4233671,Read only memory and integrated circuit and me...,[{'assignee_organization': 'Stanford Universit...
3,4234352,4234352,Thermophotovoltaic converter and cell for use ...,[{'assignee_organization': 'Electric Power Res...
4,4325611,4325611,Electrochromic material and electro-optical di...,[{'assignee_organization': 'Stanford Universit...


### Adding more fields to the query

In [17]:
json_normalize(response['patents'], record_path='assignees', meta=['patent_id','patent_title', 'patent_number']).head()

Unnamed: 0,assignee_organization,assignee_key_id,patent_id,patent_title,patent_number
0,Stanford University,301180,4200770,Cryptographic apparatus and method,4200770
1,Stanford University,301180,4214918,Method of forming polycrystalline semiconducto...,4214918
2,Stanford University,301180,4233671,Read only memory and integrated circuit and me...,4233671
3,"Electric Power Research Institute, Inc.",261116,4234352,Thermophotovoltaic converter and cell for use ...,4234352
4,Stanford University,301180,4234352,Thermophotovoltaic converter and cell for use ...,4234352


Above we were able to pull data with the default information on the patents (`patent_id`, `patent_number`, `patent_title`). 

It might be useful to know additional information on patents, such as patent classification and application date.

Let's look for those variables in the API Endpoint (http://www.patentsview.org/api/patent.html), and add those fields to our query.

We will use the USPC classification (United States Patent Classification) variable called `uspc_mainclass_title`.

The application date varible is called `app_date`.

To the URL created above, we will add the fields parameter: `&f=["patent_id", "patent_title","uspc_mainclass_title","app_date"]`.

In [18]:
url_fields = url + '&f=["patent_id", "patent_title","uspc_mainclass_title","app_date"]'
print(url_fields)

http://www.patentsview.org/api/patents/query?q={"assignee_organization":"stanford university"}&f=["assignee_organization", "patent_id" , "patent_number", "patent_title"]&o={"matched_subentities_only": "false"}&f=["patent_id", "patent_title","uspc_mainclass_title","app_date"]


In [19]:
r = requests.get(url_fields)  # Get response from the URL
r.status_code  # Check the status code
response = r.json()  # Convert response to JSON format
response  # View JSON

{'patents': [{'patent_id': '4200770',
   'patent_title': 'Cryptographic apparatus and method',
   'applications': [{'app_date': '1977-09-06', 'app_id': '05/830754'}],
   'uspcs': [{'uspc_mainclass_title': 'Cryptography'},
    {'uspc_mainclass_title': 'Electrical computers and digital processing systems:  support'}]},
  {'patent_id': '4214918',
   'patent_title': 'Method of forming polycrystalline semiconductor interconnections, resistors and contacts by applying radiation beam',
   'applications': [{'app_date': '1978-10-12', 'app_id': '05/950828'}],
   'uspcs': [{'uspc_mainclass_title': 'Metal treatment'},
    {'uspc_mainclass_title': 'Electric heating'},
    {'uspc_mainclass_title': 'Active solid-state devices (e.g., transistors, solid-state diodes)'},
    {'uspc_mainclass_title': 'Coating processes'},
    {'uspc_mainclass_title': 'Semiconductor device manufacturing: process'}]},
  {'patent_id': '4233671',
   'patent_title': 'Read only memory and integrated circuit and method of progr

We can see that this response return a nested JSON file. This one is harder to access,but luckily there is a library we can use to unpack the JSON. For complex nested JSON files we use the `json_normalize` function in `pandas` (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.io.json.json_normalize.html).

In [20]:
# Unpack the patents key
json_normalize(response['patents']).head()

Unnamed: 0,patent_id,patent_title,applications,uspcs
0,4200770,Cryptographic apparatus and method,"[{'app_date': '1977-09-06', 'app_id': '05/8307...","[{'uspc_mainclass_title': 'Cryptography'}, {'u..."
1,4214918,Method of forming polycrystalline semiconducto...,"[{'app_date': '1978-10-12', 'app_id': '05/9508...","[{'uspc_mainclass_title': 'Metal treatment'}, ..."
2,4233671,Read only memory and integrated circuit and me...,"[{'app_date': '1979-01-05', 'app_id': '06/0013...",[{'uspc_mainclass_title': 'Active solid-state ...
3,4234352,Thermophotovoltaic converter and cell for use ...,"[{'app_date': '1978-07-26', 'app_id': '05/9281...",[{'uspc_mainclass_title': 'Batteries: thermoe...
4,4325611,Electrochromic material and electro-optical di...,"[{'app_date': '1979-12-26', 'app_id': '06/1065...","[{'uspc_mainclass_title': 'Compositions'}, {'u..."


We can now see the `patent_id`, `patent_title`, but we still need to 
break down `applications` and `uspcs` further. For that, we will specify the `record_path`.

In [21]:
# First, specify the record path for the applications
json_normalize(response['patents'], record_path='applications').head()

Unnamed: 0,app_date,app_id
0,1977-09-06,05/830754
1,1978-10-12,05/950828
2,1979-01-05,06/001360
3,1978-07-26,05/928103
4,1979-12-26,06/106547


It shows us the values of the `applications` key, but we also want to know the associated `patent_id` and `patent_title`. We can specify those variables using the `meta` argument.

In [22]:
json_normalize(response['patents'], record_path='applications', meta=['patent_id','patent_title']).head()

Unnamed: 0,app_date,app_id,patent_id,patent_title
0,1977-09-06,05/830754,4200770,Cryptographic apparatus and method
1,1978-10-12,05/950828,4214918,Method of forming polycrystalline semiconducto...
2,1979-01-05,06/001360,4233671,Read only memory and integrated circuit and me...
3,1978-07-26,05/928103,4234352,Thermophotovoltaic converter and cell for use ...
4,1979-12-26,06/106547,4325611,Electrochromic material and electro-optical di...


In [23]:
# Save result in dataframe
patent_app_date = (json_normalize(response['patents'], record_path='applications', 
                                  meta=['patent_id','patent_title']))

In [24]:
# repeat the process for the uspcs key and save it into another dataframe
patent_class = (json_normalize(response['patents'], record_path='uspcs', 
                               meta=['patent_id','patent_title']))

Now we can merge the two tables on `patent_id` and `patent_title` to get a complete table with the application date and the patent class.

In [25]:
stanford_all = patent_class.merge(patent_app_date,on=['patent_id','patent_title'])
stanford_all

Unnamed: 0,uspc_mainclass_title,patent_id,patent_title,app_date,app_id
0,Cryptography,4200770,Cryptographic apparatus and method,1977-09-06,05/830754
1,Electrical computers and digital processing sy...,4200770,Cryptographic apparatus and method,1977-09-06,05/830754
2,Metal treatment,4214918,Method of forming polycrystalline semiconducto...,1978-10-12,05/950828
3,Electric heating,4214918,Method of forming polycrystalline semiconducto...,1978-10-12,05/950828
4,"Active solid-state devices (e.g., transistors,...",4214918,Method of forming polycrystalline semiconducto...,1978-10-12,05/950828
5,Coating processes,4214918,Method of forming polycrystalline semiconducto...,1978-10-12,05/950828
6,Semiconductor device manufacturing: process,4214918,Method of forming polycrystalline semiconducto...,1978-10-12,05/950828
7,"Active solid-state devices (e.g., transistors,...",4233671,Read only memory and integrated circuit and me...,1979-01-05,06/001360
8,Miscellaneous active electrical nonlinear devi...,4233671,Read only memory and integrated circuit and me...,1979-01-05,06/001360
9,Static information storage and retrieval,4233671,Read only memory and integrated circuit and me...,1979-01-05,06/001360


## Customize the number of results

As you might have noticed, by default, only 25 results are returned in one request. To change the number of results returned (for example, 50 results), add the option parameter to the query URL: `&o={"page":1,"per_page":50}`. To get the second page, specify `&o={"page":2,"per_page":50)`.

In [26]:
# Increase the number of results that are returned
url_page_1 = url_fields + '&o={"page":1,"per_page":50}'
r = requests.get(url_page_1)
r.status_code
response_page_1 = r.json()

Now the JSON shows 50 results (as noted in the variable `count`).

In [27]:
response_page_1['count']

50