# Acquire Data

In this lesson, we'll explore a REST API that returns JSON data.

## Definitions

REST, referring to *Representational State Transfer*, is a set of guidelines for structuring urls. Often times you will encounter the phrase RESTful to describe web sites or web services that follow REST guidelines. [More on REST](https://en.wikipedia.org/wiki/Representational_state_transfer)

API stands for *Application Programming Interface*. It is a way that either developers interact with a program, or one program interacts with another.

JSON stands for *JavaScript Object Notation*. All JSON is technically valid JavaScript code; JSON is very commonly used as a data representation format, and is commonly used as a data interchange format. In fact, if you were to open up a jupyter notebook in a plain text editor, you would see a big JSON object. JSON data structures consist of arrays (analogous to lists in python), objects (dictionaries), strings, booleans, and numbers.

Here is an example json data structure that represents students:

```json
[
    {"id": 1, "name": "billy", "grades": [90, 79, 80, 81]},
    {"id": 2, "name": "sally", "grades": [90, 96, 91, 88]},
    {"id": 3, "name": "margaret", "grades": [78, 91, 99, 86]}
]
```

In this example, the outermost element is an array, and the elements in the array are objects, each of which represents a student. Each student has an id, name, and grades property.

A REST (or RESTful) JSON API is one in which the urls follow a RESTful convention, and all the data sent to/from the server is JSON. 

## Making HTTP Requests

The way we interact with web sites and web servers (including RESTful JSON APIs) is through HTTP **requests** and **responses**.

We can use the `requests` library to make http requests. This is somewhat the same as visiting a url in your browser, except that we can interact with the responses programatically in python.

In [1]:
import requests

We will use the `get` function from `requests` and pass it a url:

In [2]:
# https://aphorisms.glitch.me returns a random quotation
response = requests.get('http://aphorisms.glitch.me/')
response

<Response [200]>

We get back a python object that represents an HTTP response.

The response object has several interesting properties:

- `.ok`: a boolean that indicates that the response was successful (the server sent back a 200 response code)
- `.status_code`: a number indicating the HTTP response status code 
- `.text`: the raw response text

In [3]:
response.ok

True

In [4]:
response.status_code

200

In [5]:
response.text

'{"quote":"We cannot direct the wind, but we can adjust the sails.","author":"Dolly Parton"}'

## Example JSON API

For an example of a JSON api, we'll interact with the [Star Wars API](https://swapi.dev/).

In [6]:
url = 'https://swapi.dev/api/people/5'
response = requests.get(url)
print(response.text)

{"name":"Leia Organa","height":"150","mass":"49","hair_color":"brown","skin_color":"light","eye_color":"brown","birth_year":"19BBY","gender":"female","homeworld":"https://swapi.dev/api/planets/2/","films":["https://swapi.dev/api/films/1/","https://swapi.dev/api/films/2/","https://swapi.dev/api/films/3/","https://swapi.dev/api/films/6/"],"species":[],"vehicles":["https://swapi.dev/api/vehicles/30/"],"starships":[],"created":"2014-12-10T15:20:09.791000Z","edited":"2014-12-20T21:17:50.315000Z","url":"https://swapi.dev/api/people/5/"}


Here we see that the repsonse we got back contains a JSON object (we could also verify this by visiting the URL in a web browser).

Since the response is JSON, we can use the `.json` method on the response object to get a data structure we can work with:

In [7]:
data = response.json()
print(type(data))
data

<class 'dict'>


{'name': 'Leia Organa',
 'height': '150',
 'mass': '49',
 'hair_color': 'brown',
 'skin_color': 'light',
 'eye_color': 'brown',
 'birth_year': '19BBY',
 'gender': 'female',
 'homeworld': 'https://swapi.dev/api/planets/2/',
 'films': ['https://swapi.dev/api/films/1/',
  'https://swapi.dev/api/films/2/',
  'https://swapi.dev/api/films/3/',
  'https://swapi.dev/api/films/6/'],
 'species': [],
 'vehicles': ['https://swapi.dev/api/vehicles/30/'],
 'starships': [],
 'created': '2014-12-10T15:20:09.791000Z',
 'edited': '2014-12-20T21:17:50.315000Z',
 'url': 'https://swapi.dev/api/people/5/'}

Now we have a dictionary that we can work with.

This API provides the availabe resources when we call the root url, so let's make a request so that we can take a look at it.

In [8]:
root_url = 'https://swapi.dev/api/'
response = requests.get(root_url)
print(response)
response.json()

<Response [200]>


{'people': 'https://swapi.dev/api/people/',
 'planets': 'https://swapi.dev/api/planets/',
 'films': 'https://swapi.dev/api/films/',
 'species': 'https://swapi.dev/api/species/',
 'vehicles': 'https://swapi.dev/api/vehicles/',
 'starships': 'https://swapi.dev/api/starships/'}

Based on this, let's take a look at the people. We'll make our request, and explore the shape of the response that we get back.

In [9]:
response = requests.get('https://swapi.dev/api/people/')

data = response.json()
data.keys()

dict_keys(['count', 'next', 'previous', 'results'])

In [10]:
number_of_people = data['count']
next_page = data['next']
previous_page = data['previous']

print(f'number_of_people: {number_of_people}')
print(f'next_page: {next_page}')
print(f'previous_page: {previous_page}')

number_of_people: 82
next_page: https://swapi.dev/api/people/?page=2
previous_page: None


Here the response has some built-in properties that tell us how to get the next page and the previous one. We can see that this API uses an addon to the end of the url that allows us to pick which page of results we want to look at.

Once we've drilled down into the data structure, we'll find that the entire response is a sort of wrapper around the `results` property:

In [11]:
data['results'][:2]

[{'name': 'Luke Skywalker',
  'height': '172',
  'mass': '77',
  'hair_color': 'blond',
  'skin_color': 'fair',
  'eye_color': 'blue',
  'birth_year': '19BBY',
  'gender': 'male',
  'homeworld': 'https://swapi.dev/api/planets/1/',
  'films': ['https://swapi.dev/api/films/1/',
   'https://swapi.dev/api/films/2/',
   'https://swapi.dev/api/films/3/',
   'https://swapi.dev/api/films/6/'],
  'species': [],
  'vehicles': ['https://swapi.dev/api/vehicles/14/',
   'https://swapi.dev/api/vehicles/30/'],
  'starships': ['https://swapi.dev/api/starships/12/',
   'https://swapi.dev/api/starships/22/'],
  'created': '2014-12-09T13:50:51.644000Z',
  'edited': '2014-12-20T21:17:56.891000Z',
  'url': 'https://swapi.dev/api/people/1/'},
 {'name': 'C-3PO',
  'height': '167',
  'mass': '75',
  'hair_color': 'n/a',
  'skin_color': 'gold',
  'eye_color': 'yellow',
  'birth_year': '112BBY',
  'gender': 'n/a',
  'homeworld': 'https://swapi.dev/api/planets/1/',
  'films': ['https://swapi.dev/api/films/1/',
 

So we should be able to use the count of people and number of results to calculate what the last page number would be.

In [12]:
import math

number_of_results = len(data['results'])
max_page = math.ceil(number_of_people / number_of_results)

print(f'number_of_results: {number_of_results}')
print(f'max_page: {max_page}')

number_of_results: 10
max_page: 9


We can also turn `results` data into a pandas dataframe:

In [13]:
import pandas as pd

df = pd.DataFrame(data['results'])
df.head()

Unnamed: 0,name,height,mass,hair_color,skin_color,eye_color,birth_year,gender,homeworld,films,species,vehicles,starships,created,edited,url
0,Luke Skywalker,172,77,blond,fair,blue,19BBY,male,https://swapi.dev/api/planets/1/,"[https://swapi.dev/api/films/1/, https://swapi...",[],"[https://swapi.dev/api/vehicles/14/, https://s...","[https://swapi.dev/api/starships/12/, https://...",2014-12-09T13:50:51.644000Z,2014-12-20T21:17:56.891000Z,https://swapi.dev/api/people/1/
1,C-3PO,167,75,,gold,yellow,112BBY,,https://swapi.dev/api/planets/1/,"[https://swapi.dev/api/films/1/, https://swapi...",[https://swapi.dev/api/species/2/],[],[],2014-12-10T15:10:51.357000Z,2014-12-20T21:17:50.309000Z,https://swapi.dev/api/people/2/
2,R2-D2,96,32,,"white, blue",red,33BBY,,https://swapi.dev/api/planets/8/,"[https://swapi.dev/api/films/1/, https://swapi...",[https://swapi.dev/api/species/2/],[],[],2014-12-10T15:11:50.376000Z,2014-12-20T21:17:50.311000Z,https://swapi.dev/api/people/3/
3,Darth Vader,202,136,none,white,yellow,41.9BBY,male,https://swapi.dev/api/planets/1/,"[https://swapi.dev/api/films/1/, https://swapi...",[],[],[https://swapi.dev/api/starships/13/],2014-12-10T15:18:20.704000Z,2014-12-20T21:17:50.313000Z,https://swapi.dev/api/people/4/
4,Leia Organa,150,49,brown,light,brown,19BBY,female,https://swapi.dev/api/planets/2/,"[https://swapi.dev/api/films/1/, https://swapi...",[],[https://swapi.dev/api/vehicles/30/],[],2014-12-10T15:20:09.791000Z,2014-12-20T21:17:50.315000Z,https://swapi.dev/api/people/5/


In [14]:
df.shape

(10, 16)

Now that we've gotten the data from the first page, we can extract the data from the next page (as indicated by the API's response), and add it onto our dataframe:

In [15]:
response = requests.get(data['next'])
data = response.json()

number_of_people = data['count']
next_page = data['next']
previous_page = data['previous']
number_of_results = len(data['results'])
max_page = math.ceil(number_of_people / number_of_results)

print(f'number_of_people: {number_of_people}')
print(f'next_page: {next_page}')
print(f'previous_page: {previous_page}')
print(f'number_of_results: {number_of_results}')
print(f'max_page: {max_page}')

df = pd.concat([df, pd.DataFrame(data['results'])]).reset_index()

number_of_people: 82
next_page: https://swapi.dev/api/people/?page=3
previous_page: https://swapi.dev/api/people/?page=1
number_of_results: 10
max_page: 9


We'll repeat the process one more time:

In [16]:
response = requests.get(data['next'])
data = response.json()

number_of_people = data['count']
next_page = data['next']
previous_page = data['previous']
number_of_results = len(data['results'])
max_page = math.ceil(number_of_people / number_of_results)

print(f'number_of_people: {number_of_people}')
print(f'next_page: {next_page}')
print(f'previous_page: {previous_page}')
print(f'number_of_results: {number_of_results}')
print(f'max_page: {max_page}')

df = pd.concat([df, pd.DataFrame(data['results'])]).reset_index()

number_of_people: 82
next_page: https://swapi.dev/api/people/?page=4
previous_page: https://swapi.dev/api/people/?page=2
number_of_results: 10
max_page: 9


In [17]:
df.shape

(30, 18)

Let's look ahead and see what the last page looks like.

In [18]:
url = 'https://swapi.dev/api/people/?page=9'
response = requests.get(url)
data = response.json()

number_of_people = data['count']
next_page = data['next']
previous_page = data['previous']
number_of_results = len(data['results'])

print(f'number_of_people: {number_of_people}')
print(f'next_page: {next_page}')
print(f'previous_page: {previous_page}')
print(f'number_of_results: {number_of_results}')

number_of_people: 82
next_page: None
previous_page: https://swapi.dev/api/people/?page=8
number_of_results: 2


Now that the API says that the next_page is None, we know we can stop making requests, and assume that we have all of the people data.

## Further Reading

- [Using APIs in Python](https://www.dataquest.io/blog/python-api-tutorial/)
- [Understand and using REST APIs](https://www.smashingmagazine.com/2018/01/understanding-using-rest-api/)

## Exercises

Create a new local git repository and remote repository on github named `time-series-exercises`. Save this work for this module in your `time-series-exercises` repo.

The end result of this exercise should be a file named `acquire.py`.

1. Using the code from the lesson as a guide and the REST API from https://swapi.dev/ as we did in the lesson, create a dataframe named `people` that has all of the data for people.
1. Do the same thing, but for `planets`.
1. Extract the data for `starships`.
1. Save the data in your files to local csv files so that it will be faster to access in the future.
1. Combine the data from your three separate dataframes into one large dataframe.
1. Acquire the Open Power Systems Data for Germany, which has been rapidly expanding its renewable energy production in recent years. The data set includes country-wide totals of electricity consumption, wind power production, and solar power production for 2006-2017. You can get the data here: https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv
1. Make sure all the work that you have done above is reproducible. That is, you should put the code above into separate functions in the `acquire.py` file and be able to re-run the functions and get the same data.