# Acquire Data

### Let's examine a new way of ingesting data: a RESTful API!

## Some Vocabulary:

 - API:
     *Application Programming Interface*. It is a way that either developers interact with a program, or one program interacts with another. Think about it like a liason with a set of rules that allows you to programmatically interact with something that you likely don't have the permission to be on the back-end of.

 - REST:
     *Representational State Transfer*, is a set of guidelines for structuring urls. Often times you will encounter the phrase RESTful to describe web sites or web services that follow REST guidelines. [More on REST](https://en.wikipedia.org/wiki/Representational_state_transfer)

 - JSON:
     *JavaScript Object Notation*. All JSON is technically valid JavaScript code; JSON is very commonly used as a data representation format, and is commonly used as a data interchange format. In fact, if you were to open up a jupyter notebook in a plain text editor, you would see a big JSON object. JSON data structures consist of arrays (analogous to lists in python), objects (dictionaries), strings, booleans, and numbers.

Here is an example json data structure that represents students:

```json
[
    {"id": 1, "name": "billy", "grades": [90, 79, 80, 81]},
    {"id": 2, "name": "sally", "grades": [90, 96, 91, 88]},
    {"id": 3, "name": "margaret", "grades": [78, 91, 99, 86]}
]
```

> This looks familiar, right?

> What you're looking at is not a Python list containing dictionaries, despite looking like one.  Think about this as a Magma -> Lava situation.  This content is still in JSON, which means that we would refer to it by the notation appropriate for Javascript (Hence JavaScript Object Notation!).  What we are looking at here is an array, containing three objects, each of which contains the record of a single student. 



For our purposes, A REST (or RESTful) JSON API is one in which the urls follow a RESTful convention, and all the data sent to/from the server is JSON. 

## Making HTTP Requests

The way we interact with web sites and web servers (including RESTful JSON APIs) is through HTTP **requests** and **responses**.

We can use the `requests` library to make http requests. This is somewhat the same as visiting a url in your browser, except that we can interact with the responses programatically in python.

In [1]:
import requests

We will use the `get` function from `requests` and pass it a url:

In [2]:
response = requests.get('http://request-inspector.glitch.me/')
response

<Response [200]>

We get back a python object that represents an HTTP response.

The response object has several interesting properties:

- `.ok`: a boolean that indicates that the response was successful (the server sent back a 200 response code)
- `.status_code`: a number indicating the HTTP response status code 
- `.text`: the raw response text

In [3]:
response.ok

True

In [4]:
response.status_code

200

In [5]:
response.text

'{"method":"GET","query":{},"body":{}}'

In this case, we see a string that contains HTML. HTML is what makes up web pages that are intended for humans to read. If you go to http://example.com, you'll see what the HTML in the above response looks like when rendered. Some other endpoints on the internet return JSON, which is usually intended to be worked with programatically.

## Example JSON API

For an example of a JSON api, we'll interact with the [a quote generator](https://aphorisms.glitch.me/).

In [6]:
url = 'https://swapi.dev/api/people/5'
response = requests.get(url)
print(response.text)

{"name":"Leia Organa","height":"150","mass":"49","hair_color":"brown","skin_color":"light","eye_color":"brown","birth_year":"19BBY","gender":"female","homeworld":"https://swapi.dev/api/planets/2/","films":["https://swapi.dev/api/films/1/","https://swapi.dev/api/films/2/","https://swapi.dev/api/films/3/","https://swapi.dev/api/films/6/"],"species":[],"vehicles":["https://swapi.dev/api/vehicles/30/"],"starships":[],"created":"2014-12-10T15:20:09.791000Z","edited":"2014-12-20T21:17:50.315000Z","url":"https://swapi.dev/api/people/5/"}


Here we see that the repsonse we got back contains a JSON object (we could also verify this by visiting the URL in a web browser).

Since the response is JSON, we can use the `.json` method on the response object to get a data structure we can work with:

In [7]:
data = response.json()
print(type(data))
data

<class 'dict'>


{'name': 'Leia Organa',
 'height': '150',
 'mass': '49',
 'hair_color': 'brown',
 'skin_color': 'light',
 'eye_color': 'brown',
 'birth_year': '19BBY',
 'gender': 'female',
 'homeworld': 'https://swapi.dev/api/planets/2/',
 'films': ['https://swapi.dev/api/films/1/',
  'https://swapi.dev/api/films/2/',
  'https://swapi.dev/api/films/3/',
  'https://swapi.dev/api/films/6/'],
 'species': [],
 'vehicles': ['https://swapi.dev/api/vehicles/30/'],
 'starships': [],
 'created': '2014-12-10T15:20:09.791000Z',
 'edited': '2014-12-20T21:17:50.315000Z',
 'url': 'https://swapi.dev/api/people/5/'}

Now we have a dictionary that we can work with.

Let's now take a look at another api. We'll start by looking at just the base URL:  

In [8]:
base_url = 'https://python.zgulde.net'
print(requests.get(base_url).text)

{"api":"/api/v1","help":"/documentation"}



This API provides some documentation, so let's make a request so that we can take a look at it.

In [9]:
response = requests.get(base_url + '/documentation')
print(response.json()['payload'])


The API accepts GET requests for all endpoints, where endpoints are prefixed
with

    /api/{version}

Where version is "v1"

Valid endpoints:

- /stores[/{store_id}]
- /items[/{item_id}]
- /sales[/{sale_id}]

All endpoints accept a `page` parameter that can be used to navigate through
the results.



> Mini Exercise:
> Try to follow the paths to the data that is stored [here](https://python.zgulde.net). It is made publically accessible through a JSON REST API.

Based on this, let's take a look at the items. We'll make our request, and explore the shape of the response that we get back.

In [10]:
import pandas as pd

In [11]:
base_url = 'https://python.zgulde.net/'

In [12]:
response = requests.get(base_url + '/api/v1/items')

In [13]:
response.ok

True

In [14]:
response.text

'{"payload":{"items":[{"item_brand":"Riceland","item_id":1,"item_name":"Riceland American Jazmine Rice","item_price":0.84,"item_upc12":"35200264013","item_upc14":"35200264013"},{"item_brand":"Caress","item_id":2,"item_name":"Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct","item_price":6.44,"item_upc12":"11111065925","item_upc14":"11111065925"},{"item_brand":"Earths Best","item_id":3,"item_name":"Earths Best Organic Fruit Yogurt Smoothie Mixed Berry","item_price":2.43,"item_upc12":"23923330139","item_upc14":"23923330139"},{"item_brand":"Boars Head","item_id":4,"item_name":"Boars Head Sliced White American Cheese - 120 Ct","item_price":3.14,"item_upc12":"208528800007","item_upc14":"208528800007"},{"item_brand":"Back To Nature","item_id":5,"item_name":"Back To Nature Gluten Free White Cheddar Rice Thin Crackers","item_price":2.61,"item_upc12":"759283100036","item_upc14":"759283100036"},{"item_brand":"Sally Hansen","item_id":6,"item_name":"Sally Hansen Nail Color Magnetic 903 Silver

In [15]:
response.json()

{'payload': {'items': [{'item_brand': 'Riceland',
    'item_id': 1,
    'item_name': 'Riceland American Jazmine Rice',
    'item_price': 0.84,
    'item_upc12': '35200264013',
    'item_upc14': '35200264013'},
   {'item_brand': 'Caress',
    'item_id': 2,
    'item_name': 'Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct',
    'item_price': 6.44,
    'item_upc12': '11111065925',
    'item_upc14': '11111065925'},
   {'item_brand': 'Earths Best',
    'item_id': 3,
    'item_name': 'Earths Best Organic Fruit Yogurt Smoothie Mixed Berry',
    'item_price': 2.43,
    'item_upc12': '23923330139',
    'item_upc14': '23923330139'},
   {'item_brand': 'Boars Head',
    'item_id': 4,
    'item_name': 'Boars Head Sliced White American Cheese - 120 Ct',
    'item_price': 3.14,
    'item_upc12': '208528800007',
    'item_upc14': '208528800007'},
   {'item_brand': 'Back To Nature',
    'item_id': 5,
    'item_name': 'Back To Nature Gluten Free White Cheddar Rice Thin Crackers',
    'item_price':

In [16]:
items_data = response.json()

In [17]:
items_data.keys()

dict_keys(['payload', 'status'])

In [18]:
items_data['payload']['next_page'], \
items_data['payload']['max_page']

('/api/v1/items?page=2', 3)

In [19]:
items_data

{'payload': {'items': [{'item_brand': 'Riceland',
    'item_id': 1,
    'item_name': 'Riceland American Jazmine Rice',
    'item_price': 0.84,
    'item_upc12': '35200264013',
    'item_upc14': '35200264013'},
   {'item_brand': 'Caress',
    'item_id': 2,
    'item_name': 'Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct',
    'item_price': 6.44,
    'item_upc12': '11111065925',
    'item_upc14': '11111065925'},
   {'item_brand': 'Earths Best',
    'item_id': 3,
    'item_name': 'Earths Best Organic Fruit Yogurt Smoothie Mixed Berry',
    'item_price': 2.43,
    'item_upc12': '23923330139',
    'item_upc14': '23923330139'},
   {'item_brand': 'Boars Head',
    'item_id': 4,
    'item_name': 'Boars Head Sliced White American Cheese - 120 Ct',
    'item_price': 3.14,
    'item_upc12': '208528800007',
    'item_upc14': '208528800007'},
   {'item_brand': 'Back To Nature',
    'item_id': 5,
    'item_name': 'Back To Nature Gluten Free White Cheddar Rice Thin Crackers',
    'item_price':

In [22]:
data=items_data

In [23]:
data['payload']

{'items': [{'item_brand': 'Riceland',
   'item_id': 1,
   'item_name': 'Riceland American Jazmine Rice',
   'item_price': 0.84,
   'item_upc12': '35200264013',
   'item_upc14': '35200264013'},
  {'item_brand': 'Caress',
   'item_id': 2,
   'item_name': 'Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct',
   'item_price': 6.44,
   'item_upc12': '11111065925',
   'item_upc14': '11111065925'},
  {'item_brand': 'Earths Best',
   'item_id': 3,
   'item_name': 'Earths Best Organic Fruit Yogurt Smoothie Mixed Berry',
   'item_price': 2.43,
   'item_upc12': '23923330139',
   'item_upc14': '23923330139'},
  {'item_brand': 'Boars Head',
   'item_id': 4,
   'item_name': 'Boars Head Sliced White American Cheese - 120 Ct',
   'item_price': 3.14,
   'item_upc12': '208528800007',
   'item_upc14': '208528800007'},
  {'item_brand': 'Back To Nature',
   'item_id': 5,
   'item_name': 'Back To Nature Gluten Free White Cheddar Rice Thin Crackers',
   'item_price': 2.61,
   'item_upc12': '759283100036',

In [24]:
print('max_page: %s' % data['payload']['max_page'])
print('next_page: %s' % data['payload']['next_page'])

max_page: 3
next_page: /api/v1/items?page=2


In [25]:
base_url

'https://python.zgulde.net/'

In [26]:
new_response = requests.get(base_url + data['payload']['next_page'])

In [27]:
new_response.json()

{'payload': {'items': [{'item_brand': 'Doctors Best',
    'item_id': 21,
    'item_name': 'Doctors Best Best Curcumin C3 Complex 1000mg Tablets - 120 Ct',
    'item_price': 8.09,
    'item_upc12': '753950001954',
    'item_upc14': '753950001954'},
   {'item_brand': 'Betty Crocker',
    'item_id': 22,
    'item_name': 'Betty Crocker Twin Pack Real Potatoes Scalloped 2 Pouches For 2 Meals - 2 Pk',
    'item_price': 7.31,
    'item_upc12': '16000288829',
    'item_upc14': '16000288829'},
   {'item_brand': 'Reese',
    'item_id': 23,
    'item_name': 'Reese Mandarin Oranges Segments In Light Syrup',
    'item_price': 1.78,
    'item_upc12': '70670009658',
    'item_upc14': '70670009658'},
   {'item_brand': 'Smart Living',
    'item_id': 24,
    'item_name': 'Smart Living Charcoal Lighter Fluid',
    'item_price': 5.34,
    'item_upc12': '688267084225',
    'item_upc14': '688267084225'},
   {'item_brand': 'Hood',
    'item_id': 25,
    'item_name': 'Hood Latte Iced Coffee Drink Vanilla Latt

Here the response has some built-in properties that tell us how to get to subsequent pages.

Once we've drilled down into the data structure, we'll find that the entire response is a sort of wrapper around the `items` property:

In [28]:
data['payload']['items']

[{'item_brand': 'Riceland',
  'item_id': 1,
  'item_name': 'Riceland American Jazmine Rice',
  'item_price': 0.84,
  'item_upc12': '35200264013',
  'item_upc14': '35200264013'},
 {'item_brand': 'Caress',
  'item_id': 2,
  'item_name': 'Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct',
  'item_price': 6.44,
  'item_upc12': '11111065925',
  'item_upc14': '11111065925'},
 {'item_brand': 'Earths Best',
  'item_id': 3,
  'item_name': 'Earths Best Organic Fruit Yogurt Smoothie Mixed Berry',
  'item_price': 2.43,
  'item_upc12': '23923330139',
  'item_upc14': '23923330139'},
 {'item_brand': 'Boars Head',
  'item_id': 4,
  'item_name': 'Boars Head Sliced White American Cheese - 120 Ct',
  'item_price': 3.14,
  'item_upc12': '208528800007',
  'item_upc14': '208528800007'},
 {'item_brand': 'Back To Nature',
  'item_id': 5,
  'item_name': 'Back To Nature Gluten Free White Cheddar Rice Thin Crackers',
  'item_price': 2.61,
  'item_upc12': '759283100036',
  'item_upc14': '759283100036'},
 {'i

In [29]:
data['payload']

{'items': [{'item_brand': 'Riceland',
   'item_id': 1,
   'item_name': 'Riceland American Jazmine Rice',
   'item_price': 0.84,
   'item_upc12': '35200264013',
   'item_upc14': '35200264013'},
  {'item_brand': 'Caress',
   'item_id': 2,
   'item_name': 'Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct',
   'item_price': 6.44,
   'item_upc12': '11111065925',
   'item_upc14': '11111065925'},
  {'item_brand': 'Earths Best',
   'item_id': 3,
   'item_name': 'Earths Best Organic Fruit Yogurt Smoothie Mixed Berry',
   'item_price': 2.43,
   'item_upc12': '23923330139',
   'item_upc14': '23923330139'},
  {'item_brand': 'Boars Head',
   'item_id': 4,
   'item_name': 'Boars Head Sliced White American Cheese - 120 Ct',
   'item_price': 3.14,
   'item_upc12': '208528800007',
   'item_upc14': '208528800007'},
  {'item_brand': 'Back To Nature',
   'item_id': 5,
   'item_name': 'Back To Nature Gluten Free White Cheddar Rice Thin Crackers',
   'item_price': 2.61,
   'item_upc12': '759283100036',

In [30]:
data['payload']['items'][:2]

[{'item_brand': 'Riceland',
  'item_id': 1,
  'item_name': 'Riceland American Jazmine Rice',
  'item_price': 0.84,
  'item_upc12': '35200264013',
  'item_upc14': '35200264013'},
 {'item_brand': 'Caress',
  'item_id': 2,
  'item_name': 'Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct',
  'item_price': 6.44,
  'item_upc12': '11111065925',
  'item_upc14': '11111065925'}]

We can turn this data into a pandas dataframe:

In [31]:
import pandas as pd

df = pd.DataFrame(data['payload']['items'])
df.head()

Unnamed: 0,item_brand,item_id,item_name,item_price,item_upc12,item_upc14
0,Riceland,1,Riceland American Jazmine Rice,0.84,35200264013,35200264013
1,Caress,2,Caress Velvet Bliss Ultra Silkening Beauty Bar...,6.44,11111065925,11111065925
2,Earths Best,3,Earths Best Organic Fruit Yogurt Smoothie Mixe...,2.43,23923330139,23923330139
3,Boars Head,4,Boars Head Sliced White American Cheese - 120 Ct,3.14,208528800007,208528800007
4,Back To Nature,5,Back To Nature Gluten Free White Cheddar Rice ...,2.61,759283100036,759283100036


Now that we've gotten the data from the first page, we can extract the data from the next page (as indicated by the API's response), and add it onto our dataframe:

In [32]:
response = requests.get(base_url + data['payload']['next_page'])
# base_url is just a string literal assigned to a variable
# data['payload']['next_page'] is also just a string referencing an input
# concat them to get to the next page progrmatically
# requests.get('http:zgulde.python.net/api/v1/items?page=2')


# Use this technique to get to the second page of data
# ^ just like we did in new_pesonse, but reassigning to response
# (remember df is still page 1 info)
data = response.json()

print('max_page: %s' % data['payload']['max_page'])
print('next_page: %s' % data['payload']['next_page'])

max_page: 3
next_page: /api/v1/items?page=3


In [33]:
# To glue two pages of info together
# utilize pd.concat to take first page of info already in a df
# then concat it with a new df with page two data
# specifying with pandas to ignore the new index and tack on
# the info
df = pd.concat([df, pd.DataFrame(data['payload']['items'])]).reset_index()

In [34]:
df

Unnamed: 0,index,item_brand,item_id,item_name,item_price,item_upc12,item_upc14
0,0,Riceland,1,Riceland American Jazmine Rice,0.84,35200264013,35200264013
1,1,Caress,2,Caress Velvet Bliss Ultra Silkening Beauty Bar...,6.44,11111065925,11111065925
2,2,Earths Best,3,Earths Best Organic Fruit Yogurt Smoothie Mixe...,2.43,23923330139,23923330139
3,3,Boars Head,4,Boars Head Sliced White American Cheese - 120 Ct,3.14,208528800007,208528800007
4,4,Back To Nature,5,Back To Nature Gluten Free White Cheddar Rice ...,2.61,759283100036,759283100036
5,5,Sally Hansen,6,Sally Hansen Nail Color Magnetic 903 Silver El...,6.93,74170388732,74170388732
6,6,Twinings Of London,7,Twinings Of London Classics Lady Grey Tea - 20 Ct,9.64,70177154004,70177154004
7,7,Lea & Perrins,8,Lea & Perrins Marinade In-a-bag Cracked Pepper...,1.68,51600080015,51600080015
8,8,Van De Kamps,9,Van De Kamps Fillets Beer Battered - 10 Ct,1.79,19600923015,19600923015
9,9,Ahold,10,Ahold Cocoa Almonds,3.17,688267141676,688267141676


We'll repeat the process one more time:

In [35]:
response = requests.get(base_url + data['payload']['next_page'])
data = response.json()

print('max_page: %s' % data['payload']['max_page'])
print('next_page: %s' % data['payload']['next_page'])

df = pd.concat([df, pd.DataFrame(data['payload']['items'])]).reset_index()

max_page: 3
next_page: None


Now that the API says that the `next_page` is None, we'll stop making requests, and assume that we have all of the `items` data.

In [36]:
df.shape

(50, 8)

## Further Reading

- [Using APIs in Python](https://www.dataquest.io/blog/python-api-tutorial/)
- [Understand and using REST APIs](https://www.smashingmagazine.com/2018/01/understanding-using-rest-api/)

## Exercises

Within your `codeup-data-science` directory, create a new repo named `time-series-exercises`. This will be where you do your work for this module. Create a repository on GitHub with the same name, and link your local repository to GitHub.

Save this work in your `time-series-exercises` repo. Then add, commit, and push your changes.

The end result of this exercise should be a file named `acquire.py`.

1. Using the code from the lesson as a guide and the REST API from https://python.zgulde.net/api/v1/items as we did in the lesson, create a dataframe named `items` that has all of the data for items.

1. Do the same thing, but for `stores` (https://python.zgulde.net/api/v1/stores)

1. Extract the data for `sales` (https://python.zgulde.net/api/v1/sales). There are a lot of pages of data here, so your code will need to be a little more complex. Your code should continue fetching data from the next page until all of the data is extracted.

1. Save the data in your files to local csv files so that it will be faster to access in the future.

1. Combine the data from your three separate dataframes into one large dataframe. 

1. Acquire the Open Power Systems Data for Germany, which has been rapidly expanding its renewable energy production in recent years. The data set includes country-wide totals of electricity consumption, wind power production, and solar power production for 2006-2017. You can get the data here: https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv

1. Make sure all the work that you have done above is reproducible. That is, you should put the code above into separate functions in the `acquire.py` file and be able to re-run the functions and get the same data.