# Data Acquisistion

* HTTP (*HyperText Transfer Protocol*): Plain text transportation
    * Request
    * Response
* HTML (*HyperText Markup Language*): Document structure for a webpage
* JSON (*JavaScript Object Notation*): Data interchange format based on JavaScript (structure is very similiar to Python dictionaries)
* API (*Application Programming Interface*): How things are interacted with programatically
* REST (*Representational State Transfer*): A set of rules for application urls

| HTTP Method | Endpoint         | Description                |
| ---         | ---              | ---                        |
| GET         | /{resource}/{id} | Read details of a resource |
| GET         | /{resource}      | A listing of resources     |
| POST        | /{resource}      | Create a new resource      |
| PATCH       | /{resource}/{id} | Update a resource          |
| DELETE      | /{resource}/{id} | Delete a resource          |

We'll focus on the GET methods as they are the ones that retrieve and let us read information.

## Imports

In [None]:
import pandas as pd

import requests

Example HTML pages:

- http://example.com
- https://alumni.codeup.com

## Using requests library

We get responses from making a requests.

### Some things we can get:

#### Response Code

In [None]:
response = requests.get('http://example.com')
response

Http status codes:

* 200s: everythings good
* 300s: redirecting
* 400s: you did something wrong
* 500s: something is wrong with the server

#### Response Text

In [None]:
print(response.text)

### Example JSON API endpoints:

* https://aphorisms.glitch.me
* https://jsonplaceholder.typicode.com/posts/1
* https://jsonplaceholder.typicode.com/users

### Let's start with the quote generator

In [None]:
response = requests.get('https://aphorisms.glitch.me')
response

In [None]:
response.text

In [None]:
data = response.json()
data

#### What is the difference between `response.text` and `response.json()`?

In [None]:
print('response.text type is:', type(response.text))
print('response.json() type is:', type(response.json()))

#### So now we can treat `data` just like a dictionary?

In [None]:
data

In [None]:
data['quote']

In [None]:
data['author']

## Now let's work with an API storing time series data:

In [None]:
url = 'https://python.zgulde.net'
response = requests.get(url)
response.json()

#### &#8593; This tells us what we can add to url to get new info

#### Let's look at the documentation

In [None]:
url = 'https://python.zgulde.net' + '/documentation'
response = requests.get(url)
response.json()

In [None]:
response.json()['payload']

In [None]:
print(response.json()['payload'])

#### What's an endpoint?

##### * An endpoint are the parts after the main url, called the domain. 

##### * In this case our endpoints go after .com in the url separated by slashes.

   * **Extra**: .com, .gov, .net are known as TLD or *Top Level Domains* in a url

#### So with this info we can now start retrieving data from the api

#### Let's check out the stores data

In [None]:
url = 'https://python.zgulde.net/api/v1/stores'
response = requests.get(url)
data = response.json()
data

In [None]:
data.keys()

In [None]:
data['status']

In [None]:
data['payload']

In [None]:
data['payload'].keys()

In [None]:
data['payload']['stores']

In [None]:
pd.DataFrame(data['payload']['stores'])

#### Let's do the same with items

In [None]:
url = 'https://python.zgulde.net/api/v1/items'
response = requests.get(url)
data = response.json()
data.keys()

In [None]:
data['status']

In [None]:
data['payload']

In [None]:
data['payload'].keys()

In [None]:
(
    data['payload']['page'], 
    data['payload']['max_page'], 
    data['payload']['next_page'],
    data['payload']['previous_page'],
)

In [None]:
pd.DataFrame(data['payload']['items'])

### Now what to do about the multiple pages?

In [None]:
domain = 'https://python.zgulde.net'
endpoint = '/api/v1/items'
items = []

url = domain + endpoint

response = requests.get(url)
data = response.json()
# .extend adds elemnts from a list to another list
items.extend(data['payload']['items'])

In [None]:
data['payload']['next_page']

In [None]:
url = domain + data['payload']['next_page']
print('Next url:', url)

In [None]:
response = requests.get(url)
data = response.json()
items.extend(data['payload']['items'])

In [None]:
url = domain + data['payload']['next_page']
print('next url:', url)

In [None]:
response = requests.get(url)
data = response.json()
items.extend(data['payload']['items'])

In [None]:
# Hint hint: if data['payload']['next_page'] is None:
print('next endpoint', data['payload']['next_page'])

In [None]:
pd.DataFrame(items)
# next steps:
# save to a csv or wrap up everything in a function

In [None]:
# setup
domain = 'https://python.zgulde.net'
endpoint = '/api/v1/items'
items = []

# For each page -- until next page is None
url = domain + endpoint
response = requests.get(url)
data = response.json()
items.extend(data['payload']['items'])
# update the endpoint
endpoint = data['payload']['next_page']

## Guidance for the exercise
1. Setup
    * url (base + endpoint)
    * empty list
1. Loop
    1. make a request
    1. handle the response, add to the list
    1. find the next url endpoint
        1. if it's None, stop looping
        1. if it's a string, use it to construct the next url
1. Turn the list into a dataframe

### General Tips

![Alt Text](https://c.tenor.com/NpDMsR4GdTAAAAAC/salute-captain-america.gif)

* solve an easy problem first (the items endpoint), then apply that solution to the larger problem (sales)
* informational print statements are helpful as you are developing code, especially inside of a loop to see what changes
* Dont' be afraid to command + shift + p (command + shift + c for jupyter lab) "interrupt the kernel"

# Exercises

Create a new local git repository and remote repository on github named `time-series-exercises`. Save this work for this module in your `time-series-exercises` repo.

The end result of this exercise should be a file named `acquire.py`.

1. Using the code from the lesson as a guide and the REST API from https://python.zgulde.net/api/v1/items as we did in the lesson, create a dataframe named `items` that has all of the data for items.
2. Do the same thing, but for `stores` (https://python.zgulde.net/api/v1/stores)
3. Extract the data for `sales` (https://python.zgulde.net/api/v1/sales). There are a lot of pages of data here, so your code will need to be a little more complex. Your code should continue fetching data from the next page until all of the data is extracted.
4. Save the data in your files to local csv files so that it will be faster to access in the future.
5. Combine the data from your three separate dataframes into one large dataframe.
6. Acquire the Open Power Systems Data for Germany, which has been rapidly expanding its renewable energy production in recent years. The data set includes country-wide totals of electricity consumption, wind power production, and solar power production for 2006-2017. You can get the data here: https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv
7. Make sure all the work that you have done above is reproducible. That is, you should put the code above into separate functions in the `acquire.py` file and be able to re-run the functions and get the same data.