# Data Acquisition

- request / response
- HTTP: plain text transportation
- HTML: document structure (compilation target for markdown)
- JSON: data interchange format based on JavaScript
- API: How things are interacted with programatically
- REST: a prescription for application urls

RESTful urls:

| HTTP Method | Endpoint         | Description                |
| ---         | ---              | ---                        |
| GET         | /{resource}/{id} | Read details of a resource |
| GET         | /{resource}      | A listing of resources     |
| POST        | /{resource}      | Create a new resource      |
| PATCH       | /{resource}/{id} | Update a resource          |
| DELETE      | /{resource}/{id} | Delete a resource          |

We'll focus on the GET methods as they are the ones that retrieve and let us read information.

In [1]:
import pandas as pd
# The requests library simplifies the process of making http requests
import requests

Example HTML pages:

- http://example.com
- https://alumni.codeup.com

In [2]:
response = requests.get('http://example.com')

Demo: the response text is html

In [5]:
response

<Response [200]>

Http status codes: (http.cat/code)

- 200s: everythings good
- 300s: redirecting
- 400s: you did something wrong
- 500s: something is wrong with the server

In [7]:
print(response.text)

<!doctype html>
<html>
<head>
    <title>Example Domain</title>

    <meta charset="utf-8" />
    <meta http-equiv="Content-type" content="text/html; charset=utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1" />
    <style type="text/css">
    body {
        background-color: #f0f0f2;
        margin: 0;
        padding: 0;
        font-family: -apple-system, system-ui, BlinkMacSystemFont, "Segoe UI", "Open Sans", "Helvetica Neue", Helvetica, Arial, sans-serif;
        
    }
    div {
        width: 600px;
        margin: 5em auto;
        padding: 2em;
        background-color: #fdfdff;
        border-radius: 0.5em;
        box-shadow: 2px 3px 7px 2px rgba(0,0,0,0.02);
    }
    a:link, a:visited {
        color: #38488f;
        text-decoration: none;
    }
    @media (max-width: 700px) {
        div {
            margin: 0 auto;
            width: auto;
        }
    }
    </style>    
</head>

<body>
<div>
    <h1>Example Domain</h1>
    <p>This domai

Example JSON API endpoints:

- https://aphorisms.glitch.me
- https://jsonplaceholder.typicode.com/posts/1
- https://jsonplaceholder.typicode.com/users

In [17]:
response = requests.get('https://aphorisms.glitch.me')
response.text

'{"quote":"Programming is more about practicing than studying: a little bit of study but a lot of practice","author":"Luis Montealegre"}'

In [21]:
data = response.json()
data['quote']

'Programming is more about practicing than studying: a little bit of study but a lot of practice'

Demo: json responses are a special case

In [24]:
url = 'https://jsonplaceholder.typicode.com/posts/3'
response = requests.get(url)
response.json()

{'userId': 1,
 'id': 3,
 'title': 'ea molestias quasi exercitationem repellat qui ipsa sit aut',
 'body': 'et iusto sed quo iure\nvoluptatem occaecati omnis eligendi aut ad\nvoluptatem doloribus vel accusantium quis pariatur\nmolestiae porro eius odio et labore et velit aut'}

In [32]:
url = 'https://jsonplaceholder.typicode.com/users/'
response = requests.get(url)
data = response.json()
df = pd.DataFrame(data)
df['street'] = df.address.apply(lambda address_dict: address_dict['street'])
df.head()

Unnamed: 0,id,name,username,email,address,phone,website,company,street
0,1,Leanne Graham,Bret,Sincere@april.biz,"{'street': 'Kulas Light', 'suite': 'Apt. 556',...",1-770-736-8031 x56442,hildegard.org,"{'name': 'Romaguera-Crona', 'catchPhrase': 'Mu...",Kulas Light
1,2,Ervin Howell,Antonette,Shanna@melissa.tv,"{'street': 'Victor Plains', 'suite': 'Suite 87...",010-692-6593 x09125,anastasia.net,"{'name': 'Deckow-Crist', 'catchPhrase': 'Proac...",Victor Plains
2,3,Clementine Bauch,Samantha,Nathan@yesenia.net,"{'street': 'Douglas Extension', 'suite': 'Suit...",1-463-123-4447,ramiro.info,"{'name': 'Romaguera-Jacobson', 'catchPhrase': ...",Douglas Extension
3,4,Patricia Lebsack,Karianne,Julianne.OConner@kory.org,"{'street': 'Hoeger Mall', 'suite': 'Apt. 692',...",493-170-9623 x156,kale.biz,"{'name': 'Robel-Corkery', 'catchPhrase': 'Mult...",Hoeger Mall
4,5,Chelsey Dietrich,Kamren,Lucio_Hettinger@annie.ca,"{'street': 'Skiles Walks', 'suite': 'Suite 351...",(254)954-1289,demarco.info,"{'name': 'Keebler LLC', 'catchPhrase': 'User-c...",Skiles Walks


Demo: converting /users to a dataframe

- https://swapi.dev/api/people/5
- https://api.data.codeup.com

In [33]:
url = 'https://swapi.dev/api/people/5'
response = requests.get(url)
response.json()

{'name': 'Leia Organa',
 'height': '150',
 'mass': '49',
 'hair_color': 'brown',
 'skin_color': 'light',
 'eye_color': 'brown',
 'birth_year': '19BBY',
 'gender': 'female',
 'homeworld': 'https://swapi.dev/api/planets/2/',
 'films': ['https://swapi.dev/api/films/1/',
  'https://swapi.dev/api/films/2/',
  'https://swapi.dev/api/films/3/',
  'https://swapi.dev/api/films/6/'],
 'species': [],
 'vehicles': ['https://swapi.dev/api/vehicles/30/'],
 'starships': [],
 'created': '2014-12-10T15:20:09.791000Z',
 'edited': '2014-12-20T21:17:50.315000Z',
 'url': 'https://swapi.dev/api/people/5/'}

Demo: multiple pages and wrapper responses

In [36]:
response = requests.get('https://alumni.codeup.com')
response.json()

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

In [37]:
url = 'https://api.data.codeup.com'
response = requests.get(url)
response.json()

{'api': '/api/v1', 'help': '/documentation'}

In [41]:
url = 'https://api.data.codeup.com/documentation'
response = requests.get(url)
print(response.json()['payload'])


The API accepts GET requests for all endpoints, where endpoints are prefixed
with

    /api/{version}

Where version is "v1"

Valid endpoints:

- /stores[/{store_id}]
- /items[/{item_id}]
- /sales[/{sale_id}]

All endpoints accept a `page` parameter that can be used to navigate through
the results.



In [45]:
url = 'https://api.data.codeup.com/api/v1/stores'
response = requests.get(url)
data = response.json()
data.keys()

dict_keys(['payload', 'status'])

In [46]:
data['status']

'ok'

In [48]:
data['payload'].keys()

dict_keys(['max_page', 'next_page', 'page', 'previous_page', 'stores'])

In [51]:
pd.DataFrame(data['payload']['stores'])

Unnamed: 0,store_address,store_city,store_id,store_state,store_zipcode
0,12125 Alamo Ranch Pkwy,San Antonio,1,TX,78253
1,9255 FM 471 West,San Antonio,2,TX,78251
2,2118 Fredericksburg Rdj,San Antonio,3,TX,78201
3,516 S Flores St,San Antonio,4,TX,78204
4,1520 Austin Hwy,San Antonio,5,TX,78218
5,1015 S WW White Rd,San Antonio,6,TX,78220
6,12018 Perrin Beitel Rd,San Antonio,7,TX,78217
7,15000 San Pedro Ave,San Antonio,8,TX,78232
8,735 SW Military Dr,San Antonio,9,TX,78221
9,8503 NW Military Hwy,San Antonio,10,TX,78231


In [54]:
url = 'https://api.data.codeup.com/api/v1/items'
response = requests.get(url)
data = response.json()
data.keys()

dict_keys(['payload', 'status'])

In [70]:
type(data)

dict

In [55]:
data['status']

'ok'

In [57]:
data['payload'].keys()

dict_keys(['items', 'max_page', 'next_page', 'page', 'previous_page'])

In [71]:
data['payload']['page'], data['payload']['max_page'], data['payload']['next_page']

(1, 3, '/api/v1/items?page=2')

In [63]:
data['payload']['previous_page']

In [68]:
pd.DataFrame(data['payload']['items'])

Unnamed: 0,item_brand,item_id,item_name,item_price,item_upc12,item_upc14
0,Riceland,1,Riceland American Jazmine Rice,0.84,35200264013,35200264013
1,Caress,2,Caress Velvet Bliss Ultra Silkening Beauty Bar...,6.44,11111065925,11111065925
2,Earths Best,3,Earths Best Organic Fruit Yogurt Smoothie Mixe...,2.43,23923330139,23923330139
3,Boars Head,4,Boars Head Sliced White American Cheese - 120 Ct,3.14,208528800007,208528800007
4,Back To Nature,5,Back To Nature Gluten Free White Cheddar Rice ...,2.61,759283100036,759283100036
5,Sally Hansen,6,Sally Hansen Nail Color Magnetic 903 Silver El...,6.93,74170388732,74170388732
6,Twinings Of London,7,Twinings Of London Classics Lady Grey Tea - 20 Ct,9.64,70177154004,70177154004
7,Lea & Perrins,8,Lea & Perrins Marinade In-a-bag Cracked Pepper...,1.68,51600080015,51600080015
8,Van De Kamps,9,Van De Kamps Fillets Beer Battered - 10 Ct,1.79,19600923015,19600923015
9,Ahold,10,Ahold Cocoa Almonds,3.17,688267141676,688267141676


In [83]:
domain = 'https://api.data.codeup.com'
endpoint = '/api/v1/items'
items = []

url = domain + endpoint

response = requests.get(url)
data = response.json()
# .extend adds elemnts from a list to another list
items.extend(data['payload']['items'])

In [84]:
data['payload']['next_page']

'/api/v1/items?page=2'

In [85]:
url = domain + data['payload']['next_page']
print('next url:', url)

next url: https://api.data.codeup.com/api/v1/items?page=2


In [89]:
response = requests.get(url)
data = response.json()
items.extend(data['payload']['items'])

In [92]:
url = domain + data['payload']['next_page']
print('next url:', url)

next url: https://api.data.codeup.com/api/v1/items?page=3


In [93]:
response = requests.get(url)
data = response.json()
items.extend(data['payload']['items'])

In [95]:
# Hint hint: if data['payload']['next_page'] is None:
print('next endpoint', data['payload']['next_page'])
url = domain + data['payload']['next_page']
print('next url:', url)

next endpoint None


TypeError: can only concatenate str (not "NoneType") to str

In [99]:
pd.DataFrame(items)
# next steps:
# save to a csv or wrap up everything in a function

Unnamed: 0,item_brand,item_id,item_name,item_price,item_upc12,item_upc14
0,Riceland,1,Riceland American Jazmine Rice,0.84,35200264013,35200264013
1,Caress,2,Caress Velvet Bliss Ultra Silkening Beauty Bar...,6.44,11111065925,11111065925
2,Earths Best,3,Earths Best Organic Fruit Yogurt Smoothie Mixe...,2.43,23923330139,23923330139
3,Boars Head,4,Boars Head Sliced White American Cheese - 120 Ct,3.14,208528800007,208528800007
4,Back To Nature,5,Back To Nature Gluten Free White Cheddar Rice ...,2.61,759283100036,759283100036
5,Sally Hansen,6,Sally Hansen Nail Color Magnetic 903 Silver El...,6.93,74170388732,74170388732
6,Twinings Of London,7,Twinings Of London Classics Lady Grey Tea - 20 Ct,9.64,70177154004,70177154004
7,Lea & Perrins,8,Lea & Perrins Marinade In-a-bag Cracked Pepper...,1.68,51600080015,51600080015
8,Van De Kamps,9,Van De Kamps Fillets Beer Battered - 10 Ct,1.79,19600923015,19600923015
9,Ahold,10,Ahold Cocoa Almonds,3.17,688267141676,688267141676


In [None]:
# setup
domain = 'https://api.data.codeup.com'
endpoint = '/api/v1/items'
items = []

# For each page -- until next page is None
url = domain + endpoint
response = requests.get(url)
data = response.json()
items.extend(data['payload']['items'])
# update the endpoint
endpoint = data['payload']['next_page']

## Guidance for the exercise

1. Setup
    - url (base + endpoint)
    - empty list
1. Loop
    1. make a request
    1. handle the response, add to the list
    1. find the next url endpoint
        1. if it's None, stop looping
        1. if it's a string, use it to construct the next url
1. Turn the list into a dataframe

General Tips

- solve an easy problem first (the items endpoint), then apply that solution to the larger problem (sales)
- informational print statements are helpful as you are developing code, especially inside of a loop to see what changes
- Dont' be afraid to command + shift + p (command + shift + c for jupyter lab) "interrupt the kernel"
- curriculum says https://python.zgulde.net, that will work or use https://api.data.codeup.com