Goal: create acquire.py using api

**Material:**

https://python.zach.lol

https://www.gesis.org/en/services/data-analysis/social-indicators/the-german-system-of-social-indicators/

1. create a dataframe named items that has all of the data for items.

2. create a dataframe named stores

3. Extract the data for sales. There are a lot of pages of data here, so your code will need to be a little more complex. Your code should continue fetching data from the next page until all of the data is extracted.

4. Save the data in your files to local csv files so that it will be faster to access in the future.

5. Combine the data from your three separate dataframes into one large dataframe.

6. Acquire the Open Power Systems Data for Germany, which has been rapidly expanding its renewable energy production in recent years. The data set includes country-wide totals of electricity consumption, wind power production, and solar power production for 2006-2017. You can get the data here: https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv

7. create acquire.py file, containing functions to get data from webpage.

In [15]:
import requests
import pandas as pd

In [16]:
base_url = 'https://python.zach.lol'
print(requests.get(base_url).text)

{"api":"/api/v1","help":"/documentation"}



In [17]:
response = requests.get(base_url + '/documentation')
print(response.json()['payload'])


The API accepts GET requests for all endpoints, where endpoints are prefixed
with

    /api/{version}

Where version is "v1"

Valid endpoints:

- /stores[/{store_id}]
- /items[/{item_id}]
- /sales[/{sale_id}]

All endpoints accept a `page` parameter that can be used to navigate through
the results.



In [18]:
response = requests.get('https://python.zach.lol/api/v1/items')

data = response.json()
data.keys()

dict_keys(['payload', 'status'])

In [19]:
data['payload'].keys()

dict_keys(['items', 'max_page', 'next_page', 'page', 'previous_page'])

In [20]:
data['payload']['items']

[{'item_brand': 'Riceland',
  'item_id': 1,
  'item_name': 'Riceland American Jazmine Rice',
  'item_price': 0.84,
  'item_upc12': '35200264013',
  'item_upc14': '35200264013'},
 {'item_brand': 'Caress',
  'item_id': 2,
  'item_name': 'Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct',
  'item_price': 6.44,
  'item_upc12': '11111065925',
  'item_upc14': '11111065925'},
 {'item_brand': 'Earths Best',
  'item_id': 3,
  'item_name': 'Earths Best Organic Fruit Yogurt Smoothie Mixed Berry',
  'item_price': 2.43,
  'item_upc12': '23923330139',
  'item_upc14': '23923330139'},
 {'item_brand': 'Boars Head',
  'item_id': 4,
  'item_name': 'Boars Head Sliced White American Cheese - 120 Ct',
  'item_price': 3.14,
  'item_upc12': '208528800007',
  'item_upc14': '208528800007'},
 {'item_brand': 'Back To Nature',
  'item_id': 5,
  'item_name': 'Back To Nature Gluten Free White Cheddar Rice Thin Crackers',
  'item_price': 2.61,
  'item_upc12': '759283100036',
  'item_upc14': '759283100036'},
 {'i

In [21]:
print('max_page: %s' % data['payload']['max_page'])
print('next_page: %s' % data['payload']['next_page'])

max_page: 3
next_page: /api/v1/items?page=2


In [22]:
data['payload']['items'][:2]

[{'item_brand': 'Riceland',
  'item_id': 1,
  'item_name': 'Riceland American Jazmine Rice',
  'item_price': 0.84,
  'item_upc12': '35200264013',
  'item_upc14': '35200264013'},
 {'item_brand': 'Caress',
  'item_id': 2,
  'item_name': 'Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct',
  'item_price': 6.44,
  'item_upc12': '11111065925',
  'item_upc14': '11111065925'}]

In [23]:
df = pd.DataFrame(data['payload']['items'])
df.head()

Unnamed: 0,item_brand,item_id,item_name,item_price,item_upc12,item_upc14
0,Riceland,1,Riceland American Jazmine Rice,0.84,35200264013,35200264013
1,Caress,2,Caress Velvet Bliss Ultra Silkening Beauty Bar...,6.44,11111065925,11111065925
2,Earths Best,3,Earths Best Organic Fruit Yogurt Smoothie Mixe...,2.43,23923330139,23923330139
3,Boars Head,4,Boars Head Sliced White American Cheese - 120 Ct,3.14,208528800007,208528800007
4,Back To Nature,5,Back To Nature Gluten Free White Cheddar Rice ...,2.61,759283100036,759283100036


In [24]:
df

Unnamed: 0,item_brand,item_id,item_name,item_price,item_upc12,item_upc14
0,Riceland,1,Riceland American Jazmine Rice,0.84,35200264013,35200264013
1,Caress,2,Caress Velvet Bliss Ultra Silkening Beauty Bar...,6.44,11111065925,11111065925
2,Earths Best,3,Earths Best Organic Fruit Yogurt Smoothie Mixe...,2.43,23923330139,23923330139
3,Boars Head,4,Boars Head Sliced White American Cheese - 120 Ct,3.14,208528800007,208528800007
4,Back To Nature,5,Back To Nature Gluten Free White Cheddar Rice ...,2.61,759283100036,759283100036
5,Sally Hansen,6,Sally Hansen Nail Color Magnetic 903 Silver El...,6.93,74170388732,74170388732
6,Twinings Of London,7,Twinings Of London Classics Lady Grey Tea - 20 Ct,9.64,70177154004,70177154004
7,Lea & Perrins,8,Lea & Perrins Marinade In-a-bag Cracked Pepper...,1.68,51600080015,51600080015
8,Van De Kamps,9,Van De Kamps Fillets Beer Battered - 10 Ct,1.79,19600923015,19600923015
9,Ahold,10,Ahold Cocoa Almonds,3.17,688267141676,688267141676


In [25]:
response = requests.get(base_url + data['payload']['next_page'])
data = response.json()

print('max_page: %s' % data['payload']['max_page'])
print('next_page: %s' % data['payload']['next_page'])

df = pd.concat([df, pd.DataFrame(data['payload']['items'])]).reset_index()

max_page: 3
next_page: /api/v1/items?page=3


In [26]:
response = requests.get(base_url + data['payload']['next_page'])
data = response.json()

print('max_page: %s' % data['payload']['max_page'])
print('next_page: %s' % data['payload']['next_page'])

df = pd.concat([df, pd.DataFrame(data['payload']['items'])]).reset_index()

max_page: 3
next_page: None


of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  import sys


In [27]:
df.shape

(50, 8)

In [29]:
base_url = 'https://python.zach.lol'
response = requests.get(base_url)

In [30]:
# peek what's inside
response.text

'{"api":"/api/v1","help":"/documentation"}\n'

In [31]:
# peek help
response = requests.get(base_url+'/documentation')
from pprint import pprint
pprint(response.text)

('{"payload":"\\nThe API accepts GET requests for all endpoints, where '
 'endpoints are prefixed\\nwith\\n\\n    /api/{version}\\n\\nWhere version is '
 '\\"v1\\"\\n\\nValid endpoints:\\n\\n- /stores[/{store_id}]\\n- '
 '/items[/{item_id}]\\n- /sales[/{sale_id}]\\n\\nAll endpoints accept a `page` '
 'parameter that can be used to navigate through\\nthe '
 'results.\\n","status":"ok"}\n')


In [35]:
print(response.json()['payload'])


The API accepts GET requests for all endpoints, where endpoints are prefixed
with

    /api/{version}

Where version is "v1"

Valid endpoints:

- /stores[/{store_id}]
- /items[/{item_id}]
- /sales[/{sale_id}]

All endpoints accept a `page` parameter that can be used to navigate through
the results.



In [28]:
response = requests.get('https://python.zach.lol/api/v1/items')

In [29]:
response.json()

{'payload': {'items': [{'item_brand': 'Riceland',
    'item_id': 1,
    'item_name': 'Riceland American Jazmine Rice',
    'item_price': 0.84,
    'item_upc12': '35200264013',
    'item_upc14': '35200264013'},
   {'item_brand': 'Caress',
    'item_id': 2,
    'item_name': 'Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct',
    'item_price': 6.44,
    'item_upc12': '11111065925',
    'item_upc14': '11111065925'},
   {'item_brand': 'Earths Best',
    'item_id': 3,
    'item_name': 'Earths Best Organic Fruit Yogurt Smoothie Mixed Berry',
    'item_price': 2.43,
    'item_upc12': '23923330139',
    'item_upc14': '23923330139'},
   {'item_brand': 'Boars Head',
    'item_id': 4,
    'item_name': 'Boars Head Sliced White American Cheese - 120 Ct',
    'item_price': 3.14,
    'item_upc12': '208528800007',
    'item_upc14': '208528800007'},
   {'item_brand': 'Back To Nature',
    'item_id': 5,
    'item_name': 'Back To Nature Gluten Free White Cheddar Rice Thin Crackers',
    'item_price':

In [30]:
response.json()

{'payload': {'items': [{'item_brand': 'Riceland',
    'item_id': 1,
    'item_name': 'Riceland American Jazmine Rice',
    'item_price': 0.84,
    'item_upc12': '35200264013',
    'item_upc14': '35200264013'},
   {'item_brand': 'Caress',
    'item_id': 2,
    'item_name': 'Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct',
    'item_price': 6.44,
    'item_upc12': '11111065925',
    'item_upc14': '11111065925'},
   {'item_brand': 'Earths Best',
    'item_id': 3,
    'item_name': 'Earths Best Organic Fruit Yogurt Smoothie Mixed Berry',
    'item_price': 2.43,
    'item_upc12': '23923330139',
    'item_upc14': '23923330139'},
   {'item_brand': 'Boars Head',
    'item_id': 4,
    'item_name': 'Boars Head Sliced White American Cheese - 120 Ct',
    'item_price': 3.14,
    'item_upc12': '208528800007',
    'item_upc14': '208528800007'},
   {'item_brand': 'Back To Nature',
    'item_id': 5,
    'item_name': 'Back To Nature Gluten Free White Cheddar Rice Thin Crackers',
    'item_price':

In [31]:
data = response.json()
data

{'payload': {'items': [{'item_brand': 'Riceland',
    'item_id': 1,
    'item_name': 'Riceland American Jazmine Rice',
    'item_price': 0.84,
    'item_upc12': '35200264013',
    'item_upc14': '35200264013'},
   {'item_brand': 'Caress',
    'item_id': 2,
    'item_name': 'Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct',
    'item_price': 6.44,
    'item_upc12': '11111065925',
    'item_upc14': '11111065925'},
   {'item_brand': 'Earths Best',
    'item_id': 3,
    'item_name': 'Earths Best Organic Fruit Yogurt Smoothie Mixed Berry',
    'item_price': 2.43,
    'item_upc12': '23923330139',
    'item_upc14': '23923330139'},
   {'item_brand': 'Boars Head',
    'item_id': 4,
    'item_name': 'Boars Head Sliced White American Cheese - 120 Ct',
    'item_price': 3.14,
    'item_upc12': '208528800007',
    'item_upc14': '208528800007'},
   {'item_brand': 'Back To Nature',
    'item_id': 5,
    'item_name': 'Back To Nature Gluten Free White Cheddar Rice Thin Crackers',
    'item_price':

In [32]:
data.keys()

dict_keys(['payload', 'status'])

In [33]:
data['payload'].keys()

dict_keys(['items', 'max_page', 'next_page', 'page', 'previous_page'])

In [34]:
data['payload']

{'items': [{'item_brand': 'Riceland',
   'item_id': 1,
   'item_name': 'Riceland American Jazmine Rice',
   'item_price': 0.84,
   'item_upc12': '35200264013',
   'item_upc14': '35200264013'},
  {'item_brand': 'Caress',
   'item_id': 2,
   'item_name': 'Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct',
   'item_price': 6.44,
   'item_upc12': '11111065925',
   'item_upc14': '11111065925'},
  {'item_brand': 'Earths Best',
   'item_id': 3,
   'item_name': 'Earths Best Organic Fruit Yogurt Smoothie Mixed Berry',
   'item_price': 2.43,
   'item_upc12': '23923330139',
   'item_upc14': '23923330139'},
  {'item_brand': 'Boars Head',
   'item_id': 4,
   'item_name': 'Boars Head Sliced White American Cheese - 120 Ct',
   'item_price': 3.14,
   'item_upc12': '208528800007',
   'item_upc14': '208528800007'},
  {'item_brand': 'Back To Nature',
   'item_id': 5,
   'item_name': 'Back To Nature Gluten Free White Cheddar Rice Thin Crackers',
   'item_price': 2.61,
   'item_upc12': '759283100036',

In [35]:
items = pd.DataFrame(data['payload']['items'])
items.head()

Unnamed: 0,item_brand,item_id,item_name,item_price,item_upc12,item_upc14
0,Riceland,1,Riceland American Jazmine Rice,0.84,35200264013,35200264013
1,Caress,2,Caress Velvet Bliss Ultra Silkening Beauty Bar...,6.44,11111065925,11111065925
2,Earths Best,3,Earths Best Organic Fruit Yogurt Smoothie Mixe...,2.43,23923330139,23923330139
3,Boars Head,4,Boars Head Sliced White American Cheese - 120 Ct,3.14,208528800007,208528800007
4,Back To Nature,5,Back To Nature Gluten Free White Cheddar Rice ...,2.61,759283100036,759283100036


In [36]:
response = requests.get(base_url + data['payload']['next_page'])
data = response.json()

print('max_page: %s' % data['payload']['max_page'])
print('next_page: %s' % data['payload']['next_page'])

items = pd.concat([items, pd.DataFrame(data['payload']['items'])]).reset_index()

max_page: 3
next_page: /api/v1/items?page=3


In [37]:
response = requests.get(base_url + data['payload']['next_page'])
data = response.json()

print('max_page: %s' % data['payload']['max_page'])
print('next_page: %s' % data['payload']['next_page'])

items = pd.concat([items, pd.DataFrame(data['payload']['items'])]).reset_index()

max_page: 3
next_page: None


of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.


  import sys


In [38]:
response = requests.get('https://python.zach.lol/api/v1/stores')

In [39]:
store_data = response.json()

In [40]:
store_data.keys()

dict_keys(['payload', 'status'])

In [41]:
# max_page = 1
store_data['payload']

{'max_page': 1,
 'next_page': None,
 'page': 1,
 'previous_page': None,
 'stores': [{'store_address': '12125 Alamo Ranch Pkwy',
   'store_city': 'San Antonio',
   'store_id': 1,
   'store_state': 'TX',
   'store_zipcode': '78253'},
  {'store_address': '9255 FM 471 West',
   'store_city': 'San Antonio',
   'store_id': 2,
   'store_state': 'TX',
   'store_zipcode': '78251'},
  {'store_address': '2118 Fredericksburg Rdj',
   'store_city': 'San Antonio',
   'store_id': 3,
   'store_state': 'TX',
   'store_zipcode': '78201'},
  {'store_address': '516 S Flores St',
   'store_city': 'San Antonio',
   'store_id': 4,
   'store_state': 'TX',
   'store_zipcode': '78204'},
  {'store_address': '1520 Austin Hwy',
   'store_city': 'San Antonio',
   'store_id': 5,
   'store_state': 'TX',
   'store_zipcode': '78218'},
  {'store_address': '1015 S WW White Rd',
   'store_city': 'San Antonio',
   'store_id': 6,
   'store_state': 'TX',
   'store_zipcode': '78220'},
  {'store_address': '12018 Perrin Beitel 

In [42]:
store = pd.DataFrame(store_data['payload']['stores'])

In [43]:
store

Unnamed: 0,store_address,store_city,store_id,store_state,store_zipcode
0,12125 Alamo Ranch Pkwy,San Antonio,1,TX,78253
1,9255 FM 471 West,San Antonio,2,TX,78251
2,2118 Fredericksburg Rdj,San Antonio,3,TX,78201
3,516 S Flores St,San Antonio,4,TX,78204
4,1520 Austin Hwy,San Antonio,5,TX,78218
5,1015 S WW White Rd,San Antonio,6,TX,78220
6,12018 Perrin Beitel Rd,San Antonio,7,TX,78217
7,15000 San Pedro Ave,San Antonio,8,TX,78232
8,735 SW Military Dr,San Antonio,9,TX,78221
9,8503 NW Military Hwy,San Antonio,10,TX,78231


In [8]:
import requests
import pandas as pd
response = requests.get('https://python.zach.lol/api/v1/sales?page=2')
sale_data = response.json()

In [9]:
sale_data

{'payload': {'max_page': 183,
  'next_page': '/api/v1/sales?page=3',
  'page': 2,
  'previous_page': '/api/v1/sales?page=1',
  'sales': [{'item': 1,
    'sale_amount': 33.0,
    'sale_date': 'Sat, 10 Sep 2016 00:00:00 GMT',
    'sale_id': 5001,
    'store': 3},
   {'item': 1,
    'sale_amount': 27.0,
    'sale_date': 'Sun, 11 Sep 2016 00:00:00 GMT',
    'sale_id': 5002,
    'store': 3},
   {'item': 1,
    'sale_amount': 26.0,
    'sale_date': 'Mon, 12 Sep 2016 00:00:00 GMT',
    'sale_id': 5003,
    'store': 3},
   {'item': 1,
    'sale_amount': 22.0,
    'sale_date': 'Tue, 13 Sep 2016 00:00:00 GMT',
    'sale_id': 5004,
    'store': 3},
   {'item': 1,
    'sale_amount': 25.0,
    'sale_date': 'Wed, 14 Sep 2016 00:00:00 GMT',
    'sale_id': 5005,
    'store': 3},
   {'item': 1,
    'sale_amount': 22.0,
    'sale_date': 'Thu, 15 Sep 2016 00:00:00 GMT',
    'sale_id': 5006,
    'store': 3},
   {'item': 1,
    'sale_amount': 35.0,
    'sale_date': 'Fri, 16 Sep 2016 00:00:00 GMT',
    'sal

In [128]:
sales = pd.DataFrame(sale_data['payload']['sales'])

In [130]:
response = requests.get(base_url + sale_data['payload']['next_page'])

In [3]:
sale_data['payload']['page']

1

In [None]:
https://python.zach.lol/api/v1/sales?page=2

In [3]:
for page in range(1,184):
    response = '?page=' + str(page)
    print(response)

?page=1
?page=2
?page=3
?page=4
?page=5
?page=6
?page=7
?page=8
?page=9
?page=10
?page=11
?page=12
?page=13
?page=14
?page=15
?page=16
?page=17
?page=18
?page=19
?page=20
?page=21
?page=22
?page=23
?page=24
?page=25
?page=26
?page=27
?page=28
?page=29
?page=30
?page=31
?page=32
?page=33
?page=34
?page=35
?page=36
?page=37
?page=38
?page=39
?page=40
?page=41
?page=42
?page=43
?page=44
?page=45
?page=46
?page=47
?page=48
?page=49
?page=50
?page=51
?page=52
?page=53
?page=54
?page=55
?page=56
?page=57
?page=58
?page=59
?page=60
?page=61
?page=62
?page=63
?page=64
?page=65
?page=66
?page=67
?page=68
?page=69
?page=70
?page=71
?page=72
?page=73
?page=74
?page=75
?page=76
?page=77
?page=78
?page=79
?page=80
?page=81
?page=82
?page=83
?page=84
?page=85
?page=86
?page=87
?page=88
?page=89
?page=90
?page=91
?page=92
?page=93
?page=94
?page=95
?page=96
?page=97
?page=98
?page=99
?page=100
?page=101
?page=102
?page=103
?page=104
?page=105
?page=106
?page=107
?page=108
?page=109
?page=110
?page=11

In [12]:
def acquire():
    import requests
    import pandas as pd    
    for page in range(1,184):
        base_url = 'https://python.zach.lol/api/v1/sales'
        response = requests.get(base_url + '?page=' + str(page))
        sales = pd.DataFrame()
        sales = pd.concat([sales, pd.DataFrame(response.json()['payload']['sales'])])
    return sales         

In [13]:
acquire()

Unnamed: 0,item,sale_amount,sale_date,sale_id,store
0,50,77.0,"Wed, 15 Oct 2014 00:00:00 GMT",910001,9
1,50,52.0,"Thu, 16 Oct 2014 00:00:00 GMT",910002,9
2,50,65.0,"Fri, 17 Oct 2014 00:00:00 GMT",910003,9
3,50,66.0,"Sat, 18 Oct 2014 00:00:00 GMT",910004,9
4,50,81.0,"Sun, 19 Oct 2014 00:00:00 GMT",910005,9
5,50,61.0,"Mon, 20 Oct 2014 00:00:00 GMT",910006,9
6,50,59.0,"Tue, 21 Oct 2014 00:00:00 GMT",910007,9
7,50,53.0,"Wed, 22 Oct 2014 00:00:00 GMT",910008,9
8,50,66.0,"Thu, 23 Oct 2014 00:00:00 GMT",910009,9
9,50,70.0,"Fri, 24 Oct 2014 00:00:00 GMT",910010,9


KeyboardInterrupt: 

In [8]:
sales = acquire('https://python.zach.lol/api/v1/sales')

In [69]:
sales.rename(columns = {'store':'store_id'}, inplace = True)

In [70]:
sales.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 915000 entries, 0 to 914999
Data columns (total 6 columns):
index          915000 non-null int64
item           915000 non-null int64
sale_amount    915000 non-null float64
sale_date      915000 non-null object
sale_id        915000 non-null int64
store_id       915000 non-null int64
dtypes: float64(1), int64(4), object(1)
memory usage: 41.9+ MB


In [74]:
store.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 5 columns):
store_address    10 non-null object
store_city       10 non-null object
store_id         10 non-null int64
store_state      10 non-null object
store_zipcode    10 non-null object
dtypes: int64(1), object(4)
memory usage: 480.0+ bytes


In [71]:
sales_store = sales.join(store.set_index('store_id'), on='store_id')

In [93]:
sales

Unnamed: 0,index,item,sale_amount,sale_date,sale_id,store_id
0,0,1,13.0,"Tue, 01 Jan 2013 00:00:00 GMT",1,1
1,1,1,11.0,"Wed, 02 Jan 2013 00:00:00 GMT",2,1
2,2,1,14.0,"Thu, 03 Jan 2013 00:00:00 GMT",3,1
3,3,1,13.0,"Fri, 04 Jan 2013 00:00:00 GMT",4,1
4,4,1,10.0,"Sat, 05 Jan 2013 00:00:00 GMT",5,1
5,5,1,12.0,"Sun, 06 Jan 2013 00:00:00 GMT",6,1
6,6,1,10.0,"Mon, 07 Jan 2013 00:00:00 GMT",7,1
7,7,1,9.0,"Tue, 08 Jan 2013 00:00:00 GMT",8,1
8,8,1,12.0,"Wed, 09 Jan 2013 00:00:00 GMT",9,1
9,9,1,9.0,"Thu, 10 Jan 2013 00:00:00 GMT",10,1


In [91]:
items.item_id.nunique()

50

'11'