# Acquire Exercises:

### The end result of this exercise should be a file named `acquire.py`.

1. Using the code from the lesson as a guide and the REST API from https://python.zgulde.net/api/v1/items as we did in the lesson, create a dataframe named `items` that has all of the data for items.

1. Do the same thing, but for `stores` (https://python.zgulde.net/api/v1/stores)

1. Extract the data for `sales` (https://python.zgulde.net/api/v1/sales). There are a lot of pages of data here, so your code will need to be a little more complex. Your code should continue fetching data from the next page until all of the data is extracted.

1. Save the data in your files to local csv files so that it will be faster to access in the future.

1. Combine the data from your three separate dataframes into one large dataframe. 

1. Acquire the Open Power Systems Data for Germany, which has been rapidly expanding its renewable energy production in recent years. The data set includes country-wide totals of electricity consumption, wind power production, and solar power production for 2006-2017. You can get the data here: https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv

1. Make sure all the work that you have done above is reproducible. That is, you should put the code above into separate functions in the `acquire.py` file and be able to re-run the functions and get the same data.

In [1]:
#Imports needed to acquire data
import requests

import pandas as pd

In [2]:
base_url = 'https://python.zgulde.net'
print(requests.get(base_url).text)

{"api":"/api/v1","help":"/documentation"}



In [4]:
#requesting the documentation:
response = requests.get(base_url + '/documentation')
print(response.json())

{'payload': '\nThe API accepts GET requests for all endpoints, where endpoints are prefixed\nwith\n\n    /api/{version}\n\nWhere version is "v1"\n\nValid endpoints:\n\n- /stores[/{store_id}]\n- /items[/{item_id}]\n- /sales[/{sale_id}]\n\nAll endpoints accept a `page` parameter that can be used to navigate through\nthe results.\n', 'status': 'ok'}


In [5]:
#### See the {'payload': ...}?  Let's add on 'payload' to our response:
response = requests.get(base_url + '/documentation')
print(response.json()['payload'])


The API accepts GET requests for all endpoints, where endpoints are prefixed
with

    /api/{version}

Where version is "v1"

Valid endpoints:

- /stores[/{store_id}]
- /items[/{item_id}]
- /sales[/{sale_id}]

All endpoints accept a `page` parameter that can be used to navigate through
the results.



In [7]:
#following the path as outlined in the documentation: Accepts GET, and prefix with /api/{type the version here- which will be v1 in this case}/type a valid endpoint here
response = requests.get(base_url + '/api/v1/items')
response.ok

True

In [11]:
data = response.json()
data.keys()

dict_keys(['payload', 'status'])

In [15]:
data['payload'].keys()

dict_keys(['items', 'max_page', 'next_page', 'page', 'previous_page'])

In [16]:
#discovering the max page and where we are in navigation of the pages:
print('max_page: %s' % data['payload']['max_page'])
print('next_page: %s' % data['payload']['next_page'])
print('page: %s' % data['payload']['page'])
print('previous_page: %s' % data['payload']['previous_page'])

max_page: 3
next_page: /api/v1/items?page=2
page: 1
previous_page: None


In [14]:
#look at the first 2 items in the dictionary
data['payload']['items'][:2]

[{'item_brand': 'Riceland',
  'item_id': 1,
  'item_name': 'Riceland American Jazmine Rice',
  'item_price': 0.84,
  'item_upc12': '35200264013',
  'item_upc14': '35200264013'},
 {'item_brand': 'Caress',
  'item_id': 2,
  'item_name': 'Caress Velvet Bliss Ultra Silkening Beauty Bar - 6 Ct',
  'item_price': 6.44,
  'item_upc12': '11111065925',
  'item_upc14': '11111065925'}]

In [17]:
#create a dataframe with of the payload items

df = pd.DataFrame(data['payload']['items'])
df.head()

Unnamed: 0,item_brand,item_id,item_name,item_price,item_upc12,item_upc14
0,Riceland,1,Riceland American Jazmine Rice,0.84,35200264013,35200264013
1,Caress,2,Caress Velvet Bliss Ultra Silkening Beauty Bar...,6.44,11111065925,11111065925
2,Earths Best,3,Earths Best Organic Fruit Yogurt Smoothie Mixe...,2.43,23923330139,23923330139
3,Boars Head,4,Boars Head Sliced White American Cheese - 120 Ct,3.14,208528800007,208528800007
4,Back To Nature,5,Back To Nature Gluten Free White Cheddar Rice ...,2.61,759283100036,759283100036


In [18]:
response = requests.get(base_url + data['payload']['next_page'])
data = response.json()

print('max_page: %s' % data['payload']['max_page'])
print('next_page: %s' % data['payload']['next_page'])

df = pd.concat([df, pd.DataFrame(data['payload']['items'])]).reset_index()

max_page: 3
next_page: /api/v1/items?page=3


In [19]:
response = requests.get(base_url + data['payload']['next_page'])
data = response.json()

print('max_page: %s' % data['payload']['max_page'])
print('next_page: %s' % data['payload']['next_page'])

df = pd.concat([df, pd.DataFrame(data['payload']['items'])]).reset_index()

max_page: 3
next_page: None


In [20]:
df.shape

(50, 8)

In [21]:
items = df
items.shape

(50, 8)

### Now we have created a df named items.  We will repeat the steps above to create a df for "stores".

In [24]:
response = requests.get(base_url + '/api/v1/stores')
response.ok

True

In [25]:
data = response.json()
data.keys()

dict_keys(['payload', 'status'])

In [26]:
data['payload'].keys()

dict_keys(['max_page', 'next_page', 'page', 'previous_page', 'stores'])

### Notice the keys look similar, but now have 'stores' instead.

In [28]:
#discovering the max page and where we are in navigation of the pages:
print('max_page: %s' % data['payload']['max_page'])
print('next_page: %s' % data['payload']['next_page'])
print('page: %s' % data['payload']['page'])
print('previous_page: %s' % data['payload']['previous_page'])

max_page: 1
next_page: None
page: 1
previous_page: None


#### There is only one page of data here.

In [32]:
#look at the first 2 items in the dictionary
data['payload']['stores'][:2]

[{'store_address': '12125 Alamo Ranch Pkwy',
  'store_city': 'San Antonio',
  'store_id': 1,
  'store_state': 'TX',
  'store_zipcode': '78253'},
 {'store_address': '9255 FM 471 West',
  'store_city': 'San Antonio',
  'store_id': 2,
  'store_state': 'TX',
  'store_zipcode': '78251'}]

In [30]:
#create a dataframe with of the payload items

df = pd.DataFrame(data['payload']['stores'])
df.head()

Unnamed: 0,store_address,store_city,store_id,store_state,store_zipcode
0,12125 Alamo Ranch Pkwy,San Antonio,1,TX,78253
1,9255 FM 471 West,San Antonio,2,TX,78251
2,2118 Fredericksburg Rdj,San Antonio,3,TX,78201
3,516 S Flores St,San Antonio,4,TX,78204
4,1520 Austin Hwy,San Antonio,5,TX,78218


### Right away I can see there are typos in these entries. (look at the 3rd record "Rdj" instead of "Rd") Something to keep in mind for the future.

In [35]:
df.shape

(10, 5)

In [37]:
stores = df
stores.shape

(10, 5)

### Now I have two dataframes: items and stores.
### I will repeat the above for the "sales".

In [46]:
#following the path as outlined in the documentation: Accepts GET, and prefix with /api/{type the version here- which will be v1 in this case}/type a valid endpoint here
response = requests.get(base_url + '/api/v1/sales')
response.ok

True

In [47]:
data = response.json()
data.keys()

dict_keys(['payload', 'status'])

In [48]:
data['payload'].keys()

dict_keys(['max_page', 'next_page', 'page', 'previous_page', 'sales'])

In [49]:
#discovering the max page and where we are in navigation of the pages:
print('max_page: %s' % data['payload']['max_page'])
print('next_page: %s' % data['payload']['next_page'])
print('page: %s' % data['payload']['page'])
print('previous_page: %s' % data['payload']['previous_page'])

max_page: 183
next_page: /api/v1/sales?page=2
page: 1
previous_page: None


In [50]:
base_url

'https://python.zgulde.net'

In [55]:
suffix = '/api/v1/sales'
    
sales_list = []
data = response.json()
n = data['payload']['max_page']

for i in range(1,n+1):
    url = base_url + suffix + '?page=' + str(i)
    response = requests.get(url)
    sales_page = data['payload']['sales']
    sales_list += sales_page

In [56]:
sales = pd.DataFrame(sales_list)

In [57]:
sales.head()

Unnamed: 0,item,sale_amount,sale_date,sale_id,store
0,25,106.0,"Wed, 06 Jul 2016 00:00:00 GMT",445001,4
1,25,111.0,"Thu, 07 Jul 2016 00:00:00 GMT",445002,4
2,25,120.0,"Fri, 08 Jul 2016 00:00:00 GMT",445003,4
3,25,140.0,"Sat, 09 Jul 2016 00:00:00 GMT",445004,4
4,25,152.0,"Sun, 10 Jul 2016 00:00:00 GMT",445005,4


In [58]:
sales.shape

(915000, 5)

In [59]:
sales.to_csv("store_sales.csv")

In [70]:
sales = pd.read_csv('store_sales.csv', index_col=0)
sales.head()

Unnamed: 0,item,sale_amount,sale_date,sale_id,store
0,25,106.0,"Wed, 06 Jul 2016 00:00:00 GMT",445001,4
1,25,111.0,"Thu, 07 Jul 2016 00:00:00 GMT",445002,4
2,25,120.0,"Fri, 08 Jul 2016 00:00:00 GMT",445003,4
3,25,140.0,"Sat, 09 Jul 2016 00:00:00 GMT",445004,4
4,25,152.0,"Sun, 10 Jul 2016 00:00:00 GMT",445005,4


In [63]:
stores.head()

Unnamed: 0,store_address,store_city,store_id,store_state,store_zipcode
0,12125 Alamo Ranch Pkwy,San Antonio,1,TX,78253
1,9255 FM 471 West,San Antonio,2,TX,78251
2,2118 Fredericksburg Rdj,San Antonio,3,TX,78201
3,516 S Flores St,San Antonio,4,TX,78204
4,1520 Austin Hwy,San Antonio,5,TX,78218


In [69]:
items.head()

Unnamed: 0,level_0,index,item_brand,item_id,item_name,item_price,item_upc12,item_upc14
0,0,0.0,Riceland,1,Riceland American Jazmine Rice,0.84,35200264013,35200264013
1,1,1.0,Caress,2,Caress Velvet Bliss Ultra Silkening Beauty Bar...,6.44,11111065925,11111065925
2,2,2.0,Earths Best,3,Earths Best Organic Fruit Yogurt Smoothie Mixe...,2.43,23923330139,23923330139
3,3,3.0,Boars Head,4,Boars Head Sliced White American Cheese - 120 Ct,3.14,208528800007,208528800007
4,4,4.0,Back To Nature,5,Back To Nature Gluten Free White Cheddar Rice ...,2.61,759283100036,759283100036


In [71]:
## now to merge the df's together.  Look at the id's and determine how they should be stitched together. 
# sales.store matches stores.store_id. I think I'd like the row from the store df to add on the row every time it matches the store.store_id to sales.store
# sales.item matches items.item_id. I'd also like the row from the items df to add the entire row every time it matches the items.item_id to sales.item
# item_id is like roles.id (right_on='id') so for us, it would be right_on='item_id', and left_on 'item'
# upon inspection, it appears something may have gone wrong in the acquisition of the sales df.  Notice that all items' value is '25'.  Fix that before merging.