# Time Series: Data Acquisition Exercises

<hr style="border:2px solid blue"> </hr>


### 1. Using the code from the lesson as a guide and the REST API from https://python.zgulde.net/api/v1/items as we did in the lesson, create a dataframe named items that has all of the data for items.

In [1]:
# Data Science Libraries
import pandas as pd

# New libraries for this lesson
import requests

In [2]:
# Requesting the data from the website
response = requests.get('https://python.zgulde.net/api/v1/items')

# Inspecting data received 
data = response.json()
data['payload']['next_page']

'/api/v1/items?page=2'

Since the exercises wants "all of the data for items", with the keyword being "all"; I will need to grab all the items from the other pages as well.

In [3]:
# Starting the dataframe with the data I have so far
items = pd.DataFrame(data['payload']['items'])

# What is the shape?
items.shape

(20, 6)

In [4]:
# What page is next?
data['payload']['next_page']

'/api/v1/items?page=2'

In [5]:
# What does my dataframe look like so far?
items.head()

Unnamed: 0,item_brand,item_id,item_name,item_price,item_upc12,item_upc14
0,Riceland,1,Riceland American Jazmine Rice,0.84,35200264013,35200264013
1,Caress,2,Caress Velvet Bliss Ultra Silkening Beauty Bar...,6.44,11111065925,11111065925
2,Earths Best,3,Earths Best Organic Fruit Yogurt Smoothie Mixe...,2.43,23923330139,23923330139
3,Boars Head,4,Boars Head Sliced White American Cheese - 120 Ct,3.14,208528800007,208528800007
4,Back To Nature,5,Back To Nature Gluten Free White Cheddar Rice ...,2.61,759283100036,759283100036


In [7]:
# Requesting information on next_page (page 2)
response = requests.get('https://python.zgulde.net' + data['payload']['next_page'])
data = response.json()

# Inspecting pages
print('max_page: %s' % data['payload']['max_page'])
print('next_page: %s' % data['payload']['next_page'])

# Concatenating page 2 items to page 1 items
items = pd.concat([items, pd.DataFrame(data['payload']['items'])]).reset_index()

# Shape of dataframe
print(items.shape)

# Looking at dataframe thus far
items.head()

max_page: 3
next_page: /api/v1/items?page=3
(40, 7)


Unnamed: 0,index,item_brand,item_id,item_name,item_price,item_upc12,item_upc14
0,0,Riceland,1,Riceland American Jazmine Rice,0.84,35200264013,35200264013
1,1,Caress,2,Caress Velvet Bliss Ultra Silkening Beauty Bar...,6.44,11111065925,11111065925
2,2,Earths Best,3,Earths Best Organic Fruit Yogurt Smoothie Mixe...,2.43,23923330139,23923330139
3,3,Boars Head,4,Boars Head Sliced White American Cheese - 120 Ct,3.14,208528800007,208528800007
4,4,Back To Nature,5,Back To Nature Gluten Free White Cheddar Rice ...,2.61,759283100036,759283100036


In [31]:
# Requesting information on next_page (page 3)
response = requests.get('https://python.zgulde.net' + data['payload']['next_page'])
data = response.json()

# Inspecting pages
print('max_page: %s' % data['payload']['max_page'])
print('next_page: %s' % data['payload']['next_page'])

# Concatenating page 3 items to pages 1 & 2 items
items = pd.concat([items, pd.DataFrame(data['payload']['items'])]).reset_index()

# Shape of dataframe
print(items.shape)

# Looking at dataframe thus far
items.head()

TypeError: can only concatenate str (not "NoneType") to str

In [9]:

items.shape

(50, 8)

In [53]:
def grab_this(endpoint):
    # Requesting the data from the website
    response = requests.get(f'https://python.zgulde.net/api/v1/{endpoint}')
    
    # Storing data received from request
    data = response.json()
    
    # Creating initial dataframe
    df = pd.DataFrame(data['payload'][endpoint])
    
    # Creating page variable to be checked in while loop
    next_page = data['payload']['next_page']

    
    # Looping through remaining pages
    # First check to make sure there is a next page
    while data['payload']['next_page'] is not None:
        
        # Requesting information on next_page 
        response = requests.get('https://python.zgulde.net' + data['payload']['next_page'])
        data = response.json()

        # Assigning next next_page
        next_page = data['payload']['next_page']

        # Concatenating new page to dataframe
        df = pd.concat([df, pd.DataFrame(data['payload'][endpoint])]).reset_index(drop=True)

    return df


In [54]:
# Testing my new function
test = grab_this('items')
test.shape

(50, 6)

In [55]:
# Success!
test.head()

Unnamed: 0,item_brand,item_id,item_name,item_price,item_upc12,item_upc14
0,Riceland,1,Riceland American Jazmine Rice,0.84,35200264013,35200264013
1,Caress,2,Caress Velvet Bliss Ultra Silkening Beauty Bar...,6.44,11111065925,11111065925
2,Earths Best,3,Earths Best Organic Fruit Yogurt Smoothie Mixe...,2.43,23923330139,23923330139
3,Boars Head,4,Boars Head Sliced White American Cheese - 120 Ct,3.14,208528800007,208528800007
4,Back To Nature,5,Back To Nature Gluten Free White Cheddar Rice ...,2.61,759283100036,759283100036


### 2. Do the same thing, but for stores (https://python.zgulde.net/api/v1/stores)



In [56]:
stores = grab_this('stores')
stores.shape

(10, 5)

### 3. Extract the data for sales (https://python.zgulde.net/api/v1/sales). There are a lot of pages of data here, so your code will need to be a little more complex. Your code should continue fetching data from the next page until all of the data is extracted.

In [57]:
sales = grab_this('sales')
sales.shape

(913000, 5)

### 4. Save the data in your files to local csv files so that it will be faster to access in the future.



### 5. Combine the data from your three separate dataframes into one large dataframe.



### 6. Acquire the Open Power Systems Data for Germany, which has been rapidly expanding its renewable energy production in recent years. The data set includes country-wide totals of electricity consumption, wind power production, and solar power production for 2006-2017. You can get the data here: https://raw.githubusercontent.com/jenfly/opsd/master/opsd_germany_daily.csv

### 7.Make sure all the work that you have done above is reproducible. That is, you should put the code above into separate functions in the acquire.py file and be able to re-run the functions and get the same data.

913000 rows and ~14 columns