**ETL and APIs - Saving and Using API Credentials:**
https://app.clickup.com/9015081401/v/dc/8cneedt-1515/8cneedt-1535

In [1]:
# Standard Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Additional Imports
# os - for saving and loading files
# json - to work with json files
# math - to round up results
# time - to add a short pause to not overwhelm the server
import os, json, math, time

# to make yelpapi calls
from yelpapi import YelpAPI

# progress bar from tqdm_notebook
from tqdm.notebook import tqdm_notebook

In [None]:
# Install the following libraries if necessary
# !pip install yelpapi
# !pip install tqdm

**Import json library and open the yelp_api.json file to check your client_id and api_key.**

In [2]:
import json
with open('/Users/tspiet/.secret/yelp_api.json') as f: #change the path to match YOUR path!!
    login = json.load(f)
login.keys()

dict_keys(['client-id', 'api-key'])

**Import Yelpapi Libraries and set the timeout in seconds**

In [2]:
from yelpapi import YelpAPI
yelp_api = YelpAPI(login['api-key'], timeout_s=5.0)
yelp_api

<yelpapi.yelpapi.YelpAPI at 0x257c1c99f90>

ETL and APIs - Traversing JSON with Python:
https://app.clickup.com/9015081401/v/dc/8cneedt-1555/8cneedt-1575

**Open the firl in write mode 'w' option**
    "r" = read text,
    "w" = write text,
    "rb" = read binary (like images),
    "wb" = write binary (like images)

Whenever we want to safely read and write to a file without accidentally corrupting it, we will use the **with** statement. 

The **with** statement works like an if statement or a for loop.

In [6]:
# This is the example message we want to save as .txt
message = """My test message to save to file.
It is a multi-line string."""
message

'My test message to save to file.\nIt is a multi-line string.'

In [7]:
# Save it to a filewith open('example_file.txt','w') as file:
with open('example_file.txt','w') as file:
    file.write(message)

In [8]:
with open('example_file.txt','r') as f:
    loaded = f.read()
    
loaded

'My test message to save to file.\nIt is a multi-line string.'

In [9]:
import json

In [10]:
## Saving a dictionary to a JSON file
data = {'stack':4, 'week':2, 
       'things learned':['MySQL','MySQL WorkBench',
                         'GitHub Desktop','Jupyter Notebooks','JSON' ]}
data

{'stack': 4,
 'week': 2,
 'things learned': ['MySQL',
  'MySQL WorkBench',
  'GitHub Desktop',
  'Jupyter Notebooks',
  'JSON']}

In [12]:
## save dict to json file with json.dump
with open('example_saved_data.json','w') as f:
    json.dump(data, f)

In [13]:
## Load saved json file back to dictionary
with open('example_saved_data.json') as f:
    loaded = json.load(f)
loaded

{'stack': 4,
 'week': 2,
 'things learned': ['MySQL',
  'MySQL WorkBench',
  'GitHub Desktop',
  'Jupyter Notebooks',
  'JSON']}

In [14]:
print(type(loaded))
loaded.keys()

<class 'dict'>


dict_keys(['stack', 'week', 'things learned'])

**ETL and APIs - Traversing JSON with Python:**
https://app.clickup.com/9015081401/v/dc/8cneedt-1635/8cneedt-1655

**Questions to Ask**

 - What is the very top level of the JSON data? Is it a list or a dictionary?
      - If it's a dictionary, what are the keys?
      - If it's a list, how long is it?
          - What does the first entry look like?

- Repeat these questions for each level of the dictionary.
   - If the next level is a dictionary, what are its keys?
   - If the next level is a list, how long is the list?
     - What are the items in the list? Integers? Dictionaries?
       - If they're dictionaries, do they seem to have to same keys for each dict in the list?

What is the top level of the JSON data?

In [10]:
# Open the jsonfile and display the first level of keys
# open API results WITH json module
import json
with open('/Users/tspiet/Documents/example_yelp_results.json') as f:   #adjust for your path
    json_file = json.load(f)
    
## What type is top-level of json?
type(json_file)


dict

In [25]:
# what are the keys?
json_file.keys()

dict_keys(['businesses', 'total', 'region'])

What is stored in the second level?

We have three keys to explore.  We will start by exploring the region.

In [13]:
# what type is region?
type(json_file['region'])

dict

Explore 'total'

In [26]:
# what is stored under the "total" key?
type(json_file['total'])

int

In [27]:
## what is the value?
json_file['total']

435

Explore 'businesses'

In [28]:
# what is in the businesses key?
type(json_file['businesses'])

list

In [29]:
# how long is businesses?
len(json_file['businesses'])

20

It looks like there are 20 businesses stored in the list.

In [30]:
# what does the first entry of business look like?
json_file['businesses'][0]

{'id': 'D9A33FM394q99o4QtK5YwA',
 'alias': 'faidleys-seafood-baltimore-3',
 'name': 'Faidleys Seafood',
 'image_url': 'https://s3-media3.fl.yelpcdn.com/bphoto/OTjVDCVS7pGopH6GZcfjBA/o.jpg',
 'is_closed': False,
 'url': 'https://www.yelp.com/biz/faidleys-seafood-baltimore-3?adjust_creative=KJtcufKUS887p24u6rjVIQ&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=KJtcufKUS887p24u6rjVIQ',
 'review_count': 1184,
 'categories': [{'alias': 'seafood', 'title': 'Seafood'},
  {'alias': 'beerbar', 'title': 'Beer Bar'}],
 'rating': 4.0,
 'coordinates': {'latitude': 39.291696, 'longitude': -76.62224},
 'transactions': ['delivery'],
 'price': '$$',
 'location': {'address1': '203 N Paca St',
  'address2': '',
  'address3': '',
  'city': 'Baltimore',
  'zip_code': '21201',
  'country': 'US',
  'state': 'MD',
  'display_address': ['203 N Paca St', 'Baltimore, MD 21201']},
 'phone': '+14107274898',
 'display_phone': '(410) 727-4898',
 'distance': 1349.560720156645}

In [20]:
## what are the keys of the first entry in businesses?
json_file['businesses'][0].keys()

dict_keys(['id', 'alias', 'name', 'image_url', 'is_closed', 'url', 'review_count', 'categories', 'rating', 'coordinates', 'transactions', 'price', 'location', 'phone', 'display_phone', 'distance'])

In [22]:
## what are the keys of the NEXT entry in businesses? Do they match the first?
json_file['businesses'][1].keys()

dict_keys(['id', 'alias', 'name', 'image_url', 'is_closed', 'url', 'review_count', 'categories', 'rating', 'coordinates', 'transactions', 'price', 'location', 'phone', 'display_phone', 'distance'])

**Identifying "records"**

- There is a name for the particular type of json format in which we have a list that contains dictionaries with matching keys for each item.
- We call these "records".
    
It looks like the "businesses" key in our json data is a list of records.
    
- Records can be turned into DataFrames for easier inspection!

In [23]:
import pandas as pd
df_businesses = pd.DataFrame(json_file['businesses'])
df_businesses.head()

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,D9A33FM394q99o4QtK5YwA,faidleys-seafood-baltimore-3,Faidleys Seafood,https://s3-media3.fl.yelpcdn.com/bphoto/OTjVDC...,False,https://www.yelp.com/biz/faidleys-seafood-balt...,1184,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",4.0,"{'latitude': 39.291696, 'longitude': -76.62224}",[delivery],$$,"{'address1': '203 N Paca St', 'address2': '', ...",14107274898,(410) 727-4898,1349.56072
1,u65W69AhbjUlvJJBkEhGNQ,miss-shirleys-cafe-baltimore-9,Miss Shirley's Cafe,https://s3-media4.fl.yelpcdn.com/bphoto/9FsOyV...,False,https://www.yelp.com/biz/miss-shirleys-cafe-ba...,2919,"[{'alias': 'breakfast_brunch', 'title': 'Break...",4.0,"{'latitude': 39.2870995, 'longitude': -76.6053...","[delivery, pickup]",$$,"{'address1': '750 E Pratt St', 'address2': '',...",14105285373,(410) 528-5373,1028.736468
2,ieS_5zqxDHcWMCm8BKUYbg,thames-street-oyster-house-baltimore,Thames Street Oyster House,https://s3-media1.fl.yelpcdn.com/bphoto/9hGjo5...,False,https://www.yelp.com/biz/thames-street-oyster-...,2729,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",4.5,"{'latitude': 39.28214, 'longitude': -76.59162}",[delivery],$$$,"{'address1': '1728 Thames St', 'address2': '',...",14434497726,(443) 449-7726,2090.712792
3,6am8TZAFnvND52MOz-Yctg,mamas-on-the-half-shell-baltimore,Mama's On The Half Shell,https://s3-media2.fl.yelpcdn.com/bphoto/HWY8OF...,False,https://www.yelp.com/biz/mamas-on-the-half-she...,1279,"[{'alias': 'bars', 'title': 'Bars'}, {'alias':...",4.0,"{'latitude': 39.27986, 'longitude': -76.5752399}","[delivery, pickup]",$$,"{'address1': '2901 Odonnell St', 'address2': '...",14102763160,(410) 276-3160,3328.825798
4,ISn7WyGQaFpsVSRSh0NSqg,sal-and-sons-baltimore,Sal and Sons,https://s3-media3.fl.yelpcdn.com/bphoto/LmVL4j...,False,https://www.yelp.com/biz/sal-and-sons-baltimor...,153,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...",4.5,"{'latitude': 39.284, 'longitude': -76.59337}",[delivery],$,"{'address1': '1640 Aliceanna St', 'address2': ...",14106751466,(410) 675-1466,1817.406978


- Notice how cleanly the list of dictionaries was converted into a DataFrame!
  - Even if we didn't save this dataframe as a variable, temporarily converting lists in JSON files to DataFrames can be very helpful for sifting through the file contents!
- Now, the DataFrame definitely is not perfectly formatted. Take note of the "categories", "coordinates", and "location" columns.

In [30]:
df_businesses[['categories','coordinates','location']].head(3)


Unnamed: 0,categories,coordinates,location
0,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...","{'latitude': 39.291696, 'longitude': -76.62224}","{'address1': '203 N Paca St', 'address2': '', ..."
1,"[{'alias': 'breakfast_brunch', 'title': 'Break...","{'latitude': 39.2870995, 'longitude': -76.6053...","{'address1': '750 E Pratt St', 'address2': '',..."
2,"[{'alias': 'seafood', 'title': 'Seafood'}, {'a...","{'latitude': 39.28214, 'longitude': -76.59162}","{'address1': '1728 Thames St', 'address2': '',..."


- These cells actually contain more than 1 piece of information. **They actually contain dictionaries.**
- While making our records into a DataFrame helped us organize and digest the business data, it is not a perfect solution and we may need to do additional cleaning on the columns that are filled with dictionaries or lists.
- We will discuss how to deal with scenarios like this in an upcoming lesson on **Advanced Transformations with Pandas.**

**ETL and APIs - Yelp API Package:**
https://app.clickup.com/9015081401/v/dc/8cneedt-1675/8cneedt-1695

**Constructing an API Call**
- To construct an API call, we must combine the base URL with the Path for the specific endpoint we want to use, as well as any parameters required for that endpoint.
- Click on the blue text for the Path of the endpoint you are using to get detailed information on what parameters the endpoint accepts.

- Endpoint Documentation: https://docs.developer.yelp.com/docs/fusion-intro
- We can see the URL for GETTING results from the API.

**Note:** as data scientists, we will be focusing on Extracting data from an API and will therefore be focusing on GET methods.

- According to the Parameters table, there are many parameters we can specify. The first few are:
  - term: the phrase/food/cuisine to search for
     - For our example, this will be "crab cakes"
  - location: physical location to search (City/zipcode/etc.)
     - For our example, this will be "Baltimore, MD"

**API Calls Using the YelpAPI Package**

- Thankfully, we do not need to construct our API calls manually. There is a Python package for most APIs that we can leverage. For more information on how to construct API calls manually, see the optional "Manually Constructing API Calls " lesson at the end of this week.
- For Yelp, we have the YelpAPI Package (https://github.com/gfairchild/yelpapi)

**Using the code - Example**

from yelpapi import YelpAPI

yelp_api = YelpAPI(api_key, timeout_s=3.0) #You can set timeouts so API calls do not block indefenitely in degraded network conditions

search_results = yelp_api.searh_query(args)

In [34]:
import json
with open('/Users/tspiet/.secret/yelp_api.json') as f:
    login = json.load(f)
login.keys()

dict_keys(['client-id', 'api-key'])

In [35]:
# import the YelpAPI Class
from yelpapi import YelpAPI
# Create an instance with your key
yelp_api = YelpAPI(login['api-key'], timeout_s=5.0)
yelp_api

<yelpapi.yelpapi.YelpAPI at 0x17603bb81f0>

- To use the "businesses search" endpoint,  we will use the yelp_api.search_query method.

  - If we inspect the docstring for the function (either run the help function on it or place your cursor inside the parenthesis for it and hit Shift+Tab), we see that it doesn't tell us very much.

In [36]:
help(yelp_api.search_query)

Help on method search_query in module yelpapi.yelpapi:

search_query(**kwargs) method of yelpapi.yelpapi.YelpAPI instance
    Query the Yelp Search API.
    
    documentation: https://www.yelp.com/developers/documentation/v3/business_search
    
    required parameters:
        * one of either:
            * location - text specifying a location to search for
            * latitude and longitude



In [37]:
# use our yelp_api variable's search_query method to perform our API call
search_results = yelp_api.search_query(location='NY, NY',
                                       term='Pizza')
print(type(search_results))
search_results.keys()

<class 'dict'>


dict_keys(['businesses', 'total', 'region'])

The package returns the results in the **JSON format** we have been exploring. Note that the exact results may vary as the Yelp site is constantly changing.

In [38]:
search_results['total']

12400

In [39]:
biz = pd.DataFrame(search_results['businesses'])
biz.head(2) 

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,ysqgdbSrezXgVwER2kQWKA,julianas-brooklyn-3,Juliana's,https://s3-media2.fl.yelpcdn.com/bphoto/od36nF...,False,https://www.yelp.com/biz/julianas-brooklyn-3?a...,2703,"[{'alias': 'pizza', 'title': 'Pizza'}]",4.5,"{'latitude': 40.70274718768062, 'longitude': -...",[delivery],$$,"{'address1': '19 Old Fulton St', 'address2': '...",17185966700,(718) 596-6700,308.569844
1,zj8Lq1T8KIC5zwFief15jg,prince-street-pizza-new-york-2,Prince Street Pizza,https://s3-media4.fl.yelpcdn.com/bphoto/PfI8oV...,False,https://www.yelp.com/biz/prince-street-pizza-n...,5082,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",4.5,"{'latitude': 40.72308755605564, 'longitude': -...","[pickup, delivery]",$,"{'address1': '27 Prince St', 'address2': None,...",12129664100,(212) 966-4100,1961.877142


In [40]:
## total number of matching businesses
search_results['total']

12400

In [41]:
## how many businesses in our results
len(search_results['businesses'])

20

**The Yelp API will only return a "page" of 20 results at a time. The general term for this is "Pagination**- 
- If we want to get the next page of results, we will perform another API call, but we will add an additional argument called "offset."
  - The offset is what # result to use as the FIRST result for the page.
  - If we had 20 businesses in our first result, we would want to add an offset of 20.

In [42]:
# add offset to our original api call
search_results = yelp_api.search_query(location='NY, NY',
                                       term='Pizza',
                                       offset = 20)

In [43]:
biz20 = pd.DataFrame(search_results['businesses'])
biz20.head(2)

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,vpkTNjq9dRh9rT3Nh0pe-A,nolita-pizza-new-york-2,Nolita Pizza,https://s3-media2.fl.yelpcdn.com/bphoto/_BKVVo...,False,https://www.yelp.com/biz/nolita-pizza-new-york...,226,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",4.5,"{'latitude': 40.7208883, 'longitude': -73.9962...","[delivery, pickup]",$,"{'address1': '68 Kenmare St', 'address2': None...",16468959131,(646) 895-9131,1725.054309
1,2xQmBB6w-W6lxiex80fA9A,luigis-pizzeria-brooklyn-4,Luigi's Pizzeria,https://s3-media4.fl.yelpcdn.com/bphoto/j8wXRU...,False,https://www.yelp.com/biz/luigis-pizzeria-brook...,288,"[{'alias': 'pizza', 'title': 'Pizza'}]",4.5,"{'latitude': 40.6897, 'longitude': -73.965369}",[delivery],$,"{'address1': '326 Dekalb Ave', 'address2': '',...",17187832430,(718) 783-2430,3001.831815


**You should have different results than your original call when you make the offset call.**

Now you can combine the results (so far) into one dataframe using pd.concat()

In [44]:
## concatenate the previous results and new results. 
businesses = pd.concat([biz, biz20],
                      ignore_index=True)
display(businesses.head(3), businesses.tail(3))

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,ysqgdbSrezXgVwER2kQWKA,julianas-brooklyn-3,Juliana's,https://s3-media2.fl.yelpcdn.com/bphoto/od36nF...,False,https://www.yelp.com/biz/julianas-brooklyn-3?a...,2703,"[{'alias': 'pizza', 'title': 'Pizza'}]",4.5,"{'latitude': 40.70274718768062, 'longitude': -...",[delivery],$$,"{'address1': '19 Old Fulton St', 'address2': '...",17185966700,(718) 596-6700,308.569844
1,zj8Lq1T8KIC5zwFief15jg,prince-street-pizza-new-york-2,Prince Street Pizza,https://s3-media4.fl.yelpcdn.com/bphoto/PfI8oV...,False,https://www.yelp.com/biz/prince-street-pizza-n...,5082,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",4.5,"{'latitude': 40.72308755605564, 'longitude': -...","[pickup, delivery]",$,"{'address1': '27 Prince St', 'address2': None,...",12129664100,(212) 966-4100,1961.877142
2,WG639VkTjmK5dzydd1BBJA,rubirosa-new-york-2,Rubirosa,https://s3-media3.fl.yelpcdn.com/bphoto/l0Phrn...,False,https://www.yelp.com/biz/rubirosa-new-york-2?a...,3193,"[{'alias': 'italian', 'title': 'Italian'}, {'a...",4.5,"{'latitude': 40.722766, 'longitude': -73.996233}",[pickup],$$,"{'address1': '235 Mulberry St', 'address2': ''...",12129650500,(212) 965-0500,1932.94677


Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
37,yH6t1JWcwkdWG4CS5KPvow,adriennes-pizzabar-new-york,Adrienne's Pizzabar,https://s3-media3.fl.yelpcdn.com/bphoto/YePG9J...,False,https://www.yelp.com/biz/adriennes-pizzabar-ne...,1850,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",4.0,"{'latitude': 40.70424819999999, 'longitude': -...","[delivery, pickup]",$$,"{'address1': '54 Stone St', 'address2': None, ...",12122483838,(212) 248-3838,1340.117592
38,l6yVO8l8E5XI9ArgOy5rgw,ltd-pizza-and-bar-new-york,LTD Pizza and Bar,https://s3-media2.fl.yelpcdn.com/bphoto/Lt0xvq...,False,https://www.yelp.com/biz/ltd-pizza-and-bar-new...,32,"[{'alias': 'pizza', 'title': 'Pizza'}, {'alias...",4.5,"{'latitude': 40.724242, 'longitude': -74.00802}","[delivery, pickup, restaurant_reservation]",,"{'address1': '225 Hudson St', 'address2': None...",12124191618,(212) 419-1618,2391.701096
39,MwcRbM6lS6_8N67LmyBELA,99-cent-fresh-pizza-new-york-5,99 Cent Fresh Pizza,https://s3-media1.fl.yelpcdn.com/bphoto/6v8b4O...,False,https://www.yelp.com/biz/99-cent-fresh-pizza-n...,713,"[{'alias': 'pizza', 'title': 'Pizza'}]",4.0,"{'latitude': 40.76458, 'longitude': -73.98247}","[delivery, pickup]",$,"{'address1': '1723 Broadway', 'address2': '', ...",12122452155,(212) 245-2155,6652.683138


**Summary**
- Making API calls requires our API Key and knowledge of the endpoints we want to use.
- Reading the documentation is particularly important for using APIs.
- The output of your **API call may be broken into bite-sized chunks, known as pagination**. You will need to use offset to access all of the relevant information.
- The results can then be concatenated into a single data frame.