# Part 1 - Extracting and Saving Data from Yelp API

## Obective

- For this CodeAlong, we will be working with the Yelp API. 
- You will use the the Yelp API to search your home town for a cuisine type of your choice.
- Next class, we will then use Plotly Express to create a map with the Mapbox API to visualize the results.
    
    

## Tools You Will Use
- Part 1:
    - Yelp API:
        - Getting Started: 
            - https://www.yelp.com/developers/documentation/v3/get_started

    - `YelpAPI` python package
        -  "YelpAPI": https://github.com/gfairchild/yelpapi
- Part 2:

    - Plotly Express: https://plotly.com/python/getting-started/
        - With Mapbox API: https://www.mapbox.com/
        - `px.scatter_mapbox` [Documentation](https://plotly.com/python/scattermapbox/): 




### Applying Code From
- Efficient API Calls Lesson Link: https://login.codingdojo.com/m/376/12529/88078

In [1]:
# Standard Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Additional Imports
# os - for saving and loading files
# json - to work with json files
# math - to round up results
# time - to add a short pause to not overwhelm the server
import os, json, math, time

# to make yelpapi calls
from yelpapi import YelpAPI

# progress bar from tqdm_notebook
from tqdm.notebook import tqdm_notebook

In [2]:
!pip install yelpapi
!pip install tqdm



## 1. Registering for Required APIs


- Yelp: https://www.yelp.com/developers/documentation/v3/get_started


> Check the official API documentation to know what arguments we can search for: https://www.yelp.com/developers/documentation/v3/business_search

### Load Credentials and Create Yelp API Object

In [3]:
# Load API Credentials
with open('/Users/sherlin01/.secret/yelp_api.json', 'r') as f:
    login = json.load(f)

In [4]:
login.keys()

dict_keys(['Client-ID', 'API Key'])

In [5]:
#login.items()
#login['API Key']

In [6]:
# Instantiate YelpAPI Variable
yelp = YelpAPI(login['API Key'], timeout_s = 5.0)

### Define Search Terms and File Paths

In [7]:
# set our API call parameters and filename before the first call
location = 'Fresno, TX 77545'
term = 'barbecue'

In [8]:
location.split(',')[0]

'Fresno'

In [9]:
## Specify folder for saving data
FOLDER = 'Data/'

os.makedirs(FOLDER, exist_ok = True)
# Specifying JSON_FILE filename (can include a folder)
JSON_FILE = FOLDER+f"{location.split(',')[0]}-{term}.json"

In [10]:
JSON_FILE

'Data/Fresno-barbecue.json'

### Check if Json File exists and Create it if it doesn't

In [11]:
## Check if JSON_FILE exists
file_exists = os.path.isfile(JSON_FILE)

## If it does not exist: 
if file_exists == False:    
    ## CREATE ANY NEEDED FOLDERS
    # Get the Folder Name only
    folder = os.path.dirname(JSON_FILE)
    
    ## If JSON folder name is not empty:
    if len(folder)>0:
        # create the folder
        os.makedirs(folder, exist_ok = True)
        
        
    ## INFORM USER AND SAVE EMPTY LIST
    print(f"[i] {JSON_FILE} not found. Saving empty list to file.")

    
    ## save the first page of results
    with open(JSON_FILE, 'w') as f:
          json.dump([], f)
        
## If it exists, inform user
else:
    print(f"[i] {JSON_FILE} already exists.")

[i] Data/Fresno-barbecue.json already exists.


In [12]:
os.path.isfile(JSON_FILE)

True

### Load JSON FIle and account for previous results

### Make the first API call to get the first page of data

- We will use this first result to check:
    - how many total results there are?
    - Where is the actual data we want to save?
    - how many results do we get at a time?


In [13]:
# use our yelp_api variable's search_query method to perform our API call
results = yelp.search_query(term = term, location = location)

In [14]:
type(results)

dict

In [15]:
len(results)

3

In [16]:
results.keys()

dict_keys(['businesses', 'total', 'region'])

In [17]:
results['total']

187

In [18]:
results['region']

{'center': {'longitude': -95.46981811523438, 'latitude': 29.537609279094596}}

In [19]:
results['businesses']

[{'id': 'chuj4i5TRnFdFS0z_vAU9w',
  'alias': 'boutte-s-bbq-fresno-2',
  'name': 'Boutte’s BBQ',
  'image_url': 'https://s3-media2.fl.yelpcdn.com/bphoto/YjCrPAy8z1K_YiUs3CZ8sw/o.jpg',
  'is_closed': False,
  'url': 'https://www.yelp.com/biz/boutte-s-bbq-fresno-2?adjust_creative=rZiFgfUX1eajVOVygdFykg&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=rZiFgfUX1eajVOVygdFykg',
  'review_count': 9,
  'categories': [{'alias': 'foodtrucks', 'title': 'Food Trucks'},
   {'alias': 'bbq', 'title': 'Barbeque'}],
  'rating': 5.0,
  'coordinates': {'latitude': 29.53606, 'longitude': -95.47645},
  'transactions': [],
  'location': {'address1': '1514 Trammel Fresno Rd',
   'address2': None,
   'address3': '',
   'city': 'Fresno',
   'zip_code': '77545',
   'country': 'US',
   'state': 'TX',
   'display_address': ['1514 Trammel Fresno Rd', 'Fresno, TX 77545']},
  'phone': '+17134692669',
  'display_phone': '(713) 469-2669',
  'distance': 660.1141558818418},
 {'id': '2Gun8_aVi3zpEj12

In [20]:
## How many results total?
pd.DataFrame(results['businesses'])

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,location,phone,display_phone,distance,price
0,chuj4i5TRnFdFS0z_vAU9w,boutte-s-bbq-fresno-2,Boutte’s BBQ,https://s3-media2.fl.yelpcdn.com/bphoto/YjCrPA...,False,https://www.yelp.com/biz/boutte-s-bbq-fresno-2...,9,"[{'alias': 'foodtrucks', 'title': 'Food Trucks...",5.0,"{'latitude': 29.53606, 'longitude': -95.47645}",[],"{'address1': '1514 Trammel Fresno Rd', 'addres...",17134692669,(713) 469-2669,660.114156,
1,2Gun8_aVi3zpEj12VFf6YA,papa-nick-s-fresno,Papa Nick’s,,False,https://www.yelp.com/biz/papa-nick-s-fresno?ad...,1,"[{'alias': 'bbq', 'title': 'Barbeque'}, {'alia...",5.0,"{'latitude': 29.53606, 'longitude': -95.47645}",[],"{'address1': '1514 B Trammel', 'address2': '',...",18329160227,(832) 916-0227,660.3751,
2,CJcm3JmMQhDUF0wclPu2AQ,slow-smoked-meats-missouri-city,Slow Smoked Meats,https://s3-media2.fl.yelpcdn.com/bphoto/-Vzlgj...,False,https://www.yelp.com/biz/slow-smoked-meats-mis...,1,"[{'alias': 'catering', 'title': 'Caterers'}, {...",5.0,"{'latitude': 29.60216, 'longitude': -95.50744}",[],"{'address1': None, 'address2': None, 'address3...",13463101372,(346) 310-1372,7320.022096,
3,HC_v0TxyQyvHt595crwQZw,the-greatest-bbq-missouri-city,The Greatest BBQ,https://s3-media3.fl.yelpcdn.com/bphoto/5Af1lH...,False,https://www.yelp.com/biz/the-greatest-bbq-miss...,17,"[{'alias': 'bbq', 'title': 'Barbeque'}]",4.0,"{'latitude': 29.5943214, 'longitude': -95.5268...","[pickup, delivery]","{'address1': '2304-2428 Texas Pkwy', 'address2...",12812612264,(281) 261-2264,8376.697777,
4,4-TxcY8ZgNquIqMWuWFrCg,seven-seeds-texas-barbecue-rosharon-2,Seven Seeds Texas Barbecue,https://s3-media1.fl.yelpcdn.com/bphoto/9odk3x...,False,https://www.yelp.com/biz/seven-seeds-texas-bar...,22,"[{'alias': 'bbq', 'title': 'Barbeque'}, {'alia...",4.0,"{'latitude': 29.5429847, 'longitude': -95.4174...","[pickup, delivery]","{'address1': '3040 Cr 48', 'address2': '', 'ad...",12816150496,(281) 615-0496,5096.971199,
5,jx7VXB5oUE1RnSKs4zFUHw,big-horn-bbq-pearland,Big Horn BBQ,https://s3-media3.fl.yelpcdn.com/bphoto/SAvHCd...,False,https://www.yelp.com/biz/big-horn-bbq-pearland...,337,"[{'alias': 'bbq', 'title': 'Barbeque'}, {'alia...",3.0,"{'latitude': 29.5692407905726, 'longitude': -9...",[delivery],"{'address1': '2300 Smith Ranch Rd', 'address2'...",12817413289,(281) 741-3289,8888.390692,$$
6,uHF2wLqsesoVqh0lyHXUBw,texas-biergarten-missouri-city,Texas Biergarten,https://s3-media4.fl.yelpcdn.com/bphoto/Fgy60W...,False,https://www.yelp.com/biz/texas-biergarten-miss...,286,"[{'alias': 'german', 'title': 'German'}, {'ali...",4.5,"{'latitude': 29.562035, 'longitude': -95.562823}",[delivery],"{'address1': '6302 Hwy 6', 'address2': 'Ste Q'...",12817780030,(281) 778-0030,9400.384008,$$
7,ihHp3_rxOrx-HTp1kg7z0w,remos-cafe-houston,ReMo's Cafe,https://s3-media1.fl.yelpcdn.com/bphoto/Xf31sy...,False,https://www.yelp.com/biz/remos-cafe-houston?ad...,81,"[{'alias': 'bbq', 'title': 'Barbeque'}, {'alia...",4.5,"{'latitude': 29.621766, 'longitude': -95.508692}","[pickup, delivery]","{'address1': '8420 S Sam Houston Pkwy W', 'add...",18326995800,(832) 699-5800,10082.899511,
8,3G9HjbFXeQCGba7nGUPqoA,skeets-barbeque-pearland,Skeets Barbeque,https://s3-media3.fl.yelpcdn.com/bphoto/bw23Hh...,False,https://www.yelp.com/biz/skeets-barbeque-pearl...,132,"[{'alias': 'bbq', 'title': 'Barbeque'}, {'alia...",3.0,"{'latitude': 29.55443126406246, 'longitude': -...",[delivery],"{'address1': '10228 Broadway St', 'address2': ...",17134360012,(713) 436-0012,9056.534922,$$
9,tf-LwW5wcyCEGw4mIWEhrw,boogies-chicago-style-bbq-missouri-city-2,Boogie's Chicago Style BBQ,https://s3-media3.fl.yelpcdn.com/bphoto/J-Yg6H...,False,https://www.yelp.com/biz/boogies-chicago-style...,139,"[{'alias': 'bbq', 'title': 'Barbeque'}, {'alia...",3.5,"{'latitude': 29.600604, 'longitude': -95.5259907}","[pickup, delivery]","{'address1': '1767 Texas Pkwy', 'address2': No...",12819698626,(281) 969-8626,8864.474896,$


- Where is the actual data we want to save?

In [21]:
## How many did we get the details for?
results_per_page = len(results['businesses'])
results_per_page

20

- Calculate how many pages of results needed to cover the total_results

In [22]:
(results['total'])/ results_per_page

9.35

In [23]:
# Use math.ceil to round up for the total number of pages of results.
n_pages = math.ceil((results['total'])/ results_per_page)
n_pages

10

In [24]:
for i in tqdm_notebook(range(1,n_pages+1)):
    ## The block of code we want to TRY to run
    try:
        
        time.sleep(.2)
        
        ## Read in results in progress file and check the length
        with open(JSON_FILE, 'r') as f:
            previous_results = json.load(f)
        
        ## save number of results to use as offset
        n_results = len(previous_results)
        
        
        ## use n_results as the OFFSET 
        results = yelp.search_query(location = location, term = term,
                                   offset = n_results+1)

        ## append new results and save to file
        previous_results.extend(results['businesses'])
        
        with open(JSON_FILE, 'w') as f:
            json.dump(previous_results, f)

            
    ## What to do if we get an error/exception.
    except Exception as e:
        print(' [!] ERROR', e)


  0%|          | 0/10 [00:00<?, ?it/s]

## Open the Final JSON File with Pandas

In [25]:
df = pd.read_json(JSON_FILE)

In [26]:
df.head()

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,location,phone,display_phone,distance,price
0,2Gun8_aVi3zpEj12VFf6YA,papa-nick-s-fresno,Papa Nick’s,,False,https://www.yelp.com/biz/papa-nick-s-fresno?ad...,1,"[{'alias': 'bbq', 'title': 'Barbeque'}, {'alia...",5.0,"{'latitude': 29.53606, 'longitude': -95.47645}",[],"{'address1': '1514 B Trammel', 'address2': '',...",18329160227,(832) 916-0227,660.3751,
1,HC_v0TxyQyvHt595crwQZw,the-greatest-bbq-missouri-city,The Greatest BBQ,https://s3-media3.fl.yelpcdn.com/bphoto/5Af1lH...,False,https://www.yelp.com/biz/the-greatest-bbq-miss...,17,"[{'alias': 'bbq', 'title': 'Barbeque'}]",4.0,"{'latitude': 29.5943214, 'longitude': -95.5268...","[delivery, pickup]","{'address1': '2304-2428 Texas Pkwy', 'address2...",12812612264,(281) 261-2264,8376.697777,
2,CJcm3JmMQhDUF0wclPu2AQ,slow-smoked-meats-missouri-city,Slow Smoked Meats,https://s3-media2.fl.yelpcdn.com/bphoto/-Vzlgj...,False,https://www.yelp.com/biz/slow-smoked-meats-mis...,1,"[{'alias': 'catering', 'title': 'Caterers'}, {...",5.0,"{'latitude': 29.60216, 'longitude': -95.50744}",[],"{'address1': None, 'address2': None, 'address3...",13463101372,(346) 310-1372,7320.022096,
3,4-TxcY8ZgNquIqMWuWFrCg,seven-seeds-texas-barbecue-rosharon-2,Seven Seeds Texas Barbecue,https://s3-media1.fl.yelpcdn.com/bphoto/9odk3x...,False,https://www.yelp.com/biz/seven-seeds-texas-bar...,21,"[{'alias': 'bbq', 'title': 'Barbeque'}, {'alia...",4.0,"{'latitude': 29.5429847, 'longitude': -95.4174...","[delivery, pickup]","{'address1': '3040 Cr 48', 'address2': '', 'ad...",12816150496,(281) 615-0496,5096.971199,
4,jx7VXB5oUE1RnSKs4zFUHw,big-horn-bbq-pearland,Big Horn BBQ,https://s3-media1.fl.yelpcdn.com/bphoto/TjNvun...,False,https://www.yelp.com/biz/big-horn-bbq-pearland...,335,"[{'alias': 'bbq', 'title': 'Barbeque'}, {'alia...",3.0,"{'latitude': 29.5692407905726, 'longitude': -9...",[delivery],"{'address1': '2300 Smith Ranch Rd', 'address2'...",12817413289,(281) 741-3289,8888.390692,$$


In [27]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 189 entries, 0 to 188
Data columns (total 16 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             189 non-null    object 
 1   alias          189 non-null    object 
 2   name           189 non-null    object 
 3   image_url      189 non-null    object 
 4   is_closed      189 non-null    bool   
 5   url            189 non-null    object 
 6   review_count   189 non-null    int64  
 7   categories     189 non-null    object 
 8   rating         189 non-null    float64
 9   coordinates    189 non-null    object 
 10  transactions   189 non-null    object 
 11  location       189 non-null    object 
 12  phone          189 non-null    object 
 13  display_phone  189 non-null    object 
 14  distance       189 non-null    float64
 15  price          149 non-null    object 
dtypes: bool(1), float64(2), int64(1), object(12)
memory usage: 22.5+ KB


In [28]:
## convert the filename to a .csv.gz
csv_file = JSON_FILE.replace('.json','.csv.gz')
csv_file

'Data/Fresno-barbecue.csv.gz'

In [29]:
## Save it as a compressed csv (to save space)
df.to_csv(csv_file, compression = 'gzip', index = False)

## Bonus: compare filesize with os module's `os.path.getsize`

In [30]:
size_json = os.path.getsize(JSON_FILE)
size_csv_gz = os.path.getsize(JSON_FILE.replace('.json','.csv.gz'))

print(f'JSON FILE: {size_json:,} Bytes')
print(f'CSV.GZ FILE: {size_csv_gz:,} Bytes')

print(f'the csv.gz is {size_json/size_csv_gz} times smaller!')

JSON FILE: 186,792 Bytes
CSV.GZ FILE: 26,102 Bytes
the csv.gz is 7.156233238832273 times smaller!


## Next Class: Processing the Results and Mapping 