# Part 1 - Extracting and Saving Data from Yelp API

## Obective

- For this CodeAlong, we will be working with the Yelp API. 
- You will use the the Yelp API to search your home town for a cuisine type of your choice.
- Next class, we will then use Plotly Express to create a map with the Mapbox API to visualize the results.
    
    

## Tools You Will Use
- Part 1:
    - Yelp API:
        - Getting Started: 
            - https://www.yelp.com/developers/documentation/v3/get_started

    - `YelpAPI` python package
        -  "YelpAPI": https://github.com/gfairchild/yelpapi
- Part 2:

    - Plotly Express: https://plotly.com/python/getting-started/
        - With Mapbox API: https://www.mapbox.com/
        - `px.scatter_mapbox` [Documentation](https://plotly.com/python/scattermapbox/): 




### Applying Code From
- Efficient API Calls Lesson Link: https://login.codingdojo.com/m/376/12529/88078

In [154]:
!pip install tqdm



In [155]:
# Standard Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Additional Imports
import os, json, math, time
from yelpapi import YelpAPI
from tqdm.notebook import tqdm_notebook

## 1. Registering for Required APIs


- Yelp: https://www.yelp.com/developers/documentation/v3/get_started


> Check the official API documentation to know what arguments we can search for: https://www.yelp.com/developers/documentation/v3/business_search

### Load Credentials and Create Yelp API Object

In [156]:
# Load API Credentials
with open('/Users/LP-Ca/.secret/yelp_api.json') as f:
    creds = json.load(f)
creds.keys()

dict_keys(['client id', 'API key'])

In [157]:
# Instantiate YelpAPI Variable
yelp = YelpAPI(creds['API key'])
yelp

<yelpapi.yelpapi.YelpAPI at 0x199eeb13b88>

### Define Search Terms and File Paths

In [158]:
# set our API call parameters and filename before the first call
LOCATION = "Pawcatuck, CT 06379"
TERM = "Burgers"

In [181]:
## Specify folder for saving data
FOLDER = "Data/"
os.makedirs(FOLDER, exist_ok=True)
# Specifying JSON_FILE filename (can include a folder)
JSON_FILE = FOLDER + f"{LOCATION.split(',')[0]}-{TERM}.json"
JSON_FILE

'Data/Pawcatuck-Burgers.json'

### Check if Json File exists and Create it if it doesn't

### Make the first API call to get the first page of data

- We will use this first result to check:
    - how many total results there are?
    - Where is the actual data we want to save?
    - how many results do we get at a time?


In [160]:
# use our yelp_api variable's search_query method to perform our API call
results = yelp.search_query(term=TERM, location=LOCATION)
type(results)

dict

In [161]:
results.keys()

dict_keys(['businesses', 'total', 'region'])

In [162]:
 type(results['businesses'])

list

In [163]:
len(results['businesses'])

20

In [164]:
results['total']

29

In [165]:
pd.DataFrame(results['businesses'])

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,sODdKPCWnkYxQ9BxP0aNGw,b-and-b-dockside-westerly-2,B & B Dockside,https://s3-media1.fl.yelpcdn.com/bphoto/jpeD0p...,False,https://www.yelp.com/biz/b-and-b-dockside-west...,136,"[{'alias': 'burgers', 'title': 'Burgers'}, {'a...",4.5,"{'latitude': 41.3670539855957, 'longitude': -7...",[delivery],$$,"{'address1': '19 Margin St', 'address2': '', '...",14013152520,(401) 315-2520,1722.989668
1,SEUBkQCAhdkKRUjXUdVO2w,graze-burgers-westerly,Graze Burgers,https://s3-media3.fl.yelpcdn.com/bphoto/1XCzJ6...,False,https://www.yelp.com/biz/graze-burgers-westerl...,111,"[{'alias': 'burgers', 'title': 'Burgers'}, {'a...",3.5,"{'latitude': 41.37068576556191, 'longitude': -...",[delivery],,"{'address1': '127 Granite St', 'address2': Non...",14019928223,(401) 992-8223,2575.705237
2,ydxhFI0FRgLjhex8pNC5Iw,the-malted-barley-westerly,The Malted Barley,https://s3-media1.fl.yelpcdn.com/bphoto/00N8ZI...,False,https://www.yelp.com/biz/the-malted-barley-wes...,299,"[{'alias': 'bars', 'title': 'Bars'}, {'alias':...",4.5,"{'latitude': 41.3788, 'longitude': -71.8303}","[delivery, pickup]",$$,"{'address1': '42 High St', 'address2': None, '...",14013152184,(401) 315-2184,1645.153838
3,8gXaeFOak_To2t2xvtWUQw,the-shallows-westerly,The Shallows,https://s3-media3.fl.yelpcdn.com/bphoto/WzE-Br...,False,https://www.yelp.com/biz/the-shallows-westerly...,58,"[{'alias': 'newamerican', 'title': 'American (...",4.5,"{'latitude': 41.379296, 'longitude': -71.830527}",[delivery],$$,"{'address1': '54 High St', 'address2': '', 'ad...",14013155737,(401) 315-5737,1631.203716
4,AxGW4itHJXGh8oqZVsjRug,bridge-restaurant-westerly,Bridge Restaurant [Raw Bar] River Patio,https://s3-media1.fl.yelpcdn.com/bphoto/6bCQn4...,False,https://www.yelp.com/biz/bridge-restaurant-wes...,385,"[{'alias': 'newamerican', 'title': 'American (...",3.5,"{'latitude': 41.376973, 'longitude': -71.831896}",[delivery],$$,"{'address1': '37 Main St', 'address2': '', 'ad...",14013489700,(401) 348-9700,1502.652791
5,QiQbsj7umDOd_jXgGYmLIQ,surf-cantina-westerly,Surf cantina,https://s3-media2.fl.yelpcdn.com/bphoto/DJ3JVy...,False,https://www.yelp.com/biz/surf-cantina-westerly...,34,"[{'alias': 'tacos', 'title': 'Tacos'}, {'alias...",4.0,"{'latitude': 41.38028082416941, 'longitude': -...",[],$$,"{'address1': '15 Canal St', 'address2': '', 'a...",14013888626,(401) 388-8626,1696.661533
6,Ck4wvlxKt0lF4-6SyP0SpA,back-track-bar-and-grill-westerly,Back Track Bar & Grill,https://s3-media3.fl.yelpcdn.com/bphoto/7s2oD2...,False,https://www.yelp.com/biz/back-track-bar-and-gr...,4,"[{'alias': 'breakfast_brunch', 'title': 'Break...",4.5,"{'latitude': 41.38224759078481, 'longitude': -...",[delivery],,"{'address1': '13 Industrial Dr', 'address2': N...",14013480150,(401) 348-0150,1681.655927
7,9yXbtKS_pszWG4SnKZpxnw,mias-cafe-pawcatuck-5,Mia's Cafe,https://s3-media2.fl.yelpcdn.com/bphoto/NklrHg...,False,https://www.yelp.com/biz/mias-cafe-pawcatuck-5...,182,"[{'alias': 'breakfast_brunch', 'title': 'Break...",4.0,"{'latitude': 41.3774172, 'longitude': -71.8331...","[delivery, pickup]",$$,"{'address1': '1 W Broad St', 'address2': '', '...",18605993840,(860) 599-3840,1398.69461
8,_qQD9hSPFhTgzRd4X0Kvhg,cc-o-briens-sports-cafe-pawcatuck,CC O'Brien's Sports Cafe,https://s3-media2.fl.yelpcdn.com/bphoto/GHM1xT...,False,https://www.yelp.com/biz/cc-o-briens-sports-ca...,50,"[{'alias': 'newamerican', 'title': 'American (...",4.0,"{'latitude': 41.3768699, 'longitude': -71.83311}",[delivery],$$,"{'address1': '8 Mechanic St', 'address2': '', ...",18605992034,(860) 599-2034,1400.935649
9,2mRTVwcBuwKsXASmL4kokw,brazen-hen-westerly,Brazen Hen,https://s3-media3.fl.yelpcdn.com/bphoto/PzstG2...,False,https://www.yelp.com/biz/brazen-hen-westerly?a...,110,"[{'alias': 'irish', 'title': 'Irish'}, {'alias...",3.5,"{'latitude': 41.37968, 'longitude': -71.83035}","[delivery, pickup]",$$,"{'address1': '4 Canal St', 'address2': '', 'ad...",14013488100,(401) 348-8100,1677.713257


In [166]:
os.path.isfile(JSON_FILE)

True

In [167]:
## Check if JSON_FILE exists
if os.path.isfile(JSON_FILE)==False:
## If it does not exist: 
    ## INFORM USER AND SAVE EMPTY LIST
    print("The file does not exist. Creating empty file.")
       
    ## save the first page of results
    with open(JSON_FILE, 'w') as f:
        json.dump(results['businesses'], f)
        
## If it exists, inform user
else:
    print('File already exists.')

File already exists.


### Load JSON FIle and account for previous results

In [182]:
## Load previous results and use len of results for offset
prev_df = pd.read_json(JSON_FILE)
prev_df
## set offset based on previous results


ValueError: Expected object or value

### Make the first API call to get the first page of data

- We will use this first result to check:
    - how many total results there are?
    - Where is the actual data we want to save?
    - how many results do we get at a time?


In [169]:
# use our yelp_api variable's search_query method to perform our API call


In [170]:
## How many results total?


- Where is the actual data we want to save?

In [171]:
## How many did we get the details for?
results_per_page = None
results_per_page

- Calculate how many pages of results needed to cover the total_results

In [172]:
# Use math.ceil to round up for the total number of pages of results.


In [173]:
for i in tqdm_notebook( range(1,n_pages+1)):
    ## The block of code we want to TRY to run
        
        
        ## Read in results in progress file and check the length

        
        ## save number of results for to use as offset
        
        
        
        ## use n_results as the OFFSET 
        

        ## append new results and save to file
        

            
    ## What to do if we get an error/exception.
        


SyntaxError: unexpected EOF while parsing (2903188637.py, line 20)

## Open the Final JSON File with Pandas

In [None]:
df = None

In [None]:
## convert the filename to a .csv.gz
csv_file = JSON_FILE.replace('.json','.csv.gz')
csv_file

In [None]:
## Save it as a compressed csv (to save space)


## Bonus: compare filesize with os module's `os.path.getsize`

In [None]:
size_json = os.path.getsize(JSON_FILE)
size_csv_gz = os.path.getsize(JSON_FILE.replace('.json','.csv.gz'))

print(f'JSON FILE: {size_json:,} Bytes')
print(f'CSV.GZ FILE: {size_csv_gz:,} Bytes')

print(f'the csv.gz is {size_json/size_csv_gz} times smaller!')

## Next Class: Processing the Results and Mapping 