# Mapping Yelp Search Results - Part 1

- 04/26/22


## Obective

- For this CodeAlong, we will be working with the Yelp API. 
- You will use the the Yelp API to search your home town for a cuisine type of your choice.
- Next class, we will then use Plotly Express to create a map with the Mapbox API to visualize the results.
    
    

## Tools You Will Use
- Part 1:
    - Yelp API:
        - Getting Started: 
            - https://www.yelp.com/developers/documentation/v3/get_started

    - `YelpAPI` python package
        -  "YelpAPI": https://github.com/gfairchild/yelpapi
- Part 2:

    - Plotly Express: https://plotly.com/python/getting-started/
        - With Mapbox API: https://www.mapbox.com/
        - `px.scatter_mapbox` [Documentation](https://plotly.com/python/scattermapbox/): 




### Applying Code From
- Efficient API Calls Lesson Link: https://login.codingdojo.com/m/376/12529/88078

In [20]:
# Standard Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Additional Imports
import os, json, math, time
from yelpapi import YelpAPI
from tqdm.notebook import tqdm_notebook

## 1. Registering for Required APIs


- Yelp: https://www.yelp.com/developers/documentation/v3/get_started


> Check the official API documentation to know what arguments we can search for: https://www.yelp.com/developers/documentation/v3/business_search

### Load Credentials and Create Yelp API Object

In [21]:
# Load API Credentials
with open('/Users/chas/.secret/yelp_api.json', 'r') as f:
    login = json.load(f)
login.keys()

dict_keys(['client-id', 'api-key'])

In [22]:
# Instantiate YelpAPI Variable
yelpapi = YelpAPI(login['api-key'], timeout_s=5)
yelpapi

<yelpapi.yelpapi.YelpAPI at 0x7fee14e35250>

### Define Search Terms and File Paths

In [23]:
# set our API call parameters and filename before the first call
LOCATION = 'Reading, PA, 19601'
TERM = 'Chinese'

In [24]:
## Specify folder for saving data
FOLDER = 'Data/'
os.makedirs(FOLDER, exist_ok=True)

In [25]:
LOCATION.split(',')[0]

'Reading'

In [26]:
# Specifying JSON_FILE filename (can include a folder)
JSON_FILE = f"{FOLDER}{TERM}_{LOCATION.split(',')[0]}.json"
JSON_FILE

'Data/Chinese_Reading.json'

### Check if Json File exists and Create it if it doesn't

In [27]:
## Check if JSON_FILE exists
file_exists = os.path.isfile(JSON_FILE)
## If it does not exist: 
if file_exists == False:    
    ## CREATE ANY NEEDED FOLDERS
    # Get the Folder Name only
    folder = os.path.dirname(JSON_FILE)
    
    ## If JSON_FILE included a folder:
    if len(folder)>0:
        # create the folder
        os.makedirs(folder,exist_ok=True)
        
        
    ## INFORM USER AND SAVE EMPTY LIST
    print(f"[i] {JSON_FILE} not found. Saving empty list to file.")
    
    #saving empty file
    with open(JSON_FILE, 'w') as f:
        json.dump([], f)
        
## If it exists, inform user
else:
    print(f"[i] {JSON_FILE} already exists.")

[i] Data/Chinese_Reading.json already exists.


### Load JSON FIle and account for previous results

In [28]:
## Load previous results and use len of results for offset
with open (JSON_FILE, 'r') as f:
    previous_results = json.load(f)
## set offset based on previous results
n_results = len(previous_results)
print(f"-{n_results} previous results found")

-0 previous results found


### Make the first API call to get the first page of data

- We will use this first result to check:
    - how many total results there are?
    - Where is the actual data we want to save?
    - how many results do we get at a time?


In [29]:
# use our yelp_api variable's search_query method to perform our API call
results = yelpapi.search_query(location = LOCATION, 
                              term = TERM)
results.keys()

dict_keys(['businesses', 'total', 'region'])

In [30]:
## How many results total?
results['total']

36

- Where is the actual data we want to save?

In [31]:
type(results['businesses'])

list

In [32]:
results['businesses']

[{'id': 'bGoiESr-CoxvhLy3wE_EMA',
  'alias': 'great-wall-chinese-restaurant-west-reading-2',
  'name': 'Great Wall Chinese Restaurant',
  'image_url': 'https://s3-media2.fl.yelpcdn.com/bphoto/OVeudbsNLduHBici992cHA/o.jpg',
  'is_closed': False,
  'url': 'https://www.yelp.com/biz/great-wall-chinese-restaurant-west-reading-2?adjust_creative=pmEh1CnvTrbj1V5gF4_06w&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=pmEh1CnvTrbj1V5gF4_06w',
  'review_count': 26,
  'categories': [{'alias': 'cantonese', 'title': 'Cantonese'}],
  'rating': 3.5,
  'coordinates': {'latitude': 40.33542, 'longitude': -75.94889},
  'transactions': ['delivery'],
  'price': '$',
  'location': {'address1': '532 Penn Ave',
   'address2': '',
   'address3': '',
   'city': 'West Reading',
   'zip_code': '19611',
   'country': 'US',
   'state': 'PA',
   'display_address': ['532 Penn Ave', 'West Reading, PA 19611']},
  'phone': '+16106858585',
  'display_phone': '(610) 685-8585',
  'distance': 2447.80181

In [33]:
## How many did we get the details for?
results_per_page = len(results['businesses'])
results_per_page

20

- Calculate how many pages of results needed to cover the total_results

In [34]:
# Use math.ceil to round up for the total number of pages of results.
n_pages = math.ceil(results['total']/results_per_page)
n_pages

2

In [35]:
for i in tqdm_notebook( range(0,n_pages+1)):
    ## The block of code we want to TRY to run
        try:
            time.sleep(.2)
            ## Read in results in progress file and check the length
            with open (JSON_FILE, 'r') as f:
                previous_results = json.load(f)
            ## set offset based on previous results
            n_results = len(previous_results)

             ## use n_results as the OFFSET 
            results = yelpapi.search_query(location = LOCATION, 
                                       term = TERM,
                                       offset = n_results+1)
            
            ## append new results and save to file
            previous_results.extend(results['businesses'])

            with open(JSON_FILE, 'w') as f:
                json.dump(previous_results, f)
    ## What to do if we get an error/exception.
        except Exception as e:
            print(e)
            break


  0%|          | 0/3 [00:00<?, ?it/s]

## Open the Final JSON File with Pandas

In [37]:
df = pd.read_json(JSON_FILE)
df

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,CVnvnYpDu3DuFBEnTiN5_g,wongs-chinese-restaurant-reading,Wong's Chinese Restaurant,https://s3-media2.fl.yelpcdn.com/bphoto/bYG6bW...,False,https://www.yelp.com/biz/wongs-chinese-restaur...,14,"[{'alias': 'chinese', 'title': 'Chinese'}, {'a...",4.5,"{'latitude': 40.365425, 'longitude': -75.919975}",[delivery],$,"{'address1': '835-5 Hiesters Ln', 'address2': ...",16109291999,(610) 929-1999,2062.233179
1,x4aZiu4up011rNrkS-F_aQ,fortune-cafe-wyomissing-2,Fortune Cafe,https://s3-media1.fl.yelpcdn.com/bphoto/7rInx1...,False,https://www.yelp.com/biz/fortune-cafe-wyomissi...,18,"[{'alias': 'chinese', 'title': 'Chinese'}]",4.5,"{'latitude': 40.3423, 'longitude': -75.9809}",[delivery],,"{'address1': '1177 Berkshire Blvd', 'address2'...",16103965329,(610) 396-5329,3721.789492
2,-kVxn8rVRPkcbRChaMQF9A,wonderful-chinese-restaurant-reading,Wonderful Chinese Restaurant,https://s3-media4.fl.yelpcdn.com/bphoto/q5t1vH...,False,https://www.yelp.com/biz/wonderful-chinese-res...,19,"[{'alias': 'chinese', 'title': 'Chinese'}, {'a...",4.5,"{'latitude': 40.3262672424316, 'longitude': -7...",[delivery],$,"{'address1': '4301 Penn Ave', 'address2': None...",16106788388,(610) 678-8388,8194.366784
3,0AXHF41K7QZcfWlvCJYXzw,no-1-chinese-restaurant-reading-4,No.1 Chinese Restaurant,https://s3-media2.fl.yelpcdn.com/bphoto/WQv6hO...,False,https://www.yelp.com/biz/no-1-chinese-restaura...,30,"[{'alias': 'chinese', 'title': 'Chinese'}]",4.5,"{'latitude': 40.2987075, 'longitude': -75.9473...",[delivery],$,"{'address1': '160 Kenhorst Plz', 'address2': '...",16107750875,(610) 775-0875,6377.472186
4,UDjJEM5xlmmxseGERcihNw,chans-chinese-restaurant-reading,Chan's Chinese Restaurant,https://s3-media2.fl.yelpcdn.com/bphoto/UIzJqI...,False,https://www.yelp.com/biz/chans-chinese-restaur...,6,"[{'alias': 'chinese', 'title': 'Chinese'}]",4.5,"{'latitude': 40.3321, 'longitude': -75.92966}",[delivery],$$,"{'address1': '201 S 4th St', 'address2': '', '...",16103721200,(610) 372-1200,2911.877238
5,Dx1Ay1Vtk8eCZCGRHHQoyw,mikura-reading,Mikura,https://s3-media2.fl.yelpcdn.com/bphoto/r5gkRd...,False,https://www.yelp.com/biz/mikura-reading?adjust...,106,"[{'alias': 'asianfusion', 'title': 'Asian Fusi...",3.5,"{'latitude': 40.3464, 'longitude': -75.95944}","[pickup, delivery]",$$,"{'address1': '840 N Park Rd', 'address2': '', ...",16103735851,(610) 373-5851,1910.164228
6,qAT7gKm5Ndi0e0prictoOA,china-wok-reading,China Wok,https://s3-media2.fl.yelpcdn.com/bphoto/dIhrAS...,False,https://www.yelp.com/biz/china-wok-reading?adj...,15,"[{'alias': 'chinese', 'title': 'Chinese'}]",3.5,"{'latitude': 40.38523, 'longitude': -75.92713}",[delivery],$,"{'address1': '3401 N 5th Street Hwy', 'address...",16109298383,(610) 929-8383,3391.138994
7,nWSYTpcPOTmoWAw-_6vpvA,tokyo-hibachi-and-bar-wyomissing,Tokyo Hibachi & Bar,https://s3-media2.fl.yelpcdn.com/bphoto/6BXRDe...,False,https://www.yelp.com/biz/tokyo-hibachi-and-bar...,128,"[{'alias': 'chinese', 'title': 'Chinese'}, {'a...",3.5,"{'latitude': 40.34523, 'longitude': -75.96808}","[pickup, delivery]",$$,"{'address1': '960 Woodland Rd', 'address2': ''...",16106852888,(610) 685-2888,2628.777696
8,Qwd8unlaye8Oov7qPmU3uA,china-fun-restaurant-reading,China Fun Restaurant,https://s3-media1.fl.yelpcdn.com/bphoto/kXEyFd...,False,https://www.yelp.com/biz/china-fun-restaurant-...,13,"[{'alias': 'chinese', 'title': 'Chinese'}]",4.0,"{'latitude': 40.3867987, 'longitude': -75.9345...",[delivery],$$,"{'address1': '3611 Pottsville Pike', 'address2...",16109390775,(610) 939-0775,3417.693143
9,Uvf6IJ3rtIoUf17tQAgdeg,great-china-reading,Great China,https://s3-media4.fl.yelpcdn.com/bphoto/OryjzC...,False,https://www.yelp.com/biz/great-china-reading?a...,20,"[{'alias': 'chinese', 'title': 'Chinese'}]",4.0,"{'latitude': 40.3160268, 'longitude': -76.0009...",[delivery],$,"{'address1': '2675 Shillington Rd', 'address2'...",16106789222,(610) 678-9222,6755.290103


In [38]:
#check for duplicates
df.duplicated(subset='id').sum()

0

In [39]:
## convert the filename to a .csv.gz
csv_file = JSON_FILE.replace('.json','.csv.gz')
csv_file

'Data/Chinese_Reading.csv.gz'

In [40]:
## Save it as a compressed csv (to save space)
df.to_csv(csv_file, compression='gzip', index=False)

## Bonus: compare filesize with os module's `os.path.getsize`

In [41]:
size_json = os.path.getsize(JSON_FILE)
size_csv_gz = os.path.getsize(JSON_FILE.replace('.json','.csv.gz'))

print(f'JSON FILE: {size_json:,} Bytes')
print(f'CSV.GZ FILE: {size_csv_gz:,} Bytes')

print(f'the csv.gz is {size_json/size_csv_gz} times smaller!')

JSON FILE: 32,808 Bytes
CSV.GZ FILE: 5,180 Bytes
the csv.gz is 6.333590733590734 times smaller!


## Next Class: Processing the Results and Mapping 