# Efficient Yelp API Calls (Core)

## Task

- For this assignment, you will be working with the Yelp API.

- As before, you will use the Yelp API to search your favorite city for a cuisine type of your choice.

- Extract all of the results from your search and compile them into one dataframe using a for loop as shown in the lesson "Code for Efficient API Extraction"

- Save your notebook, commit the change to your repository and submit the repository URL for this assignment.
    
    

## Tools You Will Use
- Part 1:
    - Yelp API:
        - Getting Started: 
            - https://www.yelp.com/developers/documentation/v3/get_started

    - `YelpAPI` python package
        -  "YelpAPI": https://github.com/gfairchild/yelpapi

### Applying Code From
- Efficient API Calls Lesson Link: https://login.codingdojo.com/m/376/12529/88078

In [1]:
# Standard Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Additional Imports
import os, json, math, time
from yelpapi import YelpAPI
from tqdm.notebook import tqdm_notebook

In [2]:
!pip install yelpapi --quiet

In [3]:
!pip install yelpapi



In [4]:
# Install tmdbsimple (only need to run once)
!pip install tqdm



## 1. Registering for Required APIs


- Yelp: https://www.yelp.com/developers/documentation/v3/get_started


> Check the official API documentation to know what arguments we can search for: https://www.yelp.com/developers/documentation/v3/business_search

### Load Credentials and Create Yelp API Object

In [5]:
import json
with open(r'C:\Users\ASUS TUF\Documents\GitHub\data-enrichment-wk14-activity-mapping-yelp-api-results\.secret\yelp_api.json') as f:
    creds = json.load(f)
print(creds.keys())

dict_keys(['client-id', 'api-key'])


In [23]:
# import json
# import os

# create a relative filepath
#relative_path = os.path.join('.secret', 'yelp_api.json')

In [24]:
# Load API Credentials
#with open(relative_path) as file:
#    login = json.load(file)

#print(login.keys())

### Define Search Terms and File Paths

In [7]:
# set our API call parameters and filename before the first call
LOCATION = 'NY, NY'
TERM = 'Sushi'

# filename for saving the data
# json_filename = 'yelp_search_results.json'

In [8]:
## Specify fodler for saving data
folder_path = r"C:\Users\ASUS TUF\Documents\GitHub\data-enrichment-wk14-activity-mapping-yelp-api-results"

# Specifying JSON_FILE filename (can include a folder)
# JSON_FILE = None

print(f'data will be saved to:{folder_path}')

data will be saved to:C:\Users\ASUS TUF\Documents\GitHub\data-enrichment-wk14-activity-mapping-yelp-api-results


In [9]:
# Name wthe file to save results
JSON_FILE = r"C:\Users\ASUS TUF\Documents\GitHub\data-enrichment-wk14-activity-mapping-yelp-api-results/results_NY_Sushi.json"
JSON_FILE

'C:\\Users\\ASUS TUF\\Documents\\GitHub\\data-enrichment-wk14-activity-mapping-yelp-api-results/results_NY_Sushi.json'

### Check if Json File exists and Create it if it doesn't

In [10]:
## Check if JSON_FILE exists
file_exists = os.path.isfile(JSON_FILE)

## If it does not exist:
if file_exists == False:
    
    ## CREATE ANY NEEDED FOLDERS
    # Get the Folder Name only
    folder = os.path.dirname(JSON_FILE)
    
    ## If JSON_FILE included a folder:
    if len(folder) > 0:
        # create the folder
        os.makedirs(folder, exist_ok = True)
        
    ## INFORM USER AND SAVE EMPTY LIST
    print(f'[i] {JSON_FILE} not found. Saving empty list to file.')
    
    ## save the first page of results
    with open(JSON_FILE, 'w') as file:
        json.dump([], file)
        
## If it exists, inform user
else:
    print(f'[i] {JSON_FILE} already exits.')

[i] C:\Users\ASUS TUF\Documents\GitHub\data-enrichment-wk14-activity-mapping-yelp-api-results/results_NY_Sushi.json not found. Saving empty list to file.


### Load JSON FIle and account for previous results

In [11]:
## Load previous results and use len of results for offset
with open(JSON_FILE, 'r') as file:
    previous_results = json.load(file)
    
## set offset based on previous results
n_results = len(previous_results)

print(f'- {n_results} previous results found.')

- 0 previous results found.


### Make the first API call to get the first page of data

- We will use this first result to check:
    - how many total results there are?
    - Where is the actual data we want to save?
    - how many results do we get at a time?


In [14]:
# Instantiate YelpAPI Variable
yelp_api = YelpAPI(creds['api-key'], timeout_s=5.0)

In [15]:
# use our yelp_api variable's search_query method to perform our API call
results = yelp_api.search_query(location = LOCATION,
                                term = TERM,
                                offset = n_results)

results.keys()

dict_keys(['businesses', 'total', 'region'])

In [44]:
## How many results total?
total_results = results['total']

total_results

110

- Where is the actual data we want to save?

In [16]:
business_data = results['businesses']

# specify the filename where you want to save the data
json_file_path = JSON_FILE

# save the business data to a JSON file
with open(json_file_path, 'w') as file:
    json.dump(business_data, file, indent = 4)

In [17]:
## How many did we get the details for?
results_per_page = len(business_data)
print(f'number of results retrieved per page', results_per_page)

number of results retrieved per page 20


In [19]:
## How many results total?
total_results = results['total']
total_results

4500

- Calculate how many pages of results needed to cover the total_results

In [20]:
# Use math.ceil to round up for the total number of pages of results.
n_pages = math.ceil(total_results / results_per_page)

print(f'Total number of pages: {n_pages}')

Total number of pages: 225


In [21]:
# total number of api calls to make as to not exceed call limit
results_per_call = len(results['businesses'])

# Calculate the total number of iterations needed based on total results and results per call.
total_iterations = min(n_pages, math.ceil(total_results / results_per_call))

for i in tqdm_notebook(range(1, n_pages + 1)):

    ## The block of code we want to TRY to run
    try:        
        # Introduce a short delay to respect API rate limits
        time.sleep(0.2)
        
        ## Read in results in progress file and check the length
        with open(JSON_FILE, 'r') as file:
            previous_results = json.load(file)
        
        ## Save number of results to use as offset
        n_results = len(previous_results)
        
        ## Use n_results as the OFFSET 
        results = yelp_api.search_query(location = LOCATION,
                                        term = TERM,
                                        offset = len(previous_results))

        ## Append new results and save to file
        previous_results.extend(results['businesses'])
        with open(JSON_FILE, 'w') as file:
            json.dump(previous_results, file)
            
## What to do if we get an error/exception.
    except Exception as e:
        # check if we are at rate limit
        if 'Too Many Requests for url' in str(e):
            print('Rate limit exceeded. Stop data collection.')
            break
        else:
            print(f'an error occured {e}')
            # optionally handle error differently
            continue            

  0%|          | 0/225 [00:00<?, ?it/s]

an error occured 400 Client Error: Bad Request for url: https://api.yelp.com/v3/businesses/search?location=NY%2C+NY&term=Sushi&offset=1000
an error occured 400 Client Error: Bad Request for url: https://api.yelp.com/v3/businesses/search?location=NY%2C+NY&term=Sushi&offset=1000
an error occured 400 Client Error: Bad Request for url: https://api.yelp.com/v3/businesses/search?location=NY%2C+NY&term=Sushi&offset=1000
an error occured 400 Client Error: Bad Request for url: https://api.yelp.com/v3/businesses/search?location=NY%2C+NY&term=Sushi&offset=1000
an error occured 400 Client Error: Bad Request for url: https://api.yelp.com/v3/businesses/search?location=NY%2C+NY&term=Sushi&offset=1000
an error occured 400 Client Error: Bad Request for url: https://api.yelp.com/v3/businesses/search?location=NY%2C+NY&term=Sushi&offset=1000
an error occured 400 Client Error: Bad Request for url: https://api.yelp.com/v3/businesses/search?location=NY%2C+NY&term=Sushi&offset=1000
an error occured 400 Client

## Open the Final JSON File with Pandas

In [22]:
df = pd.read_json(JSON_FILE)
display(df.head(), df.tail())

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,location,phone,display_phone,distance,price
0,k8YJDry6_pbPIiCPeK-6mQ,mikado-brooklyn,Mikado,https://s3-media2.fl.yelpcdn.com/bphoto/MNIqGY...,False,https://www.yelp.com/biz/mikado-brooklyn?adjus...,63,"[{'alias': 'japanese', 'title': 'Japanese'}, {...",4.7,"{'latitude': 40.69032888195518, 'longitude': -...","[pickup, delivery]","{'address1': '177 Atlantic Ave', 'address2': N...",12123811388,(212) 381-1388,1681.307993,
1,lv1zewxaq-zfYsKoWydsWQ,mr-sushi-brooklyn-4,Mr Sushi,https://s3-media4.fl.yelpcdn.com/bphoto/NXbUWO...,False,https://www.yelp.com/biz/mr-sushi-brooklyn-4?a...,41,"[{'alias': 'japanese', 'title': 'Japanese'}, {...",4.9,"{'latitude': 40.71424074485434, 'longitude': -...","[pickup, delivery]","{'address1': '331 Graham Ave', 'address2': Non...",13476894141,(347) 689-4141,4298.671833,
2,Q_81mthGMrQM6Qg-QdnOkA,mikado-new-york-5,Mikado,https://s3-media3.fl.yelpcdn.com/bphoto/UFSd49...,False,https://www.yelp.com/biz/mikado-new-york-5?adj...,20,"[{'alias': 'sushi', 'title': 'Sushi Bars'}]",4.9,"{'latitude': 40.70589172111019, 'longitude': -...","[pickup, delivery]","{'address1': '164 Pearl St', 'address2': None,...",12128717499,(212) 871-7499,1106.219054,
3,AudBWxeAr3zHr1ITrTcVpg,izakaya-fuku-jackson-heights,Izakaya Fuku,https://s3-media1.fl.yelpcdn.com/bphoto/mVURSC...,False,https://www.yelp.com/biz/izakaya-fuku-jackson-...,322,"[{'alias': 'japanese', 'title': 'Japanese'}]",4.4,"{'latitude': 40.74640809234451, 'longitude': -...","[delivery, pickup, restaurant_reservation]","{'address1': '71-28 Roosevelt Ave', 'address2'...",17182551120,(718) 255-1120,9609.34191,$$
4,g8loNCHFIviiXEMN4nbKkA,sushi-295-mercer-new-york,Sushi 295 Mercer,https://s3-media1.fl.yelpcdn.com/bphoto/JIeOyS...,False,https://www.yelp.com/biz/sushi-295-mercer-new-...,36,"[{'alias': 'sushi', 'title': 'Sushi Bars'}]",4.9,"{'latitude': 40.730331, 'longitude': -73.993914}","[pickup, delivery]","{'address1': '295 Mercer St', 'address2': None...",12127770295,(212) 777-0295,2770.936494,


Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,location,phone,display_phone,distance,price
995,479rycCXZojMT9S0cl8Oxg,yakiniku-gen-new-york,Yakiniku Gen,https://s3-media1.fl.yelpcdn.com/bphoto/0CZBc5...,False,https://www.yelp.com/biz/yakiniku-gen-new-york...,176,"[{'alias': 'japanese', 'title': 'Japanese'}, {...",3.8,"{'latitude': 40.75611, 'longitude': -73.96815}","[pickup, delivery]","{'address1': '250 E 52nd St', 'address2': None...",12126021129.0,(212) 602-1129,6046.752812,$$$
996,S-QizulX5qYwqUWUudTlkw,bondi-sushi-new-york-4,Bondi Sushi,https://s3-media4.fl.yelpcdn.com/bphoto/-kY_GC...,False,https://www.yelp.com/biz/bondi-sushi-new-york-...,24,"[{'alias': 'sushi', 'title': 'Sushi Bars'}]",2.7,"{'latitude': 40.71518511014587, 'longitude': -...","[pickup, delivery]","{'address1': '275 Greenwich St', 'address2': '...",,,1773.221927,
997,jCettteufZZP9bAx5Pxovw,19-juku-omakase-new-york-2,19 @ Juku - Omakase,https://s3-media3.fl.yelpcdn.com/bphoto/DG-Rqe...,False,https://www.yelp.com/biz/19-juku-omakase-new-y...,40,"[{'alias': 'sushi', 'title': 'Sushi Bars'}, {'...",4.4,"{'latitude': 40.7146, 'longitude': -73.9995299}",[],"{'address1': '32 Mulberry St', 'address2': '',...",16465902111.0,(646) 590-2111,1105.275693,
998,tznio-FInuk87Y15Dgq5iw,udon-st-marks-new-york-2,Udon St Marks,https://s3-media2.fl.yelpcdn.com/bphoto/v7DD1z...,False,https://www.yelp.com/biz/udon-st-marks-new-yor...,940,"[{'alias': 'japanese', 'title': 'Japanese'}, {...",3.7,"{'latitude': 40.729424, 'longitude': -73.989004}","[pickup, delivery]","{'address1': '11 St Marks Pl', 'address2': '',...",12129229677.0,(212) 922-9677,2706.058999,$$
999,fcCF0Kh_RCbv6KSVlb-Seg,sticky-rice-new-york,Sticky Rice,https://s3-media3.fl.yelpcdn.com/bphoto/z09IeM...,False,https://www.yelp.com/biz/sticky-rice-new-york?...,438,"[{'alias': 'thai', 'title': 'Thai'}, {'alias':...",3.9,"{'latitude': 40.71798, 'longitude': -73.99042}","[pickup, delivery]","{'address1': '85 Orchard St', 'address2': '', ...",12122748208.0,(212) 274-8208,1439.783276,$$


In [23]:
# check for duplicate IDs
df.duplicated(subset = 'id').sum()

0

In [24]:
# There are no duplicates to remove

In [25]:
## convert the filename to a .csv.gz
csv_file = JSON_FILE.replace('.json','.csv.gz')
csv_file

'C:\\Users\\ASUS TUF\\Documents\\GitHub\\data-enrichment-wk14-activity-mapping-yelp-api-results/results_NY_Sushi.csv.gz'

In [None]:
## Save it as a compressed csv (to save space)

# Creating a Relative File Path

In [26]:
# Creating a Relative File Path

# specify directory and filename
directory = 'Data'
filename = 'final_results_NY_Sushi'
path = os.path.join(directory, filename)

# ensure that the 'Data' directory exists
os.makedirs(directory, exist_ok = True)

## Save it as a compressed csv (to save space)
df.to_csv(path, compression = 'gzip', index = False)

In [27]:
# Step 1: Correctly Save the JSON File
json_file = 'Data/final_results_NY_Sushi.json'  # Specify the correct JSON file name
os.makedirs('Data', exist_ok = True)  # Ensure the Data directory exists
df.to_json(json_file, orient = 'records', lines = True)  # Save the DataFrame as JSON

# Step 2: Convert and Save as .CSV.GZ
csv_gz_file = json_file.replace('.json', '.csv.gz')  # Create the CSV.GZ file name based on the JSON file name
df.to_csv(csv_gz_file, compression = 'gzip', index = False)  # Save the DataFrame as compressed CSV

# Step 3: Compare File Sizes
if os.path.exists(json_file) and os.path.exists(csv_gz_file):
    size_json = os.path.getsize(json_file)
    size_csv_gz = os.path.getsize(csv_gz_file)

    print(f'JSON FILE: {size_json:,} Bytes')
    print(f'CSV.GZ FILE: {size_csv_gz:,} Bytes')

    if size_csv_gz > 0:
        compression_ratio = size_json / size_csv_gz
        print(f'The csv.gz file is {compression_ratio:.2f} times smaller than the JSON file.')
    else:
        print("CSV.GZ file size is 0, cannot compare sizes.")
else:
    print("One or both files do not exist, check file paths.")

JSON FILE: 905,563 Bytes
CSV.GZ FILE: 139,889 Bytes
The csv.gz file is 6.47 times smaller than the JSON file.
