# Part 1 - Extracting and Saving Data from Yelp API

## Obective

- For this CodeAlong, we will be working with the Yelp API. 
- You will use the the Yelp API to search your home town for a cuisine type of your choice.
- Next class, we will then use Plotly Express to create a map with the Mapbox API to visualize the results.
    
    

## Tools You Will Use
- Part 1:
    - Yelp API:
        - Getting Started: 
            - https://www.yelp.com/developers/documentation/v3/get_started

    - `YelpAPI` python package
        -  "YelpAPI": https://github.com/gfairchild/yelpapi
- Part 2:

    - Plotly Express: https://plotly.com/python/getting-started/
        - With Mapbox API: https://www.mapbox.com/
        - `px.scatter_mapbox` [Documentation](https://plotly.com/python/scattermapbox/): 




### Applying Code From
- Efficient API Calls Lesson Link: https://login.codingdojo.com/m/376/12529/88078

In [1]:
# Standard Imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Additional Imports
import os, json, math, time
from yelpapi import YelpAPI
from tqdm.notebook import tqdm_notebook

## 1. Registering for Required APIs


- Yelp: https://www.yelp.com/developers/documentation/v3/get_started


> Check the official API documentation to know what arguments we can search for: https://www.yelp.com/developers/documentation/v3/business_search

### Load Credentials and Create Yelp API Object

In [2]:
# Load API Credentials
with open("/Users/Beemo/.secret/yelp_api.json") as f:
    login = json.load(f)
login.keys()

dict_keys(['client-id', 'api-key'])

In [3]:
# Instantiate YelpAPI Variable
from yelpapi import YelpAPI
yelp = YelpAPI(login['api-key'], timeout_s=5.0)
yelp

<yelpapi.yelpapi.YelpAPI at 0x1470f1f5cd0>

### Define Search Terms and File Paths

In [4]:
# set our API call parameters and filename before the first call
location = "Chicago, IL, 60647"
term = 'American'

In [5]:
## Specify fodler for saving data
folder = "Data/"
os.makedirs(folder,exist_ok=True)
#exist_ok = true creates an exception'

# Specifying JSON_FILE filename (can include a folder)
JSON_FILE = folder+f"Chicago-American.json"

In [6]:
location.split(',')[0]

'Chicago'

In [7]:
#JSON_FILE = FOLDER+f{LOCATION.split(',')[0]}

### Check if Json File exists and Create it if it doesn't

In [8]:
# ## Check if JSON_FILE exists
# if os.path.isfile(JSON_FILE) ==False:
# ## If it does not exist: 
#     print("The file does not exist...bitch")
#     ## CREATE ANY NEEDED FOLDERS
#     # Get the Folder Name only
#     with open(JSON_FILE, 'w') as f:
#         json.dump(results['businesses'], f)
# else:
#     print("File exists...")
# ## If JSON_FILE included a folder:
# # create the folder  
# ## INFORM USER AND SAVE EMPTY LIST 
# ## save the first page of results        
# ## If it exists, inform user


### Load JSON FIle and account for previous results

In [9]:
## Load previous results and use len of results for offset

## set offset based on previous results


### Make the first API call to get the first page of data

- We will use this first result to check:
    - how many total results there are?
    - Where is the actual data we want to save?
    - how many results do we get at a time?


In [10]:
# use our yelp_api variable's search_query method to perform our API call
results =yelp.search_query(term=term, location=location)
results

{'businesses': [{'id': 'okaqMJEoHfHblpKz9Q-CMA',
   'alias': 'the-perch-chicago',
   'name': 'The Perch',
   'image_url': 'https://s3-media4.fl.yelpcdn.com/bphoto/6S7CbMwTmvjlbNSNJFW5Tw/o.jpg',
   'is_closed': False,
   'url': 'https://www.yelp.com/biz/the-perch-chicago?adjust_creative=RrPFJ6lUMwZMYSl1xY3Y_Q&utm_campaign=yelp_api_v3&utm_medium=api_v3_business_search&utm_source=RrPFJ6lUMwZMYSl1xY3Y_Q',
   'review_count': 331,
   'categories': [{'alias': 'breweries', 'title': 'Breweries'},
    {'alias': 'newamerican', 'title': 'American (New)'}],
   'rating': 4.5,
   'coordinates': {'latitude': 41.90348, 'longitude': -87.676221},
   'transactions': ['delivery', 'pickup'],
   'price': '$$',
   'location': {'address1': '1932 W Division',
    'address2': None,
    'address3': '',
    'city': 'Chicago',
    'zip_code': '60622',
    'country': 'US',
    'state': 'IL',
    'display_address': ['1932 W Division', 'Chicago, IL 60622']},
   'phone': '',
   'display_phone': '',
   'distance': 2498.

In [11]:
## How many results total?
results.keys()

dict_keys(['businesses', 'total', 'region'])

In [12]:
results['total']
results
results_per_page = len(results['businesses'])
results_per_page

20

- Where is the actual data we want to save?

In [13]:
buss = results['businesses']

In [14]:
pd.DataFrame(buss)

Unnamed: 0,id,alias,name,image_url,is_closed,url,review_count,categories,rating,coordinates,transactions,price,location,phone,display_phone,distance
0,okaqMJEoHfHblpKz9Q-CMA,the-perch-chicago,The Perch,https://s3-media4.fl.yelpcdn.com/bphoto/6S7CbM...,False,https://www.yelp.com/biz/the-perch-chicago?adj...,331,"[{'alias': 'breweries', 'title': 'Breweries'},...",4.5,"{'latitude': 41.90348, 'longitude': -87.676221}","[delivery, pickup]",$$,"{'address1': '1932 W Division', 'address2': No...",,,2498.281633
1,gzhkdb6YoiFm5s3vriG1AA,gretel-chicago,Gretel,https://s3-media1.fl.yelpcdn.com/bphoto/2SniAq...,False,https://www.yelp.com/biz/gretel-chicago?adjust...,126,"[{'alias': 'coffee', 'title': 'Coffee & Tea'},...",4.5,"{'latitude': 41.917275, 'longitude': -87.698577}","[delivery, pickup]",$$,"{'address1': '2833 W Armitage Ave', 'address2'...",17737703427.0,(773) 770-3427,121.339066
2,zm6Peew9j8YtowzUu4sQfA,the-whale-chicago-chicago,The Whale Chicago,https://s3-media2.fl.yelpcdn.com/bphoto/PTPZJy...,False,https://www.yelp.com/biz/the-whale-chicago-chi...,762,"[{'alias': 'newamerican', 'title': 'American (...",4.0,"{'latitude': 41.92555, 'longitude': -87.70112}","[delivery, pickup]",$$,"{'address1': '2427 N Milwaukee Ave', 'address2...",17738252900.0,(773) 825-2900,825.964359
3,DZ4lM8OHFK9gAdLffFWlTA,the-leavitt-street-inn-and-tavern-chicago,The Leavitt Street Inn & Tavern,https://s3-media2.fl.yelpcdn.com/bphoto/NTFPOv...,False,https://www.yelp.com/biz/the-leavitt-street-in...,38,"[{'alias': 'bedbreakfast', 'title': 'Bed & Bre...",5.0,"{'latitude': 41.924083, 'longitude': -87.682524}","[delivery, pickup]",$$,"{'address1': '2345 N Leavitt St', 'address2': ...",17736619639.0,(773) 661-9639,1482.40521
4,sWYRaPCYHrTxsZPLWmzkzQ,scofflaw-chicago,Scofflaw,https://s3-media2.fl.yelpcdn.com/bphoto/pQw6qJ...,False,https://www.yelp.com/biz/scofflaw-chicago?adju...,789,"[{'alias': 'newamerican', 'title': 'American (...",4.5,"{'latitude': 41.9173240661621, 'longitude': -8...",[delivery],$$,"{'address1': '3201 W Armitage Ave', 'address2'...",17732529700.0,(773) 252-9700,710.137448
5,wfkj7DK8YzhdwhhFc2OntA,giant-chicago-2,Giant,https://s3-media1.fl.yelpcdn.com/bphoto/mkHyg5...,False,https://www.yelp.com/biz/giant-chicago-2?adjus...,568,"[{'alias': 'newamerican', 'title': 'American (...",4.5,"{'latitude': 41.9171, 'longitude': -87.70746}",[delivery],$$$,"{'address1': '3209 W Armitage Ave', 'address2'...",17732520997.0,(773) 252-0997,745.440723
6,bwJ7IcLgQ228nkDCd4KO1A,second-generation-chicago,Second Generation,https://s3-media3.fl.yelpcdn.com/bphoto/V-1GgC...,False,https://www.yelp.com/biz/second-generation-chi...,3,"[{'alias': 'newamerican', 'title': 'American (...",4.5,"{'latitude': 41.92796, 'longitude': -87.70473}",[],,"{'address1': '3057 W Logan Blvd', 'address2': ...",17739047620.0,(773) 904-7620,1177.837755
7,fJkO3cfnNXrLIoYYGBX7_g,table-donkey-and-stick-chicago,"Table, Donkey and Stick",https://s3-media3.fl.yelpcdn.com/bphoto/kSHOCZ...,False,https://www.yelp.com/biz/table-donkey-and-stic...,473,"[{'alias': 'modern_european', 'title': 'Modern...",4.0,"{'latitude': 41.91775, 'longitude': -87.695965}","[delivery, pickup]",$$,"{'address1': '2728 W Armitage Ave', 'address2'...",17734868525.0,(773) 486-8525,236.902482
8,mBG17MKjI3TNwYew5dzdyQ,the-moonlighter-chicago,The Moonlighter,https://s3-media4.fl.yelpcdn.com/bphoto/9OFncM...,False,https://www.yelp.com/biz/the-moonlighter-chica...,227,"[{'alias': 'burgers', 'title': 'Burgers'}, {'a...",4.0,"{'latitude': 41.91766, 'longitude': -87.70738}",[delivery],$$,"{'address1': '3204 W Armitage Ave', 'address2'...",17733608896.0,(773) 360-8896,723.391938
9,VPJk-SEWSWS_nGoQvM-COw,penumbra-chicago,Penumbra,https://s3-media1.fl.yelpcdn.com/bphoto/2KoXY4...,False,https://www.yelp.com/biz/penumbra-chicago?adju...,722,"[{'alias': 'wine_bars', 'title': 'Wine Bars'},...",5.0,"{'latitude': 41.92445, 'longitude': -87.710933}","[delivery, restaurant_reservation]",$$,"{'address1': '3309 W Fullerton Ave', 'address2...",17737722343.0,(773) 772-2343,1217.117336


In [15]:
## How many did we get the details for?
#results_per_page = None
#results_per_page

- Calculate how many pages of results needed to cover the total_results

In [16]:
# Use math.ceil to round up for the total number of pages of results.
import math
n_pages = math.ceil(results['total']/results_per_page)
n_pages

90

In [17]:
if os.path.isfile(JSON_FILE)==False:
    # file does not exist so print message and create empty file
    print(" The file does not exist. Creating empty file.")
    with open(JSON_FILE,'w') as f:
        json.dump(results['businesses'],f)
# if file exists
else:
    print("File already exists")


 The file does not exist. Creating empty file.


In [18]:
for i in tqdm_notebook( range(1,n_pages+1)):
    with open(JSON_FILE) as f:
        prev_results = json.load(f)
    n_results = len(prev_results)
    results = yelp.search_query(term=term, location=location, offset =5)
    prev_results.extend(results['businesses'])
    with open(JSON_FILE, 'w') as f:
        json.dump(prev_results,f)

  0%|          | 0/90 [00:00<?, ?it/s]

In [19]:
## The block of code we want to TRY to run
## Read in results in progress file and check the length
## use n_results as the OFFSET     
## append new results and save to fill
## What to do if we get an error/exception.


## Open the Final JSON File with Pandas

In [20]:
df = pd.read_json('./Data/Chicago-American.json')

In [21]:
df = df.replace('json', 'csv.gz')

## Bonus: compare filesize with os module's `os.path.getsize`

In [22]:
df = pd.read_json(JSON_FILE)
df

## convert the filename to a .csv.gz
csv_file = JSON_FILE.replace('.json','.csv.gz')
csv_file


df.to_csv(csv_file,compression='gzip',index=False)


# ## Bonus: compare filesize with os module's `os.path.getsize`

size_json = os.path.getsize(JSON_FILE)
print(f'JSON FILE: {size_json:,} Bytes')


size_json = os.path.getsize(JSON_FILE)
size_csv_gz = os.path.getsize(JSON_FILE.replace('.json','.csv.gz'))

print(f'JSON FILE: {size_json:,} Bytes')
print(f'CSV.GZ FILE: {size_csv_gz:,} Bytes')

print(f'the csv.gz is {size_json/size_csv_gz} times smaller!')


JSON FILE: 1,781,919 Bytes
JSON FILE: 1,781,919 Bytes
CSV.GZ FILE: 15,827 Bytes
the csv.gz is 112.58728754659758 times smaller!


## Next Class: Processing the Results and Mapping 