Web Crawling & Scraping (Yelp API)

This script crawls Yelp reviews via Yelp API.

First, we show you how to search businesses by geo-location (e.g. 2 kilometers radius from a particular city).
Next, we show you how to retrieve user reviews for a given business (business ID).

Ref (Yelp API Documentation): https://www.yelp.com/developers/documentation/v3

Client ID
YOUR_CLIENT_ID

API Key
YOUR_API_KEY

In [None]:
# Import the modules
import requests
import json

In [None]:
# Define my API Key, My Endpoint, and My Header
API_KEY = 'YOUR API KEY'
HEADERS = {'Authorization': 'bearer %s' % API_KEY}
ENDPOINT = 'https://api.yelp.com/v3/businesses/search'

How do we find all businesses with a particular city?
How do we find all businesses of a particular category (e.g. food)?
Below, we use Business "Search" API to find all restaurants in San Francisco.

Refer to this API documentation for details on what parameters can be set:
https://www.yelp.com/developers/documentation/v3/business_search

term: 'food' or 'restaurants' (it can also be a specific business name such as Starbucks)

location: (required if either latitude or longitude is not provided) a city name would do, e.g. "New York City"
radius (in meters), e.g. 40000 meters is about 25 miles

limit (optional): number of business results to return (by default, it is 20), maximum is 50

offset (optional): offset the list of returned business results by this amount

In [None]:
PARAMETERS = {'term': 'food',
             'limit': 50,
             'offset': 50,
             'radius': 10000,
             'location': 'San Francisco'}

In [None]:
# Make a request to the Yelp API
response = requests.get(url = ENDPOINT,
                        params = PARAMETERS,
                        headers = HEADERS)

# Conver the JSON String
business_search_results = response.json()

# print the response
print(json.dumps(business_search_results, indent = 3))

In [None]:
# Retrieve each business & display its attributes

# Note that 'id' attribute is important - it is later used to search Business Reviews API
# to retrieve review comments

for result in business_search_results['businesses']:
    print("============== business ============")
    print("id: ", result['id'])
    print("name: ", result['name'])
    print("# of reviews: ", result['review_count'])
    print("rating: ", result['rating'])
    if 'price' in result:
        print("price: ", result['price'])
    else:
        print("price: NOT_AVAILABLE")
    print("location: ", result['location']['display_address'])

Below, we show how to retrieve user "reviews" for a specific business (identified by business ID).

We were able to retrieve IDs of businesses.

If you know a specific business' id, you can call Yelp's Business Reviews API,
and retrieve user reviews.

In [None]:
# Define a business ID
business_id = 'DhCJ7D47swvT5DdsC0PCGQ'

In [None]:
# Change ENDPOINT to Business Reviews API
ENDPOINT = 'https://api.yelp.com/v3/businesses/{}/reviews'.format(business_id)

In [None]:
# Define my parameters of the search
# BUSINESS SEARCH PARAMETERS - EXAMPLE
# PARAMETERS = {'term': 'food',
#              'limit': 50,
#              'offset': 50,
#              'radius': 10000,
#              'location': 'San Diego'}

# BUSINESS MATCH PARAMETERS - EXAMPLE
#PARAMETERS = {'name': 'Peets Coffee & Tea',
#              'address1': '7845 Highland Village Pl',
#              'city': 'San Diego',
#              'state': 'CA',
#              'country': 'US'}

In [None]:
# Make a request to the Yelp API
# response = requests.get(url = ENDPOINT,
#                         params = PARAMETERS,
#                         headers = HEADERS)

response = requests.get(url = ENDPOINT,
                        headers = HEADERS)

# Conver the JSON String
business_data = response.json()

# print the response
print(json.dumps(business_data, indent = 3))

In [None]:
# Retrieve only reviews
for review in business_data['reviews']:
    print("========== review ==========")
    print(review['id'])
    print(review['url'])
    print(review['text'])