We are interested in looking at trends in restaurant openings and closings by neighborhood in New York City.

## Introduction

### Yelp!

Yelp! is an online aggregator of reviews of things. Restaraunts prefigure in their offerings. Yelp! has an extremely popular API for accessing their data, as well as a company-maintained Python driver thereof. You can see an example output [here](https://www.yelp.com/developers/documentation/v2/business).

There's one important limitation to what they provide, however, which is that they will only give you a simple "sample" review in the API results. It remains to be seen whether or not this is the *first* review (which is useful to us, as it will track when the restaraunt opened, potentially *very* closely) or just *any* review (which is totally useless).

At a minimum, Yelp! provides info on whether or not a business still remains open. This is very important to us.

In [8]:
from yelp.client import Client
from yelp.oauth1_authenticator import Oauth1Authenticator
from yelp.errors import BusinessUnavailable
import os
import json

def import_credentials(filename='../data/yelp_credentials.json'):
    try:
        data = json.load(open(filename))
        return data
    except:
        raise IOError('This API requires Yelp credentials to work. Did you forget to define them?')

credentials = import_credentials()

auth = Oauth1Authenticator(
    consumer_key=credentials['consumer_key'],
    consumer_secret=credentials['consumer_secret'],
    token=credentials['token'],
    token_secret=credentials['token_secret']
)

client = Client(auth)

There's no immediate way to match the CAMIS IDs to Yelp! IDs, since Yelp! has its own (restaurant-name-neighborhood) ID scheme. Surprisingly enough, the DBA (doing business as) isn't as much help as you would think, either, at least not from initial probing.

However, a phone search appears to be accurate. Join-by-phone-number? Not something I expected going into this! But it does make sense; restaurants are loath to change their phone number after all, as that's a lot of customers that they'd lose contact with.

In [40]:
client.phone_search("7185925000")

<yelp.obj.search_response.SearchResponse at 0x546cf98>

In [41]:
response = _

In [43]:
response.businesses[0].url

'https://www.yelp.com/biz/terrace-on-the-park-corona?adjust_creative=dkJPGu_jtTyHwsEgZIZN6g&utm_campaign=yelp_api&utm_medium=api_v2_phone_search&utm_source=dkJPGu_jtTyHwsEgZIZN6g'

Unfortunately the phone search result includes a `review_count` but not any review snippet. To get that we have to query the busienss API seperately.

In [94]:
client.search("terrace-in-the-park-corona")

<yelp.obj.search_response.SearchResponse at 0x6279fd0>

In [95]:
reponse = _

In [96]:
business = response.businesses[0]

In [98]:
business.url

'https://www.yelp.com/biz/yelps-12th-burstday-aboard-the-zephyr-new-york?adjust_creative=dkJPGu_jtTyHwsEgZIZN6g&utm_campaign=yelp_api&utm_medium=api_v2_search&utm_source=dkJPGu_jtTyHwsEgZIZN6g'

In [99]:
business.is_closed

False

In [100]:
business.name

"Yelp's 12th Burstday Aboard THE ZEPHYR"

In [101]:
business.review_count

69

In [102]:
business.menu_date_updated

In [103]:
business.reviews

In [104]:
business.reviews == None

True

No review response at all! Huh.

In [66]:
client.search("taqueria-tehuitzingo-new-york")

<yelp.obj.search_response.SearchResponse at 0x5496b00>

In [67]:
response = _

In [70]:
response.businesses[0].url

'https://www.yelp.com/biz/yelps-12th-burstday-aboard-the-zephyr-new-york?adjust_creative=dkJPGu_jtTyHwsEgZIZN6g&utm_campaign=yelp_api&utm_medium=api_v2_search&utm_source=dkJPGu_jtTyHwsEgZIZN6g'

In [71]:
response.businesses[0].reviews

In [72]:
response.businesses[0].review_count

69

Looks like at some point Yelp! API output included a sample review, but that was removed, and the docs haven't been updated to reflect this fact.

Yelp! in general is very aggressive about monetizing their reviews. Even this barest of information isn't included in the API output because they probably decided it was already TMI.

You can scrape this off of their website pretty easily by doing a `date_asc` search ([example](https://www.yelp.com/biz/terrace-on-the-park-corona?sort_by=date_asc)) and then grabbing the first review on the page.

The following demos this.

In [73]:
import requests

In [78]:
taq_r = requests.get("https://www.yelp.com/biz/taqueria-tehuitzingo-new-york?sort_by=date_asc")

In [77]:
import bs4

In [80]:
taq = bs4.BeautifulSoup(taq_r.content, "html.parser")

In [86]:
taq.find_all("div", {'class': "review-content"})[0]\
    .find("meta", {'itemprop': 'datePublished'})\
    .text\
    .strip()

'2/10/2014'

There's talk online that lots of requests against the Yelp! API will get you IP banned (again, they defend this stuff really fiercly). We might test that.