# Yelp Dataset Challenge
---
Timothy Helton

Yelp is a website that allows patrons to review restaurants they have been to. The company runs a regular challenge to see if anyone can derive additional insights from the raw user reviews.
More information about the challenge may be found
[here](https://www.yelp.com/dataset_challenge).

---
*For excerises 1-4, use the Yelp business json file. For exercises 5-6, use the Yelp review json file.*

---
<br>
<font color="red">
    NOTE:
    <br>
    This notebook uses code found in the
    <a href="https://github.com/TimothyHelton/k2datascience/blob/master/k2datascience/yelp.py">
    <strong>k2datascience.yelp</strong></a> module.
    To execute all the cells do one of the following items:
    <ul>
        <li>Install the k2datascience package to the active Python interpreter.</li>
        <li>Add k2datascience/k2datascience to the PYTHON_PATH system variable.</li>
        <li>Create a link to the yelp.py file in the same directory as this notebook.</li>
</font>

---

### Imports

In [1]:
from k2datascience import yelp

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
%matplotlib inline

---
### Load Data

#### Create Instance of Yelp Class

In [2]:
ydc = yelp.YDC()

In [3]:
ydc.load_data()

04/30/2017 08:52:54      INFO  -> root <- (line: 100) File Loaded: yelp_academic_dataset_business.json



In [4]:
business = ydc.file_data['business']
business.shape
business.head()
business.tail()

(85901, 15)

Unnamed: 0,attributes,business_id,categories,city,full_address,hours,latitude,longitude,name,neighborhoods,open,review_count,stars,state,type
0,"{'Take-out': True, 'Drive-Thru': False, 'Good ...",5UmKMjUEUNdYWqANhGckJw,"[Fast Food, Restaurants]",Dravosburg,"4734 Lebanon Church Rd\nDravosburg, PA 15034","{'Friday': {'close': '21:00', 'open': '11:00'}...",40.354327,-79.900706,Mr Hoagie,[],True,7,3.5,PA,business
1,"{'Happy Hour': True, 'Accepts Credit Cards': T...",UsFtqoBl7naz8AVUBZMjQQ,[Nightlife],Dravosburg,"202 McClure St\nDravosburg, PA 15034",{},40.350553,-79.886814,Clancy's Pub,[],True,5,3.0,PA,business
2,{'Good for Kids': True},cE27W9VPgO88Qxe4ol6y_g,"[Active Life, Mini Golf, Golf]",Bethel Park,"1530 Hamilton Rd\nBethel Park, PA 15234",{},40.354115,-80.01466,Cool Springs Golf Center,[],False,5,2.5,PA,business
3,"{'Alcohol': 'full_bar', 'Noise Level': 'averag...",mVHrayjG3uZ_RLHkLj-AMg,"[Bars, American (New), Nightlife, Lounges, Res...",Braddock,"414 Hawkins Ave\nBraddock, PA 15104","{'Tuesday': {'close': '19:00', 'open': '10:00'...",40.40883,-79.866211,Emil's Lounge,[],True,26,4.5,PA,business
4,"{'Parking': {'garage': False, 'street': False,...",mYSpR_SLPgUVymYOvTQd_Q,"[Active Life, Golf]",Braddock,"1000 Clubhouse Dr\nBraddock, PA 15104","{'Sunday': {'close': '15:00', 'open': '10:00'}...",40.403405,-79.855782,Grand View Golf Club,[],True,3,5.0,PA,business


Unnamed: 0,attributes,business_id,categories,city,full_address,hours,latitude,longitude,name,neighborhoods,open,review_count,stars,state,type
85896,{'Accepts Credit Cards': True},m7-3lyY0CJEhePfJKWtD3w,"[Bridal, Fashion, Shopping, Formal Wear]",Las Vegas,"3899 East Sunset Rd\nSte 105\nLas Vegas, NV 89120","{'Tuesday': {'close': '18:00', 'open': '10:00'...",36.070535,-115.089318,Bowties Bridal,[],True,61,4.0,NV,business
85897,"{'Take-out': True, 'Wi-Fi': 'no', 'Good For': ...",g0vvhkZWZKlwF8BUeSPaTA,"[Mexican, Restaurants]",Goodyear,"525 N Estrella Pkwy\nSte 100\nGoodyear, AZ 85338",{},33.452205,-112.392009,Senor Taco,[],True,89,3.5,AZ,business
85898,{},46L_7y9QXffPpOaXNLX8hg,"[Car Wash, Automotive]",Phoenix,"9215 North 7th St\nPhoenix, AZ 85020","{'Monday': {'close': '18:00', 'open': '07:00'}...",33.570417,-112.064854,Cobblestone Auto Spa,[],True,7,3.0,AZ,business
85899,"{'Accepts Credit Cards': True, 'Wi-Fi': 'free'...",HuLzZUBkHEcHk6ETDJIVhQ,"[Home Services, Real Estate, Apartments]",Edinburgh,16 Waterloo Place\nOld Town\nEdinburgh EH1 3EG,{},55.953447,-3.186813,Princess Street Suites,[Old Town],True,5,4.0,EDH,business
85900,{},DH2Ujt_hwcMBIz8VvCb0Lg,"[Mexican, Restaurants]",Charlotte,Charlotte Douglas International Airport Termin...,{},35.224223,-80.94029,Salsarita's Express,[],True,57,2.5,NC,business


---
### Exercise 1: Create a new column that contains only the zipcode.

In [5]:
ydc.get_zip_codes()
business.head()

Unnamed: 0,attributes,business_id,categories,city,full_address,hours,latitude,longitude,name,neighborhoods,open,review_count,stars,state,type,zip_code
0,"{'Take-out': True, 'Drive-Thru': False, 'Good ...",5UmKMjUEUNdYWqANhGckJw,"[Fast Food, Restaurants]",Dravosburg,"4734 Lebanon Church Rd\nDravosburg, PA 15034","{'Friday': {'close': '21:00', 'open': '11:00'}...",40.354327,-79.900706,Mr Hoagie,[],True,7,3.5,PA,business,15034
1,"{'Happy Hour': True, 'Accepts Credit Cards': T...",UsFtqoBl7naz8AVUBZMjQQ,[Nightlife],Dravosburg,"202 McClure St\nDravosburg, PA 15034",{},40.350553,-79.886814,Clancy's Pub,[],True,5,3.0,PA,business,15034
2,{'Good for Kids': True},cE27W9VPgO88Qxe4ol6y_g,"[Active Life, Mini Golf, Golf]",Bethel Park,"1530 Hamilton Rd\nBethel Park, PA 15234",{},40.354115,-80.01466,Cool Springs Golf Center,[],False,5,2.5,PA,business,15234
3,"{'Alcohol': 'full_bar', 'Noise Level': 'averag...",mVHrayjG3uZ_RLHkLj-AMg,"[Bars, American (New), Nightlife, Lounges, Res...",Braddock,"414 Hawkins Ave\nBraddock, PA 15104","{'Tuesday': {'close': '19:00', 'open': '10:00'...",40.40883,-79.866211,Emil's Lounge,[],True,26,4.5,PA,business,15104
4,"{'Parking': {'garage': False, 'street': False,...",mYSpR_SLPgUVymYOvTQd_Q,"[Active Life, Golf]",Braddock,"1000 Clubhouse Dr\nBraddock, PA 15104","{'Sunday': {'close': '15:00', 'open': '10:00'}...",40.403405,-79.855782,Grand View Golf Club,[],True,3,5.0,PA,business,15104


---
### Exercise 2: The table contains a column called 'categories' and each entry in this column is populated by a list. We are interested in those businesses that are restaurants. Create a new column 'Restaurant_type' that contains a description of the restaurant based on the other elements of 'categories. 
### That is, if we have '[Sushi Bars, Japanese, Restaurants]' in categories the 'Restaurant_type will be '{'SushiBars': 1, 'Japanese': 1, 'Mexican': 0, ...}'

In [6]:
ydc.get_restaurant_type()
business.head()

Unnamed: 0,attributes,business_id,categories,city,full_address,hours,latitude,longitude,name,neighborhoods,open,review_count,stars,state,type,zip_code,restaurant_type
0,"{'Take-out': True, 'Drive-Thru': False, 'Good ...",5UmKMjUEUNdYWqANhGckJw,"[Fast Food, Restaurants]",Dravosburg,"4734 Lebanon Church Rd\nDravosburg, PA 15034","{'Friday': {'close': '21:00', 'open': '11:00'}...",40.354327,-79.900706,Mr Hoagie,[],True,7,3.5,PA,business,15034,"{'Fast Food': 1, 'Bars': 0, 'American (New)': ..."
1,"{'Happy Hour': True, 'Accepts Credit Cards': T...",UsFtqoBl7naz8AVUBZMjQQ,[Nightlife],Dravosburg,"202 McClure St\nDravosburg, PA 15034",{},40.350553,-79.886814,Clancy's Pub,[],True,5,3.0,PA,business,15034,remove
2,{'Good for Kids': True},cE27W9VPgO88Qxe4ol6y_g,"[Active Life, Mini Golf, Golf]",Bethel Park,"1530 Hamilton Rd\nBethel Park, PA 15234",{},40.354115,-80.01466,Cool Springs Golf Center,[],False,5,2.5,PA,business,15234,remove
3,"{'Alcohol': 'full_bar', 'Noise Level': 'averag...",mVHrayjG3uZ_RLHkLj-AMg,"[Bars, American (New), Nightlife, Lounges, Res...",Braddock,"414 Hawkins Ave\nBraddock, PA 15104","{'Tuesday': {'close': '19:00', 'open': '10:00'...",40.40883,-79.866211,Emil's Lounge,[],True,26,4.5,PA,business,15104,"{'Fast Food': 0, 'Bars': 1, 'American (New)': ..."
4,"{'Parking': {'garage': False, 'street': False,...",mYSpR_SLPgUVymYOvTQd_Q,"[Active Life, Golf]",Braddock,"1000 Clubhouse Dr\nBraddock, PA 15104","{'Sunday': {'close': '15:00', 'open': '10:00'}...",40.403405,-79.855782,Grand View Golf Club,[],True,3,5.0,PA,business,15104,remove


In [7]:
business.restaurant_type.ix[0]

{'Active Life': 0,
 'Adult Entertainment': 0,
 'Afghan': 0,
 'African': 0,
 'Alsatian': 0,
 'Amateur Sports Teams': 0,
 'American (New)': 0,
 'American (Traditional)': 0,
 'Amusement Parks': 0,
 'Antiques': 0,
 'Appliances': 0,
 'Arabian': 0,
 'Arcades': 0,
 'Argentine': 0,
 'Art Galleries': 0,
 'Arts & Crafts': 0,
 'Arts & Entertainment': 0,
 'Asian Fusion': 0,
 'Australian': 0,
 'Austrian': 0,
 'Automotive': 0,
 'Baden': 0,
 'Bagels': 0,
 'Bakeries': 0,
 'Bangladeshi': 0,
 'Banks & Credit Unions': 0,
 'Barbeque': 0,
 'Bars': 0,
 'Bartenders': 0,
 'Basque': 0,
 'Bavarian': 0,
 'Beauty & Spas': 0,
 'Bed & Breakfast': 0,
 'Beer Bar': 0,
 'Beer Garden': 0,
 'Beer Gardens': 0,
 'Beer Hall': 0,
 'Beer, Wine & Spirits': 0,
 'Belgian': 0,
 'Bikes': 0,
 'Bistros': 0,
 'Boating': 0,
 'Books, Mags, Music & Video': 0,
 'Bookstores': 0,
 'Bowling': 0,
 'Brasseries': 0,
 'Brazilian': 0,
 'Breakfast & Brunch': 0,
 'Breweries': 0,
 'British': 0,
 'Bubble Tea': 0,
 'Buffets': 0,
 'Burgers': 0,
 'Burm

---
### Exercise 3: Lets clean the 'attributes' column. The entries in this column are dictionaries. We need to do two things: 
#### 1) Turn all the True or False in the dictionary to 1s and 0s.
#### 2) There are some entries within dictionaries that are dictionaries themselves, lets turn the whole entry into just one dictionary, for example if we have 
##### '{'Accepts Credit Cards': True, 'Alcohol': 'none','Ambience': {'casual': False,'classy': False}}' 
##### then turn it into
##### '{'Accepts Credit Cards':1, 'Alcohol_none': 1, 'Ambience_casual': 0, 'Ambience_classy': 0}'. 
##### There might be other entries like {'Price Range': 1} where the values are numerical so we might want to change that into {'Price_Range_1': 1}.

*The reason we modify categorical variables like this is that machine learning algorithms cannot interpret textual data like "True" and "False". They need numerical inputs such as 1 and 0.*

In [8]:
business.attributes = yelp.convert_boolean(business.attributes)
business.head()

Unnamed: 0,attributes,business_id,categories,city,full_address,hours,latitude,longitude,name,neighborhoods,open,review_count,stars,state,type,zip_code,restaurant_type
0,"{'Take-out': 1, 'Drive-Thru': 0, 'Good For': {...",5UmKMjUEUNdYWqANhGckJw,"[Fast Food, Restaurants]",Dravosburg,"4734 Lebanon Church Rd\nDravosburg, PA 15034","{'Friday': {'close': '21:00', 'open': '11:00'}...",40.354327,-79.900706,Mr Hoagie,[],True,7,3.5,PA,business,15034,"{'Fast Food': 1, 'Bars': 0, 'American (New)': ..."
1,"{'Happy Hour': 1, 'Accepts Credit Cards': 1, '...",UsFtqoBl7naz8AVUBZMjQQ,[Nightlife],Dravosburg,"202 McClure St\nDravosburg, PA 15034",{},40.350553,-79.886814,Clancy's Pub,[],True,5,3.0,PA,business,15034,remove
2,{'Good for Kids': 1},cE27W9VPgO88Qxe4ol6y_g,"[Active Life, Mini Golf, Golf]",Bethel Park,"1530 Hamilton Rd\nBethel Park, PA 15234",{},40.354115,-80.01466,Cool Springs Golf Center,[],False,5,2.5,PA,business,15234,remove
3,"{'Has TV': 1, 'Ambience': {'romantic': 0, 'int...",mVHrayjG3uZ_RLHkLj-AMg,"[Bars, American (New), Nightlife, Lounges, Res...",Braddock,"414 Hawkins Ave\nBraddock, PA 15104","{'Tuesday': {'close': '19:00', 'open': '10:00'...",40.40883,-79.866211,Emil's Lounge,[],True,26,4.5,PA,business,15104,"{'Fast Food': 0, 'Bars': 1, 'American (New)': ..."
4,"{'Parking': {'garage': 0, 'street': 0, 'valida...",mYSpR_SLPgUVymYOvTQd_Q,"[Active Life, Golf]",Braddock,"1000 Clubhouse Dr\nBraddock, PA 15104","{'Sunday': {'close': '15:00', 'open': '10:00'}...",40.403405,-79.855782,Grand View Golf Club,[],True,3,5.0,PA,business,15104,remove


In [9]:
business.attributes.ix[0]

{'Accepts Credit Cards': 1,
 'Alcohol_none': 1,
 'Ambience': {'casual': 0,
  'classy': 0,
  'divey': 0,
  'hipster': 0,
  'intimate': 0,
  'romantic': 0,
  'touristy': 0,
  'trendy': 0,
  'upscale': 0},
 'Attire_casual': 1,
 'Caters': 0,
 'Delivery': 0,
 'Drive-Thru': 0,
 'Good For': {'breakfast': 0,
  'brunch': 0,
  'dessert': 0,
  'dinner': 0,
  'latenight': 0,
  'lunch': 0},
 'Good For Groups': 1,
 'Good for Kids': 1,
 'Has TV': 0,
 'Noise Level_average': 1,
 'Outdoor Seating': 0,
 'Parking': {'garage': 0, 'lot': 0, 'street': 0, 'valet': 0, 'validated': 0},
 'Price Range_1': 1,
 'Take-out': 1,
 'Takes Reservations': 0,
 'Waiter Service': 0}

---
### Exercise 4: Create a new column for every day of the week and fill it with the amount of hours the business is open that day.

*Your approach should handle businesses that stay open late like bars and nightclubs.*

In [10]:
ydc.calc_open_hours()
business.head(6)

Unnamed: 0,attributes,business_id,categories,city,full_address,hours,latitude,longitude,name,neighborhoods,...,type,zip_code,restaurant_type,Sunday,Monday,Tuesday,Wednesday,Thursday,Friday,Saturday
0,"{'Take-out': 1, 'Drive-Thru': 0, 'Good For': {...",5UmKMjUEUNdYWqANhGckJw,"[Fast Food, Restaurants]",Dravosburg,"4734 Lebanon Church Rd\nDravosburg, PA 15034","{'Friday': {'close': '21:00', 'open': '11:00'}...",40.354327,-79.900706,Mr Hoagie,[],...,business,15034,"{'Fast Food': 1, 'Bars': 0, 'American (New)': ...",0.0,10.0,10.0,10.0,10.0,10.0,0.0
1,"{'Happy Hour': 1, 'Accepts Credit Cards': 1, '...",UsFtqoBl7naz8AVUBZMjQQ,[Nightlife],Dravosburg,"202 McClure St\nDravosburg, PA 15034",{},40.350553,-79.886814,Clancy's Pub,[],...,business,15034,remove,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,{'Good for Kids': 1},cE27W9VPgO88Qxe4ol6y_g,"[Active Life, Mini Golf, Golf]",Bethel Park,"1530 Hamilton Rd\nBethel Park, PA 15234",{},40.354115,-80.01466,Cool Springs Golf Center,[],...,business,15234,remove,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"{'Has TV': 1, 'Ambience': {'romantic': 0, 'int...",mVHrayjG3uZ_RLHkLj-AMg,"[Bars, American (New), Nightlife, Lounges, Res...",Braddock,"414 Hawkins Ave\nBraddock, PA 15104","{'Tuesday': {'close': '19:00', 'open': '10:00'...",40.40883,-79.866211,Emil's Lounge,[],...,business,15104,"{'Fast Food': 0, 'Bars': 1, 'American (New)': ...",0.0,0.0,9.0,9.0,9.0,10.0,6.0
4,"{'Parking': {'garage': 0, 'street': 0, 'valida...",mYSpR_SLPgUVymYOvTQd_Q,"[Active Life, Golf]",Braddock,"1000 Clubhouse Dr\nBraddock, PA 15104","{'Sunday': {'close': '15:00', 'open': '10:00'}...",40.403405,-79.855782,Grand View Golf Club,[],...,business,15104,remove,5.0,0.0,0.0,9.0,9.0,9.0,9.0
5,"{'Has TV': 1, 'Ambience': {'romantic': 0, 'int...",KayYbHCt-RkbGcPdGOThNg,"[Bars, American (Traditional), Nightlife, Rest...",Carnegie,"141 Hawthorne St\nGreentree\nCarnegie, PA 15106","{'Monday': {'close': '02:00', 'open': '11:00'}...",40.415486,-80.067549,Alexion's Bar & Grill,[Greentree],...,business,15106,"{'Fast Food': 0, 'Bars': 1, 'American (New)': ...",10.0,15.0,15.0,15.0,15.0,15.0,14.0


---
### Exercise 5: Create a table with the average review for a business.

*You will need to pull in a new json file and merge DataFrames for the next 2 exercises.*

---
### Exercise 6: Create a new table that only contains restaurants with the following schema:

#### Business_Name | Restaurant_type | Friday hours | Saturday hours | Attributes | Zipcode | Average Rating