# Machine Learning For Local Restaurant Price Assessment And Prediction

## 1. Generate pandas.DataFrame from json

In this section, we generate dataframe from the data we collected using Yelp API. Explanation of each features can be found at [https://www.yelp.com/developers/documentation/v3/business_search].

In [1]:
import pandas as pd
import json

# Read data
with open('./Dataset/restaurants_VA.json', encoding = 'utf8') as inFile:
    data = json.load(inFile)
    
data = pd.DataFrame.from_dict(data)
display(data.head(5))

print("Total {0} samples".format(len(data)))

Unnamed: 0,alias,categories,coordinates,display_phone,distance,id,image_url,is_closed,location,name,phone,price,rating,review_count,transactions,url
0,the-dutch-treat-rose-hill,"[{'alias': 'delis', 'title': 'Delis'}, {'alias...","{'latitude': 36.6903325, 'longitude': -83.3104...",(276) 445-4024,19538.084959,rd8oYQOtyc4LxjPQp8Muvw,,False,"{'address1': '21332 Wilderness Rd', 'address2'...",The Dutch Treat,12764454024,,5.0,2,[],https://www.yelp.com/biz/the-dutch-treat-rose-...
1,a-better-burger-jonesville,"[{'alias': 'burgers', 'title': 'Burgers'}, {'a...","{'latitude': 36.689639, 'longitude': -83.108766}",(276) 346-6768,34871.790634,bwCj2AcoOroZfCTxb6rCcg,https://s3-media2.fl.yelpcdn.com/bphoto/3KS3Xs...,False,"{'address1': '33739 Main St', 'address2': 'Ste...",A Better Burger,12763466768,$$,3.5,6,[],https://www.yelp.com/biz/a-better-burger-jones...
2,el-castillo-jonesville,"[{'alias': 'mexican', 'title': 'Mexican'}]","{'latitude': 36.7263373464484, 'longitude': -8...",(276) 346-4000,37317.843873,S9S9kFJSkmfpbjFForCWLQ,https://s3-media1.fl.yelpcdn.com/bphoto/NGC_GJ...,False,"{'address1': '236 Trade Center Ln', 'address2'...",El Castillo,12763464000,$,4.0,2,[],https://www.yelp.com/biz/el-castillo-jonesvill...
3,el-centenario-pennington-gap,"[{'alias': 'mexican', 'title': 'Mexican'}]","{'latitude': 36.7602500915527, 'longitude': -8...",(276) 546-0044,36034.813254,XFksdPFZhPHk458C0pl0Cg,https://s3-media1.fl.yelpcdn.com/bphoto/G5XFTv...,False,"{'address1': '930 E Morgan Ave', 'address2': N...",El Centenario,12765460044,,5.0,3,[],https://www.yelp.com/biz/el-centenario-penning...
4,rubys-country-steak-house-pennington-gap,"[{'alias': 'restaurants', 'title': 'Restaurant...","{'latitude': 36.7624955624342, 'longitude': -8...",(276) 546-6900,38271.262707,AsZk7i1UyQSElluN_ixSPQ,https://s3-media3.fl.yelpcdn.com/bphoto/n6JFx4...,False,"{'address1': '131 Industrial Dr', 'address2': ...",Rubys Country Steak House,12765466900,,3.5,2,[],https://www.yelp.com/biz/rubys-country-steak-h...


Total 6413 samples


In this step, we drop the not very useful features and extract some important features from a json format column.

In [2]:
# drop columns: 'alias', 'is_closed', 'url', 'transactions', 'phone', 'display_phone', 'distance' 
data = data.drop(columns=['alias', 'is_closed', 'url', 'transactions', 'phone', 'display_phone', 'distance'])

# extract latitude and longitude values
coords = data['coordinates'].tolist()
coords_json = str(coords).replace("'", "\"")
df = pd.DataFrame(json.loads(coords_json))
data['latitude'] = df['latitude']
data['longitude'] = df['longitude']

# extract city info
locations = data['location'].tolist()
cities = []
for loc in locations:
    cities.append(loc['city'])
data['city'] = cities

# drop columns: 'coordinates', 'location'
data = data.drop(columns=['coordinates', 'location'])
data.head()

Unnamed: 0,categories,id,image_url,name,price,rating,review_count,latitude,longitude,city
0,"[{'alias': 'delis', 'title': 'Delis'}, {'alias...",rd8oYQOtyc4LxjPQp8Muvw,,The Dutch Treat,,5.0,2,36.690332,-83.310449,Rose Hill
1,"[{'alias': 'burgers', 'title': 'Burgers'}, {'a...",bwCj2AcoOroZfCTxb6rCcg,https://s3-media2.fl.yelpcdn.com/bphoto/3KS3Xs...,A Better Burger,$$,3.5,6,36.689639,-83.108766,Jonesville
2,"[{'alias': 'mexican', 'title': 'Mexican'}]",S9S9kFJSkmfpbjFForCWLQ,https://s3-media1.fl.yelpcdn.com/bphoto/NGC_GJ...,El Castillo,$,4.0,2,36.726337,-83.099858,Jonesville
3,"[{'alias': 'mexican', 'title': 'Mexican'}]",XFksdPFZhPHk458C0pl0Cg,https://s3-media1.fl.yelpcdn.com/bphoto/G5XFTv...,El Centenario,,5.0,3,36.76025,-83.023682,Pennington Gap
4,"[{'alias': 'restaurants', 'title': 'Restaurant...",AsZk7i1UyQSElluN_ixSPQ,https://s3-media3.fl.yelpcdn.com/bphoto/n6JFx4...,Rubys Country Steak House,,3.5,2,36.762496,-83.017144,Pennington Gap


In [3]:
cate_info = data.categories
for cate in cate_info:
    print(cate)

[{'alias': 'delis', 'title': 'Delis'}, {'alias': 'cheese', 'title': 'Cheese Shops'}, {'alias': 'meats', 'title': 'Meat Shops'}]
[{'alias': 'burgers', 'title': 'Burgers'}, {'alias': 'hotdog', 'title': 'Hot Dogs'}, {'alias': 'wraps', 'title': 'Wraps'}]
[{'alias': 'mexican', 'title': 'Mexican'}]
[{'alias': 'mexican', 'title': 'Mexican'}]
[{'alias': 'restaurants', 'title': 'Restaurants'}]
[{'alias': 'salad', 'title': 'Salad'}, {'alias': 'burgers', 'title': 'Burgers'}, {'alias': 'sandwiches', 'title': 'Sandwiches'}]
[{'alias': 'restaurants', 'title': 'Restaurants'}]
[{'alias': 'burgers', 'title': 'Burgers'}, {'alias': 'tradamerican', 'title': 'American (Traditional)'}, {'alias': 'hotdog', 'title': 'Hot Dogs'}]
[{'alias': 'newamerican', 'title': 'American (New)'}]
[{'alias': 'burgers', 'title': 'Burgers'}]
[{'alias': 'chicken_wings', 'title': 'Chicken Wings'}, {'alias': 'pizza', 'title': 'Pizza'}, {'alias': 'bbq', 'title': 'Barbeque'}]
[{'alias': 'restaurants', 'title': 'Restaurants'}]
[{'al

[{'alias': 'newamerican', 'title': 'American (New)'}, {'alias': 'pizza', 'title': 'Pizza'}, {'alias': 'cocktailbars', 'title': 'Cocktail Bars'}]
[{'alias': 'burgers', 'title': 'Burgers'}]
[{'alias': 'newamerican', 'title': 'American (New)'}]
[{'alias': 'caribbean', 'title': 'Caribbean'}]
[{'alias': 'catering', 'title': 'Caterers'}, {'alias': 'bbq', 'title': 'Barbeque'}]
[{'alias': 'newamerican', 'title': 'American (New)'}, {'alias': 'wine_bars', 'title': 'Wine Bars'}, {'alias': 'venues', 'title': 'Venues & Event Spaces'}]
[{'alias': 'cocktailbars', 'title': 'Cocktail Bars'}, {'alias': 'newamerican', 'title': 'American (New)'}, {'alias': 'seafood', 'title': 'Seafood'}]
[{'alias': 'seafood', 'title': 'Seafood'}, {'alias': 'tradamerican', 'title': 'American (Traditional)'}, {'alias': 'breakfast_brunch', 'title': 'Breakfast & Brunch'}]
[{'alias': 'newamerican', 'title': 'American (New)'}, {'alias': 'cocktailbars', 'title': 'Cocktail Bars'}, {'alias': 'beerbar', 'title': 'Beer Bar'}]
[{'ali

[{'alias': 'newamerican', 'title': 'American (New)'}, {'alias': 'cocktailbars', 'title': 'Cocktail Bars'}]
[{'alias': 'french', 'title': 'French'}, {'alias': 'mediterranean', 'title': 'Mediterranean'}]
[{'alias': 'bars', 'title': 'Bars'}, {'alias': 'raw_food', 'title': 'Live/Raw Food'}, {'alias': 'seafood', 'title': 'Seafood'}]
[{'alias': 'mexican', 'title': 'Mexican'}, {'alias': 'vegan', 'title': 'Vegan'}, {'alias': 'vegetarian', 'title': 'Vegetarian'}]
[{'alias': 'tradamerican', 'title': 'American (Traditional)'}]
[{'alias': 'mexican', 'title': 'Mexican'}]
[{'alias': 'thai', 'title': 'Thai'}]
[{'alias': 'newamerican', 'title': 'American (New)'}]
[{'alias': 'chicken_wings', 'title': 'Chicken Wings'}, {'alias': 'sandwiches', 'title': 'Sandwiches'}, {'alias': 'hotdog', 'title': 'Hot Dogs'}]
[{'alias': 'asianfusion', 'title': 'Asian Fusion'}, {'alias': 'chinese', 'title': 'Chinese'}, {'alias': 'thai', 'title': 'Thai'}]
[{'alias': 'caribbean', 'title': 'Caribbean'}]
[{'alias': 'musicvenue

[{'alias': 'bbq', 'title': 'Barbeque'}, {'alias': 'cajun', 'title': 'Cajun/Creole'}]
[{'alias': 'diners', 'title': 'Diners'}, {'alias': 'breakfast_brunch', 'title': 'Breakfast & Brunch'}, {'alias': 'tradamerican', 'title': 'American (Traditional)'}]
[{'alias': 'coffee', 'title': 'Coffee & Tea'}, {'alias': 'icecream', 'title': 'Ice Cream & Frozen Yogurt'}, {'alias': 'creperies', 'title': 'Creperies'}]
[{'alias': 'sushi', 'title': 'Sushi Bars'}, {'alias': 'tradamerican', 'title': 'American (Traditional)'}]
[{'alias': 'tapas', 'title': 'Tapas Bars'}]
[{'alias': 'sushi', 'title': 'Sushi Bars'}]
[{'alias': 'pizza', 'title': 'Pizza'}, {'alias': 'tradamerican', 'title': 'American (Traditional)'}, {'alias': 'bars', 'title': 'Bars'}]
[{'alias': 'greek', 'title': 'Greek'}, {'alias': 'mediterranean', 'title': 'Mediterranean'}, {'alias': 'tradamerican', 'title': 'American (Traditional)'}]
[{'alias': 'tradamerican', 'title': 'American (Traditional)'}, {'alias': 'hotdogs', 'title': 'Fast Food'}]
[{'

[{'alias': 'newamerican', 'title': 'American (New)'}, {'alias': 'bars', 'title': 'Bars'}]
[{'alias': 'southern', 'title': 'Southern'}]
[{'alias': 'french', 'title': 'French'}]
[{'alias': 'pizza', 'title': 'Pizza'}, {'alias': 'beerbar', 'title': 'Beer Bar'}]
[{'alias': 'burgers', 'title': 'Burgers'}, {'alias': 'tradamerican', 'title': 'American (Traditional)'}]
[{'alias': 'italian', 'title': 'Italian'}]
[{'alias': 'burgers', 'title': 'Burgers'}, {'alias': 'sandwiches', 'title': 'Sandwiches'}, {'alias': 'cocktailbars', 'title': 'Cocktail Bars'}]
[{'alias': 'cocktailbars', 'title': 'Cocktail Bars'}, {'alias': 'asianfusion', 'title': 'Asian Fusion'}]
[{'alias': 'newamerican', 'title': 'American (New)'}]
[{'alias': 'pubs', 'title': 'Pubs'}, {'alias': 'british', 'title': 'British'}]
[{'alias': 'healthmarkets', 'title': 'Health Markets'}, {'alias': 'salad', 'title': 'Salad'}, {'alias': 'hotdogs', 'title': 'Fast Food'}]
[{'alias': 'southern', 'title': 'Southern'}]
[{'alias': 'soulfood', 'title

[{'alias': 'tradamerican', 'title': 'American (Traditional)'}, {'alias': 'sportsbars', 'title': 'Sports Bars'}, {'alias': 'burgers', 'title': 'Burgers'}]
[{'alias': 'salad', 'title': 'Salad'}, {'alias': 'soup', 'title': 'Soup'}, {'alias': 'sandwiches', 'title': 'Sandwiches'}]
[{'alias': 'pizza', 'title': 'Pizza'}, {'alias': 'salad', 'title': 'Salad'}, {'alias': 'coffee', 'title': 'Coffee & Tea'}]
[{'alias': 'hotdogs', 'title': 'Fast Food'}, {'alias': 'burgers', 'title': 'Burgers'}, {'alias': 'hotdog', 'title': 'Hot Dogs'}]
[{'alias': 'foodtrucks', 'title': 'Food Trucks'}, {'alias': 'catering', 'title': 'Caterers'}, {'alias': 'pizza', 'title': 'Pizza'}]
[{'alias': 'diners', 'title': 'Diners'}, {'alias': 'tradamerican', 'title': 'American (Traditional)'}]
[{'alias': 'pizza', 'title': 'Pizza'}]
[{'alias': 'coffee', 'title': 'Coffee & Tea'}, {'alias': 'bakeries', 'title': 'Bakeries'}, {'alias': 'cafes', 'title': 'Cafes'}]
[{'alias': 'pizza', 'title': 'Pizza'}, {'alias': 'italian', 'title':

[{'alias': 'mexican', 'title': 'Mexican'}]
[{'alias': 'burgers', 'title': 'Burgers'}]
[{'alias': 'gluten_free', 'title': 'Gluten-Free'}, {'alias': 'coffeeroasteries', 'title': 'Coffee Roasteries'}]
[{'alias': 'breweries', 'title': 'Breweries'}, {'alias': 'newamerican', 'title': 'American (New)'}, {'alias': 'musicvenues', 'title': 'Music Venues'}]
[{'alias': 'southern', 'title': 'Southern'}, {'alias': 'soulfood', 'title': 'Soul Food'}, {'alias': 'bbq', 'title': 'Barbeque'}]
[{'alias': 'tradamerican', 'title': 'American (Traditional)'}]
[{'alias': 'bars', 'title': 'Bars'}, {'alias': 'newamerican', 'title': 'American (New)'}]
[{'alias': 'breakfast_brunch', 'title': 'Breakfast & Brunch'}, {'alias': 'tradamerican', 'title': 'American (Traditional)'}]
[{'alias': 'indpak', 'title': 'Indian'}]
[{'alias': 'japanese', 'title': 'Japanese'}, {'alias': 'sushi', 'title': 'Sushi Bars'}, {'alias': 'steak', 'title': 'Steakhouses'}]
[{'alias': 'cafes', 'title': 'Cafes'}]
[{'alias': 'italian', 'title': '