# KBBQ Hit Location Project

## *Introduction*
#### While travelling in New York, we noticed that there aren't as many Korean BBQ restaurants as other ethnic restaurants such as Itallian or Japanese in Manhattan, New York.   
#### Some of the Korean BBQ restaurants that we visited such as Jongro BBQ on 32nd Street seemed to be very busy and making a lot of profit with fully booked tables and people waiting in line for over an hour to be served.  
#### However, we barely saw Korean BBQ restaurants, if not none, in other parts of Manhattan.  We want to find the best possible location for a Korean BBQ restaurant in a location where population traffic is dense but little or no Korean restaurants are around. 
#### In addition, we want to avoid other types of BBQ restaurants as they could become serious competitors.

## *Data Description*
#### First, we need to find locations of all Korean BBQ restaurants and locations of restaurants that serve similar food.  This can be done using the Foursquare API and using keywords such as BBQ, Restaurants, and etc.  We will be showing the dataset in the Methodology section.
#### Second, we need to determine population traffic at these locations.  We can determine this by looking at rating counts and tip counts.  We will be manipulating location data as well as venue data to merge to get the final dataset.


## *Methodology*
#### We will merge location and venue data and preprocess the merged dataset.  Once preprocessing is done, we will apply 75th percentile condition on rating, rating counts, and tip counts columns to determine which areas can be hot spots.

In [5]:
# Import libraries

import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json
from geopy.geocoders import Nominatim
import requests
from pandas import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium

In [6]:
CLIENT_ID = 'ID'
CLIENT_SECRET = 'SECRET'
ACCESS_TOKEN = 'TOKEN'
VERSION = '20180604'
LIMIT = 1000

### We want to first find where KBBQ and other BBQ restaurants are located in Manhanttan.

In [7]:
address = '22 W 32nd St, New York, NY'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

search_query = 'BBQ'
radius = 10000
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)

results = requests.get(url).json()
venues = results['response']['venues']
dataframe = json_normalize(venues)

filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']


dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]
dataframe_filtered.sort_values(by='distance',ascending=True, inplace=True)
dataframe_filtered = dataframe_filtered[dataframe_filtered['categories'].str.contains('Restaurant',na=False,regex=False)]
dataframe_filtered

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,neighborhood,city,state,country,formattedAddress,id
0,Jongro BBQ,Korean Restaurant,22 W 32nd St Fl 2,btwn Broadway & 5th Ave,40.747574,-73.987043,"[{'label': 'display', 'lat': 40.747574, 'lng':...",10,10001.0,US,Koreatown,New York,NY,United States,"[22 W 32nd St Fl 2 (btwn Broadway & 5th Ave), ...",540f86da498e020149fa7676
22,The Kunjip,Korean Restaurant,32 W 32nd St,btwn Broadway & 5th Ave,40.747945,-73.987134,"[{'label': 'display', 'lat': 40.74794485506209...",46,10001.0,US,,New York,NY,United States,"[32 W 32nd St (btwn Broadway & 5th Ave), New Y...",49e6e227f964a52074641fe3
4,Chung Moo Ro BBQ,Korean Restaurant,10 W 32nd St,5th Ave.,40.747311,-73.986464,"[{'label': 'display', 'lat': 40.74731099999999...",47,10001.0,US,,New York,NY,United States,"[10 W 32nd St (5th Ave.), New York, NY 10001]",4a1e0dd2f964a520c07b1fe3
10,miss KOREA BBQ,Korean Restaurant,10 W 32nd St Fl 3 #1,btwn Broadway & 5th Ave,40.747286,-73.98641,"[{'label': 'display', 'lat': 40.747286, 'lng':...",52,10001.0,US,,New York,NY,United States,[10 W 32nd St Fl 3 #1 (btwn Broadway & 5th Ave...,4c254ba6136d20a19f63e361
1,Samwon Garden BBQ,Korean Restaurant,37 W 32nd St,btwn 5th & 6th Ave,40.74801,-73.98728,"[{'label': 'display', 'lat': 40.74801, 'lng': ...",58,10001.0,US,Koreatown,New York,NY,United States,"[37 W 32nd St (btwn 5th & 6th Ave), New York, ...",5aab0d2ae179107b87768ff8
6,K-Town BBQ (고기주점),Korean Restaurant,2 W 32nd St Frnt 2,,40.747418,-73.986121,"[{'label': 'display', 'lat': 40.74741821848371...",68,10001.0,US,,New York,NY,United States,"[2 W 32nd St Frnt 2, New York, NY 10001]",5a2c7151a22db744e299c521
14,LOVE Korean BBQ,Korean Restaurant,319 5th Ave,,40.747163,-73.985188,"[{'label': 'display', 'lat': 40.747163, 'lng':...",152,10016.0,US,,New York,NY,United States,"[319 5th Ave, New York, NY 10016]",5e3b69b257fcde0008c0c3a3
2,Don's Bogam Korean BBQ & Wine,Korean Restaurant,17 E 32nd St.,Between Madison & 5th,40.746788,-73.984444,"[{'label': 'display', 'lat': 40.74678829586255...",225,10016.0,US,,New York,NY,United States,"[17 E 32nd St. (Between Madison & 5th), New Yo...",41e46880f964a520d81e1fe3
26,Hongchun Korean BBQ,Korean Restaurant,739 Avenue of the Americas,6th And 27th,40.745432,-73.990989,"[{'label': 'display', 'lat': 40.74543229305341...",417,10010.0,US,,New York,NY,United States,"[739 Avenue of the Americas (6th And 27th), Ne...",532891cd498e46e4f3868cca
35,Oppa New Korean BBQ @madsqeats,Korean Restaurant,,,40.742676,-73.98892,"[{'label': 'display', 'lat': 40.742676, 'lng':...",569,,US,,New York,NY,United States,"[New York, NY]",591dece61543c72695710ae0


### The map below shows all BBQ restaurants seem to be located in the midtown of Manhattan.

In [8]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Jongro BBQ
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Jongro BBQ',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the BBQ restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

### Restaurants in Lower Manhattan 

In [9]:
address = '138 Lafayette St, New York, NY 10013'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

search_query = 'Restaurant'
radius = 2000
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)

results = requests.get(url).json()
venues = results['response']['venues']
dataframe = json_normalize(venues)

filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']


dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]
dataframe_filtered.sort_values(by='distance',ascending=True, inplace=True)
dataframe_filtered = dataframe_filtered[dataframe_filtered['categories'].str.contains('Restaurant',na=False,regex=False)]
total_data = dataframe_filtered.copy()

### Restaurants in Upper Manhattan

In [10]:
address = '1544 Madison Ave, New York, NY 10029'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

search_query = 'Restaurant'
radius = 2000
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&oauth_token={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude,ACCESS_TOKEN, VERSION, search_query, radius, LIMIT)

results = requests.get(url).json()
venues = results['response']['venues']
dataframe = json_normalize(venues)

filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']


dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]
dataframe_filtered.sort_values(by='distance',ascending=True, inplace=True)
dataframe_filtered = dataframe_filtered[dataframe_filtered['categories'].str.contains('Restaurant',na=False,regex=False)]
total_data = total_data.append(dataframe_filtered)

### We merge lower and upper manhattan datasets.

In [11]:
total_data = pd.DataFrame(total_data, index=None)
total_data

Unnamed: 0,name,categories,address,crossStreet,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,neighborhood,id
9,Canal Street Seafood Restaurant,Seafood Restaurant,266 Canal St,,40.718768,-74.001015,"[{'label': 'display', 'lat': 40.718768, 'lng':...",74,10013,US,New York,NY,United States,"[266 Canal St, New York, NY 10013]",,4ab80dbff964a520fc7b20e3
13,Canal Best Chinese Restaurant,Chinese Restaurant,266 Canal St,,40.71879,-74.001055,"[{'label': 'display', 'lat': 40.71878993882605...",75,10013,US,New York,NY,United States,"[266 Canal St, New York, NY 10013]",,4d83993c81fdb1f7cf87eabf
12,Sun Sai Gai Restaurant,Chinese Restaurant,220 Canal St,at Baxter St,40.717369,-73.999415,"[{'label': 'display', 'lat': 40.71736941942955...",207,10013,US,New York,NY,United States,"[220 Canal St (at Baxter St), New York, NY 10013]",,4a81ac53f964a5203af71fe3
21,Beijing Pop Kabob Restaurant,Chinese Restaurant,122 Mulberry St,,40.717912,-73.998105,"[{'label': 'display', 'lat': 40.71791177404022...",226,10013,US,New York,NY,United States,"[122 Mulberry St, New York, NY 10013]",,540df8c8498e1524a6acd4f8
8,Puglia Restaurant,Italian Restaurant,189 Hester St,btwn Mott & Mulberry,40.718165,-73.997822,"[{'label': 'display', 'lat': 40.71816511464507...",231,10013,US,New York,NY,United States,"[189 Hester St (btwn Mott & Mulberry), New Yor...",,3fd66200f964a520ade61ee3
20,Lunela Restaurant,Italian Restaurant,173 Mulberry St,,40.720106,-73.997215,"[{'label': 'display', 'lat': 40.720106, 'lng':...",280,10013,US,New York,NY,United States,"[173 Mulberry St, New York, NY 10013]",,4f32b1c219836c91c7f095f1
3,Galli Restaurant,Italian Restaurant,45 Mercer St,Broome & Grand Streets,40.721607,-74.001235,"[{'label': 'display', 'lat': 40.72160721760932...",289,10013,US,New York,NY,United States,"[45 Mercer St (Broome & Grand Streets), New Yo...",,5018507fe4b03a729d0b40f9
2,Royal Seafood Restaurant,Seafood Restaurant,103-105 Mott St,btwn Canal & Hester St,40.717305,-73.997497,"[{'label': 'display', 'lat': 40.71730464970235...",308,10013,US,New York,NY,United States,"[103-105 Mott St (btwn Canal & Hester St), New...",,4bdd7814b0f5c928c4684ce3
7,Shanghai Heping Restaurant,Chinese Restaurant,104 Mott St,btwn Hester & Canal St,40.717438,-73.997347,"[{'label': 'display', 'lat': 40.71743807449281...",309,10013,US,New York,NY,United States,"[104 Mott St (btwn Hester & Canal St), New Yor...",,4f19bd72e4b0808f629111d1
10,Hoy Wong Restaurant 喜喜饭店,Chinese Restaurant,81 Mott St,at Canal St.,40.716607,-73.997912,"[{'label': 'display', 'lat': 40.71660726686447...",342,10013,US,New York,NY,United States,"[81 Mott St (at Canal St.), New York, NY 10013]",,4a7b6d4df964a520fbea1fe3


### We merge the location and venue data here.

In [12]:
def rating_count_extractor(venue_id):
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&oauth_token={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET,ACCESS_TOKEN, VERSION)
    result = requests.get(url).json()
    try:
        num_rated = result['response']['venue'] ['ratingSignals']
        rating = result['response']['venue']['rating']
    except:
        num_rated = 0
        rating = None
    try:
        tip_count = result['response']['venue']['tips']['count']
    except:
        tip_count = 0
    data_list = [venue_id, num_rated, rating, tip_count]
    data_columns = ['Venue_ID', 'Number_of_Rated', 'Rating', 'Tip_Count']
    df = pd.DataFrame([data_list], columns=data_columns)
    return df

venue_ids = total_data['id'].tolist()
total_df = []
for venue_id in venue_ids:
    df = rating_count_extractor(str(venue_id))
    total_df.append(df)

total_df = pd.concat(total_df, ignore_index=True)
df_merge = total_data.merge(total_df, left_on ='id', right_on = 'Venue_ID', how='left')
df_merge.dropna(subset = ['Rating'], inplace=True)

### As explained in the beginning of this section, we use 75th percentile on The Total Number of People Who Rated, Ratings, and Tip Comment Counts to determine which venue is hit or miss.  The rationale for choosing these three features as indicators of the target variable is that we want to know which venues have been visited by a lot of people and rated well.  If a venue was rated well, people tend to come back.

In [32]:
clean_list = ['name', 'categories', 'lat', 'lng','Number_of_Rated', 'Rating', 'Tip_Count', 'formattedAddress']
df_merge_clean = df_merge[clean_list]
sfperc_num_rated = np.percentile(df_merge_clean['Number_of_Rated'], 75)
sfperc_rating = np.percentile(df_merge_clean['Rating'], 75)
sfperc_tip_count = np.percentile(df_merge_clean['Tip_Count'], 75)

conditions = [((df_merge_clean['Number_of_Rated'] >= sfperc_num_rated) & (df_merge_clean['Rating'] >= sfperc_rating) & (df_merge_clean['Tip_Count'] >= sfperc_tip_count)),
             ((df_merge_clean['Number_of_Rated'] < sfperc_num_rated) | (df_merge_clean['Rating'] < sfperc_rating))]
values = ['Hit', 'Miss']
df_merge_clean['Potential_Spot_Flag'] = np.select(conditions, values)
df_merge_clean.dropna(inplace=True)
df_merge_final = df_merge_clean[df_merge_clean['Potential_Spot_Flag']=='Hit']
df_merge_final


A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_merge_clean['Potential_Spot_Flag'] = np.select(conditions, values)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_merge_clean.dropna(inplace=True)


Unnamed: 0,name,categories,lat,lng,Number_of_Rated,Rating,Tip_Count,formattedAddress,Potential_Spot_Flag
6,Galli Restaurant,Italian Restaurant,40.721607,-74.001235,716,8.2,141,"[45 Mercer St (Broome & Grand Streets), New Yo...",Hit
14,Bo Ky Restaurant 波記潮州小食,Chinese Restaurant,40.715696,-73.998667,229,8.1,74,"[80 Bayard St (at Mott St), New York, NY 10013]",Hit
17,Deluxe Green Bo Restaurant,Chinese Restaurant,40.715545,-73.998137,326,8.3,135,"[66 Bayard St (btwn Elizabeth & Mott St), New ...",Hit
23,Golden Unicorn Restaurant 麒麟金閣,Dim Sum Restaurant,40.713629,-73.99723,1187,8.0,228,"[18 E Broadway (at Catherine St), New York, NY...",Hit
29,Arturo's Restaurant,Italian Restaurant,40.727407,-74.000378,463,8.5,133,"[106 W Houston St (at Thompson St.), New York,...",Hit
32,Frank Restaurant,Italian Restaurant,40.726939,-73.988899,606,8.8,164,"[88 2nd Ave (at E 5th St), New York, NY 10003]",Hit
61,Heidelberg Restaurant,German Restaurant,40.777532,-73.951979,360,8.8,104,"[1648 2nd Ave (btwn 85th & 86th St.), New York...",Hit
68,Carmine's Italian Restaurant,Italian Restaurant,40.791096,-73.973991,579,8.3,132,"[2450 Broadway (btwn W 90th & W 91st), New Yor...",Hit
72,Yuka Japanese Restaurant,Sushi Restaurant,40.774581,-73.954206,233,8.0,97,"[1557 2nd Ave (btwn E 80th & E 81st St), New Y...",Hit
75,Fred's Restaurant,American Restaurant,40.785658,-73.976539,583,8.2,149,"[476 Amsterdam Ave. (at W 83rd St), New York, ...",Hit


In [42]:
df_merge_final = df_merge_final.iloc[[0,1,2,3,4,5],:]

### We can clearly see there are three potential hit areas in this map.

In [43]:
venues_map = folium.Map(location=["40.7128", "-74.0060"], zoom_start=11)

# add the BBQ restaurants as blue circle markers
for lat, lng, label in zip(df_merge_final.lat, df_merge_final.lng, df_merge_final.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

## *Results*
#### After running a data visualization, we were able to find three potential hit areas.  Out of these three areas, lower Manhattan seems to be the best opportunity for an area to open a KBBQ restaurant as there is very high population traffic and no KBBQ or any other types of BBQ restaurants are present.  Let us see the map below.

In [64]:
venues_map = folium.Map(location=["40.7128", "-74.0060"], zoom_start=14)

# add the BBQ restaurants as blue circle markers
for lat, lng, label in zip(df_merge_final.lat, df_merge_final.lng, df_merge_final.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

coordinates = [[lat, lng] for lat, lng in zip(df_merge_final.lat, df_merge_final.lng)]
# folium.PolyLine(coordinates, line_color='#FF0000', line_weight=5).add_to(venues_map)

avg_lat = np.mean(df_merge_final.lat)
std_lat = np.std(df_merge_final.lat)
avg_lng = np.mean(df_merge_final.lng)
std_lng = np.std(df_merge_final.lng)

folium.CircleMarker(
    [avg_lat+0.2*std_lat, avg_lng+0.6*std_lng],
    radius = 120,
    color='green',
    fill=True,
    fill_color='green',
    fill_opacity=0.2
).add_to(venues_map)

venues_map

## *Discussion*
#### In this project, I refrained from using any of the existing machine learning algorithms for reasons.  The first reason was the shear amount of data for this project.  You can see from the methodology section that we do not have sufficient amount of data to train and test a model.  Second, it was too open to approach with unsupervised way.  I ended up using labels but you can also approach this with an unsupervised method by adding more features and try to group them based on those features.  Normally, unsupervised learning requires a lot more data than supervised learning, so for the sake of this project, I stayed with labels.  Choosing 75th percentile was my pure subjective choice, so this most certainly have introduced bias in my analysis.  However the bias, based on the feature metrics of the surrounding restaurants, the area picked for the KBBQ will be populated by many people and will have a lot of patronage opportunities if the food is served right.

## *Conclusion*
#### We set out to find what seemingly a very broad idea: "KBBQ seems very profitable and can we find the best location for a venue in Manhattan?"  Coupled with location and venue data, this broad idea became a very specific goal.  Even without sufficient amount of data and without using fancy machine learning algorithms, just employing simple data preprocessing, statistical methods, and data visualization, we were able to pinpoint an area where a potential investor in food industry can make profits running a KBBQ restaurant or restaurants.

## Thank you for reading!