# Fine Great Food @BKK: Bangkok Food & Restaurant Data-assisted Suggestion

## Capstone Project : IBM Data Science Professional Certificate

### Phumrapee Pisutsin

Import required libraries.

In [1]:
import math
import folium
import requests
import numpy as np
import pandas as pd
from pandas.io.json import json_normalize

__transportation.csv__ records information of all stations of rail public transportation in Bangkok. The code below reads in the transportation data file from the csv format to store in a dataframe. The location data in the csv file is obtained from Google Maps by the author.

_Lines_ column provides detail about which rail line the station is located on. _isOperating_ column provides the detail whether the station is currently in use. _grey_ means that the station is the hub connecting more than one line of transportation.

In [2]:
transportation_df = pd.read_csv('./transportation.csv')
transportation_df.head()

Unnamed: 0,Service,Station,Lines,Latitude,Longitude,isOperating
0,BTS,Siam,grey,13.745511,100.53388,True
1,BTS,National Stadium,blue,13.746345,100.528863,True
2,BTS,Ratchadamri,blue,13.7395,100.539267,True
3,BTS,Sala Daeng,blue,13.728426,100.534436,True
4,BTS,Chong Nonsi,blue,13.723851,100.52945,True


Obtain the midpoint latitude and longitude of all stations so the map can be centred at the right point.

In [3]:
mid_latitude = (transportation_df['Latitude'].min() + transportation_df['Latitude'].max())/2
mid_longitude = (transportation_df['Longitude'].min() + transportation_df['Longitude'].max())/2

Plot the location of all transportation service (BTS, MRT, Airport Link) on the map.

In [4]:
station_map = folium.Map(location=[mid_latitude, mid_longitude], zoom_start=11)

for lat, lon in zip(transportation_df['Latitude'], transportation_df['Longitude']):
    folium.CircleMarker(
        [lat, lon],
        radius=2,
        color="#424242",
    ).add_to(station_map)
       
station_map

In case the map does not show up. Please see the following picture
<img src="./map_5.jpg">

Plot and label the station on the map again but change the colour according to the rail line that the station is located on.

In [5]:
colored_station_map = folium.Map(location=[mid_latitude, mid_longitude], tiles="cartodbpositron", zoom_start=12)

for lat, lon, color, name, service in zip(transportation_df['Latitude'], transportation_df['Longitude'], transportation_df['Lines'], transportation_df['Station'], transportation_df['Service']):

    if color=='Grey':
        ifHub = ', Hub'
    else:
        ifHub = ''

    folium.CircleMarker(
        [lat, lon],
        radius=2,
        color=color,
        tooltip= "{} Station ({}{})".format(name, service, ifHub ),
    ).add_to(colored_station_map)
       
colored_station_map

In case the map does not show up. Please see the following picture
<img src="./map_4.jpg">

Obtain the information about the location of users who would like to gather.
Let the users input all of the information.

In [6]:
userLocation = []
userLatitude = []
userLongitude = []

n = int(input("How many people gather?"))
for i in range(n):
    latitude = float(input("User {}\'s Latitude : ".format(i+1)))
    longitude = float(input("User {}\'s Longitude : ".format(i+1)))
    userLocation.append([latitude, longitude])
    userLatitude.append(latitude)
    userLongitude.append(longitude)

In the experiment in this notebook, two users with location (latitude, longitude) of (13.751725, 100.531142) and (13.740558, 100.525142) will be used throughout this notebook.

Plot the position of users on the same map __(in red)__.

In [7]:
for location in userLocation:
    folium.CircleMarker(
        [location[0], location[1]],
        radius=2,
        color="red",
    ).add_to(colored_station_map)
       
colored_station_map

In case the map does not show up. Please see the following picture
<img src="./map_3.PNG">

Define the function to find the average value of elements in a list.

In [8]:
def findAvg(list):
    sum = 0
    for element in list:
        sum += element
    return sum/len(list)

Calculate and plot the mid point __in Pink__ of all the users from calculating the average latitude and longitude.

The best position for everyone to meet up should be at the point that the latitude is equal to the average latitude and logitude is equal to the average longitude so it is equally far from everyone.

In [9]:
userAvgLatitude = findAvg(userLatitude)
userAvgLongitude = findAvg(userLongitude)

In [10]:
folium.CircleMarker(
    [userAvgLatitude, userAvgLongitude],
    radius=2,
    color="#ff3786",
).add_to(colored_station_map)
       
colored_station_map

In case the map does not show up. Please see the following picture
<img src="./map_2.jpg">

Define the function to find the minimum element of the list. This function returns the index and value of the element with the minimum value.

In [11]:
def findMin(list):
    minIndex = 0
    minElement = list[0]
    for i, element in enumerate(list):
        if element < minElement:
            minElement = element
            minIndex = i
    return minIndex, minElement

Define the function that calculates the simplified distance (in degree, for simplicity) from the meet up point to the station locations by using Pythagorean Theorem.

In [12]:
def findDistance(avgLat, avgLong, stationLocation):
    return math.sqrt((avgLat - stationLocation[0])**2 + (avgLong - stationLocation[1])**2)

However, meeting up exactly at the midpoint that is equally far from everyone by distance would be hard in term of transportation since there might not be a public transportation service there. Therefore, this part of code would find the nearest rail service station to the best location that everyone can meet.

The code belows calculate the distance to every station.

In [13]:
distances = []

for i in range(transportation_df.shape[0]):
    stationLocation = [transportation_df['Latitude'][i], transportation_df['Longitude'][i]]
    x = findDistance(userAvgLatitude, userAvgLongitude, stationLocation)
    distances.append(x)

Add the distance to any of the stations to the transportation dataframe

In [14]:
transportation_df['Distance'] = distances
transportation_df.head()

Unnamed: 0,Service,Station,Lines,Latitude,Longitude,isOperating,Distance
0,BTS,Siam,grey,13.745511,100.53388,True,0.005773
1,BTS,National Stadium,blue,13.746345,100.528863,True,0.000749
2,BTS,Ratchadamri,blue,13.7395,100.539267,True,0.012957
3,BTS,Sala Daeng,blue,13.728426,100.534436,True,0.0188
4,BTS,Chong Nonsi,blue,13.723851,100.52945,True,0.022329


Get the index and the name of the station with the minimum distance from the meetup (by distance) location point.

In [15]:
indexMin = findMin(distances)[0]
print('The clostest station is \'{}\''.format(transportation_df['Station'][indexMin]))

The clostest station is 'National Stadium'


Find the location (latitude and longitude) of the station that is closest to the midpoint. Then, the search for the restaurant will use this station location as the centre.

In [16]:
lat_centre = transportation_df['Latitude'][indexMin]
long_centre = transportation_df['Longitude'][indexMin]

Create a search query for restaurants near the station and use 800 meters as a radius (approximately 10 mins walking distance)

Please note that the _CLIENT_ID_ and _CLIENT_SECRET_ is intentionally hidden for a security purpose.

In [17]:
CLIENT_ID = 'unavailable'
CLIENT_SECRET = 'unavailable'
VERSION = '20180604'
LIMIT = 20
search_query = 'Restaurant'
radius = 800
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat_centre, long_centre, VERSION, search_query, radius, LIMIT)

Use the query to get the data, then convert to the dataframe

In [18]:
results = requests.get(url).json()
venues = results['response']['venues']
restaurant_df = json_normalize(venues)

restaurant_df.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.crossStreet,location.lat,location.lng,location.labeledLatLngs,location.distance,location.postalCode,location.cc,location.city,location.state,location.country,location.formattedAddress,location.neighborhood
0,4bdd057bb0f5c92817924be3,PH1 -Party House One Bar & Restaurant,"[{'id': '4bf58dd8d48988d1d5941735', 'name': 'H...",v-1590343182,False,Siam@Siam Design Hotel & Spa,G Fl. & M Fl.,13.747002,100.526976,"[{'label': 'display', 'lat': 13.74700239042918...",216,10330.0,TH,ปทุมวัน,กรุงเทพมหานคร,ประเทศไทย,"[Siam@Siam Design Hotel & Spa (G Fl. & M Fl.),...",
1,4f3b48bde4b06a3b8f065498,City Center Restaurant (ห้องอาหารกลางเมือง),"[{'id': '4bf58dd8d48988d1c4941735', 'name': 'R...",v-1590343182,False,กฤษณ์ไทย แมนช่ันส์,,13.746248,100.528488,"[{'label': 'display', 'lat': 13.74624750593096...",41,,TH,พระนคร,กรุงเทพมหานคร,ประเทศไทย,"[กฤษณ์ไทย แมนช่ันส์, พระนคร, กรุงเทพมหานคร, ปร...",
2,4d1f3b88dd6a236a4b132e38,Yana Restaurant,"[{'id': '52e81612bcbc57f1066b79ff', 'name': 'H...",v-1590343182,False,MBK Center,5th Fl.,13.745851,100.530206,"[{'label': 'display', 'lat': 13.74585105952405...",155,10330.0,TH,ปทุมวัน,กรุงเทพมหานคร,ประเทศไทย,"[MBK Center (5th Fl.), ปทุมวัน, กรุงเทพมหานคร ...",
3,509fadd0e4b00adc5ee9170f,Grandmother Restaurant (ร้านคุณยาย),"[{'id': '4bf58dd8d48988d149941735', 'name': 'T...",v-1590343182,False,Phaya Thai Rd,,13.75053,100.531317,"[{'label': 'display', 'lat': 13.75052988129684...",536,10400.0,TH,ราชเทวี,กรุงเทพมหานคร,ประเทศไทย,"[Phaya Thai Rd, ราชเทวี, กรุงเทพมหานคร 10400, ...",Thanon Phetchaburi
4,4f0514cb9a523e111eeba1d6,Jim Thompson Bar & Restaurant,"[{'id': '4bf58dd8d48988d149941735', 'name': 'T...",v-1590343182,False,6 Soi Kasem San 2,,13.749316,100.528363,"[{'label': 'display', 'lat': 13.74931611624435...",335,10330.0,TH,ปทุมวัน,กรุงเทพมหานคร,ประเทศไทย,"[6 Soi Kasem San 2, ปทุมวัน, กรุงเทพมหานคร 103...",Wang Mai


Drop unnecessary columns from the restaurant dataframe and drop the row that the distance is higher than 800 meters (more than 10 mins walking, not feasible)

In [19]:
restaurant_df = restaurant_df.drop(restaurant_df[restaurant_df['location.distance'] > 800].index)

restaurant_df = restaurant_df.drop(['categories', 'hasPerk', 'referralId', 'location.postalCode', 'location.city', 'location.state', 'location.country', 'location.address', 'location.cc', 'location.formattedAddress', 'location.neighborhood', 'location.labeledLatLngs', 'location.crossStreet'], axis = 1)

restaurant_df.column = ['Id', 'Name', 'Latitude', 'Longitude', 'Distance']

restaurant_df.head()

Unnamed: 0,id,name,location.lat,location.lng,location.distance
0,4bdd057bb0f5c92817924be3,PH1 -Party House One Bar & Restaurant,13.747002,100.526976,216
1,4f3b48bde4b06a3b8f065498,City Center Restaurant (ห้องอาหารกลางเมือง),13.746248,100.528488,41
2,4d1f3b88dd6a236a4b132e38,Yana Restaurant,13.745851,100.530206,155
3,509fadd0e4b00adc5ee9170f,Grandmother Restaurant (ร้านคุณยาย),13.75053,100.531317,536
4,4f0514cb9a523e111eeba1d6,Jim Thompson Bar & Restaurant,13.749316,100.528363,335


Get the rating information of each food and restaurant services in the table above 

In [20]:
ratedList= []

for id in list(restaurant_df['id']):
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(id, CLIENT_ID, CLIENT_SECRET, VERSION)
    result = requests.get(url).json()
    try:
        ratedList.append([result['response']['venue']['rating'], result['response']['venue']['name'], result['response']['venue']['location']['lat'], result['response']['venue']['location']['lng'] ])
    except:
        pass

Now list of rating _ratedList_ is obtained

In [21]:
ratedList

[[7.1,
  'PH1 -Party House One Bar & Restaurant',
  13.747002390429184,
  100.52697572012217],
 [7.5, 'Yana Restaurant', 13.745851059524055, 100.53020551662877],
 [7.8,
  'Jim Thompson Bar & Restaurant',
  13.749316116244351,
  100.52836349409179],
 [7.2,
  'Scala Restaurant (ภัตตาคารสกาล่า)',
  13.745749901265727,
  100.53126870602846],
 [7.3, 'The Great Wall Restaurant', 13.751497465166223, 100.53034327118345],
 [5.2, 'The Eight Restaurant', 13.746647620627424, 100.5289768821219]]

Sort the rating list by the rating of the restaurant

In [22]:
sortedRatingList = sorted(ratedList, key = lambda x: x[0])


Print out the top three restaurant with highest ratings

In [23]:
print('Suggested Restaurant')
try:
    print("1.", sortedRatingList[-1][1])
except:
    pass
try:
    print("2.", sortedRatingList[-2][1])
except:
    pass
try:
    print("3.", sortedRatingList[-3][1])
except:
    pass

Suggested Restaurant
1. Jim Thompson Bar & Restaurant
2. Yana Restaurant
3. The Great Wall Restaurant


Plot on the same map together with other locations. The suggested restaurant (ranked by rating) is labelled in yellow.

In [24]:
for i in range(1,4):
    try:
        folium.CircleMarker(
            [sortedRatingList[-i][2], sortedRatingList[-i][3]],
            radius=2,
            color="#faed27",
            tooltip= "{} ({} rating)".format(sortedRatingList[-i][1], sortedRatingList[-i][0]),
        ).add_to(colored_station_map)
    except:
        pass
       
colored_station_map

In case the map does not show up. Please see the following picture
<img src="./map_1.jpg">

## That concludes my capstone project. Thank you so much for your time :)
### Wish you have a nice day, keep safe, and stay healthy!