# Food Battle SG!

Singapore (SG) is an island-state located in South East Asia, south of Johor Bahru, Singapore. Although it is a country with mere 721.5 sq km landspace, the country is well-known for quite a few things: being one of the financial hub of Asia, having the best universities in Asia and being called the <a href = https://edition.cnn.com/specials/asia/future-cities-singapore>city of the future</a>. <br><br>
While all these sound impressive, the city state is better known to tourists as the <i>Food Heaven</i>. Singapore has all the delicious food from South East Asia and all the world. Food is arguably one of the things that keep the multi-racial cohesive. Food is the common interest of the people. The people appreciate different race and culture through food sharing. Food, some may say, is what defines Singapore. <br><br>
In this data story, we will be exploring the food in different towns of Singapore. Using Foursquare as a data source, we shall explore how each town differ from each other when it comes to food establishments. Through this story, we shall also debunk a common perception that there are <a href = https://www.quora.com/What-are-the-major-differences-between-West-Side-and-East-Side-Singaporeans>differences between the East and the West sides of Singapore </a>. Is it true that the East has better food options than the west? We shall find out!

In [1]:
# import libraries
import pandas as pd
import numpy as np

import requests
from bs4 import BeautifulSoup

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt # plots
import matplotlib.cm as cm # confusion matrix
import matplotlib.colors as colors # color palete

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

# 1. Import Location Data
Here, we shall simplify the locations by taking the MRT stations as the references. The MRT (Mass Rapid Train) is the subway system in Singapore. It has a total of 141 stations connecting every corners of the city-state. The MRT provides great convenience to the people, and it is arguably the most important transportation system in the city. Most housing and commercial properties are built around the stations. Hence, it is safe to assume that most food establishments are built around the stations, where human traffic is the highest.

In [2]:
# All the data of the mrt stations are saved in a json file named mrt_stations.json
# data source: https://github.com/xkjyeah/singapore-postal-codes
with open('mrt_stations.json', "r") as f:
    data = json.load(f)

In [3]:
# Extract keys of the primary dictionary in json file
keys = []
for key in data[0].keys():
    keys.append(key)
print(keys)

['Possible Locations', 'Station', 'Station Name']


In [4]:
# Extract the required data for our analysis, i.e. Station Name, Station, Postal Code, Lat, Long
sg_mrt = []

for item in data:
    new_dict = {}
    new_dict[keys[2].upper()] = item[keys[2]]
    new_dict[keys[1].upper()] = item[keys[1]]
    Address = item[keys[0]][0]
    new_dict['LATITUDE'] = Address['LATITUDE']
    new_dict['LONGITUDE'] = Address['LONGITUDE']
    new_dict['POSTAL'] = Address['POSTAL']
    sg_mrt.append(new_dict)

print(sg_mrt[:2])
print(len(sg_mrt))

[{'STATION NAME': 'Jurong East', 'STATION': 'NS1', 'LATITUDE': '1.33315261987297', 'LONGITUDE': '103.742286544006', 'POSTAL': '609690'}, {'STATION NAME': 'Bukit Batok', 'STATION': 'NS2', 'LATITUDE': '1.34903410858812', 'LONGITUDE': '103.749566764129', 'POSTAL': '659958'}]
141


In [5]:
# Convert required data into a dataframe for easy analysis
sg_df = pd.DataFrame(sg_mrt)
sg_df = sg_df[['STATION NAME','STATION','POSTAL','LONGITUDE','LATITUDE']]
# data type conversion: string to float for LONGITUDE and LATITUDE
sg_df['LONGITUDE'] = sg_df['LONGITUDE'].apply(lambda x: float(x))
sg_df['LATITUDE'] = sg_df['LATITUDE'].apply(lambda x: float(x))
print(sg_df.shape)
sg_df.head()

(141, 5)


Unnamed: 0,STATION NAME,STATION,POSTAL,LONGITUDE,LATITUDE
0,Jurong East,NS1,609690,103.742287,1.333153
1,Bukit Batok,NS2,659958,103.749567,1.349034
2,Bukit Gombak,NS3,659083,103.751791,1.358612
3,Choa Chu Kang,NS4,689810,103.744371,1.385363
4,Yew Tee,NS5,689715,103.747405,1.397535


In [6]:
# Removes duplicated stations
# Some stations are interchanges - i.e. one station may appear on two different subway lines
# Need to remove duplicates as duplicates does not give us any additional useful information
sg_df=sg_df.drop_duplicates(subset=['STATION NAME'],keep = 'first')
print(sg_df.shape)

(119, 5)


# 2. Generate Singapore Map with the MRT Stations

In [7]:
# Get Longitude and Latitude of Central SG
address = 'Singapore, SG'

geolocator = Nominatim(user_agent="on_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Singapore are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Singapore are 1.2904753, 103.8520359.


In [8]:
# create map of Singapore using latitude and longitude values
map_singapore = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(sg_df['LATITUDE'], sg_df['LONGITUDE'], sg_df['STATION NAME']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup = label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_singapore)  

# Display Map
map_singapore

As mentioned, the MRT system is expansive and covers most residential and commercial areas in the city. Click on the bubble to see the station name.

# 3. Extract Data from Foursquare

In [30]:
# Define credentials
CLIENT_ID = '' # Foursquare ID omitted for privacy
CLIENT_SECRET = '' # Foursquare Secret omitted for privacy
VERSION = ''
LIMIT = 30

In [10]:
# Create function that queries the API to get required data
# Required data here are venue names, location, venue category
def getVenue(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [11]:
# Execute function - expect this to take a little longer 
# since we are querying for a total of 119 locations
sg_venues = getVenue(names=sg_df['STATION NAME'],
                                   latitudes=sg_df['LATITUDE'],
                                   longitudes=sg_df['LONGITUDE']
                                  )

Jurong East
Bukit Batok
Bukit Gombak
Choa Chu Kang
Yew Tee
Kranji
Marsiling
Woodlands
Admiralty
Sembawang
Yishun
Khatib
Yio Chu Kang
Ang Mo Kio
Bishan
Braddell
Toa Payoh
Novena
Newton
Orchard
Somerset
Dhoby Ghaut
City Hall
Raffles Place
Marina Bay
Marina South Pier
Pasir Ris
Tampines
Simei
Tanah Merah
Bedok
Kembangan
Eunos
Paya Lebar
Aljunied
Kallang
Lavender
Bugis
Tanjong Pagar
Outram Park
Tiong Bahru
Redhill
Queenstown
Commonwealth
Buona Vista
Dover
Clementi
Chinese Garden
Lakeside
Boon Lay
Pioneer
Joo Koon
Gul Circle
Tuas Crescent
Tuas West Road
Tuas Link
Expo
Changi Airport
HarbourFront
Chinatown
Clarke Quay
Little India
Farrer Park
Boon Keng
Potong Pasir
Woodleigh
Serangoon
Kovan
Hougang
Buangkok
Sengkang
Punggol
Bras Basah
Esplanade
Promenade
Nicoll Highway
Stadium
Mountbatten
Dakota
MacPherson
Tai Seng
Bartley
Lorong Chuan
Marymount
Caldecott
Botanic Gardens
Farrer Road
Holland Village
one-north
Kent Ridge
Haw Par Villa
Pasir Panjang
Labrador Park
Telok Blangah
Bayfront
Bukit Pa

In [12]:
# A brief look at the number of venue category in each neighborhood
sg_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Admiralty,30,30,30,30,30,30
Aljunied,30,30,30,30,30,30
Ang Mo Kio,30,30,30,30,30,30
Bartley,29,29,29,29,29,29
Bayfront,30,30,30,30,30,30
Beauty World,30,30,30,30,30,30
Bedok,30,30,30,30,30,30
Bedok North,30,30,30,30,30,30
Bedok Reservoir,30,30,30,30,30,30
Bencoolen,30,30,30,30,30,30


In [13]:
# Brief look at the different venue category
sg_venues['Venue Category'].unique()

array(['Clothing Store', 'Japanese Restaurant', 'Skating Rink', 'Bakery',
       'Furniture / Home Store', 'Vegetarian / Vegan Restaurant',
       'Movie Theater', 'Multiplex', 'Chinese Restaurant',
       'Shopping Mall', 'Supermarket', 'Bookstore', 'Department Store',
       'Bubble Tea Shop', 'Indian Restaurant', 'Sushi Restaurant',
       'Hotpot Restaurant', 'Italian Restaurant', 'Trail',
       'Korean Restaurant', 'Burger Joint', 'Halal Restaurant', 'Pool',
       'Fast Food Restaurant', 'Coffee Shop', 'Food Court',
       'Malay Restaurant', 'Frozen Yogurt Shop', 'Sandwich Place',
       'Bowling Alley', 'Café', 'Train Station', 'Lingerie Store',
       'Grocery Store', 'Ice Cream Shop', 'Lake', 'Flea Market',
       'Steakhouse', 'Golf Course', 'Bar', 'Stadium', 'Asian Restaurant',
       'Lounge', 'Thai Restaurant', 'Park', 'Playground',
       'Portuguese Restaurant', 'Beer Garden', 'Cosmetics Shop',
       'Shoe Store', 'Dessert Shop', 'Pharmacy', 'Music Store',
       'Gym

In [14]:
# Here, we shall extract data of food establishments only
sg_food = sg_venues[sg_venues['Venue Category'].str.contains('Restaurant|Coffee|Food|Burger|Café|Hawker')]
sg_food.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
1,Jurong East,1.333153,103.742287,Tonkatsu by Ma Maison とんかつ マメゾン (Tonkatsu by M...,1.333668,103.742818,Japanese Restaurant
5,Jurong East,1.333153,103.742287,Green Dot,1.333641,103.742858,Vegetarian / Vegan Restaurant
8,Jurong East,1.333153,103.742287,Dian Xiao Er 店小二,1.333447,103.743094,Chinese Restaurant
10,Jurong East,1.333153,103.742287,Paradise Dynasty 樂天皇朝,1.334364,103.743612,Chinese Restaurant
12,Jurong East,1.333153,103.742287,"Tsukada Nojo 塚田農場 Japanese ""Bijin Nabe"" Restau...",1.333718,103.742498,Japanese Restaurant


In [15]:
# one hot encoding
sg_onehot = pd.get_dummies(sg_food[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
sg_onehot['Neighborhood'] = sg_food['Neighborhood'] 


# move neighborhood column to the first column
fixed_columns = [sg_onehot.columns[list(sg_onehot.columns).index("Neighborhood")]] \
+ list(sg_onehot.columns[:list(sg_onehot.columns).index("Neighborhood")]) \
+list(sg_onehot.columns[list(sg_onehot.columns).index("Neighborhood")+1:])
sg_onehot = sg_onehot[fixed_columns]

sg_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Australian Restaurant,Burger Joint,Café,Cantonese Restaurant,Chinese Aristocrat Restaurant,Chinese Restaurant,Coffee Shop,...,Spanish Restaurant,Street Food Gathering,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
1,Jurong East,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,Jurong East,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,1,0
8,Jurong East,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
10,Jurong East,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
12,Jurong East,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [16]:
# Check that we only have what we need
sg_onehot.columns

Index(['Neighborhood', 'American Restaurant', 'Asian Restaurant',
       'Australian Restaurant', 'Burger Joint', 'Café', 'Cantonese Restaurant',
       'Chinese Aristocrat Restaurant', 'Chinese Restaurant', 'Coffee Shop',
       'Comfort Food Restaurant', 'Dim Sum Restaurant', 'Dumpling Restaurant',
       'English Restaurant', 'Fast Food Restaurant', 'Filipino Restaurant',
       'Food', 'Food & Drink Shop', 'Food Court', 'Food Stand', 'Food Truck',
       'French Restaurant', 'Fujian Restaurant', 'German Restaurant',
       'Greek Restaurant', 'Hainan Restaurant', 'Hakka Restaurant',
       'Halal Restaurant', 'Health Food Store', 'Hong Kong Restaurant',
       'Hotpot Restaurant', 'Indian Restaurant', 'Indonesian Restaurant',
       'Italian Restaurant', 'Japanese Curry Restaurant',
       'Japanese Restaurant', 'Kebab Restaurant', 'Korean Restaurant',
       'Malay Restaurant', 'Mediterranean Restaurant', 'Mexican Restaurant',
       'Middle Eastern Restaurant', 'Modern European R

In [17]:
# From above, we find that there is one venue category that does not fit well with our story
# Health Food Store seems more like a health supplement store than a food establishment
# So we shall remove this from our analysis
sg_grouped = sg_onehot.groupby('Neighborhood').mean().reset_index()
sg_grouped = sg_grouped.drop(['Health Food Store'], axis = 1)
sg_grouped.head()

Unnamed: 0,Neighborhood,American Restaurant,Asian Restaurant,Australian Restaurant,Burger Joint,Café,Cantonese Restaurant,Chinese Aristocrat Restaurant,Chinese Restaurant,Coffee Shop,...,Spanish Restaurant,Street Food Gathering,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Tapas Restaurant,Thai Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant
0,Admiralty,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.166667,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aljunied,0.0,0.05,0.0,0.0,0.1,0.05,0.0,0.25,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.15,0.0
2,Ang Mo Kio,0.0,0.090909,0.0,0.090909,0.090909,0.0,0.0,0.090909,0.181818,...,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bartley,0.0,0.0625,0.0,0.0,0.125,0.0,0.0625,0.0625,0.0625,...,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.125,0.0
4,Bayfront,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [18]:
# Generate a brief report of the top 5 venues in each town and their respective frequencies (max = 1)
num_top_venues = 5

for hood in sg_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = sg_grouped[sg_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Admiralty----
                  venue  freq
0            Food Court  0.33
1  Fast Food Restaurant  0.25
2           Coffee Shop  0.17
3                  Café  0.08
4     Food & Drink Shop  0.08


----Aljunied----
                           venue  freq
0             Chinese Restaurant  0.25
1  Vegetarian / Vegan Restaurant  0.15
2             Dim Sum Restaurant  0.10
3                     Food Court  0.10
4                           Café  0.10


----Ang Mo Kio----
                 venue  freq
0  Japanese Restaurant  0.18
1          Coffee Shop  0.18
2         Burger Joint  0.09
3                 Café  0.09
4     Asian Restaurant  0.09


----Bartley----
                           venue  freq
0                     Food Court  0.31
1  Vegetarian / Vegan Restaurant  0.12
2                           Café  0.12
3              Korean Restaurant  0.06
4             Chinese Restaurant  0.06


----Bayfront----
                 venue  freq
0  Japanese Restaurant  0.25
1   Italian Restaurant  0

                           venue  freq
0            Japanese Restaurant   0.4
1               Asian Restaurant   0.1
2  Vegetarian / Vegan Restaurant   0.1
3                Udon Restaurant   0.1
4              Hotpot Restaurant   0.1


----Geylang Bahru----
                venue  freq
0          Food Court  0.35
1  Chinese Restaurant  0.20
2    Asian Restaurant  0.10
3                Café  0.10
4   Indian Restaurant  0.05


----Gul Circle----
              venue  freq
0       Coffee Shop  0.38
1        Food Court  0.25
2  Asian Restaurant  0.12
3              Food  0.12
4        Restaurant  0.12


----HarbourFront----
                venue  freq
0  Chinese Restaurant   0.2
1         Coffee Shop   0.2
2    Swiss Restaurant   0.1
3   German Restaurant   0.1
4    Sushi Restaurant   0.1


----Haw Par Villa----
                   venue  freq
0       Asian Restaurant  0.23
1             Food Court  0.15
2            Coffee Shop  0.15
3  Vietnamese Restaurant  0.08
4     Chinese Restaurant  0

                  venue  freq
0    Chinese Restaurant  0.28
1            Food Court  0.17
2  Fast Food Restaurant  0.17
3      Asian Restaurant  0.11
4            Restaurant  0.06


----Promenade----
                 venue  freq
0  Japanese Restaurant  0.50
1                 Café  0.17
2    German Restaurant  0.17
3           Restaurant  0.17
4  American Restaurant  0.00


----Punggol----
                  venue  freq
0  Fast Food Restaurant  0.19
1            Food Court  0.12
2    Chinese Restaurant  0.12
3           Coffee Shop  0.06
4      Asian Restaurant  0.06


----Queenstown----
                venue  freq
0          Food Court  0.18
1                Café  0.12
2    Malay Restaurant  0.12
3  Chinese Restaurant  0.12
4         Coffee Shop  0.12


----Raffles Place----
                 venue  freq
0  Japanese Restaurant  0.14
1          Coffee Shop  0.14
2   Italian Restaurant  0.07
3     Asian Restaurant  0.07
4    Hotpot Restaurant  0.07


----Redhill----
                venue  

In [19]:
# Create function to return most common venue for each town, based on a defined number num_top_venues
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [20]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = sg_grouped['Neighborhood']

for ind in np.arange(sg_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sg_grouped.iloc[ind, :], num_top_venues)

# A deeper look into the top 10 venues in each town
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Admiralty,Food Court,Fast Food Restaurant,Coffee Shop,Food & Drink Shop,Café,Malay Restaurant,Vietnamese Restaurant,Halal Restaurant,Hakka Restaurant,Hainan Restaurant
1,Aljunied,Chinese Restaurant,Vegetarian / Vegan Restaurant,Café,Food Court,Seafood Restaurant,Dim Sum Restaurant,Korean Restaurant,Asian Restaurant,Food Truck,Cantonese Restaurant
2,Ang Mo Kio,Japanese Restaurant,Coffee Shop,Halal Restaurant,Sushi Restaurant,Modern European Restaurant,Chinese Restaurant,Café,Burger Joint,Asian Restaurant,Hainan Restaurant
3,Bartley,Food Court,Vegetarian / Vegan Restaurant,Café,Korean Restaurant,Coffee Shop,Chinese Aristocrat Restaurant,Chinese Restaurant,Szechuan Restaurant,Halal Restaurant,Asian Restaurant
4,Bayfront,Italian Restaurant,Japanese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Vietnamese Restaurant,Food & Drink Shop,Halal Restaurant,Hakka Restaurant,Hainan Restaurant,Greek Restaurant


# 4. Create Clusters based on Food Establishment
Now that we have identified the top 10 venues of each town, we can cluster the towns based on how similar they are in terms of their food establishment. By doing so, it may give us some insights to how similar each town is.

In [21]:
# set number of clusters = 3

kclusters = 3

sg_grouped_clustering = sg_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sg_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 2, 1, 0, 1, 2, 2, 0, 2, 1])

In [22]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

sg_merged = sg_df

# merge neighborhoods_venues_sorted with sg_df to add latitude/longitude for each neighborhood
sg_merged = sg_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='STATION NAME')

sg_merged.head() # check the last columns!

Unnamed: 0,STATION NAME,STATION,POSTAL,LONGITUDE,LATITUDE,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Jurong East,NS1,609690,103.742287,1.333153,2,Chinese Restaurant,Japanese Restaurant,Italian Restaurant,Halal Restaurant,Korean Restaurant,Sushi Restaurant,Indian Restaurant,Burger Joint,Vegetarian / Vegan Restaurant,Hotpot Restaurant
1,Bukit Batok,NS2,659958,103.749567,1.349034,0,Food Court,Coffee Shop,Chinese Restaurant,Malay Restaurant,Fast Food Restaurant,Italian Restaurant,Café,Food Truck,French Restaurant,Fujian Restaurant
2,Bukit Gombak,NS3,659083,103.751791,1.358612,0,Food Court,Coffee Shop,Chinese Restaurant,Fast Food Restaurant,Indian Restaurant,Italian Restaurant,Malay Restaurant,Vegetarian / Vegan Restaurant,Asian Restaurant,Chinese Aristocrat Restaurant
3,Choa Chu Kang,NS4,689810,103.744371,1.385363,0,Fast Food Restaurant,Coffee Shop,Asian Restaurant,Thai Restaurant,Food Court,Portuguese Restaurant,Food & Drink Shop,Food Stand,Food Truck,French Restaurant
4,Yew Tee,NS5,689715,103.747405,1.397535,0,Fast Food Restaurant,Food Court,Café,Japanese Restaurant,Restaurant,Asian Restaurant,Chinese Restaurant,Hakka Restaurant,Hainan Restaurant,Halal Restaurant


In [23]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(sg_merged['LATITUDE'], sg_merged['LONGITUDE'], sg_merged['STATION NAME'], sg_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Looking at the map, we can see a clear distinction between the East, West, North and Central Singapore.<br><br>

East Singapore is heavily occupied by cluster 2 whereas West Singapore is predominantly cluster 0. Central and Downtown (Southern) Singapore is mainly made up of cluster 1. Northern Singapore is much more similar to West Singapore where mainly cluster 1 resides. <br><br>

Let's have a deeper look into the clusters to see what's the difference.
    

### Cluster 0

In [27]:
sg_merged.loc[sg_merged['Cluster Labels'] == 0, sg_merged.columns[[0] + list(range(5, 11))]]

Unnamed: 0,STATION NAME,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Bukit Batok,0,Food Court,Coffee Shop,Chinese Restaurant,Malay Restaurant,Fast Food Restaurant
2,Bukit Gombak,0,Food Court,Coffee Shop,Chinese Restaurant,Fast Food Restaurant,Indian Restaurant
3,Choa Chu Kang,0,Fast Food Restaurant,Coffee Shop,Asian Restaurant,Thai Restaurant,Food Court
4,Yew Tee,0,Fast Food Restaurant,Food Court,Café,Japanese Restaurant,Restaurant
6,Marsiling,0,Food Court,Coffee Shop,Café,Fast Food Restaurant,Asian Restaurant
8,Admiralty,0,Food Court,Fast Food Restaurant,Coffee Shop,Food & Drink Shop,Café
11,Khatib,0,Coffee Shop,Food Court,Fast Food Restaurant,Seafood Restaurant,American Restaurant
12,Yio Chu Kang,0,Food Court,Chinese Restaurant,Coffee Shop,Food,Fast Food Restaurant
26,Pasir Ris,0,Coffee Shop,Food Court,Fast Food Restaurant,Café,American Restaurant
28,Simei,0,Coffee Shop,Fast Food Restaurant,Chinese Restaurant,Seafood Restaurant,Japanese Restaurant


From above, we see that in cluster 0, the towns have more coffee shops, food courts and fast food restaurants than cuisine-specific restaurants.<br>

Background: Coffee shops and Food Courts (sometimes known as <a href = https://en.wikipedia.org/wiki/Hawker_centre>Hawker centres</a>) are centralized food establishments. One can find many kinds of food in food courts, such as Chinese, Malay, Indian, Western and Local Delights. While the food sold in food courts is relatively inexpensive, the food tastes delicious. Due to its affordability, most Singaporeans patron food courts for a hearty meal!

### Cluster 1

In [28]:
sg_merged.loc[sg_merged['Cluster Labels'] == 1, sg_merged.columns[[0] + list(range(5, 11))]]

Unnamed: 0,STATION NAME,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
5,Kranji,1,Asian Restaurant,Café,Vietnamese Restaurant,Food,Hong Kong Restaurant
7,Woodlands,1,Café,Food Court,Chinese Restaurant,Japanese Restaurant,Indian Restaurant
13,Ang Mo Kio,1,Japanese Restaurant,Coffee Shop,Halal Restaurant,Sushi Restaurant,Modern European Restaurant
14,Bishan,1,Japanese Restaurant,Coffee Shop,Chinese Restaurant,Thai Restaurant,Food Court
17,Novena,1,Coffee Shop,Vegetarian / Vegan Restaurant,Italian Restaurant,Café,Chinese Restaurant
18,Newton,1,Japanese Restaurant,Chinese Restaurant,Spanish Restaurant,Seafood Restaurant,Thai Restaurant
19,Orchard,1,Sushi Restaurant,Chinese Restaurant,Asian Restaurant,Indonesian Restaurant,Restaurant
20,Somerset,1,Japanese Restaurant,Ramen Restaurant,Chinese Restaurant,Korean Restaurant,Dumpling Restaurant
21,Dhoby Ghaut,1,Café,Hotpot Restaurant,Korean Restaurant,Modern European Restaurant,Indonesian Restaurant
22,City Hall,1,French Restaurant,Indian Restaurant,Chinese Restaurant,Japanese Restaurant,Dumpling Restaurant


Cluster 1 towns have restaurants that are generally more exotic, such as Japanese, European, Korean and even Cafés. Looking at the type of food establishments, the food sold in towns are likely to be more expensive. <br><br>
Except for Kranji, the towns marked as cluster 1 are relative central. Tampines and Woodlands are exceptions also, but they are commercial hubs where many office buildings are located. <br><br>
Most cluster 1 towns are in the downtown, where the Central Business District and the Tourist Hotspots are. Hence it is expected that food variety is greater and costs more expensive in these areas. 

### Cluster 2

In [29]:
sg_merged.loc[sg_merged['Cluster Labels'] == 2, sg_merged.columns[[0] + list(range(5, 11))]]

Unnamed: 0,STATION NAME,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Jurong East,2,Chinese Restaurant,Japanese Restaurant,Italian Restaurant,Halal Restaurant,Korean Restaurant
9,Sembawang,2,Chinese Restaurant,Coffee Shop,Asian Restaurant,Fast Food Restaurant,Café
10,Yishun,2,Indian Restaurant,Chinese Restaurant,Thai Restaurant,Coffee Shop,Vegetarian / Vegan Restaurant
15,Braddell,2,Chinese Restaurant,Asian Restaurant,Coffee Shop,Japanese Restaurant,Seafood Restaurant
16,Toa Payoh,2,Chinese Restaurant,Coffee Shop,Indian Restaurant,Asian Restaurant,Hong Kong Restaurant
29,Tanah Merah,2,Food Court,Café,Chinese Restaurant,Coffee Shop,Indian Restaurant
30,Bedok,2,Chinese Restaurant,Coffee Shop,Food Court,Indian Restaurant,Vegetarian / Vegan Restaurant
31,Kembangan,2,Indian Restaurant,Chinese Restaurant,Food Court,Coffee Shop,Malay Restaurant
32,Eunos,2,Chinese Restaurant,Food Court,Asian Restaurant,Vegetarian / Vegan Restaurant,Restaurant
33,Paya Lebar,2,Chinese Restaurant,Asian Restaurant,Vietnamese Restaurant,Vegetarian / Vegan Restaurant,Food Court


Cluster 2 towns are more similar to cluster 1 towns. However, the main difference is that the food establishments are mainly more local, i.e. Chinese, Indian, Malay and Asian restaurants are more commonly found. In other words, the restaurants in cluster 2 towns are more local compared to cluster 1.<br><br>

# 5. Conclusion

In conclusion, East and West are indeed different when it comes to food. Generally, foodcourts and fast food restaurants are more common in the West, whereas more local restaurants can be found in the East.<br><br>
Possible reasons are:
1. The East is more developed historically;
2. The East is where the civil airport located at, thus there is likely to be more tourism in these areas.
Of course, food is merely one of many dimensions used to determine how similar both areas are. But for now, we shall conclude that there is indeed a difference between the two areas.<br><br>
I hope this data story gives you a glimpse of the food places in Singapore. Perhaps you should visit this city-state and explore its culture through its food! If you ever come to this country for a tour, I'm more than happy to bring you around. :)