# Description of the Problem and Discussion of the Background (Week 1 & 2)

# 1. Introduction/description (Week 1)

Prospects of opening a restaurant in different regions of Singapore:

- Singapore is a place that I am living in. It is known for its wonderful city skyline, clean and tidy streets and mouth-watering food such as Chilli Crab, Hainanese Chicken Rice, Curry Fish Head, Laksa, etc. It is also home to several michelin star resturants (Candlenut, CUT by Wolfgang Puck, Liao Fan Hong Kong Soya Sauce Chicken Rice & Noodle, to name a few). If you are interested you can refer to https://www.visitsingapore.com/editorials/michelin-star-restaurants-singapore/ for more details. Furthermore, hawker centres are common in Singapore, it is the go-to place for Singaporeans to get affordable and delicious food. 

- Given the amount of food choices that people in Singapore have, it is extremely competitive to set up a restaurants in Singapore. Therefore, people who wants to set up a restaurants in Singapore needs to be strategic. This means that they have to can get some important insights into the different places that they could possibly set up their business. I will be exploring the major regions in Singapore (East, West, North, South, Central). In particular, I will be looking at the different amenities and businesses that are within the each region to determine the level of competition in each region. Afterwhich, together with the population data of the different regions, I will make some recommendation on the best possible location to set up a food restaurant.

Target audience:
- People who are looking to set up a restaurant in Singapore and want to set up in a place where it is less competitive but yet have access to a large pool of potential customer 


# 2. Data preparation (Week 1)

2.1. Get the names of the different regions in Singapore, regional centres, population density and population from Wikipedia

2.2. Processing the information from Wikipedia to make necessary the necessary dataframes 

2.3. Clean the data to remove missing data if necessary

2.4. Get the coordinates of the regional centres (the centre of the different regions in Singapore)

2.5. Check and compare with the coordinate data obtained from Google Search and refine if necessary


# 3. Exploratory Data Analysis. (Week 2)

# 3.1. Import the necessary packages

In [65]:
import numpy as np

from bs4 import BeautifulSoup
import requests
import pandas as pd

!conda install -c conda-forge folium=0.5.0 --yes
import folium
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



# 3.2. Import the data from wikipedia

In [5]:
response_obj = requests.get('https://en.wikipedia.org/wiki/Regions_of_Singapore').text
soup = BeautifulSoup(response_obj, 'lxml')
districts_tapei = soup.find('table', {'class':'wikitable'})
print(districts_tapei)


<table class="wikitable sortable">
<tbody><tr bgcolor="#CCCCCC">
<th>Region<sup class="reference" id="cite_ref-citypopulation_3-3"><a href="#cite_note-citypopulation-3">[3]</a></sup></th>
<th><a href="/wiki/Regional_centre_(Singapore)" title="Regional centre (Singapore)">Regional Centre</a></th>
<th>Largest PA by area</th>
<th>Largest PA by population</th>
<th>Area<br/>(km²)</th>
<th>Estimated
<p>Population<sup class="reference" id="cite_ref-4"><a href="#cite_note-4">[4]</a></sup>
</p>
</th>
<th>Population<br/>density<br/>(/km²)</th>
<th>Planning<br/>Areas
</th></tr>
<tr>
<td><a href="/wiki/Central_Region,_Singapore" title="Central Region, Singapore">Central Region</a></td>
<td><a href="/wiki/Central_Area,_Singapore" title="Central Area, Singapore">Central Area</a> (de facto)</td>
<td><a href="/wiki/Queenstown,_Singapore" title="Queenstown, Singapore">Queenstown</a></td>
<td><a href="/wiki/Bukit_Merah" title="Bukit Merah">Bukit Merah</a></td>
<td align="right">132.7</td>
<td align="rig

In [6]:
wiki = 'https://en.wikipedia.org/wiki/Regions_of_Singapore'
wikipedia_page = requests.get(wiki)

df_raw = pd.read_html(wikipedia_page.content, header=0)[1]

df_raw.head()

Unnamed: 0,Region[3],Regional Centre,Largest PA by area,Largest PA by population,Area(km²),Estimated Population[4],Populationdensity(/km²),PlanningAreas
0,Central Region,Central Area (de facto),Queenstown,Bukit Merah,132.7,922980,6955.0,22
1,East Region,Tampines,Changi,Bedok,93.1,686050,7369.0,6
2,North Region,Woodlands,Central Water Catchment,Woodlands,134.5,573950,4267.0,8
3,North-East Region,Seletar,North-Eastern Islands,Sengkang,103.9,921940,8873.0,9
4,West Region,Jurong East,Western Water Catchment,Jurong West,201.3,921340,4577.0,12


# 3.3. Change 'Central Area (de facto)' to 'Raffles Place' (The central business district in singapore)

In [8]:
df = df_raw.replace(to_replace = "Central Area (de facto)", value = "Raffles Place")
df1 = df.drop(df.index[5])

# 3.4. Get coordinates of 'Regional Centre'

In [9]:
geolocator = Nominatim(user_agent="singapore_explorer")
df1['RC_coordinate'] = df['Regional Centre'].apply(geolocator.geocode).apply(lambda x: (x.latitude,x.longitude))
df1

Unnamed: 0,Region[3],Regional Centre,Largest PA by area,Largest PA by population,Area(km²),Estimated Population[4],Populationdensity(/km²),PlanningAreas,RC_coordinate
0,Central Region,Raffles Place,Queenstown,Bukit Merah,132.7,922980,6955.0,22,"(1.2835416999999998, 103.85146023266938)"
1,East Region,Tampines,Changi,Bedok,93.1,686050,7369.0,6,"(1.3546528, 103.9435712)"
2,North Region,Woodlands,Central Water Catchment,Woodlands,134.5,573950,4267.0,8,"(30.1734194, -95.504686)"
3,North-East Region,Seletar,North-Eastern Islands,Sengkang,103.9,921940,8873.0,9,"(1.4098488, 103.8773789)"
4,West Region,Jurong East,Western Water Catchment,Jurong West,201.3,921340,4577.0,12,"(1.333115, 103.7422968)"


In [10]:
df1[['Latitude', 'Longitude']] = df1['RC_coordinate'].apply(pd.Series)
df1

Unnamed: 0,Region[3],Regional Centre,Largest PA by area,Largest PA by population,Area(km²),Estimated Population[4],Populationdensity(/km²),PlanningAreas,RC_coordinate,Latitude,Longitude
0,Central Region,Raffles Place,Queenstown,Bukit Merah,132.7,922980,6955.0,22,"(1.2835416999999998, 103.85146023266938)",1.283542,103.85146
1,East Region,Tampines,Changi,Bedok,93.1,686050,7369.0,6,"(1.3546528, 103.9435712)",1.354653,103.943571
2,North Region,Woodlands,Central Water Catchment,Woodlands,134.5,573950,4267.0,8,"(30.1734194, -95.504686)",30.173419,-95.504686
3,North-East Region,Seletar,North-Eastern Islands,Sengkang,103.9,921940,8873.0,9,"(1.4098488, 103.8773789)",1.409849,103.877379
4,West Region,Jurong East,Western Water Catchment,Jurong West,201.3,921340,4577.0,12,"(1.333115, 103.7422968)",1.333115,103.742297


In [12]:
df1.drop(['RC_coordinate'], axis=1, inplace=True)
df1

Unnamed: 0,Region[3],Regional Centre,Largest PA by area,Largest PA by population,Area(km²),Estimated Population[4],Populationdensity(/km²),PlanningAreas,Latitude,Longitude
0,Central Region,Raffles Place,Queenstown,Bukit Merah,132.7,922980,6955.0,22,1.283542,103.85146
1,East Region,Tampines,Changi,Bedok,93.1,686050,7369.0,6,1.354653,103.943571
2,North Region,Woodlands,Central Water Catchment,Woodlands,134.5,573950,4267.0,8,30.173419,-95.504686
3,North-East Region,Seletar,North-Eastern Islands,Sengkang,103.9,921940,8873.0,9,1.409849,103.877379
4,West Region,Jurong East,Western Water Catchment,Jurong West,201.3,921340,4577.0,12,1.333115,103.742297


# 3.5. Change Latitude and Longitude of the regional centre "Woodlands"

In [26]:
Lat_list = df1['Latitude'].tolist()
Long_list = df1['Longitude'].tolist()
print ("Old Latitude list: ", Lat_list)
print ("Old Longitude list: ", Long_list)
replace_latitudes = {30.1734194:1.435588}
replace_longitudes = {-95.504686:103.785220}

latitudes_new = [replace_latitudes.get(n3,n3) for n3 in Lat_list]
longtitudes_new = [replace_longitudes.get(n4,n4) for n4 in Long_list]
print (latitudes_new)
print (longtitudes_new)

sg_df = df1.drop(['Latitude', 'Longitude'], axis=1)

Old Latitude list:  [1.2835416999999998, 1.3546528, 30.1734194, 1.4098488, 1.333115]
Old Longitude list:  [103.85146023266938, 103.9435712, -95.504686, 103.8773789, 103.7422968]
[1.2835416999999998, 1.3546528, 1.435588, 1.4098488, 1.333115]
[103.85146023266938, 103.9435712, 103.78522, 103.8773789, 103.7422968]


In [28]:
sg_df['Dist_Latitude'] = latitudes_new
sg_df['Dist_Longitude'] = longtitudes_new

sg_df

Unnamed: 0,Region[3],Regional Centre,Largest PA by area,Largest PA by population,Area(km²),Estimated Population[4],Populationdensity(/km²),PlanningAreas,Dist_Latitude,Dist_Longitude
0,Central Region,Raffles Place,Queenstown,Bukit Merah,132.7,922980,6955.0,22,1.283542,103.85146
1,East Region,Tampines,Changi,Bedok,93.1,686050,7369.0,6,1.354653,103.943571
2,North Region,Woodlands,Central Water Catchment,Woodlands,134.5,573950,4267.0,8,1.435588,103.78522
3,North-East Region,Seletar,North-Eastern Islands,Sengkang,103.9,921940,8873.0,9,1.409849,103.877379
4,West Region,Jurong East,Western Water Catchment,Jurong West,201.3,921340,4577.0,12,1.333115,103.742297


# 3.6. Get Singapore Latitude and Longitude

In [29]:
address = 'Singapore'

geolocator = Nominatim(user_agent="singapore_explorer")
location = geolocator.geocode(address)
sg_latitude = location.latitude
sg_longitude = location.longitude
print('The geograpical coordinates of Singapore are {}, {}.'.format(sg_latitude, sg_longitude))

The geograpical coordinates of Singapore are 1.357107, 103.8194992.


In [31]:
# # create map of the 5 regional centres using latitude and longitude values
sg_5regional_centre = folium.Map(location=[sg_latitude, sg_longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(sg_df['Dist_Latitude'], sg_df['Dist_Longitude'], 
                           sg_df['Regional Centre']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=9,
        popup=label,
        color='magenta',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(sg_5regional_centre)  

sg_5regional_centre

# Use of FourSquare API

In [32]:
CLIENT_ID = 'PPR4LTKHD424D3ZQGNRT455OE0BKAT43XZ4WX4KOSRJ03N4M' # your Foursquare ID
CLIENT_SECRET = 'G2ESTCFCL2KAIB2DRN1IMNGE5BNX2SAVRWCQEGMCRQZSUELW' # your Foursquare Secret
VERSION = '20200613' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: PPR4LTKHD424D3ZQGNRT455OE0BKAT43XZ4WX4KOSRJ03N4M
CLIENT_SECRET:G2ESTCFCL2KAIB2DRN1IMNGE5BNX2SAVRWCQEGMCRQZSUELW


In [33]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?client_id=PPR4LTKHD424D3ZQGNRT455OE0BKAT43XZ4WX4KOSRJ03N4M&client_secret=G2ESTCFCL2KAIB2DRN1IMNGE5BNX2SAVRWCQEGMCRQZSUELW&v=20200613&ll=1.357107, 103.8194992&radius=500&limit=100'
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Regional Centre', 
                  'Dist_Latitude', 
                  'Dist_Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [35]:
sg_venues = getNearbyVenues(names=sg_df['Regional Centre'],
                                   latitudes=sg_df['Dist_Latitude'],
                                   longitudes=sg_df['Dist_Longitude']
                                  )

Raffles Place
Tampines
Woodlands
Seletar
Jurong East


In [37]:
print ("Shape of the Venues Dataframe: ", sg_venues.shape)
sg_venues.head()

Shape of the Venues Dataframe:  (60, 7)


Unnamed: 0,Regional Centre,Dist_Latitude,Dist_Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Raffles Place,1.283542,103.85146,SICC The Lookout,1.359105,103.818598,Café
1,Raffles Place,1.283542,103.85146,SICC Swimming Pool,1.357949,103.818607,Pool
2,Raffles Place,1.283542,103.85146,Island Bowl,1.358052,103.818661,Bowling Alley
3,Raffles Place,1.283542,103.85146,Silk Restaurant,1.359086,103.818634,Chinese Restaurant
4,Raffles Place,1.283542,103.85146,ISP Cafe @ SICC,1.357919,103.818601,Café


In [44]:
print (sg_venues['Venue Category'].value_counts())

Café                   10
Trail                  10
Sporting Goods Shop     5
Bowling Alley           5
Diner                   5
Bar                     5
Pool                    5
Golf Course             5
Chinese Restaurant      5
Gym                     5
Name: Venue Category, dtype: int64


# Analyze Each Neighborhood

In [46]:
sg_onehot = pd.get_dummies(sg_venues[['Venue Category']], prefix="", prefix_sep="")

sg_onehot['Regional Centre'] = sg_venues['Regional Centre']

col_name='Regional Centre'
first_col = sg_onehot.pop(col_name)

sg_onehot.insert(0, col_name, first_col)

sg_onehot.head()


Unnamed: 0,Regional Centre,Bar,Bowling Alley,Café,Chinese Restaurant,Diner,Golf Course,Gym,Pool,Sporting Goods Shop,Trail
0,Raffles Place,0,0,1,0,0,0,0,0,0,0
1,Raffles Place,0,0,0,0,0,0,0,1,0,0
2,Raffles Place,0,1,0,0,0,0,0,0,0,0
3,Raffles Place,0,0,0,1,0,0,0,0,0,0
4,Raffles Place,0,0,1,0,0,0,0,0,0,0


In [47]:
sg_grouped = sg_onehot.groupby('Regional Centre').mean().reset_index()
sg_grouped

Unnamed: 0,Regional Centre,Bar,Bowling Alley,Café,Chinese Restaurant,Diner,Golf Course,Gym,Pool,Sporting Goods Shop,Trail
0,Jurong East,0.083333,0.083333,0.166667,0.083333,0.083333,0.083333,0.083333,0.083333,0.083333,0.166667
1,Raffles Place,0.083333,0.083333,0.166667,0.083333,0.083333,0.083333,0.083333,0.083333,0.083333,0.166667
2,Seletar,0.083333,0.083333,0.166667,0.083333,0.083333,0.083333,0.083333,0.083333,0.083333,0.166667
3,Tampines,0.083333,0.083333,0.166667,0.083333,0.083333,0.083333,0.083333,0.083333,0.083333,0.166667
4,Woodlands,0.083333,0.083333,0.166667,0.083333,0.083333,0.083333,0.083333,0.083333,0.083333,0.166667


# Print each regional centre along with the top 5 most common venues

In [48]:
num_top_venues = 5

for places in sg_grouped['Regional Centre']:
    print("%%%%%%%%%"+places+"%%%%%%%%")
    temp = sg_grouped[sg_grouped['Regional Centre'] == places].T.reset_index()
    temp.columns = ['Venue','Freq']
    temp = temp.iloc[1:]
    temp['Freq'] = temp['Freq'].astype(float)
    temp = temp.round({'Freq': 2})
    print(temp.sort_values('Freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

%%%%%%%%%Jurong East%%%%%%%%
                Venue  Freq
0                Café  0.17
1               Trail  0.17
2                 Bar  0.08
3       Bowling Alley  0.08
4  Chinese Restaurant  0.08


%%%%%%%%%Raffles Place%%%%%%%%
                Venue  Freq
0                Café  0.17
1               Trail  0.17
2                 Bar  0.08
3       Bowling Alley  0.08
4  Chinese Restaurant  0.08


%%%%%%%%%Seletar%%%%%%%%
                Venue  Freq
0                Café  0.17
1               Trail  0.17
2                 Bar  0.08
3       Bowling Alley  0.08
4  Chinese Restaurant  0.08


%%%%%%%%%Tampines%%%%%%%%
                Venue  Freq
0                Café  0.17
1               Trail  0.17
2                 Bar  0.08
3       Bowling Alley  0.08
4  Chinese Restaurant  0.08


%%%%%%%%%Woodlands%%%%%%%%
                Venue  Freq
0                Café  0.17
1               Trail  0.17
2                 Bar  0.08
3       Bowling Alley  0.08
4  Chinese Restaurant  0.08




In [49]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [52]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Regional Centre']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
Regional_Centre_venues_sorted = pd.DataFrame(columns=columns)
Regional_Centre_venues_sorted['Regional Centre'] = sg_grouped['Regional Centre']

for ind in np.arange(sg_grouped.shape[0]):
    Regional_Centre_venues_sorted.iloc[ind, 1:] = return_most_common_venues(sg_grouped.iloc[ind, :], num_top_venues)

Regional_Centre_venues_sorted.head()

Unnamed: 0,Regional Centre,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Jurong East,Trail,Café,Sporting Goods Shop,Pool,Gym,Golf Course,Diner,Chinese Restaurant,Bowling Alley,Bar
1,Raffles Place,Trail,Café,Sporting Goods Shop,Pool,Gym,Golf Course,Diner,Chinese Restaurant,Bowling Alley,Bar
2,Seletar,Trail,Café,Sporting Goods Shop,Pool,Gym,Golf Course,Diner,Chinese Restaurant,Bowling Alley,Bar
3,Tampines,Trail,Café,Sporting Goods Shop,Pool,Gym,Golf Course,Diner,Chinese Restaurant,Bowling Alley,Bar
4,Woodlands,Trail,Café,Sporting Goods Shop,Pool,Gym,Golf Course,Diner,Chinese Restaurant,Bowling Alley,Bar


# 4. Clustering the 5 regional centres in Singapore

In [56]:
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 2

sg_grouped_clustering = sg_grouped.drop('Regional Centre', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(sg_grouped_clustering)

# check cluster labels generated for each row in the dataframe
print ("Check the 5 Cluster labels :",  kmeans.labels_[0:5])

Check the 5 Cluster labels : [0 0 0 0 0]


  


In [57]:

Regional_Centre_venues_sorted.insert(0, 'Cluster Label', kmeans.labels_)

Regional_Centre_venues_sorted_cluster_merged = sg_df

#merge the inital Tokyo Dataframe with Sorted Most Visited places for each neighborhood

Regional_Centre_venues_sorted_cluster_merged = Regional_Centre_venues_sorted_cluster_merged.join \
                                        (Regional_Centre_venues_sorted.set_index('Regional Centre'), on='Regional Centre')

Regional_Centre_venues_sorted_cluster_merged.head()

Unnamed: 0,Region[3],Regional Centre,Largest PA by area,Largest PA by population,Area(km²),Estimated Population[4],Populationdensity(/km²),PlanningAreas,Dist_Latitude,Dist_Longitude,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central Region,Raffles Place,Queenstown,Bukit Merah,132.7,922980,6955.0,22,1.283542,103.85146,...,Trail,Café,Sporting Goods Shop,Pool,Gym,Golf Course,Diner,Chinese Restaurant,Bowling Alley,Bar
1,East Region,Tampines,Changi,Bedok,93.1,686050,7369.0,6,1.354653,103.943571,...,Trail,Café,Sporting Goods Shop,Pool,Gym,Golf Course,Diner,Chinese Restaurant,Bowling Alley,Bar
2,North Region,Woodlands,Central Water Catchment,Woodlands,134.5,573950,4267.0,8,1.435588,103.78522,...,Trail,Café,Sporting Goods Shop,Pool,Gym,Golf Course,Diner,Chinese Restaurant,Bowling Alley,Bar
3,North-East Region,Seletar,North-Eastern Islands,Sengkang,103.9,921940,8873.0,9,1.409849,103.877379,...,Trail,Café,Sporting Goods Shop,Pool,Gym,Golf Course,Diner,Chinese Restaurant,Bowling Alley,Bar
4,West Region,Jurong East,Western Water Catchment,Jurong West,201.3,921340,4577.0,12,1.333115,103.742297,...,Trail,Café,Sporting Goods Shop,Pool,Gym,Golf Course,Diner,Chinese Restaurant,Bowling Alley,Bar


In [61]:
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[sg_latitude, sg_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Regional_Centre_venues_sorted_cluster_merged['Dist_Latitude'], Regional_Centre_venues_sorted_cluster_merged['Dist_Longitude'], Regional_Centre_venues_sorted_cluster_merged['Regional Centre'], Regional_Centre_venues_sorted_cluster_merged['Cluster Label']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [63]:
Regional_Centre_venues_sorted_cluster_merged.loc[Regional_Centre_venues_sorted_cluster_merged['Cluster Label'] == 0, Regional_Centre_venues_sorted_cluster_merged.columns[[1] + list(range(5, Regional_Centre_venues_sorted_cluster_merged.shape[1]))]]


Unnamed: 0,Regional Centre,Estimated Population[4],Populationdensity(/km²),PlanningAreas,Dist_Latitude,Dist_Longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Raffles Place,922980,6955.0,22,1.283542,103.85146,0,Trail,Café,Sporting Goods Shop,Pool,Gym,Golf Course,Diner,Chinese Restaurant,Bowling Alley,Bar
1,Tampines,686050,7369.0,6,1.354653,103.943571,0,Trail,Café,Sporting Goods Shop,Pool,Gym,Golf Course,Diner,Chinese Restaurant,Bowling Alley,Bar
2,Woodlands,573950,4267.0,8,1.435588,103.78522,0,Trail,Café,Sporting Goods Shop,Pool,Gym,Golf Course,Diner,Chinese Restaurant,Bowling Alley,Bar
3,Seletar,921940,8873.0,9,1.409849,103.877379,0,Trail,Café,Sporting Goods Shop,Pool,Gym,Golf Course,Diner,Chinese Restaurant,Bowling Alley,Bar
4,Jurong East,921340,4577.0,12,1.333115,103.742297,0,Trail,Café,Sporting Goods Shop,Pool,Gym,Golf Course,Diner,Chinese Restaurant,Bowling Alley,Bar


In [64]:
Regional_Centre_venues_sorted_cluster_merged.loc[Regional_Centre_venues_sorted_cluster_merged['Cluster Label'] == 1, Regional_Centre_venues_sorted_cluster_merged.columns[[1] + list(range(5, Regional_Centre_venues_sorted_cluster_merged.shape[1]))]]



Unnamed: 0,Regional Centre,Estimated Population[4],Populationdensity(/km²),PlanningAreas,Dist_Latitude,Dist_Longitude,Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


# Conclusion

From the data analysis above, it is observed that Trail, Cafe and Sporting goods shop are the top 3 most common venues in the 5 regional centres. From the analysis, Cafe, Diner, Chinese Restaurant and Bar were the top 10 most common venues in the 5 regional centres, suggesting high level of competition. 

Even though the data shows that there are high level of competition, it can be seen that there is a lack of variety especially when it comes to the type of cuisine. In particular, only chinese restuarant is among the top 10 most common venue. This is not surpising as the majority of the Singapore's population is Chinese. However, it is worth noting that Singapore is a multicultural society with people coming from different countries, race and backgrounds. Therefore, for people who are looking to set up restaurant that provide non-chinese cuisine, there will be less competition and there could be a potential demand for the food. (Singaporeans love to try out all kinds of food :) )

Considering the fact that the cluster analysis shows that there is no different between the 5 regions, we need to look beyond the type the businesses and amenities within the 5 regional centres. People who want to start a restaurant can choose to look at the population and the demographics in each regional centre. From the data, Seletar is the most densely populated (8873.0/km²) and this means that the business could potentially get more customer and hence more revenue given the same level of competition across all 5 regional centres.

There are some limitations in this analysis. I have not considered the average rental prices in these 5 regional centres. As average rental prices in Singapore varies significantly every year and due to the difficulty in obtaining the data, the average rental prices was left out. Future analysis should take into consideration the average rental price as it may be an important consideration factor for people who are interested in setting up a restaurant. (A rental price that is too high may deter people from setting up a business in the area as there are too much sunk cost) Moreover, future analysis should further stratify the analysis according to the crowd and the movements of people at the various regional centres at different times of the day.  

Nevertheless, this project still provide glimpse the distribution of different amenities and business in the 5 regional centres of Singapore. The insight in this analysis could give some insights to people who want to set up a restuarant in Singapore.