# Applied Data Science Capstone Notebook

## Segmenting and Clustering the Mass Rapid Transit (MRT) Train Stations in Singapore

# Introduction

The Mass Rapid Transit (MRT), is a heavy rail rapid transit system that constitutes the bulk of the railway network in Singapore, spanning—with the exception of the forested core and the island's rural northwest—the length and width of the city-state's main island. It is one of two of our main public transport in Singapore. The other being public bus services.

Currently, there are 5 MRT lines in Singapore: **East-West Line, North-South Line, North-East Line, Circle Line and Downtown Line**.

As the MRT is extensively use as a form of commute for most Singaporeans, these locations often offer a good flow of human traffic.

In this Capstone Project for the Applied Data Science Professional Certificate by IBM, I will be analysing the ideal areas where food businesses can set up their shop with the consideration of the types of businesses that are currently around each MRT Station. This will help us to decide which locations are ideal to start a business.


# Data

I will be explaining the datasets required and my sources.

### MRT Station Data
I have gotten Station Names, Postal Codes, Latitude and Longitude of all MRT Stations in Singapore through a user on Github who has collated this data. (https://github.com/xkjyeah/singapore-postal-codes/blob/master/mrt_stations.json) Although the file came in JSON format and I used other methods to convert it to an appropriate CSV file

###  Venue Data
I will be getting the data of other business and venues around each and every MRT Station through Foursquare API

---


## Exploratory Data Analysis

##### Importing the required libraries

In [185]:
import pandas as pd #import pandas library
import requests #import requests library
from bs4 import BeautifulSoup #import BeautifulSoup
import numpy as np #import numpy library
from geopy.geocoders import Nominatim #import Nominatim to retrieve Singapore's Longitude and Latitude
import matplotlib.cm as cm #for data visualisation
import matplotlib.colors as colors #for data visualisation
from sklearn.cluster import KMeans #for clustering of MRT Stations
import json #to access Foursquare data
from pandas.io.json import json_normalize #for normalising data
import folium #for data visualisation
print('Importing done!')

Importing done!


##### Importing and cleaning up of the MRT Station data by removing unnecesary columns and missing values

In [179]:
MRT = pd.DataFrame(pd.read_csv('/Users/adayummmm/Downloads/mrt_stations.csv')) #importing CSV file into Python as a data frame
MRT = MRT.rename(columns={col: col.split('_')[-1] for col in MRT.columns}) #removing unnecessary strings in column names
MRT = MRT.drop(['ADDRESS','BUILDING','NO','LONGTITUDE', 'NAME','SEARCHVAL','X','Y'], axis = 1) #removing unecessary columns
MRT = MRT[MRT.columns[::-1]] #rearranging the index
MRT = MRT.dropna() #removing NA values
MRT.head()

Unnamed: 0,Station Name,Station,POSTAL,LONGITUDE,LATITUDE
0,Jurong East,NS1,609690,103.742287,1.333153
1,Bukit Batok,NS2,659958,103.749567,1.349034
2,Bukit Gombak,NS3,659083,103.751791,1.358612
3,Choa Chu Kang,NS4,689810,103.744371,1.385363
4,Yew Tee,NS5,689715,103.747405,1.397535


##### Getting the coordinates of Singapore

In [102]:
address = 'Singapore, SG'

geolocator = Nominatim(user_agent="Singapore")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Singapore are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Singapore are 1.357107, 103.8194992.


##### Creating a map of Singapore with the MRT Stations superimposed on top

In [189]:
map_SG = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, name , station in zip(MRT_Stations['LATITUDE'], MRT_Stations['LONGITUDE'], MRT_Stations['Station Name'], MRT_Stations['Station']):
    label = '{}, {}'.format(name, station)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_SG)  
    
map_SG

# Methodology

In this project I aim to gather the most commonly visited places from each MRT Station and find out the categories of the venues. This will be done by using Foursquare to extract venue and venue category datas for each MRT Station around a 500m radius of the MRT Station.

Next, we will do a count on how many of each category of venues are there around each MRT Station by grouping the venues together. Afterwhich, we will group the data by MRT Stations instead so that we can find out the most common types of businesses around each MRT Station.

We will then use k *means* clustering to cluster MRT Stations with similar common venues so that we can focus on the clusters we are interested in. This will give us a handful of options to set up our food business.

##### Defining Foursquare credentials and version

In [181]:
CLIENT_ID = 'HGBEG2ECDTBOX0X13QXI3G22WSVGE03CGSV31ZSZ00OWD24A'
CLIENT_SECRET = 'GUNFI4HR0RQETCCYDJL4ABUFBXLVUF3M1C1MKV0DSAKFCKQT'
VERSION = '20180605' 
ACCESS_TOKEN = "420YJQM1NMPL5WUC2U54G4L0EPT1VUKGQVD1KRZH014VUL2Q"

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: HGBEG2ECDTBOX0X13QXI3G22WSVGE03CGSV31ZSZ00OWD24A
CLIENT_SECRET:GUNFI4HR0RQETCCYDJL4ABUFBXLVUF3M1C1MKV0DSAKFCKQT


##### Exploring the data using the first MRT Station

In [183]:
print(MRT.loc[0, 'Station Name'])
print('\n-------\n')

Station_latitude = MRT.loc[0, 'LATITUDE'] 
Station_longitude = MRT.loc[0, 'LONGITUDE'] 

Station_name = MRT.loc[0, 'Station Name'] 

print('Latitude and longitude values of {} are {}, {}.'.format(Station_name, 
                                                               Station_latitude, 
                                                               Station_longitude))

Jurong East

-------

Latitude and longitude values of Jurong East are 1.33315261987297, 103.742286544006.


##### Creating GET request via Foursquare API

In [74]:
LIMIT = 100
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&oauth_token={}'.format(CLIENT_ID,CLIENT_SECRET,VERSION,Station_latitude,Station_longitude,radius,LIMIT,ACCESS_TOKEN)

results = requests.get(url).json()

##### Sending GET request and extracting the category of venues

In [105]:
results = requests.get(url).json()

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
print('Done!')

Done!


##### Converting data from JSON to *pandas* data frame

In [191]:
venues = results['response']['groups'][0]['items']

nearby_venues = pd.json_normalize(venues) 

filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

print(nearby_venues.head(), '\n ----- \n','{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

KeyError: 'groups'

### Now let's explore all the stations!

##### Extracting the venues for each MRT Station

In [100]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Station', 
                  'Station Latitude', 
                  'Station Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Station_venues = getNearbyVenues(names=MRT['Station Name'],
                                   latitudes=MRT['LATITUDE'],
                                   longitudes=MRT['LONGITUDE']
                                  )


Jurong East
Bukit Batok
Bukit Gombak
Choa Chu Kang
Yew Tee
Kranji
Marsiling
Woodlands
Admiralty
Sembawang
Yishun
Khatib
Yio Chu Kang
Ang Mo Kio
Bishan
Braddell
Toa Payoh
Novena
Newton
Orchard
Somerset
Dhoby Ghaut
City Hall
Raffles Place
Marina Bay
Marina South Pier
Pasir Ris
Tampines
Simei
Tanah Merah
Bedok
Kembangan
Eunos
Paya Lebar
Aljunied
Kallang
Lavender
Bugis
City Hall
Raffles Place
Tanjong Pagar
Outram Park
Tiong Bahru
Redhill
Queenstown
Commonwealth
Buona Vista
Dover
Clementi
Jurong East
Chinese Garden
Lakeside
Boon Lay
Pioneer
Joo Koon
Gul Circle
Tuas Crescent
Tuas West Road
Tuas Link
Expo
Changi Airport
HarbourFront
Outram Park
Chinatown
Clarke Quay
Dhoby Ghaut
Little India
Farrer Park
Boon Keng
Potong Pasir
Woodleigh
Serangoon
Kovan
Hougang
Buangkok
Sengkang
Punggol
Dhoby Ghaut
Bras Basah
Esplanade
Promenade
Nicoll Highway
Stadium
Mountbatten
Dakota
Paya Lebar
MacPherson
Tai Seng
Bartley
Serangoon
Lorong Chuan
Bishan
Marymount
Caldecott
Botanic Gardens
Farrer Road
Holland Vi

KeyError: 'groups'

##### Take a look at the results

In [108]:
print(Station_venues.shape)
Station_venues.head()

(5601, 7)


Unnamed: 0,Station,Station Latitude,Station Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Jurong East,1.333153,103.742287,UNIQLO,1.333175,103.74316,Clothing Store
1,Jurong East,1.333153,103.742287,MUJI 無印良品,1.333187,103.743064,Furniture / Home Store
2,Jurong East,1.333153,103.742287,Johan Paris,1.334083,103.742384,Bakery
3,Jurong East,1.333153,103.742287,The Rink,1.333424,103.740345,Skating Rink
4,Jurong East,1.333153,103.742287,Song Fa Bak Kut Teh 松發肉骨茶,1.333394,103.74342,Chinese Restaurant


##### Grouping the venues by MRT Stations

In [109]:
Station_venues.groupby('Station').count()

Unnamed: 0_level_0,Station Latitude,Station Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Station,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Admiralty,7,7,7,7,7,7
Aljunied,48,48,48,48,48,48
Ang Mo Kio,43,43,43,43,43,43
Bartley,8,8,8,8,8,8
Bayfront,82,82,82,82,82,82
...,...,...,...,...,...,...
Woodleigh,8,8,8,8,8,8
Yew Tee,9,9,9,9,9,9
Yio Chu Kang,17,17,17,17,17,17
Yishun,51,51,51,51,51,51


In [83]:
print('There are {} uniques categories.'.format(len(station_venues['Venue Category'].unique())))

There are 306 uniques categories.


### Moving on to analysing each MRT Station

##### Applying hot encoding to the venues

In [110]:
station_onehot = pd.get_dummies(Station_venues[['Venue Category']], prefix="", prefix_sep="")

station_onehot['Station'] = Station_venues['Station'] 

fixed_columns = [station_onehot.columns[-1]] + list(station_onehot.columns[:-1])
station_onehot = station_onehot[fixed_columns]

station_onehot.head()

Unnamed: 0,Station,ATM,Accessories Store,Airport,Airport Lounge,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,...,Water Park,Waterfall,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Yunnan Restaurant
0,Jurong East,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Jurong East,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Jurong East,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Jurong East,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Jurong East,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [111]:
station_onehot.shape

(5601, 307)

##### Grouping the venues by MRT Stations

In [112]:
station_grouped = station_onehot.groupby('Station').mean().reset_index()
station_grouped

Unnamed: 0,Station,ATM,Accessories Store,Airport,Airport Lounge,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,...,Water Park,Waterfall,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Yunnan Restaurant
0,Admiralty,0.0,0.00000,0.0,0.0,0.0,0.000000,0.0,0.00000,0.00000,...,0.0,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
1,Aljunied,0.0,0.00000,0.0,0.0,0.0,0.020833,0.0,0.00000,0.00000,...,0.0,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
2,Ang Mo Kio,0.0,0.00000,0.0,0.0,0.0,0.000000,0.0,0.00000,0.00000,...,0.0,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
3,Bartley,0.0,0.00000,0.0,0.0,0.0,0.000000,0.0,0.00000,0.00000,...,0.0,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
4,Bayfront,0.0,0.02439,0.0,0.0,0.0,0.000000,0.0,0.02439,0.02439,...,0.0,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
113,Woodleigh,0.0,0.00000,0.0,0.0,0.0,0.000000,0.0,0.00000,0.00000,...,0.0,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
114,Yew Tee,0.0,0.00000,0.0,0.0,0.0,0.000000,0.0,0.00000,0.00000,...,0.0,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
115,Yio Chu Kang,0.0,0.00000,0.0,0.0,0.0,0.000000,0.0,0.00000,0.00000,...,0.0,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000
116,Yishun,0.0,0.00000,0.0,0.0,0.0,0.019608,0.0,0.00000,0.00000,...,0.0,0.0,0.00000,0.0,0.0,0.0,0.0,0.0,0.0,0.000000


In [113]:
station_grouped.shape

(118, 307)

##### Let's take a look at the top 5 most common venues of each MRT Station

In [115]:
num_top_venues = 5

for mrt in station_grouped['Station']:
    print("----"+mrt+"----")
    temp = station_grouped[station_grouped['Station'] == mrt].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n -----\n')

----Admiralty----
                  venue  freq
0           Supermarket  0.29
1  Fast Food Restaurant  0.14
2          Optical Shop  0.14
3                  Café  0.14
4            Food Court  0.14

 -----

----Aljunied----
                           venue  freq
0             Chinese Restaurant  0.10
1               Asian Restaurant  0.08
2                   Noodle House  0.06
3  Vegetarian / Vegan Restaurant  0.06
4                    Coffee Shop  0.04

 -----

----Ang Mo Kio----
                 venue  freq
0          Coffee Shop  0.09
1           Food Court  0.07
2         Dessert Shop  0.07
3  Japanese Restaurant  0.07
4      Bubble Tea Shop  0.05

 -----

----Bartley----
           venue  freq
0   Noodle House  0.25
1    Bus Station  0.25
2  Metro Station  0.12
3           Park  0.12
4           Café  0.12

 -----

----Bayfront----
                 venue  freq
0                Hotel  0.07
1             Boutique  0.07
2  Japanese Restaurant  0.05
3            Roof Deck  0.05
4     

4  Performing Arts Venue  0.00

 -----

----MacPherson----
                           venue  freq
0                     Food Court  0.12
1               Asian Restaurant  0.12
2  Vegetarian / Vegan Restaurant  0.12
3                  Metro Station  0.06
4                   Noodle House  0.06

 -----

----Marina Bay----
                venue  freq
0                 Pub  0.08
1  Seafood Restaurant  0.04
2               Field  0.04
3            Building  0.04
4             Brewery  0.04

 -----

----Marina South Pier----
            venue  freq
0   Boat or Ferry  0.25
1          Cruise  0.12
2  History Museum  0.12
3     Snack Place  0.12
4   Metro Station  0.12

 -----

----Marsiling----
                  venue  freq
0            Food Court  0.12
1         Grocery Store  0.12
2           Coffee Shop  0.12
3  Fast Food Restaurant  0.06
4           Flea Market  0.06

 -----

----Marymount----
                   venue  freq
0     Chinese Restaurant  0.33
1  Outdoors & Recreation  0.17
2    

##### Sorting the data into *pandas* data frame in descending order

In [116]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

##### Now we'll look at the top 10 most common venues of each MRT Station

In [196]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']


columns = ['Station']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))


station_venues_sorted = pd.DataFrame(columns=columns)
station_venues_sorted['Station'] = station_grouped['Station']

for ind in np.arange(station_grouped.shape[0]):
    station_venues_sorted.iloc[ind, 1:] = return_most_common_venues(station_grouped.iloc[ind, :], num_top_venues)

station_venues_sorted.head()

Unnamed: 0,Station,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Admiralty,Supermarket,Optical Shop,Coffee Shop,Café,Food Court,Fast Food Restaurant,Food Truck,Food Stand,French Restaurant,Fried Chicken Joint
1,Aljunied,Chinese Restaurant,Asian Restaurant,Noodle House,Vegetarian / Vegan Restaurant,Breakfast Spot,Dim Sum Restaurant,Food Court,Café,Seafood Restaurant,Coffee Shop
2,Ang Mo Kio,Coffee Shop,Food Court,Dessert Shop,Japanese Restaurant,Bubble Tea Shop,Supermarket,Sandwich Place,Fast Food Restaurant,Noodle House,Malay Restaurant
3,Bartley,Noodle House,Bus Station,Bus Stop,Café,Metro Station,Park,Yunnan Restaurant,Filipino Restaurant,Fish & Chips Shop,Flea Market
4,Bayfront,Hotel,Boutique,Waterfront,Roof Deck,Bridge,Lounge,Japanese Restaurant,Garden,Accessories Store,Sandwich Place


## Classification of Data

##### Using k *means* clustering to form clusters

In [197]:
kclusters = 5

station_grouped_clustering = station_grouped.drop('Station', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(station_grouped_clustering)

kmeans.labels_[0:10] 

array([0, 3, 0, 4, 0, 0, 0, 3, 4, 0], dtype=int32)

##### Adding the clusters into our data frame

In [198]:
station_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

station_merged = MRT

station_merged = station_merged.join(station_venues_sorted.set_index('Station'), on='Station Name')
station_merged = station_merged.dropna()
station_merged['Cluster Labels'] = station_merged[['Cluster Labels']].astype(int)

station_merged.head()

Unnamed: 0,Station Name,Station,POSTAL,LONGITUDE,LATITUDE,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Jurong East,NS1,609690,103.742287,1.333153,0,Coffee Shop,Japanese Restaurant,Chinese Restaurant,Food Court,Café,Shopping Mall,Steakhouse,Clothing Store,Multiplex,Bubble Tea Shop
1,Bukit Batok,NS2,659958,103.749567,1.349034,3,Coffee Shop,Grocery Store,Food Court,Department Store,Chinese Restaurant,Sandwich Place,Multiplex,Bowling Alley,Mobile Phone Shop,Café
2,Bukit Gombak,NS3,659083,103.751791,1.358612,3,Food Court,Vegetarian / Vegan Restaurant,Coffee Shop,Stadium,ATM,Lake,Malay Restaurant,Flea Market,Chinese Restaurant,Sandwich Place
3,Choa Chu Kang,NS4,689810,103.744371,1.385363,0,Coffee Shop,Fast Food Restaurant,Supermarket,Asian Restaurant,Bubble Tea Shop,Sandwich Place,Café,Bakery,Smoke Shop,Portuguese Restaurant
4,Yew Tee,NS5,689715,103.747405,1.397535,0,Fast Food Restaurant,Diner,Food Court,Café,Sandwich Place,Pool,Shopping Mall,Japanese Restaurant,Food Stand,Food & Drink Shop


##### Now we'll create a map of Singapore with our clusters

In [199]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(station_merged['LATITUDE'], station_merged['LONGITUDE'], station_merged['Station Name'], station_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Results

### Cluster 1 (Red)

In [200]:
station_merged.loc[station_merged['Cluster Labels'] == 0, station_merged.columns[[0,1] + list(range(5, station_merged.shape[1]))]]


Unnamed: 0,Station Name,Station,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Jurong East,NS1,0,Coffee Shop,Japanese Restaurant,Chinese Restaurant,Food Court,Café,Shopping Mall,Steakhouse,Clothing Store,Multiplex,Bubble Tea Shop
3,Choa Chu Kang,NS4,0,Coffee Shop,Fast Food Restaurant,Supermarket,Asian Restaurant,Bubble Tea Shop,Sandwich Place,Café,Bakery,Smoke Shop,Portuguese Restaurant
4,Yew Tee,NS5,0,Fast Food Restaurant,Diner,Food Court,Café,Sandwich Place,Pool,Shopping Mall,Japanese Restaurant,Food Stand,Food & Drink Shop
5,Kranji,NS7,0,Racetrack,Bus Station,Go Kart Track,Bakery,Noodle House,Bus Line,Bus Stop,Filipino Restaurant,Fish & Chips Shop,Flea Market
6,Marsiling,NS8,0,Grocery Store,Food Court,Coffee Shop,Flower Shop,Seafood Restaurant,Hainan Restaurant,Fast Food Restaurant,Trail,BBQ Joint,Pharmacy
...,...,...,...,...,...,...,...,...,...,...,...,...,...
170,Geylang Bahru,DT24,0,Indian Restaurant,Supermarket,Café,Bakery,Betting Shop,Coffee Shop,Food Court,Pool,Food Truck,French Restaurant
178,Tampines West,DT31,0,Fast Food Restaurant,Halal Restaurant,Food Court,Café,Gym,Bakery,Vegetarian / Vegan Restaurant,Pharmacy,Coffee Shop,Club House
179,Tampines,DT32,0,Café,Bakery,Bubble Tea Shop,Coffee Shop,Thai Restaurant,Shopping Mall,Clothing Store,Sushi Restaurant,Dessert Shop,Gym
181,Tampines East,DT33,0,Coffee Shop,Indian Restaurant,Supermarket,Convenience Store,Dessert Shop,Pizza Place,Sandwich Place,Fast Food Restaurant,Bus Station,Optical Shop


### Cluster 2 (Purple)

In [201]:
station_merged.loc[station_merged['Cluster Labels'] == 1, station_merged.columns[[0,1] + list(range(5, station_merged.shape[1]))]]


Unnamed: 0,Station Name,Station,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
86,Little India,NE7,1,Indian Restaurant,Vegetarian / Vegan Restaurant,Restaurant,Gym,General College & University,Bakery,BBQ Joint,Coffee Shop,Motel,Bus Line
88,Farrer Park,NE8,1,Indian Restaurant,Café,Chinese Restaurant,Hotel,Motel,Vegetarian / Vegan Restaurant,Climbing Gym,Dumpling Restaurant,Seafood Restaurant,Sushi Restaurant
153,Little India,DT12,1,Indian Restaurant,Vegetarian / Vegan Restaurant,Restaurant,Gym,General College & University,Bakery,BBQ Joint,Coffee Shop,Motel,Bus Line
155,Rochor,DT13,1,Indian Restaurant,Café,Chinese Restaurant,Vegetarian / Vegan Restaurant,Ice Cream Shop,Dessert Shop,Dive Bar,BBQ Joint,Middle Eastern Restaurant,Food Court
168,Jalan Besar,DT22,1,Indian Restaurant,Chinese Restaurant,Hotel,Café,Vegetarian / Vegan Restaurant,Food Court,Hostel,Bakery,Asian Restaurant,Thai Restaurant


### Cluster 3 (Blue)

In [202]:
station_merged.loc[station_merged['Cluster Labels'] == 2, station_merged.columns[[0,1] + list(range(5, station_merged.shape[1]))]]


Unnamed: 0,Station Name,Station,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
122,Caldecott,CC17,2,Flower Shop,Metro Station,Ice Cream Shop,Garden Center,Office,Sandwich Place,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop


### Cluster 4 (Green)

In [203]:
station_merged.loc[station_merged['Cluster Labels'] == 3, station_merged.columns[[0,1] + list(range(5, station_merged.shape[1]))]]


Unnamed: 0,Station Name,Station,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Bukit Batok,NS2,3,Coffee Shop,Grocery Store,Food Court,Department Store,Chinese Restaurant,Sandwich Place,Multiplex,Bowling Alley,Mobile Phone Shop,Café
2,Bukit Gombak,NS3,3,Food Court,Vegetarian / Vegan Restaurant,Coffee Shop,Stadium,ATM,Lake,Malay Restaurant,Flea Market,Chinese Restaurant,Sandwich Place
13,Yio Chu Kang,NS15,3,Food Court,Cupcake Shop,Volleyball Court,Noodle House,Bus Station,Fast Food Restaurant,Tennis Court,Gym,Bakery,Vegetarian / Vegan Restaurant
16,Bishan,NS17,3,Food Court,Coffee Shop,Chinese Restaurant,Bubble Tea Shop,Japanese Restaurant,Supermarket,Café,Cosmetics Shop,Ice Cream Shop,Shopping Mall
18,Braddell,NS18,3,Food Court,Chinese Restaurant,Noodle House,Café,Hakka Restaurant,Seafood Restaurant,Thai Restaurant,Asian Restaurant,Bakery,Chinese Breakfast Place
19,Toa Payoh,NS19,3,Chinese Restaurant,Coffee Shop,Snack Place,Food Court,Dessert Shop,Asian Restaurant,Bakery,Burger Joint,Steakhouse,Supermarket
21,Newton,NS21,3,Chinese Restaurant,Hotel Bar,Italian Restaurant,Seafood Restaurant,Gym / Fitness Center,Dance Studio,Convenience Store,Pool,Café,Noodle House
40,Kembangan,EW6,3,Indian Restaurant,Chinese Restaurant,Noodle House,Bistro,Train Station,Supermarket,Bus Stop,BBQ Joint,Food Court,Asian Restaurant
41,Eunos,EW7,3,Chinese Restaurant,Coffee Shop,Noodle House,Bubble Tea Shop,Seafood Restaurant,Train Station,Vegetarian / Vegan Restaurant,Grocery Store,Asian Restaurant,Gym
44,Aljunied,EW9,3,Chinese Restaurant,Asian Restaurant,Noodle House,Vegetarian / Vegan Restaurant,Breakfast Spot,Dim Sum Restaurant,Food Court,Café,Seafood Restaurant,Coffee Shop


### Cluster 5 (Orange)

In [206]:
station_merged.loc[station_merged['Cluster Labels'] == 4, station_merged.columns[[0,1] + list(range(5, station_merged.shape[1]))]]


Unnamed: 0,Station Name,Station,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
65,Lakeside,EW26,4,Bus Station,Convenience Store,Skate Park,Vegetarian / Vegan Restaurant,Snack Place,Exhibit,Fast Food Restaurant,Field,Filipino Restaurant,Fish & Chips Shop
115,Bartley,CC12,4,Noodle House,Bus Station,Bus Stop,Café,Metro Station,Park,Yunnan Restaurant,Filipino Restaurant,Fish & Chips Shop,Flea Market
118,Lorong Chuan,CC14,4,Bus Station,Indian Restaurant,Café,Yunnan Restaurant,Food,Field,Filipino Restaurant,Fish & Chips Shop,Flea Market,Flower Shop
142,Cashew,DT2,4,Bus Station,Food Court,Convenience Store,Seafood Restaurant,Bus Line,Park,Food Truck,Food Stand,French Restaurant,Food & Drink Shop
177,Bedok Reservoir,DT30,4,Bus Station,Steakhouse,Sculpture Garden,Noodle House,Park,Café,Coffee Shop,Asian Restaurant,Supermarket,Food Court
182,Upper Changi,DT34,4,Bus Station,Asian Restaurant,Playground,Park,Coffee Shop,Basketball Court,Food Stand,Food Court,Food & Drink Shop,Food Truck


## Discussion

Based on the results shown above, most of the clusters are already filled with food places in minimally the top 4 most common venues. As such we will have to determine what type of food business should we open.

In this first case, we can open western or italian cuisines around MRT Stations in cluster 2. As cluster 2 is heavuly populated with a lot of indian restaurants or chinese places. By opening an italian or western restaurant, we can offer a wider variety of food to these areas without facing much close substitutes.

In the second case, we can choose to open a food place with any cuisines around MRT Stations in cluster 5. Cluster 5 is predominantly filled with public transport services and small cafes or convenience stores. By opening a restaurant in this areas, we can provide an alternative to the small cafes or convenience stores.

The third case is to open a Cafe around MRT Stations in cluster 4. Cluster 4 is filled with many coffee shops and food courts. However, what they lack if ice cream shops, cafes or basically dessert places. We can consider having a dessert place in such areas.

Our fourth case is to open a restaurant around MRT Stations in cluster 1. Cluster 1 is mostly fast food restaurants or coffee shops. By offering restaurant alternatives, we can attract a subtantial amount of people who are looking for occasional higher quality food and prices.

Lastly, cluster 3 seems to be an outlier as there are no other clusters which has Flower Shop, Metro Station and Ice Cream Shop as the top 3 venue categories. As such we could explore having a food place such as a coffee shop to cater to food options in this area.

There are some interesting insights gathered through the process of analysing the data. Firstly, clusters with food court and coffee shops tend to take up a large area of Singapore. Next, although Singapore's regions are separated by housing, industrial and business districts, we can see that clusters do not follow such behaviour which is interesting as I would believe that business districts would have more restaurant and higher priced venues. However, we can see that the businesss district is split up into the various clusters which indicates taht is it not a reasonable grouping choice.

## Conclusion

In conclusion, the purpose of this project was to study the various MRT Stations to figure out possible options of food businesses to start up. In this aspect, we have managed to gather sufficient data and explore the data with data analysis and data visualisation. Through this process, we have offered a variety of choice to anyone who is interested in exploring any forms of business. 

Considering the business competition is only one of many factors that have to be taken into account when making a decision to start a business. For example there are also factors such as human traffic and rental prices that has to be considered before making a decison. We could possibly explore those 2 factors in future reports.