

# Exploring the Neighborhoods in Singapore: Data Science in Real Life 

<h1 align=Justified><font size = 3> As a part of the final IBM Capstone Project, we get a tang of what data scientists go through in real life. Objectives of the final assignments were to define a business problem, look for data in the web and, use Foursquare location data to compare different districts within wards (municipalities) of Singapore (choice of city depends on the students) to figure out which neighborhood is suitable for starting a restaurant business(‘idea’ also depends on individual students). As prepared for the assignment, I go through the problem designing, data preparation and final analysis section step by step. Detailed codes and images are given in Github and link can be found at the end of the post. </font></h1>

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## 1. Discussion and Background of the Business Problem: 

### Problem Statement: Prospects of a Lunch Restaurant, Singapore.

<h1 align=justified><font size = 3>Singapore’s economic freedom score is 89.4, making its economy the 2nd freest in the 2019 Index. Its overall score has increased by 0.6 point, with increases in scores for trade freedom and government integrity outpacing modest declines in labor freedom and property rights. Singapore is ranked 2nd among 43 countries in the Asia–Pacific region, and its overall score is well above the regional and world averages.</font></h1>

<h1 align=justified><font size = 3> The aim of this project is to explore the areas of singapore and find the best place to open a breakfast cum lunch restuarant</font></h1>

### Target Audience

What type of clients or a group of people would be interested in this project?
    
    
1.Business personnel who wants to invest or open a restaurant. This analysis will be a comprehensive guide to start or expand restaurants targeting the large pool of office workers in singapore during lunch hours.
    
2.Freelancer who loves to have their own restaurant as a side business. This analysis will give an idea, how beneficial it is to open a restaurant and what are the pros and cons of this business.
    
3.New graduates, to find reasonable lunch/breakfast place close to office.
    
4.Budding Data Scientists, who want to implement some of the most used Exploratory Data Analysis techniques to obtain necessary data, analyze it, and, finally be able to tell a story out of it.

## 2. Data Preparation:

### 2.1. Get The Names of Wards, Major Districts and Population from Wikipedia 

In [2]:
from bs4 import BeautifulSoup

response_obj = requests.get('https://en.wikipedia.org/wiki/Planning_Areas_of_Singapore').text
print (type (response_obj))

<class 'str'>


In [3]:
soup = BeautifulSoup(response_obj,'lxml')
print (soup.prettify())


<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   Planning Areas of Singapore - Wikipedia
  </title>
  <script>
   document.documentElement.className=document.documentElement.className.replace(/(^|\s)client-nojs(\s|$)/,"$1client-js$2");RLCONF={"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Planning_Areas_of_Singapore","wgTitle":"Planning Areas of Singapore","wgCurRevisionId":899933485,"wgRevisionId":899933485,"wgArticleId":2224605,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Pages using deprecated image syntax","Urban planning in Singapore","Subdivisions of Singapore"],"wgBreakFrames":!1,"wgPageContentLanguage":"en","wgPageContentModel":"wikitext","wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","

### Processing the Information From Wiki To Make Necessary Lists


In [4]:
#pinpointing the location of the table and its contents
Wards_Singpore_Table = soup.find('table', class_ = 'wikitable sortable')
Wards_Singpore_Table

<table class="wikitable sortable">
<tbody><tr>
<th>Name <small>(<a href="/wiki/English_language" title="English language">English</a>)</small>
</th>
<th><a href="/wiki/Malay_language" title="Malay language">Malay</a>
</th>
<th><a href="/wiki/Chinese_language" title="Chinese language">Chinese</a>
</th>
<th><a href="/wiki/Pinyin" title="Pinyin">Pinyin</a>
</th>
<th><a href="/wiki/Tamil_language" title="Tamil language">Tamil</a>
</th>
<th>Region
</th>
<th>Area (km2)
</th>
<th>Population<sup class="reference" id="cite_ref-6"><a href="#cite_note-6">[6]</a></sup>
</th>
<th>Density (/km2)
</th></tr>
<tr>
<td><a href="/wiki/Ang_Mo_Kio" title="Ang Mo Kio">Ang Mo Kio</a>
</td>
<td>
</td>
<td>宏茂桥
</td>
<td>Hóng mào qiáo
</td>
<td>ஆங் மோ கியோ
</td>
<td><a href="/wiki/North-East_Region,_Singapore" title="North-East Region, Singapore">North-East</a>
</td>
<td>13.94
</td>
<td>165,710
</td>
<td>12,000
</td></tr>
<tr>
<td><a href="/wiki/Bedok" title="Bedok">Bedok</a>
</td>
<td>*
</td>
<td>勿洛
</td>
<td>

In [5]:
Name=[]
Region = []
Area = []
Population = []
Density = []

for row in Wards_Singpore_Table.findAll("tr"):
    #print (row)    
    Ward = row.findAll('td')
    #print (len(Ward))
   
    if len(Ward)==9: #Only extract table body not heading
        Name.append(Ward[0].find(text=True).rstrip())
        Region.append(Ward[5].find(text=True).rstrip())
        Area.append(Ward[6].find(text=True).rstrip())
        Population.append(Ward[7].find(text=True).rstrip())
        Density.append(Ward[8].find(text=True).rstrip())
            


In [6]:

Singapore_data=pd.DataFrame(Name,columns=['Name'])
Singapore_data['Region']=Region
Singapore_data['Area_SqKm']=Area
Singapore_data['Population']=Population
Singapore_data['Density_Per_SqKm']=Density
Singapore_data


Unnamed: 0,Name,Region,Area_SqKm,Population,Density_Per_SqKm
0,Ang Mo Kio,North-East,13.94,165710,12000
1,Bedok,East,21.69,281300,13000
2,Bishan,Central,7.62,88490,12000
3,Boon Lay,West,8.23,30,3.6
4,Bukit Batok,West,11.13,144410,13000
5,Bukit Merah,Central,14.34,151870,11000
6,Bukit Panjang,West,8.99,140820,16000
7,Bukit Timah,Central,17.53,77280,4400
8,Central Water Catchment,North,37.15,*,*
9,Changi,East,40.61,2080,62.3


In [7]:
Singapore_data_Final=pd.DataFrame(Name,columns=['Name'])
Singapore_data_Final['Region']=Region
Singapore_data_Final['Area_SqKm']=Area
Singapore_data_Final.index = np.arange(1, len(Singapore_data_Final) + 1) # reset the index so that it starts from 1. 

Singapore_data_Final

Unnamed: 0,Name,Region,Area_SqKm
1,Ang Mo Kio,North-East,13.94
2,Bedok,East,21.69
3,Bishan,Central,7.62
4,Boon Lay,West,8.23
5,Bukit Batok,West,11.13
6,Bukit Merah,Central,14.34
7,Bukit Panjang,West,8.99
8,Bukit Timah,Central,17.53
9,Central Water Catchment,North,37.15
10,Changi,East,40.61


Get the Coordinates of the Major Districts


In [8]:
from geopy.geocoders import Nominatim
geolocator = Nominatim()
Singapore_data_Final['Area_Name_Coord']= Singapore_data_Final['Name'].apply(geolocator.geocode).apply(lambda x: (x.latitude, x.longitude))

Singapore_data_Final

Unnamed: 0,Name,Region,Area_SqKm,Area_Name_Coord
1,Ang Mo Kio,North-East,13.94,"(1.369842, 103.8466086)"
2,Bedok,East,21.69,"(1.3239765, 103.930216)"
3,Bishan,Central,7.62,"(1.3514551, 103.8482628)"
4,Boon Lay,West,8.23,"(1.3456401, 103.7118018)"
5,Bukit Batok,West,11.13,"(1.3490572, 103.7495906)"
6,Bukit Merah,Central,14.34,"(4.5592879, 101.0255816)"
7,Bukit Panjang,West,8.99,"(1.377921, 103.7718658)"
8,Bukit Timah,Central,17.53,"(1.3546901, 103.7763724)"
9,Central Water Catchment,North,37.15,"(-33.55936435, 118.150468671534)"
10,Changi,East,40.61,"(36.8394346, 119.4013261)"


In [9]:
Singapore_data_Final[['Latitude', 'Longitude']] = Singapore_data_Final['Area_Name_Coord'].apply(pd.Series)
Singapore_data_Final

Unnamed: 0,Name,Region,Area_SqKm,Area_Name_Coord,Latitude,Longitude
1,Ang Mo Kio,North-East,13.94,"(1.369842, 103.8466086)",1.369842,103.846609
2,Bedok,East,21.69,"(1.3239765, 103.930216)",1.323976,103.930216
3,Bishan,Central,7.62,"(1.3514551, 103.8482628)",1.351455,103.848263
4,Boon Lay,West,8.23,"(1.3456401, 103.7118018)",1.34564,103.711802
5,Bukit Batok,West,11.13,"(1.3490572, 103.7495906)",1.349057,103.749591
6,Bukit Merah,Central,14.34,"(4.5592879, 101.0255816)",4.559288,101.025582
7,Bukit Panjang,West,8.99,"(1.377921, 103.7718658)",1.377921,103.771866
8,Bukit Timah,Central,17.53,"(1.3546901, 103.7763724)",1.35469,103.776372
9,Central Water Catchment,North,37.15,"(-33.55936435, 118.150468671534)",-33.559364,118.150469
10,Changi,East,40.61,"(36.8394346, 119.4013261)",36.839435,119.401326


In [10]:
Singapore_data_Final.drop(['Area_Name_Coord'], axis=1, inplace=True)
Singapore_data_Final

Unnamed: 0,Name,Region,Area_SqKm,Latitude,Longitude
1,Ang Mo Kio,North-East,13.94,1.369842,103.846609
2,Bedok,East,21.69,1.323976,103.930216
3,Bishan,Central,7.62,1.351455,103.848263
4,Boon Lay,West,8.23,1.34564,103.711802
5,Bukit Batok,West,11.13,1.349057,103.749591
6,Bukit Merah,Central,14.34,4.559288,101.025582
7,Bukit Panjang,West,8.99,1.377921,103.771866
8,Bukit Timah,Central,17.53,1.35469,103.776372
9,Central Water Catchment,North,37.15,-33.559364,118.150469
10,Changi,East,40.61,36.839435,119.401326


## We have the Dataframe with Coordinates 

But here we see problem with coordinates for some places like Bukit Merah,Central Water Catchment,Changi,Downtown Core,Mandai,Museum,Newton,North-Eastern Islands,Orchard,Outram,Pioneer,Queenstown,River Valley,Simpang,Tengah,Western Islands,Woodlands. So we need to replace them manually

Google search gives the values

Bukit Merah---1.2819054° N,103.8217113° E

Central Water Catchment---1.3552054° N,103.7950113° E

Changi---1.3450054° N,103.9810113° E

Downtown Core---1.2867054° N,103.8513113° E

Mandai---1.4260054° N,103.8219113° E

Museum---1.2960317° N,103.8424705° E

Newton---1.3076054° N,103.8382113° E

North-Eastern Islands---1.4064054° N,104.0301113° E

Orchard---1.3048054° N,103.8296113° E

Outram---1.2849054° N,103.8417113° E

Pioneer---1.33587865648° N,103.691660567° E

Queenstown---1.2942054° N,103.7839113° E

River Valley---1.2959054° N,103.8339113° E

Simpang---1.4443054° N,103.8406113° E

Tengah---1.3555054° N,103.7286113° E

Western Islands---1.2479054° N,103.6746113° E

Woodlands---1.4382054° N,103.7868113° E


In [11]:
index_to_change=[6,9,10,14,21,25,26,27,29,30,33,35,36,42,49,52,54]
latitude_to_change=[1.2819054,1.3552054,1.3450054,1.2867054,1.4260054,1.2960317,1.3076054,1.4064054,1.3048054,1.2849054,1.3358787,1.2942054,1.2959054,1.4443054,1.3555054,1.2479054,1.4382054]
longitude_to_change=[103.8217113,103.7950113,103.9810113,103.8513113,103.8219113,103.8424705,103.8382113,104.0301113,103.8296113,103.8417113,103.691661,103.7839113,103.8339113,103.8406113,103.7286113,103.6746113,103.7868113]

for i in range(len(index_to_change)):
    print('Before',Singapore_data_Final.loc[index_to_change[i],'Latitude'],Singapore_data_Final.loc[index_to_change[i],'Longitude'])
    Singapore_data_Final.loc[index_to_change[i],'Latitude']= latitude_to_change[i]
    Singapore_data_Final.loc[index_to_change[i],'Longitude']= longitude_to_change[i]
    print('After',Singapore_data_Final.loc[index_to_change[i],'Latitude'],Singapore_data_Final.loc[index_to_change[i],'Longitude'])
    
    

Before 4.5592879 101.0255816
After 1.2819054 103.8217113
Before -33.55936435 118.150468671534
After 1.3552054 103.7950113
Before 36.8394346 119.4013261
After 1.3450054 103.9810113
Before 53.5414274 -113.5004341
After 1.2867054 103.8513113
Before -5.0017688 -45.6112044
After 1.4260054 103.8219113
Before 42.6061514 -2.2262371
After 1.2960317 103.8424705
Before 42.3370414 -71.2092214
After 1.3076054 103.8382113
Before 23.85945265 58.0986578716435
After 1.4064054 104.0301113
Before 37.3949088 -121.9343425
After 1.3048054 103.8296113
Before 44.9537444 -65.2139856
After 1.2849054 103.8417113
Before 38.4318551 -120.5718719
After 1.3358787 103.691661
Before -45.0317203 168.6608096
After 1.2942054 103.7839113
Before 48.0202386 -95.7822406
After 1.2959054 103.8339113
Before -7.4855618 112.5971634
After 1.4443054 103.8406113
Before -5.15 132.016667
After 1.3555054 103.7286113
Before 45.0473535 -80.3432823
After 1.2479054 103.6746113
Before 30.1734194 -95.504686
After 1.4382054 103.7868113


### Final Data-Frame with Coordinates of the Major District

In [12]:
Singapore_data_Final

Unnamed: 0,Name,Region,Area_SqKm,Latitude,Longitude
1,Ang Mo Kio,North-East,13.94,1.369842,103.846609
2,Bedok,East,21.69,1.323976,103.930216
3,Bishan,Central,7.62,1.351455,103.848263
4,Boon Lay,West,8.23,1.34564,103.711802
5,Bukit Batok,West,11.13,1.349057,103.749591
6,Bukit Merah,Central,14.34,1.281905,103.821711
7,Bukit Panjang,West,8.99,1.377921,103.771866
8,Bukit Timah,Central,17.53,1.35469,103.776372
9,Central Water Catchment,North,37.15,1.355205,103.795011
10,Changi,East,40.61,1.345005,103.981011


## Conclusion 
### 1st Week: Description of Problem and Data Preparation 
We get the Initial Data-Frame with Names of regions, and corresponding areas in those regions 
and the coordinates of those areas. Before comparing all the areas, since we want to concentrate only on lunch restaurants targeting the office workers, we need to get the idea about the best business areas in Singapore.
So as the next step we will use Foursquare data and obtain information on restaurants. With these, we can start with our battle of neighborhoods for opening a restaurant in Singapore