<h1> Capstone Project - The Battle of Neighborhoods </h1>

## Select the Location for a New Chinese Restaurant in Singapore

### Introduction
---
Due to the on-going COVID19 pandemic, restaurants are closed for dining and people are staying at home. This is the time when people start to miss the past days when they could eat out and have some get-togethers with friends and families.

With the wish for everything back to normal very soon, I choose topic number 2 and will try to find a good location for a Chinese restaurant in Singapore.

In this project, I will be selecting a location for a new Chinese Restaurant. The **Foursquare** API will be called to gather data for various places in Singapore.

First I explored 200m around Bedok, Singaore, where I live.
Using **geolocator**, the geographical coordinate of Bedok are found to be 1.3239765, 103.930216.
Using **folium**, we can plot the Bedok map shown below.
<img src="http://localhost:8899/files/Documents/Data_Science/Capstone/BedokMRT.png?_xsrf=2%7Cfe3111f6%7Ca59586ebef64e1e77c9188b4f9d74857%7C1588930656">

From the map it is clear that most of the venues are near the Bedok MRT staiton, so we will continue to explore all MRT stations in Singapore.

### Method

The list of MRT stations in Singapore is available on [wikipedia](https://en.wikipedia.org/wiki/List_of_Singapore_MRT_stations), and it contains all MRT stations including stations being built or planned to be built. The stations I am interested in are the MRT stations which are currently in operation, and those interchange stations.

The geography coordinates which are not included in the table can be obtained by searching using **geocoder** with all the station names.

With the station names and their coordinates, we can explore the interchange stations by calling **foursqure** API and get the most visited venues for each station. Finally we apply *kmeans clustering* and use **folium** to visualise the popular venue distributions among MRT interchange stations and select a area for a new Chinese restaurant.

### Data pre-processing
---
In this part all Singapore MRT names and locations will be obtained by an easy web scraping using **pandas.read_html**, then I will extract the interchange MRT stations, search their coordinates, finally **Foursquare** API will be called to get the most visted venues for all interchange MRT stations in Singapore, and a dataframe will be created with all the information.

First I got a list of MRT stations from wikipedia and transferred it into a dataframe.
![df](http://localhost:8899/files/Documents/Data_Science/Capstone/Screenshot%202020-05-11%20at%2010.55.45%20AM.png?_xsrf=2%7Cfe3111f6%7Ca59586ebef64e1e77c9188b4f9d74857%7C1588930656)

The raw data can be processed into a dataframe with the information we need: interchange stations and their coordinates.
![interchanged_merged](http://localhost:8899/files/Documents/Data_Science/Capstone/Screenshot%202020-05-11%20at%2011.06.58%20AM.png?_xsrf=2%7Cfe3111f6%7Ca59586ebef64e1e77c9188b4f9d74857%7C1588930656)

After exploring 200m around each MRT interchange station by making API calls with **Foursquare** developer account, we found 219 uniques categories among those MRT stations.


![singapore_venues](http://localhost:8899/files/Documents/Data_Science/Capstone/Screenshot%202020-05-11%20at%2011.12.37%20AM.png?_xsrf=2%7Cfe3111f6%7Ca59586ebef64e1e77c9188b4f9d74857%7C1588930656)

### Clustering and Data Analysis
---

The MRT stations and their top 10 most common venues can be found like below:

In [1]:
import pandas as pd
pd.read_csv('station_venues_sorted.csv')

Unnamed: 0.1,Unnamed: 0,Station,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,Bayfront,Boutique,Scenic Lookout,Garden,Hotel,Waterfront,Roof Deck,Lounge,Bridge,Accessories Store,Park
1,1,Bishan,Coffee Shop,Food Court,Bubble Tea Shop,Cosmetics Shop,Pet Store,Café,Japanese Restaurant,Chinese Restaurant,Ice Cream Shop,Supermarket
2,2,Bugis,Bakery,Café,Hotel,Cocktail Bar,Dessert Shop,Chinese Restaurant,Japanese Restaurant,Thai Restaurant,Coffee Shop,Sandwich Place
3,3,Bukit Panjang,Fast Food Restaurant,Coffee Shop,Shopping Mall,Noodle House,Sushi Restaurant,Asian Restaurant,Café,Fried Chicken Joint,Gym,Supermarket
4,4,Buona Vista,Japanese Restaurant,Indian Restaurant,Food Court,Chinese Restaurant,Shopping Mall,Café,Bakery,Dessert Shop,Coffee Shop,Performing Arts Venue
5,5,Chinatown,Chinese Restaurant,Food Court,Hostel,Vegetarian / Vegan Restaurant,Italian Restaurant,Spa,Café,French Restaurant,Japanese Restaurant,Beer Garden
6,6,Choa Chu Kang,Coffee Shop,Food Court,Portuguese Restaurant,Thai Restaurant,Playground,Fast Food Restaurant,Sandwich Place,Chinese Restaurant,Food Truck,Café
7,7,City Hall,Hotel,Shopping Mall,Japanese Restaurant,Coffee Shop,French Restaurant,Steakhouse,Event Space,Concert Hall,Bookstore,Cocktail Bar
8,8,Dhoby Ghaut,Hotel,Café,Park,Japanese Restaurant,Cosmetics Shop,History Museum,Bubble Tea Shop,Theater,Hobby Shop,Karaoke Bar
9,9,Expo,Café,Coffee Shop,Chinese Restaurant,Japanese Restaurant,Fast Food Restaurant,Sporting Goods Shop,Bar,Indian Restaurant,Food Court,Hotel


After running clustering for 5 clusters, a new dataframe with clustering labels, station names, station Chinese names, station coordinates, and 1-10 most common venues is created

In [3]:
pd.read_csv('singapore_stations.csv')

Unnamed: 0.1,Unnamed: 0,Station name,Station name in Chinese,Location,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,Jurong East,裕廊东,Jurong East,1.333115,103.742297,0,Chinese Restaurant,Coffee Shop,Food Court,Café,Japanese Restaurant,Shopping Mall,Steakhouse,Multiplex,Bubble Tea Shop,Clothing Store
1,1,Choa Chu Kang,蔡厝港,Choa Chu Kang,1.384749,103.744534,0,Coffee Shop,Food Court,Portuguese Restaurant,Thai Restaurant,Playground,Fast Food Restaurant,Sandwich Place,Chinese Restaurant,Food Truck,Café
2,2,Woodlands,兀兰,Woodlands,1.436897,103.786216,0,Café,Coffee Shop,Japanese Restaurant,Shopping Mall,Asian Restaurant,Fast Food Restaurant,Chinese Restaurant,Indian Restaurant,Electronics Store,Clothing Store
3,3,Bishan,碧山,Bishan,1.350986,103.848255,0,Coffee Shop,Food Court,Bubble Tea Shop,Cosmetics Shop,Pet Store,Café,Japanese Restaurant,Chinese Restaurant,Ice Cream Shop,Supermarket
4,4,Newton,纽顿,Newton,1.313183,103.83804,0,Chinese Restaurant,Seafood Restaurant,Italian Restaurant,Hotel Bar,Hotel,Convenience Store,Grocery Store,Gym / Fitness Center,Noodle House,Thai Restaurant
5,5,Dhoby Ghaut,多美歌,Museum Planning Area,1.299353,103.845309,3,Hotel,Café,Park,Japanese Restaurant,Cosmetics Shop,History Museum,Bubble Tea Shop,Theater,Hobby Shop,Karaoke Bar
6,6,City Hall,政府大厦,Downtown Core,1.293027,103.852643,3,Hotel,Shopping Mall,Japanese Restaurant,Coffee Shop,French Restaurant,Steakhouse,Event Space,Concert Hall,Bookstore,Cocktail Bar
7,7,Raffles Place,莱佛士坊,Downtown Core,1.283542,103.85146,3,Hotel,Café,Food Court,Gym,Cocktail Bar,Coffee Shop,Sandwich Place,Salad Place,Waterfront,Shopping Mall
8,8,Marina Bay,滨海湾,"Downtown Core, Straits View Planning Area",1.275559,103.854897,4,Yoga Studio,Harbor / Marina,Spanish Restaurant,Plaza,Building,Gastropub,Mexican Restaurant,Government Building,Seafood Restaurant,Bus Line
9,9,Tampines,淡滨尼,Tampines,1.354653,103.943571,0,Bakery,Café,Coffee Shop,Fast Food Restaurant,Supermarket,Gym,Sushi Restaurant,Chinese Restaurant,Asian Restaurant,Japanese Restaurant


We can see that Chinese Restaurant is the top most common venues for several MRT stations, and they are all under Cluster 0.

In [4]:
pd.read_csv('chinese_restaurants.csv')

Unnamed: 0.1,Unnamed: 0,Station name,Station name in Chinese,Location,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,Jurong East,裕廊东,Jurong East,1.333115,103.742297,0,Chinese Restaurant,Coffee Shop,Food Court,Café,Japanese Restaurant,Shopping Mall,Steakhouse,Multiplex,Bubble Tea Shop,Clothing Store
1,1,Newton,纽顿,Newton,1.313183,103.83804,0,Chinese Restaurant,Seafood Restaurant,Italian Restaurant,Hotel Bar,Hotel,Convenience Store,Grocery Store,Gym / Fitness Center,Noodle House,Thai Restaurant
2,2,HarbourFront,港湾,Bukit Merah,1.265395,103.822403,0,Chinese Restaurant,Japanese Restaurant,Fast Food Restaurant,Toy / Game Store,Clothing Store,Multiplex,Coffee Shop,Bakery,Malay Restaurant,Noodle House
3,3,Chinatown,牛车水,Outram,1.283737,103.843798,0,Chinese Restaurant,Food Court,Hostel,Vegetarian / Vegan Restaurant,Italian Restaurant,Spa,Café,French Restaurant,Japanese Restaurant,Beer Garden


And singapore_station.csv can be visualised on a map below:

![](http://localhost:8805/files/Documents/Data_Science/Capstone/ChinatownMRT.png?_xsrf=2%7Cfe3111f6%7Ca59586ebef64e1e77c9188b4f9d74857%7C1588930656)

<center><b>Clustering results visualisation(Chinatown)<center><b>

### Discussion
---
From the last dataframe *chinese_restaurant.csv*, we can see that **Jurong East** is a place people go shopping and dining, and on the map it is far from the city area, so it should be a very populated residential area(which is true!).

Cluster 3 is the city area, where venues are more close to one another. This is the place where the most tourists attractions are. It is also noticed that **Chinatown**, where Chinese Restaurant is the top venue is also located near that area.

It might be apparant to say that Chinatown should be selected as the new Chinese Restaurant location, and it is indeed a pace where many popular Chinese Restaurants are located. However, city area costs more investments, and from the chinese_restaurants.csv we can see that a Chinese Restaurant in Chinatown tends to have more compitations not only from Chinese restaurants, but also from food courts and other Asian Restaurants. 

Therefore I would apply *the Blue Ocean Strategy* and select **Jurong East** as the location to build a neighborhood Chinese Restaurant. **Jurong East** MRT interchange is a residential area with many shopping venues, Chinese Restaurants are popular there, and it is far from the compitations

![](http://localhost:8805/files/Documents/Data_Science/Capstone/JurongEastMRT.png?_xsrf=2%7Cfe3111f6%7Ca59586ebef64e1e77c9188b4f9d74857%7C1588930656)

<center><b>Clustering results visualisation(Jurong East)<center/>

### Conclusion
---
In this project, MRT stations in Singapore are gathered from [wikipedia](https://en.wikipedia.org/wiki/List_of_Singapore_MRT_stations) using **pandas.read_html**, all interchange stations are extracted and their coordinates are found. With those information, **Foursquare** API is called to explore 200ms around all the MRT interchange stations in Singapore, and their top 10 most common venues are found. Then new data frame is being created by filtering the MRT station where Chinese Restaurant is the top common venue. Finally clustering was applied on the MRT data all data is visualised on a map plotted by **folium**. With all the data analysed and visualised, a conclusion was drawn that the new Chinese Restaurant is to be located around **Jurong East** station.