# Capstone Project - The Battle of the Neighborhoods

### Applied Data Science Capstone by IBM/Coursera

## Table of Contents

* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Result and Discussion](#result)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In the project, we try to know weather specific streetlook will affect the realestate price in **Taipei**, Taiwan.

Since Taipei is a **Mixed Residential Commercial City**, there are no obvious boundry between the commercial area and residential area. Also, people lives in Taipei tend to rent or buy house near the central of the city.

To find if there are some similarities or relationship between the neighborhoods and realestate price, we use the data science to seperate the same type of district and compare with the realestate price.

## Data <a name="data"></a>

Based on our business problem, factors that we will need are:
* Neighborhood looklike of districts
* Realestate price distribution

To conduct our research, the data source are below:
* Neighborhood looklike will be extract from the **Foursquare API**
* Divisions in Taipei will be extract from **Wikipedia**
* The coordinates of each district and the realestate price distribution are from the **Open Source Database of Gorvernment**

### Get a table of all district in Taipei

In [1]:
# Import Libraries
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import json
import requests
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors

In [2]:
# Get the data from wiki
wiki_url = requests.get('https://en.wikipedia.org/wiki/Postal_codes_in_Taiwan').text

# Parse the html data using BeautifulSoup
soup = BeautifulSoup(wiki_url, 'html.parser')

# Find all tables
tables = soup.find_all('table')

taipei_postalcode = tables[1] # Taipei is the first table
taipei_postalcode

<table class="wikitable">
<tbody><tr>
<th>Code</th>
<th>Division name</th>
<th>Chinese
</th></tr>
<tr>
<th colspan="3"><a href="/wiki/Taipei" title="Taipei">Taipei City</a>
</th></tr>
<tr>
<td>100</td>
<td><a href="/wiki/Zhongzheng_District" title="Zhongzheng District">Zhongzheng District</a></td>
<td>中正區
</td></tr>
<tr>
<td>103</td>
<td><a href="/wiki/Datong_District,_Taipei" title="Datong District, Taipei">Datong District</a></td>
<td>大同區
</td></tr>
<tr>
<td>104</td>
<td><a href="/wiki/Zhongshan_District,_Taipei" title="Zhongshan District, Taipei">Zhongshan District</a></td>
<td>中山區
</td></tr>
<tr>
<td>105</td>
<td><a href="/wiki/Songshan_District,_Taipei" title="Songshan District, Taipei">Songshan District</a></td>
<td>松山區
</td></tr>
<tr>
<td>106</td>
<td><a href="/wiki/Daan_District,_Taipei" title="Daan District, Taipei">Daan District</a></td>
<td>大安區
</td></tr>
<tr>
<td>108</td>
<td><a href="/wiki/Wanhua_District" title="Wanhua District">Wanhua District</a></td>
<td>萬華區
</td></tr>

In [3]:
# Create a dataframe
columns = ['PostalCode', 'Division', 'Chinese_Name']
df_taipei = pd.DataFrame(columns = columns)
df_taipei

Unnamed: 0,PostalCode,Division,Chinese_Name


In [4]:
for row in taipei_postalcode.find_all('tr'):
    col = row.find_all('td')
    if col != []:
        postalcode = col[0].text
        division = col[1].text
        chinese_name = col[2].text.strip('\n')
        df_taipei = df_taipei.append({'PostalCode': postalcode, 'Division': division, 'Chinese_Name': chinese_name}, ignore_index = True)
    
df_taipei

Unnamed: 0,PostalCode,Division,Chinese_Name
0,100,Zhongzheng District,中正區
1,103,Datong District,大同區
2,104,Zhongshan District,中山區
3,105,Songshan District,松山區
4,106,Daan District,大安區
5,108,Wanhua District,萬華區
6,110,Xinyi District,信義區
7,111,Shilin District,士林區
8,112,Beitou District,北投區
9,114,Neihu District,內湖區


We get all the 11 districts in Taipei City

### The coordinate of the center of each district

In [5]:
# Load coordinate data
coor_path = '/Users/Brian/Python/IBM Data Science Certificate/Capstone_Project/tp_coor_data.csv'

df_tp = pd.read_csv(coor_path)
df_tp

Unnamed: 0,PostalCode,Division,Chinese_Name,Latitude,Longitude
0,100,Zhongzheng District,中正區,25.032405,121.519884
1,103,Datong District,大同區,25.063424,121.513042
2,104,Zhongshan District,中山區,25.069699,121.53816
3,105,Songshan District,松山區,25.059991,121.557588
4,106,Daan District,大安區,25.02677,121.543445
5,108,Wanhua District,萬華區,25.02859,121.497986
6,110,Xinyi District,信義區,25.030621,121.57167
7,111,Shilin District,士林區,25.125467,121.550847
8,112,Beitou District,北投區,25.148068,121.517799
9,114,Neihu District,內湖區,25.083706,121.592383


### Generate a map of Taipei City

Use Geopy library to get the coordinate of Taipei

In [6]:
address = 'Taipei, Taiwan'

geolocator = Nominatim(user_agent = 'tpe_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The coordinate of Taipei is ({}. {})'.format(latitude, longitude))

The coordinate of Taipei is (25.0375198. 121.5636796)


Create a map of Taipei with divisions on top

In [7]:
# Create map using coordinates of Taipei
map_tpe = folium.Map(location = [latitude, longitude], zoom_start = 11)

# Add markers on the map
for lat, lng, division in zip(df_tp['Latitude'], df_tp['Longitude'], df_tp['Division']):
    label = '{}'.format(division)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker([lat, lng], radius = 5, popup = label, color = 'blue', fill = True, fill_color = '#3186cc', fill_opacity = 0.7, parse_html = False).add_to(map_tpe)
    
map_tpe

### Data of the realestate price distribution

In [41]:
# Import the 2021H1 realestate table of Taipei City
realestate_path = '/Users/Brian/Python/IBM Data Science Certificate/Capstone_Project/2021_H1_realestate_data.csv'

df_tp_realestate = pd.read_csv(realestate_path)
df_tp_realestate

Unnamed: 0,PostalCode,Division,Chinese_Name,500,501~1000,1001~1500,1501~2000,2001~2500,2501~3000,3001~4000,4001~5000,5001~7000,7001~9000,9001~12000,12001
0,100,Zhongzheng District,中正區,18,51,80,60,59,40,52,22,24,10,3,7
1,103,Datong District,大同區,31,71,62,50,37,25,21,13,12,1,1,1
2,104,Zhongshan District,中山區,57,219,245,151,119,75,66,45,63,14,13,21
3,105,Songshan District,松山區,12,45,81,93,76,44,64,31,12,4,1,2
4,106,Daan District,大安區,16,31,92,92,99,89,141,61,71,23,12,10
5,108,Wanhua District,萬華區,56,128,89,72,31,21,26,14,1,1,0,1
6,110,Xinyi District,信義區,22,38,100,122,90,54,42,17,22,6,7,17
7,111,Shilin District,士林區,23,64,143,124,61,47,43,24,31,2,2,7
8,112,Beitou District,北投區,30,125,142,108,55,35,50,26,21,3,3,1
9,114,Neihu District,內湖區,6,106,223,215,130,96,58,56,75,12,8,14


## Methodology <a name="methodology"></a>

First, we use the Foursquare API to determine the neighborhood looklike in our districts in Taipei, using frequency as factor to calculate.

And we use machine learning clustering to find the similar community.

Then we use the clustering to cluster the realestate price distribution and get another cluster.

Compare these 2 clusters to find the similar districts.

## Analysis <a name="analysis"></a>

### Explore the neighborhood of each districts

In [8]:
Client_ID = 'RJXDLADYUPSUOVK2FCAJAOKZCRZBOMK4F2CIRBJ30OIXJRLK'
Client_Secret = 'GLUI5A5ZIGN0410G2NFPVF0VNEWDDSX4DTR15QGMDFJIP24T'
Version = '20190425'

Create a function to explore neighborhoods in Taipei.

In [9]:
def getNearbyVenues(names, latitudes, longitudes, radius = 500):
    
    venues_list = []
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        # Create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            Client_ID,
            Client_Secret,
            Version,
            lat,
            lng,
            radius,
            Limit)
        
        # Make a GET request
        result = requests.get(url).json()['response']['groups'][0]['items']
        
        # Return only relevant information for each nearby venue
        venues_list.append([(name,
                             lat,
                             lng,
                             v['venue']['name'],
                             v['venue']['location']['lat'],
                             v['venue']['location']['lng'],
                             v['venue']['categories'][0]['name']) for v in result])
    
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Division', 'Division Latitude', 'Division Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
    
    return (nearby_venues)

In [10]:
Limit = 100
taipei_venues = getNearbyVenues(names = df_tp['Division'],
                                 latitudes = df_tp['Latitude'],
                                 longitudes = df_tp['Longitude'])

Zhongzheng District
Datong District
Zhongshan District
Songshan District
Daan District
Wanhua District
Xinyi District
Shilin District
Beitou District
Neihu District
Nangang District
Wenshan District


In [53]:
taipei_venues

Unnamed: 0,Division,Division Latitude,Division Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Zhongzheng District,25.032405,121.519884,Kinfen Braised Pork Rice (金峰魯肉飯),25.032194,121.518534,Taiwanese Restaurant
1,Zhongzheng District,25.032405,121.519884,虎記商行,25.031744,121.519284,Café
2,Zhongzheng District,25.032405,121.519884,Chiang Kai-Shek Memorial Hall (中正紀念堂),25.034555,121.521835,Monument / Landmark
3,Zhongzheng District,25.032405,121.519884,National Theater (國家戲劇院),25.035197,121.518188,Theater
4,Zhongzheng District,25.032405,121.519884,臻味赤肉胡椒餅 烤地瓜,25.033022,121.518246,Bakery
...,...,...,...,...,...,...,...
242,Wenshan District,24.988579,121.573608,581生活館,24.987399,121.576514,Grocery Store
243,Wenshan District,24.988579,121.573608,玉口香扁食園 (政大店),24.987354,121.576854,Noodle House
244,Wenshan District,24.988579,121.573608,木新路(北平)餡餅粥,24.986048,121.570673,Chinese Restaurant
245,Wenshan District,24.988579,121.573608,道南河堤公園,24.992927,121.573856,Playground


Analyze each neighborhood

In [13]:
# Onehot encoding
tpe_onehot = pd.get_dummies(taipei_venues[['Venue Category']], prefix = '', prefix_sep = '')

# Add division column back to df
tpe_onehot['Division'] = taipei_venues['Division']

# Move division column to the first column
fixed_columns = [tpe_onehot.columns[-1]] + list(tpe_onehot.columns[:-1])
tpe_onehot = tpe_onehot[fixed_columns]

tpe_onehot.head(10)

Unnamed: 0,Division,Airport Terminal,American Restaurant,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,...,Southern / Soul Food Restaurant,Sporting Goods Shop,Supermarket,Sushi Restaurant,Taiwanese Restaurant,Tea Room,Theater,Trail,Tunnel,Vegetarian / Vegan Restaurant
0,Zhongzheng District,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
1,Zhongzheng District,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Zhongzheng District,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Zhongzheng District,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
4,Zhongzheng District,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
5,Zhongzheng District,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
6,Zhongzheng District,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,Zhongzheng District,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,Zhongzheng District,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,Zhongzheng District,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


Group rows by division and take the mean of frequency of occurrence of each category

In [14]:
tpe_grouped = tpe_onehot.groupby('Division').mean().reset_index()
print(tpe_grouped.shape)
tpe_grouped

(11, 80)


Unnamed: 0,Division,Airport Terminal,American Restaurant,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,...,Southern / Soul Food Restaurant,Sporting Goods Shop,Supermarket,Sushi Restaurant,Taiwanese Restaurant,Tea Room,Theater,Trail,Tunnel,Vegetarian / Vegan Restaurant
0,Daan District,0.0,0.0,0.025641,0.0,0.025641,0.0,0.0,0.025641,0.0,...,0.0,0.0,0.0,0.0,0.076923,0.025641,0.0,0.0,0.0,0.025641
1,Datong District,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,...,0.0,0.0,0.04,0.0,0.16,0.08,0.0,0.0,0.0,0.0
2,Nangang District,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Neihu District,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.0,0.0
4,Shilin District,0.0,0.0,0.0,0.0,0.666667,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0
5,Songshan District,0.018519,0.0,0.0,0.018519,0.0,0.0,0.0,0.037037,0.0,...,0.0,0.018519,0.0,0.0,0.055556,0.018519,0.0,0.0,0.0,0.0
6,Wanhua District,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,...,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Wenshan District,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,...,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Xinyi District,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.05,0.0,...,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.05,0.0
9,Zhongshan District,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,...,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0


Print each Division along with the top 5 most common venues

In [15]:
num_top_venues = 5

for hood in tpe_grouped['Division']:
    print('----' +hood+ '----')
    temp = tpe_grouped[tpe_grouped['Division'] == hood].T.reset_index()
    temp.columns = ['venue', 'freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending = False).reset_index(drop = True).head(num_top_venues))
    print('\n')

----Daan District----
                  venue  freq
0           Coffee Shop  0.13
1                  Café  0.10
2          Noodle House  0.08
3  Taiwanese Restaurant  0.08
4        Breakfast Spot  0.05


----Datong District----
                  venue  freq
0  Taiwanese Restaurant  0.16
1     Convenience Store  0.12
2          Dessert Shop  0.08
3    Chinese Restaurant  0.08
4          Noodle House  0.08


----Nangang District----
                 venue  freq
0    Convenience Store  0.50
1          Bus Station  0.25
2          Supermarket  0.25
3     Airport Terminal  0.00
4  Monument / Landmark  0.00


----Neihu District----
                 venue  freq
0  Japanese Restaurant  0.15
1    Convenience Store  0.15
2             Tea Room  0.07
3          Coffee Shop  0.07
4    Korean Restaurant  0.04


----Shilin District----
                   venue  freq
0       Asian Restaurant  0.67
1                  Trail  0.33
2       Airport Terminal  0.00
3          Metro Station  0.00
4  Performi

In [16]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending = False)
    
    return row_categories_sorted.index.values[0: num_top_venues]

In [17]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# Create columns according to number of top venues
columns = ['Division']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))
        
# Create a new dataframe
tpe_division_sorted = pd.DataFrame(columns = columns)
tpe_division_sorted['Division'] = tpe_grouped['Division']

for ind in np.arange(tpe_grouped.shape[0]):
    tpe_division_sorted.iloc[ind, 1:] = return_most_common_venues(tpe_grouped.iloc[ind, :], num_top_venues)
    
tpe_division_sorted

Unnamed: 0,Division,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Daan District,Coffee Shop,Café,Noodle House,Taiwanese Restaurant,Breakfast Spot,Farmers Market,Chinese Breakfast Place,Pizza Place,Fast Food Restaurant,Hotpot Restaurant
1,Datong District,Taiwanese Restaurant,Convenience Store,Dessert Shop,Chinese Restaurant,Noodle House,Tea Room,Coffee Shop,Bus Stop,Bus Station,Night Market
2,Nangang District,Convenience Store,Bus Station,Supermarket,Airport Terminal,Monument / Landmark,Performing Arts Venue,Pastry Shop,Park,Noodle House,Night Market
3,Neihu District,Japanese Restaurant,Convenience Store,Tea Room,Coffee Shop,Korean Restaurant,Department Store,Pharmacy,Italian Restaurant,Chinese Restaurant,Farmers Market
4,Shilin District,Asian Restaurant,Trail,Airport Terminal,Metro Station,Performing Arts Venue,Pastry Shop,Park,Noodle House,Night Market,Mountain
5,Songshan District,Café,Convenience Store,Dumpling Restaurant,Italian Restaurant,Taiwanese Restaurant,Furniture / Home Store,Noodle House,Park,Bakery,Clothing Store
6,Wanhua District,Convenience Store,Coffee Shop,Bus Station,Bakery,Supermarket,Airport Terminal,Mountain,Performing Arts Venue,Pastry Shop,Park
7,Wenshan District,Convenience Store,Coffee Shop,Chinese Restaurant,Noodle House,Café,Japanese Restaurant,Burger Joint,Breakfast Spot,Playground,Italian Restaurant
8,Xinyi District,Park,Scenic Lookout,Chinese Restaurant,Mountain,Convenience Store,Baseball Field,Coffee Shop,Café,Southern / Soul Food Restaurant,Tunnel
9,Zhongshan District,Convenience Store,Fish Market,Farmers Market,BBQ Joint,Sushi Restaurant,Market,Scenic Lookout,Soup Place,Seafood Restaurant,Monument / Landmark


### Cluster Districts

In [18]:
# Set number of clusters
k = 5

tpe_cluster = tpe_grouped.drop('Division', axis = 1)

# Run k-means clustering
kmeans = KMeans(n_clusters = k, random_state = 0).fit(tpe_cluster)

In [19]:
# Add cluster labels
tpe_division_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

tpe_merged = df_tp

# Merge the 2 dataframe
tpe_merged = tpe_merged.join(tpe_division_sorted.set_index('Division'), on = 'Division')

# Drop the rows without cluster labels
tpe_merged = tpe_merged.dropna()
tpe_merged['Cluster Labels'] = tpe_merged['Cluster Labels'].astype(int)
tpe_merged

Unnamed: 0,PostalCode,Division,Chinese_Name,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,100,Zhongzheng District,中正區,25.032405,121.519884,0,Café,Dumpling Restaurant,Breakfast Spot,Theater,Bakery,History Museum,Plaza,Concert Hall,Dim Sum Restaurant,Performing Arts Venue
1,103,Datong District,大同區,25.063424,121.513042,3,Taiwanese Restaurant,Convenience Store,Dessert Shop,Chinese Restaurant,Noodle House,Tea Room,Coffee Shop,Bus Stop,Bus Station,Night Market
2,104,Zhongshan District,中山區,25.069699,121.53816,2,Convenience Store,Fish Market,Farmers Market,BBQ Joint,Sushi Restaurant,Market,Scenic Lookout,Soup Place,Seafood Restaurant,Monument / Landmark
3,105,Songshan District,松山區,25.059991,121.557588,0,Café,Convenience Store,Dumpling Restaurant,Italian Restaurant,Taiwanese Restaurant,Furniture / Home Store,Noodle House,Park,Bakery,Clothing Store
4,106,Daan District,大安區,25.02677,121.543445,0,Coffee Shop,Café,Noodle House,Taiwanese Restaurant,Breakfast Spot,Farmers Market,Chinese Breakfast Place,Pizza Place,Fast Food Restaurant,Hotpot Restaurant
5,108,Wanhua District,萬華區,25.02859,121.497986,4,Convenience Store,Coffee Shop,Bus Station,Bakery,Supermarket,Airport Terminal,Mountain,Performing Arts Venue,Pastry Shop,Park
6,110,Xinyi District,信義區,25.030621,121.57167,3,Park,Scenic Lookout,Chinese Restaurant,Mountain,Convenience Store,Baseball Field,Coffee Shop,Café,Southern / Soul Food Restaurant,Tunnel
7,111,Shilin District,士林區,25.125467,121.550847,1,Asian Restaurant,Trail,Airport Terminal,Metro Station,Performing Arts Venue,Pastry Shop,Park,Noodle House,Night Market,Mountain
9,114,Neihu District,內湖區,25.083706,121.592383,3,Japanese Restaurant,Convenience Store,Tea Room,Coffee Shop,Korean Restaurant,Department Store,Pharmacy,Italian Restaurant,Chinese Restaurant,Farmers Market
10,115,Nangang District,南港區,25.036009,121.609757,4,Convenience Store,Bus Station,Supermarket,Airport Terminal,Monument / Landmark,Performing Arts Venue,Pastry Shop,Park,Noodle House,Night Market


Put the cluster labels back to the map

In [20]:
# Create map
tpe_map_clusters = folium.Map(location = [latitude, longitude], zoom_start = 11)

# Set color scheme for clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
color_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in color_array]

# Add marker on the map
markers_color = []
for lat, lon, poi, cluster in zip(tpe_merged['Latitude'], tpe_merged['Longitude'], tpe_merged['Division'], tpe_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' ,Cluster ' + str(cluster), parse_html = True)
    folium.CircleMarker([lat, lon], radius = 5, popup = label, color = rainbow[cluster - 1], fill = True, fill_color = rainbow[cluster - 1], fill_opacity = 0.7).add_to(tpe_map_clusters)
    
tpe_map_clusters

### Explore the feature category of each cluster

In [21]:
tpe_merged.loc[tpe_merged['Cluster Labels'] == 0, tpe_merged.columns[[1] + list(range(6, tpe_merged.shape[1]))]]

Unnamed: 0,Division,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Zhongzheng District,Café,Dumpling Restaurant,Breakfast Spot,Theater,Bakery,History Museum,Plaza,Concert Hall,Dim Sum Restaurant,Performing Arts Venue
3,Songshan District,Café,Convenience Store,Dumpling Restaurant,Italian Restaurant,Taiwanese Restaurant,Furniture / Home Store,Noodle House,Park,Bakery,Clothing Store
4,Daan District,Coffee Shop,Café,Noodle House,Taiwanese Restaurant,Breakfast Spot,Farmers Market,Chinese Breakfast Place,Pizza Place,Fast Food Restaurant,Hotpot Restaurant


We've got Zhongzheng, Songshan and Daan District in Cluster 0

In [22]:
tpe_merged.loc[tpe_merged['Cluster Labels'] == 1, tpe_merged.columns[[1] + list(range(6, tpe_merged.shape[1]))]]

Unnamed: 0,Division,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Shilin District,Asian Restaurant,Trail,Airport Terminal,Metro Station,Performing Arts Venue,Pastry Shop,Park,Noodle House,Night Market,Mountain


We've got Shilin District in Cluster 1

In [23]:
tpe_merged.loc[tpe_merged['Cluster Labels'] == 2, tpe_merged.columns[[1] + list(range(6, tpe_merged.shape[1]))]]

Unnamed: 0,Division,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Zhongshan District,Convenience Store,Fish Market,Farmers Market,BBQ Joint,Sushi Restaurant,Market,Scenic Lookout,Soup Place,Seafood Restaurant,Monument / Landmark


We've got Zhongsheng District in Cluster 2

In [24]:
tpe_merged.loc[tpe_merged['Cluster Labels'] == 3, tpe_merged.columns[[1] + list(range(6, tpe_merged.shape[1]))]]

Unnamed: 0,Division,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Datong District,Taiwanese Restaurant,Convenience Store,Dessert Shop,Chinese Restaurant,Noodle House,Tea Room,Coffee Shop,Bus Stop,Bus Station,Night Market
6,Xinyi District,Park,Scenic Lookout,Chinese Restaurant,Mountain,Convenience Store,Baseball Field,Coffee Shop,Café,Southern / Soul Food Restaurant,Tunnel
9,Neihu District,Japanese Restaurant,Convenience Store,Tea Room,Coffee Shop,Korean Restaurant,Department Store,Pharmacy,Italian Restaurant,Chinese Restaurant,Farmers Market
11,Wenshan District,Convenience Store,Coffee Shop,Chinese Restaurant,Noodle House,Café,Japanese Restaurant,Burger Joint,Breakfast Spot,Playground,Italian Restaurant


We've got Datong, Xinyi ,Neihu and Wenshan District in Cluster 3

In [25]:
tpe_merged.loc[tpe_merged['Cluster Labels'] == 4, tpe_merged.columns[[1] + list(range(6, tpe_merged.shape[1]))]]

Unnamed: 0,Division,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Wanhua District,Convenience Store,Coffee Shop,Bus Station,Bakery,Supermarket,Airport Terminal,Mountain,Performing Arts Venue,Pastry Shop,Park
10,Nangang District,Convenience Store,Bus Station,Supermarket,Airport Terminal,Monument / Landmark,Performing Arts Venue,Pastry Shop,Park,Noodle House,Night Market


We've got Wanhua and Nangang District in Cluster 4

### Compare to the realestate price distribution

In [42]:
# Calculate the sum of each divisions
df_tp_realestate['Division_num'] = df_tp_realestate.iloc[:, 3:].sum(axis = 1)

In [43]:
df_tp_realestate['Division_total_price'] = (df_tp_realestate.iloc[:,3]*500 + df_tp_realestate.iloc[:,4]*(501+1000)/2 + df_tp_realestate.iloc[:,5]*(1001+1500)/2 + df_tp_realestate.iloc[:,6]*(1501+2000)/2 +
                                          df_tp_realestate.iloc[:,7]*(2001+2500)/2 + df_tp_realestate.iloc[:,8]*(2501+3000)/2 + df_tp_realestate.iloc[:,9]*(3001+4000)/2 + df_tp_realestate.iloc[:,10]*(4001+5000)/2 +
                                          df_tp_realestate.iloc[:,11]*(5001+7000)/2 + df_tp_realestate.iloc[:,12]*(7001+9000)/2 + df_tp_realestate.iloc[:,13]*(9001+12000)/2 + df_tp_realestate.iloc[:,14]*12001)

In [44]:
df_tp_realestate['Division_avg_price'] = df_tp_realestate['Division_total_price'] / df_tp_realestate['Division_num']

In [45]:
df_tp_realestate

Unnamed: 0,PostalCode,Division,Chinese_Name,500,501~1000,1001~1500,1501~2000,2001~2500,2501~3000,3001~4000,4001~5000,5001~7000,7001~9000,9001~12000,12001,Division_num,Division_total_price,Division_avg_price
0,100,Zhongzheng District,中正區,18,51,80,60,59,40,52,22,24,10,3,7,426,1115707.5,2619.03169
1,103,Datong District,大同區,31,71,62,50,37,25,21,13,12,1,1,1,325,620397.5,1908.915385
2,104,Zhongshan District,中山區,57,219,245,151,119,75,66,45,63,14,13,21,1088,2549776.0,2343.544118
3,105,Songshan District,松山區,12,45,81,93,76,44,64,31,12,4,1,2,465,1097977.5,2361.241935
4,106,Daan District,大安區,16,31,92,92,99,89,141,61,71,23,12,10,737,2399115.5,3255.244912
5,108,Wanhua District,萬華區,56,128,89,72,31,21,26,14,1,1,0,1,440,668942.5,1520.323864
6,110,Xinyi District,信義區,22,38,100,122,90,54,42,17,22,6,7,17,537,1410266.0,2626.193669
7,111,Shilin District,士林區,23,64,143,124,61,47,43,24,31,2,2,7,571,1287527.5,2254.864273
8,112,Beitou District,北投區,30,125,142,108,55,35,50,26,21,3,3,1,599,1181035.0,1971.677796
9,114,Neihu District,內湖區,6,106,223,215,130,96,58,56,75,12,8,14,999,2547503.5,2550.053554


### Cluster by the avg realestate price distribution

In [46]:
# Set number of clusters
k = 5

tpe_price_cluster = df_tp_realestate.drop(['PostalCode', 'Division', 'Chinese_Name', 'Division_num', 'Division_total_price', 'Division_avg_price'], axis = 1)

# Run k-means clustering
kmeans = KMeans(n_clusters = k, random_state = 0).fit(tpe_price_cluster)

In [47]:
# Add cluster labels
df_tp_realestate.insert(0, 'Cluster Labels', kmeans.labels_)
df_tp_realestate

Unnamed: 0,Cluster Labels,PostalCode,Division,Chinese_Name,500,501~1000,1001~1500,1501~2000,2001~2500,2501~3000,3001~4000,4001~5000,5001~7000,7001~9000,9001~12000,12001,Division_num,Division_total_price,Division_avg_price
0,2,100,Zhongzheng District,中正區,18,51,80,60,59,40,52,22,24,10,3,7,426,1115707.5,2619.03169
1,2,103,Datong District,大同區,31,71,62,50,37,25,21,13,12,1,1,1,325,620397.5,1908.915385
2,3,104,Zhongshan District,中山區,57,219,245,151,119,75,66,45,63,14,13,21,1088,2549776.0,2343.544118
3,2,105,Songshan District,松山區,12,45,81,93,76,44,64,31,12,4,1,2,465,1097977.5,2361.241935
4,0,106,Daan District,大安區,16,31,92,92,99,89,141,61,71,23,12,10,737,2399115.5,3255.244912
5,2,108,Wanhua District,萬華區,56,128,89,72,31,21,26,14,1,1,0,1,440,668942.5,1520.323864
6,1,110,Xinyi District,信義區,22,38,100,122,90,54,42,17,22,6,7,17,537,1410266.0,2626.193669
7,1,111,Shilin District,士林區,23,64,143,124,61,47,43,24,31,2,2,7,571,1287527.5,2254.864273
8,1,112,Beitou District,北投區,30,125,142,108,55,35,50,26,21,3,3,1,599,1181035.0,1971.677796
9,4,114,Neihu District,內湖區,6,106,223,215,130,96,58,56,75,12,8,14,999,2547503.5,2550.053554


### Explore each cluster

In [48]:
df_tp_realestate.loc[df_tp_realestate['Cluster Labels'] == 0, df_tp_realestate.columns[[2] + list(range(6, df_tp_realestate.shape[1]))]]

Unnamed: 0,Division,1001~1500,1501~2000,2001~2500,2501~3000,3001~4000,4001~5000,5001~7000,7001~9000,9001~12000,12001,Division_num,Division_total_price,Division_avg_price
4,Daan District,92,92,99,89,141,61,71,23,12,10,737,2399115.5,3255.244912


In [49]:
df_tp_realestate.loc[df_tp_realestate['Cluster Labels'] == 1, df_tp_realestate.columns[[2] + list(range(6, df_tp_realestate.shape[1]))]]

Unnamed: 0,Division,1001~1500,1501~2000,2001~2500,2501~3000,3001~4000,4001~5000,5001~7000,7001~9000,9001~12000,12001,Division_num,Division_total_price,Division_avg_price
6,Xinyi District,100,122,90,54,42,17,22,6,7,17,537,1410266.0,2626.193669
7,Shilin District,143,124,61,47,43,24,31,2,2,7,571,1287527.5,2254.864273
8,Beitou District,142,108,55,35,50,26,21,3,3,1,599,1181035.0,1971.677796
11,Wenshan District,205,134,85,66,67,21,10,0,0,0,716,1342345.0,1874.78352


In [50]:
df_tp_realestate.loc[df_tp_realestate['Cluster Labels'] == 2, df_tp_realestate.columns[[2] + list(range(6, df_tp_realestate.shape[1]))]]

Unnamed: 0,Division,1001~1500,1501~2000,2001~2500,2501~3000,3001~4000,4001~5000,5001~7000,7001~9000,9001~12000,12001,Division_num,Division_total_price,Division_avg_price
0,Zhongzheng District,80,60,59,40,52,22,24,10,3,7,426,1115707.5,2619.03169
1,Datong District,62,50,37,25,21,13,12,1,1,1,325,620397.5,1908.915385
3,Songshan District,81,93,76,44,64,31,12,4,1,2,465,1097977.5,2361.241935
5,Wanhua District,89,72,31,21,26,14,1,1,0,1,440,668942.5,1520.323864
10,Nangang District,69,55,45,27,33,28,13,10,8,1,322,877660.0,2725.652174


In [51]:
df_tp_realestate.loc[df_tp_realestate['Cluster Labels'] == 3, df_tp_realestate.columns[[2] + list(range(6, df_tp_realestate.shape[1]))]]

Unnamed: 0,Division,1001~1500,1501~2000,2001~2500,2501~3000,3001~4000,4001~5000,5001~7000,7001~9000,9001~12000,12001,Division_num,Division_total_price,Division_avg_price
2,Zhongshan District,245,151,119,75,66,45,63,14,13,21,1088,2549776.0,2343.544118


In [52]:
df_tp_realestate.loc[df_tp_realestate['Cluster Labels'] == 4, df_tp_realestate.columns[[2] + list(range(6, df_tp_realestate.shape[1]))]]

Unnamed: 0,Division,1001~1500,1501~2000,2001~2500,2501~3000,3001~4000,4001~5000,5001~7000,7001~9000,9001~12000,12001,Division_num,Division_total_price,Division_avg_price
9,Neihu District,223,215,130,96,58,56,75,12,8,14,999,2547503.5,2550.053554


## Result and Discussion <a name="result"></a>

According to the clustering, in the similar neighborhood, we've got:

* Cluster 0: Zhongzheng, Songshan and Daan: Cafe, Coffee Shop, Dumpling Restaurant
* Cluster 1: Shilin: Asian Restaurant, Trail, Airport Terminal
* Cluster 2: Zhongshan: Convenience Store, Fish Market, Farmers Market
* Cluster 3: Datong, Xinyi ,Neihu and Wenshan: Convenience Store, Chinese Restaurant, Coffee Shop
* Cluster 4: Wanhua and Nangang: Convenience Store, Supermarket, Bus Station

And the price clusters are:

* Cluster 0: Daan (Highest average)
* Cluster 1: Xinyi, Shilin, Beitou and Wenshan (Lowest average)
* Cluster 2: Zhongzheng, Datong, Songshan, Wanhua, Nangang
* Cluster 3: Zhongshan
* Cluster 4: Neihu

According to the result above, we can find there are similar communities like:
* ZhongZheng and Songshan
* Xinyi and Wenshan
* Wanhua and Nangang
* Zhongshan is a special one

## Conclusion <a name="conclusion"></a>

It is interesting to find the community featuring and price relationship in the city I am currently living. But the Foursquare API seems not have much data in Taiwan, there are only about 250 venues in the taipei city, so the bias may influence the result.

According to my experience, Daan District is the Top1 district that rich people want to live in Taipei, with large parks and convenient transportation. Otherwise Wanhua is facing old age and poverty problem (which can see from the average realestate price), and the street look is also old and more complex. But the neighborhood cluster is in the same group with Nangang, which is rapidly developing in Taipei.

But we do find some interesting point:
1. Xinyi and Wenshan are close and the street look is quite similar, the realestate price are quite similar too (except for the CBD in Xinyi)
2. Zhongshan is a really special space, it has not only the older and complex area like Wanhua, but the comfortable area like Daan, so the price distribution is extreme.