# Coursera Capstone Final Project

## Introduction

I am a foreigner who lives and works in Taipei City of Taiwan, and as a foreigner with a little knowledge of Chinese langauge, sometimes I have problems to understand the city, locations and places and people. With the power of Python, I will try to explore how places are located in this city.

To do so, as described above, I will use below methods:

- Get the location data
- Geocode it into coordiantes
- Use Foursquare to examine each neighborhood
- Cluster the neighborhood using KNN

##### Firstly import all necessary libraries

In [1]:
import pandas as pd
import requests
import folium
from pandas.io.json import json_normalize
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
import os

In [2]:
if os.path.exists("taipei_add.csv") == False:
    postal_add = pd.DataFrame(columns=['Zip Code', 'Computerized No', 'Office name', 'Adddress'])
    for i in range(1,17):
        print("Scraping for page {}/16".format(i))
        web = pd.read_html("https://www.post.gov.tw/post/internet/I_location/index.jsp?topage=1&PreRowDatas=10&city=%E8%87%BA%E5%8C%97%E5%B8%82&input2=&st_7=&st_6=&is_night=&st_1_5=&zip5=&prsb_no=&city_area=&ID=1901&post_address=&style=&keyword=&Page_Load=1".format(i))
        postal_add = pd.concat([postal_add,web[0]])
        print("Done!")
    postal_add = postal_add.reset_index(drop=True)
    postal_add.to_csv("taipei_add.csv")
else:
    postal_add = pd.read_csv("taipei_add.csv",index_col = 0)

## Get the data

In [3]:
print(postal_add.shape)
postal_add.head()

(156, 4)


Unnamed: 0,Zip Code,Computerized No,Office name,Address
0,10044,000100-6,Taipei Beimen Post Office(Taipei Branch 901),"No. 120, Sec. 1, Zhongxiao W. Rd., Zhongzheng ..."
1,10064,000101-0,Taipei Dongmen Post Office(Taipei Branch 1),"No.163, Sec. 2, Xinyi Rd., Zhongzheng Dist., T..."
2,10846,000102-3,Taipei Hanzhong Street Post Office(Taipei Bran...,"No. 173, Hanzhong St., Wanhua Dist., Taipei 10..."
3,10847,000103-7,Taipei Xiyuan Post Office(Taipei Branch 3),"No. 156, Sec. 2, Changsha St., Wanhua Dist., T..."
4,10851,000104-1,Taipei Longshan Post Office(Taipei Branch 4),"No. 67, Guangjhou St., Wanhua Dist., Taipei 10..."


In [4]:
# Add coordiantes to each of location
postal_add_geo = pd.read_csv("taipei_add_geocode.csv", index_col=0)
print(postal_add_geo.shape)
postal_add_geo.head()

(156, 6)


Unnamed: 0,Zip Code,Computerized No,Office name,Address,Latitude,Longitude
0,10044,000100-6,Taipei Beimen Post Office(Taipei Branch 901),"No. 120, Sec. 1, Zhongxiao W. Rd., Zhongzheng ...",25.04732,121.51179
1,10064,000101-0,Taipei Dongmen Post Office(Taipei Branch 1),"No.163, Sec. 2, Xinyi Rd., Zhongzheng Dist., T...",25.03414,121.52856
2,10846,000102-3,Taipei Hanzhong Street Post Office(Taipei Bran...,"No. 173, Hanzhong St., Wanhua Dist., Taipei 10...",25.04133,121.50702
3,10847,000103-7,Taipei Xiyuan Post Office(Taipei Branch 3),"No. 156, Sec. 2, Changsha St., Wanhua Dist., T...",25.04088,121.50121
4,10851,000104-1,Taipei Longshan Post Office(Taipei Branch 4),"No. 67, Guangjhou St., Wanhua Dist., Taipei 10...",25.03658,121.50458


In [5]:
# Drop the un-geocoded data
taipei = postal_add_geo.dropna(axis=0,)
print(taipei.shape)
taipei.head()

(48, 6)


Unnamed: 0,Zip Code,Computerized No,Office name,Address,Latitude,Longitude
0,10044,000100-6,Taipei Beimen Post Office(Taipei Branch 901),"No. 120, Sec. 1, Zhongxiao W. Rd., Zhongzheng ...",25.04732,121.51179
1,10064,000101-0,Taipei Dongmen Post Office(Taipei Branch 1),"No.163, Sec. 2, Xinyi Rd., Zhongzheng Dist., T...",25.03414,121.52856
2,10846,000102-3,Taipei Hanzhong Street Post Office(Taipei Bran...,"No. 173, Hanzhong St., Wanhua Dist., Taipei 10...",25.04133,121.50702
3,10847,000103-7,Taipei Xiyuan Post Office(Taipei Branch 3),"No. 156, Sec. 2, Changsha St., Wanhua Dist., T...",25.04088,121.50121
4,10851,000104-1,Taipei Longshan Post Office(Taipei Branch 4),"No. 67, Guangjhou St., Wanhua Dist., Taipei 10...",25.03658,121.50458


In [6]:
name = []
for row in taipei.iterrows():
    tmp = row[1]["Office name"].split("Post")[0]
    name.append(tmp)
taipei["Neighborhood"] = name
columns = ["Neighborhood", "Latitude", "Longitude"]
taipei_ok = taipei[columns]
taipei_ok.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Taipei Beimen,25.04732,121.51179
1,Taipei Dongmen,25.03414,121.52856
2,Taipei Hanzhong Street,25.04133,121.50702
3,Taipei Xiyuan,25.04088,121.50121
4,Taipei Longshan,25.03658,121.50458


## Examine the Data

In [7]:
latitude, longitude = taipei_ok.Latitude.mean(), taipei_ok.Longitude.mean()
map_taipei = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(taipei_ok['Latitude'], taipei_ok['Longitude'], taipei_ok['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_taipei)  
    
map_taipei

In [8]:
CLIENT_ID = 'DQLRPDS3VFILKM1YBTFC2RDVKZBTNILT3NIAMY5CUYZMCE2R' # your Foursquare ID
CLIENT_SECRET = '12F5OXQO2JMPQRKOHBGKYF4DPLY2AQA02KFF0YWKSGRX2HKX' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: DQLRPDS3VFILKM1YBTFC2RDVKZBTNILT3NIAMY5CUYZMCE2R
CLIENT_SECRET:12F5OXQO2JMPQRKOHBGKYF4DPLY2AQA02KFF0YWKSGRX2HKX


In [10]:
no = 0
neighborhood_latitude = taipei_ok.loc[no, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = taipei_ok.loc[no, 'Longitude'] # neighborhood longitude value

neighborhood_name = taipei_ok.loc[no, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of "{}" are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of "Taipei Beimen " are 25.047320000000003, 121.51178999999999.


In [11]:
# Function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [12]:
radius = 500
LIMIT = 100

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
results = requests.get(url).json()
print(results.keys())

venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

nearby_venues.head()

dict_keys(['meta', 'response'])
61 venues were returned by Foursquare.


Unnamed: 0,name,categories,lat,lng
0,鄭記豬腳飯,Asian Restaurant,25.046989,121.511049
1,North Gate (台北府城北門),Historic Site,25.047584,121.511179
2,Heritage Bakery & Cafe,Café,25.045171,121.511824
3,福州世祖胡椒餅,Food Truck,25.045967,121.513506
4,張家清真黃牛肉麵館 Chang's Halal beef Noodles,Noodle House,25.045718,121.51072


In [13]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [14]:
taipei_venues = getNearbyVenues(names=taipei_ok['Neighborhood'],
                                   latitudes=taipei_ok['Latitude'],
                                   longitudes=taipei_ok['Longitude']
                                  )

Taipei Beimen 
Taipei Dongmen 
Taipei Hanzhong Street 
Taipei Xiyuan 
Taipei Longshan 
Taipei Nanhai 
Taipei Yingqiao 
Taipei Qingtian 
Taipei Fuxing Bridge 
Taipei Zhongshan 
Taipei Dihua St. 
Taipei Yuanhuan 
Executive Yuan 
Taipei Gongguan 
Academia Historica 
Presidential Office Building 
Taipei Shuanglian 
Taipei Songshan 
Taipei District Court 
Taipei Taipei-Bridge 
Taipei Normal University 
National Taiwan University 
Taipei Ren-ai Rd. 
Legislative Yuan 
Taipei Xinwei 
Taipei Jianbei 
Shilin 
Wenshan Jingmei 
Taipei Guting 
Taipei Dongyuan 
Taipei Guanghua 
Taipei Nanyang 
Taipei Zhonglun 
Taipei Juguang 
Taipei Guangfu 
Taipei Dazhi 
Taipei Zhongnan 
Shilin Shezi 
Taipei Sanzhangli 
Taipei Chenggong 
Songshan Airport 
Taipei Chang-an 
Taipei Datong 
Taipei Xisong 
Taipei City Government 
Taipei Dalongtong 
Taipei Songde 
Shilin Tianmu 


In [15]:
print(taipei_venues.shape)
taipei_venues.groupby("Neighborhood").count()

(2098, 7)


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Academia Historica,57,57,57,57,57,57
Executive Yuan,75,75,75,75,75,75
Legislative Yuan,23,23,23,23,23,23
National Taiwan University,14,14,14,14,14,14
Presidential Office Building,13,13,13,13,13,13
Shilin,48,48,48,48,48,48
Shilin Shezi,5,5,5,5,5,5
Shilin Tianmu,10,10,10,10,10,10
Songshan Airport,74,74,74,74,74,74
Taipei Beimen,61,61,61,61,61,61


In [16]:
# one hot encoding
taipei_onehot = pd.get_dummies(taipei_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
taipei_onehot['Neighborhood'] = taipei_venues['Neighborhood'] 

# move neighborhood column to the first column
index_nbh = taipei_onehot.columns.to_list().index("Neighborhood")
fixed_columns = ["Neighborhood"] + taipei_onehot.columns.to_list()[0:index_nbh] + taipei_onehot.columns.to_list()[index_nbh+1:]
taipei_onehot = taipei_onehot[fixed_columns]
print(taipei_onehot.shape)
taipei_onehot.head()

(2098, 192)


Unnamed: 0,Neighborhood,Accessories Store,Airport Service,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,...,Used Bookstore,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Winery,Xinjiang Restaurant,Yoga Studio,Yunnan Restaurant
0,Taipei Beimen,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
1,Taipei Beimen,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Taipei Beimen,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Taipei Beimen,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Taipei Beimen,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [17]:
taipei_group = taipei_onehot.groupby("Neighborhood").mean()
print(taipei_group.shape)
taipei_group.head()

(48, 191)


Unnamed: 0_level_0,Accessories Store,Airport Service,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,BBQ Joint,Bakery,...,Used Bookstore,Vegetarian / Vegan Restaurant,Video Game Store,Vietnamese Restaurant,Whisky Bar,Wine Bar,Winery,Xinjiang Restaurant,Yoga Studio,Yunnan Restaurant
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Academia Historica,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.017544,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Executive Yuan,0.0,0.0,0.0,0.0,0.026667,0.0,0.0,0.026667,0.0,0.013333,...,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Legislative Yuan,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0,0.043478,...,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
National Taiwan University,0.071429,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
Presidential Office Building,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [18]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [19]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = taipei_group.index

for ind in np.arange(taipei_group.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(taipei_group.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Academia Historica,Hotel,Noodle House,Chinese Restaurant,Convenience Store,Café
1,Executive Yuan,Convenience Store,Café,Japanese Restaurant,Hotel,Ramen Restaurant
2,Legislative Yuan,Hotel,Chinese Restaurant,Fast Food Restaurant,Art Gallery,Noodle House
3,National Taiwan University,Other Nightlife,BBQ Joint,Multiplex,Bubble Tea Shop,Music Venue
4,Presidential Office Building,Noodle House,College Academic Building,Bistro,Garden,Café


In [20]:
# set number of clusters
kclusters = 5

taipei_grouped_clustering = taipei_group.values

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(taipei_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 1, 1, 1, 1, 4, 3, 1, 1], dtype=int32)

In [21]:
# add clustering labels
neighborhoods_venues_sorted["Cluster Labels"] = kmeans.labels_

taipei_merged = pd.merge(left=taipei_ok,right=neighborhoods_venues_sorted,left_on="Neighborhood",right_on="Neighborhood")
taipei_merged.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Cluster Labels
0,Taipei Beimen,25.04732,121.51179,Café,Hotel,Noodle House,Coffee Shop,Convenience Store,1
1,Taipei Dongmen,25.03414,121.52856,Café,Taiwanese Restaurant,Noodle House,Coffee Shop,Dumpling Restaurant,1
2,Taipei Hanzhong Street,25.04133,121.50702,Hotel,Noodle House,Coffee Shop,Ice Cream Shop,Café,1
3,Taipei Xiyuan,25.04088,121.50121,Taiwanese Restaurant,Hotel,Breakfast Spot,Hostel,Night Market,1
4,Taipei Longshan,25.03658,121.50458,Café,Noodle House,Historic Site,Taiwanese Restaurant,Convenience Store,1


In [22]:

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(taipei_merged['Latitude'], taipei_merged['Longitude'], taipei_merged['Neighborhood'], taipei_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [23]:
report = []
for i in range(0,len(set(taipei_merged["Cluster Labels"]))):
    group = taipei_merged[taipei_merged["Cluster Labels"] == i].describe(include="all")
    rep = ["Group {}".format(i+1)]+group.loc["top"][["1st Most Common Venue","2nd Most Common Venue","3rd Most Common Venue","4th Most Common Venue","5th Most Common Venue"]].to_list()
    report.append(rep)
report = np.array(report)

In [25]:
def k_cluster(df):
    # one hot encoding
    df_onehot = pd.get_dummies(df[['Venue Category']], prefix="", prefix_sep="")

    # add neighborhood column back to dataframe
    df_onehot['Neighborhood'] = df['Neighborhood'] 

    # move neighborhood column to the first column
    index_nbh = df_onehot.columns.to_list().index("Neighborhood")
    fixed_columns = ["Neighborhood"] + df_onehot.columns.to_list()[0:index_nbh] + df_onehot.columns.to_list()[index_nbh+1:]
    df_onehot = df_onehot[fixed_columns]
    print(taipei_onehot.shape)
    df_group = df_onehot.groupby("Neighborhood").mean()
    print(df_group.shape)
    num_top_venues = 5

    indicators = ['st', 'nd', 'rd']

    # create columns according to number of top venues
    columns = ['Neighborhood']
    for ind in np.arange(num_top_venues):
        try:
            columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
        except:
            columns.append('{}th Most Common Venue'.format(ind+1))

    # create a new dataframe
    neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
    neighborhoods_venues_sorted['Neighborhood'] = df_group.index

    for ind in np.arange(df_group.shape[0]):
        neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(df_group.iloc[ind, :], num_top_venues)

    neighborhoods_venues_sorted.head()

    # set number of clusters
    kclusters = 5

    df_grouped_clustering = df_group.values

    # run k-means clustering
    kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_grouped_clustering)

    # check cluster labels generated for each row in the dataframe
    kmeans.labels_[0:10] 
    # add clustering labels
    neighborhoods_venues_sorted["Cluster Labels"] = kmeans.labels_

    df_merged = pd.merge(left=taipei_ok,right=neighborhoods_venues_sorted,left_on="Neighborhood",right_on="Neighborhood")
    return df_merged

In [26]:
res = taipei_venues[taipei_venues["Venue Category"].str.contains("Restaurant")].reset_index(drop=True)

In [28]:
def report(df_merged):
    report = []
    for i in range(0,len(set(df_merged["Cluster Labels"]))):
        group = df_merged[df_merged["Cluster Labels"] == i].describe(include="all")
        rep = ["Group {}".format(i+1)]+group.loc["top"][["1st Most Common Venue","2nd Most Common Venue","3rd Most Common Venue","4th Most Common Venue","5th Most Common Venue"]].to_list()
        report.append(rep)
    report = np.array(report)
    return pd.DataFrame(report[:,1:],index=report[:,0],
                 columns=["1st Most Common Venue","2nd Most Common Venue","3rd Most Common Venue","4th Most Common Venue","5th Most Common Venue"])

In [29]:
report(taipei_merged)

Unnamed: 0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
Group 1,Café,Italian Restaurant,Sushi Restaurant,Food Stand,Japanese Restaurant
Group 2,Café,Convenience Store,Noodle House,Coffee Shop,Café
Group 3,Convenience Store,Motorcycle Shop,Seafood Restaurant,Flea Market,Furniture / Home Store
Group 4,Park,Convenience Store,Coffee Shop,Café,Café
Group 5,Bus Station,Burger Joint,Spiritual Center,Chinese Restaurant,Yunnan Restaurant


In [30]:
res_merged = k_cluster(res)
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(res_merged['Latitude'], res_merged['Longitude'], res_merged['Neighborhood'], res_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

(2098, 192)
(48, 48)


Group 1, 2 are inner cities filled with Cafe and restaurant <br/>
Group 3,4,5 are outside areas with intersection, convenience store, parks and so on