## Capstone Project - The Battle of Neighborhoods

A peer-graded assignment on Coursera made by Hu Junjie

## 1. Introduction

**Zhongshan** is my hometown. It locates at the west side of the Pearl Delta and it is famous for the birth of Sun Yet-Sun. This great man in the modern history of China is the reason why this city changed its name into 'Zhongshan' from 'Heung Shan'. Before the policy of reformation and opening, Zhongshan used to be much more prosperous than Shenzhen as a historical city. However, in the recent decades, Zhongshan is lossing its glory in tourism and the share of Economy. 
Let's find out what's the current status of tourist attraction for Zhongshan.


Based on data science, we try to separate each district of Zhongshan through machine learning and tie the districts together to present a new perspective for understanding Zhongshan to those who visit this place for the first time.
First of all, using geocoder, we will secure and visualize information in each district of Zhongshan. The Foursquare API will allow us to explore multiple venues in each district. After sorting it into Pandas dataframe through hot-end coding and normalization, Zhongshan will be divided into about five zones with similar characteristics to provide tourists with rough local information.

## 2. Data source and Preprocessing Data

We try to collect data in the similar way, referring to the method we did in the previous example - Segmenting and Clustering Neighborhoods in New York City.
* District of Zhongshan : https://www.atool99.com/china_city.php
* Folium, Geocoder
    * These will be used to visualize map info.

In [1]:
import requests # library to handle requests
from bs4 import BeautifulSoup  # import beautiful soup for html parsing

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

#!conda install -c conda-forge geopy --yes  # uncomment if geopy library is not installed
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

#!conda install -c conda-forge folium=0.5.0 --yes   # uncomment if folium library is not installed
import folium # map rendering library

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

print('Libraries imported.')

Libraries imported.


In [2]:
url = 'https://www.atool99.com/city_442000.html'

In [3]:
df = pd.read_html(url,header=0,encoding='utf8')[1]

In [4]:
df

Unnamed: 0,区位码,地区名称,电话区号,邮政编码,经纬度,英文
0,442001,石岐区,760,528400,"(113.378835, 22.52522)",Shiqi
1,442004,南区,760,528400,"(113.355896, 22.486568)",Nanqu
2,442005,五桂山区,760,528458,"(113.41079, 22.51968)",Wuguishan
3,442006,火炬开发区,760,528437,"(113.480523, 22.566082)",Huoju
4,442007,黄圃镇,760,528429,"(113.342359, 22.715116)",Huangpu
5,442008,南头镇,760,528421,"(113.296358, 22.713907)",Nantou
6,442009,东凤镇,760,528425,"(113.26114, 22.68775)",Dongfeng
7,442010,阜沙镇,760,528434,"(113.353024, 22.666364)",Fusha
8,442011,小榄镇,760,528415,"(113.244235, 22.666951)",Xiaolan
9,442012,东升镇,760,528400,"(113.296298, 22.614003)",Dongsheng


In [14]:
df_new = pd.DataFrame()

#put Name in first
df_new['District']=df.英文

#split ‘经纬度’ into 'Latitudes' & 'Longitudes'
df_new['Latitude'] = df.经纬度.str.split(',').str[1]
df_new['Longitude'] = df.经纬度.str.split(',').str[0]
df_new.head()

Unnamed: 0,District,Latitude,Longitude
0,Shiqi,22.52522),(113.378835
1,Nanqu,22.486568),(113.355896
2,Wuguishan,22.51968),(113.41079
3,Huoju,22.566082),(113.480523
4,Huangpu,22.715116),(113.342359


In [15]:
# cleaning
df_new['Latitude']=df_new.Latitude.str.split(')').str[0]
df_new['Longitude']=df_new.Longitude.str.split('(').str[1]
df_new.head()

Unnamed: 0,District,Latitude,Longitude
0,Shiqi,22.52522,113.378835
1,Nanqu,22.486568,113.355896
2,Wuguishan,22.51968,113.41079
3,Huoju,22.566082,113.480523
4,Huangpu,22.715116,113.342359


## 3. Explore and Cluster the Neighborhoods in Zhongshan

Use geopy library to get the latitude and longitude values of Zhongshan.¶

In [10]:
address = "Zhongshan, ZS"

geolocator = Nominatim(user_agent="zhongshan_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Zhongshan city are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Zhongshan city are 22.5213807, 113.3656141.


Create a map of Zhongshan with neighborhoods superimposed on top.

In [34]:
# create map of Toronto using latitude and longitude values
map_zs = folium.Map(location=[latitude, longitude], zoom_start=10)
map_zs

add markers to the map

In [35]:
for Latitude, Longitude, District in zip(
        df_new['Latitude'], 
        df_new['Longitude'], 
        df_new['District']):
    label = '{}'.format(District)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [Latitude, Longitude],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_zs)  

map_zs

Define Foursquare Credentials and Version

In [18]:
CLIENT_ID = 'F5LKCXZIMXVSFZJNXKRC00RHRT4440ESDKNCBE1SWQZNZLPS' # your Foursquare ID
CLIENT_SECRET = 'IX0DMTBNNBT5NIF421DGULYQ4MX0ICMMM5M3AQASO5HSYVJQ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: F5LKCXZIMXVSFZJNXKRC00RHRT4440ESDKNCBE1SWQZNZLPS
CLIENT_SECRET:IX0DMTBNNBT5NIF421DGULYQ4MX0ICMMM5M3AQASO5HSYVJQ


In [19]:
#Explore the first District in our data frame "df_new"
district_name = df_new.loc[0, 'District']
print(f"The first District's name is '{district_name}'.")

The first District's name is 'Shiqi'.


In [20]:
district_latitude = df_new.loc[0, 'Latitude'] # neighborhood latitude value
district_longitude = df_new.loc[0, 'Longitude'] # neighborhood longitude value

print('Latitude and longitude values of {} are {}, {}.'.format(district_name, 
                                                               district_latitude, 
                                                               district_longitude))

Latitude and longitude values of Shiqi are  22.52522, 113.378835.


Now, let's get the top 100 venues that are in The Shiqi within a radius of 5000 meters.

In [58]:
def getNearbyVenues(names, latitudes, longitudes, radius=5000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [59]:
zs_venues = getNearbyVenues(names=df_new['District'],
                                   latitudes=df_new['Latitude'],
                                   longitudes=df_new['Longitude']
                                  )

print('................')
print('Process is done.')

Shiqi
Nanqu
Wuguishan
Huoju
Huangpu
Nantou
Dongfeng
Fusha
Xiaolan
Dongsheng
Guzhen
Henglan
Sanjiao
Minzhong
Nanlang
Gangkou
Dayong
Shaxi
Sanxiang
Banfu
Shenwan
Tanzhou
................
Process is done.


In [60]:
#Let's check the size of the resulting dataframe
print(zs_venues.shape)
zs_venues.head()

(214, 7)


Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Shiqi,22.52522,113.378835,Hilton,22.51579,113.377958,Hotel
1,Shiqi,22.52522,113.378835,King Century Hotel (京华世纪酒店),22.513871,113.379753,Motel
2,Shiqi,22.52522,113.378835,Lihe Plaza (利和购物中心),22.514559,113.378863,Shopping Mall
3,Shiqi,22.52522,113.378835,Starbucks (星巴克),22.522906,113.385148,Coffee Shop
4,Shiqi,22.52522,113.378835,大信新都汇,22.534994,113.377619,Shopping Mall


In [61]:
zs_venues.groupby('District').count()

Unnamed: 0_level_0,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
District,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Banfu,5,5,5,5,5,5
Dayong,6,6,6,6,6,6
Dongfeng,7,7,7,7,7,7
Dongsheng,7,7,7,7,7,7
Fusha,5,5,5,5,5,5
Gangkou,25,25,25,25,25,25
Guzhen,14,14,14,14,14,14
Henglan,4,4,4,4,4,4
Huangpu,5,5,5,5,5,5
Huoju,10,10,10,10,10,10


In [53]:
print('There are {} uniques categories.'.format(len(zs_venues['Venue Category'].unique())))

There are 45 uniques categories.


## 4. Analyze Each District

In [62]:
# one hot encoding
zs_onehot = pd.get_dummies(zs_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
zs_onehot['District'] = zs_venues['District'] 

# move neighborhood column to the first column
fixed_columns = [zs_onehot.columns[-1]] + list(zs_onehot.columns[:-1])
zs_onehot = zs_onehot[fixed_columns]

zs_onehot.head()

Unnamed: 0,District,Athletics & Sports,Boat or Ferry,Bridal Shop,Bus Station,Cafeteria,Café,Cantonese Restaurant,Chinese Breakfast Place,Chinese Restaurant,Coffee Shop,Department Store,Electronics Store,Fabric Shop,Farm,Fast Food Restaurant,Food Court,Furniture / Home Store,Garden,German Restaurant,Golf Course,Harbor / Marina,Hostel,Hotel,Hotel Pool,Italian Restaurant,Light Rail Station,Market,Motel,Mountain,Multiplex,Pier,Pizza Place,Ramen Restaurant,Rest Area,Restaurant,Seafood Restaurant,Shopping Mall,Spa,Stadium,Steakhouse,Tea Room,Tourist Information Center,Train Station,Tunnel,Water Park
0,Shiqi,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Shiqi,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Shiqi,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
3,Shiqi,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Shiqi,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0


In [63]:
#And let's examine the new dataframe size.
zs_onehot.shape

(214, 46)

In [64]:
#Next, let's group rows by neighborhood and by taking the mean of the 
#  frequency of occurrence of each category
zs_grouped = zs_onehot.groupby('District').mean().reset_index()
zs_grouped

Unnamed: 0,District,Athletics & Sports,Boat or Ferry,Bridal Shop,Bus Station,Cafeteria,Café,Cantonese Restaurant,Chinese Breakfast Place,Chinese Restaurant,Coffee Shop,Department Store,Electronics Store,Fabric Shop,Farm,Fast Food Restaurant,Food Court,Furniture / Home Store,Garden,German Restaurant,Golf Course,Harbor / Marina,Hostel,Hotel,Hotel Pool,Italian Restaurant,Light Rail Station,Market,Motel,Mountain,Multiplex,Pier,Pizza Place,Ramen Restaurant,Rest Area,Restaurant,Seafood Restaurant,Shopping Mall,Spa,Stadium,Steakhouse,Tea Room,Tourist Information Center,Train Station,Tunnel,Water Park
0,Banfu,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0
1,Dayong,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0
2,Dongfeng,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0
3,Dongsheng,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.428571,0.0,0.0,0.0,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Fusha,0.0,0.0,0.0,0.0,0.2,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Gangkou,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.24,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.16,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.16,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0
6,Guzhen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.071429,0.0,0.0,0.0,0.071429,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.357143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0
7,Henglan,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Huangpu,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Huoju,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [65]:
zs_grouped.shape

(22, 46)

Let's print each neighborhood along with the top 10 most common venues

In [69]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
District_venues_sorted = pd.DataFrame(columns=columns)
District_venues_sorted['District'] = zs_grouped['District']

for ind in np.arange(zs_grouped.shape[0]):
    District_venues_sorted.iloc[ind, 1:] = return_most_common_venues(zs_grouped.iloc[ind, :], num_top_venues)

District_venues_sorted.head()

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Banfu,Hotel,Tunnel,Garden,Market,Mountain,Harbor / Marina,German Restaurant,Furniture / Home Store,Food Court,Fast Food Restaurant
1,Dayong,Garden,Stadium,Cantonese Restaurant,Chinese Breakfast Place,Mountain,Fabric Shop,Water Park,Electronics Store,German Restaurant,Furniture / Home Store
2,Dongfeng,Hotel,Train Station,Bus Station,Italian Restaurant,Chinese Restaurant,Pizza Place,Electronics Store,German Restaurant,Garden,Furniture / Home Store
3,Dongsheng,Chinese Restaurant,Hotel,Food Court,Cantonese Restaurant,Fast Food Restaurant,Electronics Store,Golf Course,German Restaurant,Garden,Furniture / Home Store
4,Fusha,Hotel,Cafeteria,Cantonese Restaurant,Restaurant,Hostel,Fabric Shop,Golf Course,German Restaurant,Garden,Furniture / Home Store


## 5. Cluster Neighborhoods

In [70]:
# set number of clusters
kclusters = 5

zs_grouped_clustering = zs_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(zs_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 4, 1, 3, 4, 0, 0, 2, 1, 3], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [78]:
# add clustering labels
District_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)


zs_merged = df_new

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
zs_merged = zs_merged.join(District_venues_sorted.set_index('District'), on='District')

zs_merged.head() # check the last columns!

Unnamed: 0,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Shiqi,22.52522,113.378835,0,Coffee Shop,Fast Food Restaurant,Hotel,Shopping Mall,Cantonese Restaurant,Train Station,Tourist Information Center,Motel,Pizza Place,Electronics Store
1,Nanqu,22.486568,113.355896,0,Coffee Shop,Fast Food Restaurant,Hotel,Tourist Information Center,Multiplex,Shopping Mall,Chinese Restaurant,Motel,Garden,Furniture / Home Store
2,Wuguishan,22.51968,113.41079,0,Hotel,Shopping Mall,Fast Food Restaurant,Coffee Shop,Cantonese Restaurant,Motel,Seafood Restaurant,Pizza Place,Spa,Train Station
3,Huoju,22.566082,113.480523,3,Fast Food Restaurant,Chinese Restaurant,Pier,Pizza Place,Hotel Pool,Bus Station,Water Park,German Restaurant,Garden,Furniture / Home Store
4,Huangpu,22.715116,113.342359,1,Hotel,Light Rail Station,Seafood Restaurant,Chinese Restaurant,Electronics Store,German Restaurant,Garden,Furniture / Home Store,Food Court,Fast Food Restaurant


In [88]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(zs_merged['Latitude'], zs_merged['Longitude'], zs_merged['District'], zs_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 6. Examine Clusters

In [109]:
# Cluster 1
zs_merged.loc[zs_merged['Cluster Labels'] == 0,
                  zs_merged.columns[[0] + 
                  list(range(1, zs_merged.shape[1]))]]

Unnamed: 0,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Shiqi,22.52522,113.378835,0,Coffee Shop,Fast Food Restaurant,Hotel,Shopping Mall,Cantonese Restaurant,Train Station,Tourist Information Center,Motel,Pizza Place,Electronics Store
1,Nanqu,22.486568,113.355896,0,Coffee Shop,Fast Food Restaurant,Hotel,Tourist Information Center,Multiplex,Shopping Mall,Chinese Restaurant,Motel,Garden,Furniture / Home Store
2,Wuguishan,22.51968,113.41079,0,Hotel,Shopping Mall,Fast Food Restaurant,Coffee Shop,Cantonese Restaurant,Motel,Seafood Restaurant,Pizza Place,Spa,Train Station
10,Guzhen,22.611019,113.179745,0,Hotel,Shopping Mall,Coffee Shop,Furniture / Home Store,Fast Food Restaurant,Department Store,Ramen Restaurant,Train Station,Café,Fabric Shop
12,Sanjiao,22.677033,113.423624,0,Hotel,Fast Food Restaurant,Golf Course,Rest Area,Bus Station,Fabric Shop,Boat or Ferry,German Restaurant,Garden,Furniture / Home Store
15,Gangkou,22.521113,113.382391,0,Coffee Shop,Fast Food Restaurant,Hotel,Shopping Mall,Cantonese Restaurant,Train Station,Tourist Information Center,Motel,Pizza Place,Electronics Store
17,Shaxi,22.526325,113.328369,0,Hotel,Coffee Shop,Chinese Restaurant,Fast Food Restaurant,Rest Area,Cafeteria,Café,Golf Course,German Restaurant,Garden
21,Tanzhou,22.261269,113.485677,0,Hotel,Coffee Shop,Fast Food Restaurant,Shopping Mall,Spa,Chinese Restaurant,German Restaurant,Athletics & Sports,Steakhouse,Garden


In [110]:
# Cluster 2
zs_merged.loc[zs_merged['Cluster Labels'] == 1,
                zs_merged.columns[[0] + 
                  list(range(1, zs_merged.shape[1]))]]

Unnamed: 0,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Huangpu,22.715116,113.342359,1,Hotel,Light Rail Station,Seafood Restaurant,Chinese Restaurant,Electronics Store,German Restaurant,Garden,Furniture / Home Store,Food Court,Fast Food Restaurant
5,Nantou,22.713907,113.296358,1,Hotel,Train Station,Bus Station,Seafood Restaurant,Department Store,German Restaurant,Garden,Furniture / Home Store,Food Court,Fast Food Restaurant
6,Dongfeng,22.68775,113.26114,1,Hotel,Train Station,Bus Station,Italian Restaurant,Chinese Restaurant,Pizza Place,Electronics Store,German Restaurant,Garden,Furniture / Home Store
8,Xiaolan,22.666951,113.244235,1,Hotel,Train Station,Bus Station,Italian Restaurant,Chinese Restaurant,Pizza Place,Electronics Store,German Restaurant,Garden,Furniture / Home Store


In [111]:
# Cluster 3
zs_merged.loc[zs_merged['Cluster Labels'] == 2,
                 zs_merged.columns[[0] + 
                  list(range(1, zs_merged.shape[1]))]]

Unnamed: 0,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Henglan,22.523202,113.265845,2,Rest Area,Bridal Shop,Electronics Store,Department Store,Golf Course,German Restaurant,Garden,Furniture / Home Store,Food Court,Fast Food Restaurant


In [112]:
# Cluster 4
zs_merged.loc[zs_merged['Cluster Labels'] == 3,
                 zs_merged.columns[[0] + 
                  list(range(1, zs_merged.shape[1]))]]

Unnamed: 0,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Huoju,22.566082,113.480523,3,Fast Food Restaurant,Chinese Restaurant,Pier,Pizza Place,Hotel Pool,Bus Station,Water Park,German Restaurant,Garden,Furniture / Home Store
9,Dongsheng,22.614003,113.296298,3,Chinese Restaurant,Hotel,Food Court,Cantonese Restaurant,Fast Food Restaurant,Electronics Store,Golf Course,German Restaurant,Garden,Furniture / Home Store
13,Minzhong,22.623468,113.486025,3,Chinese Restaurant,Water Park,Farm,Rest Area,Bus Station,Fabric Shop,Golf Course,German Restaurant,Garden,Furniture / Home Store
18,Sanxiang,22.352494,113.4334,3,Golf Course,Bus Station,Café,Cantonese Restaurant,Chinese Restaurant,Fast Food Restaurant,Water Park,Fabric Shop,German Restaurant,Garden


In [113]:
# Cluster 5
zs_merged.loc[zs_merged['Cluster Labels'] == 4,
                 zs_merged.columns[[0] + 
                  list(range(1, zs_merged.shape[1]))]]

Unnamed: 0,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Fusha,22.666364,113.353024,4,Hotel,Cafeteria,Cantonese Restaurant,Restaurant,Hostel,Fabric Shop,Golf Course,German Restaurant,Garden,Furniture / Home Store
14,Nanlang,22.492378,113.533939,4,Hotel,Train Station,Tea Room,Seafood Restaurant,Cantonese Restaurant,Pizza Place,Mountain,Department Store,Garden,Furniture / Home Store
16,Dayong,22.467712,113.291708,4,Garden,Stadium,Cantonese Restaurant,Chinese Breakfast Place,Mountain,Fabric Shop,Water Park,Electronics Store,German Restaurant,Furniture / Home Store
19,Banfu,22.415674,113.320346,4,Hotel,Tunnel,Garden,Market,Mountain,Harbor / Marina,German Restaurant,Furniture / Home Store,Food Court,Fast Food Restaurant
20,Shenwan,22.312476,113.359387,4,Hotel,Cantonese Restaurant,Harbor / Marina,Boat or Ferry,Bridal Shop,Fabric Shop,Golf Course,German Restaurant,Garden,Furniture / Home Store


## 7. Discussion

Using the most common place information above, we might explain characteristics of each cluster as below:

 1. Cluster #1 has 8 districts and they are quite commercial. You could easily find venues such as hotel, coffee shops, Fastfood restaurants. It would be good for tourist to enjoy it and have nice experience here. 

2. Cluster #2 has 4 districts and they functions quite much as transportation junction. You could easily find Light Rail Station, Bus Station and Hotel.

3. Cluster #3 has 1 district. This district, Henglan, is not commercial but good place for rest area. 

4. Cluster #4 has 4 districts and they are not quite commercial but living towns. You could easily find Fast food, Chinese food, Cantonese food.

5. Cluster #5 has 5 districts and they are good for tourism. You can easily find Hotel, garden, restaurants and some special venus.

## 8. Conclusion
Using Foursquare API and simple machine learning technique, we divide Zhongshan into 5 clusters and find out characteritics of each cluster. Based on this infomation, we could make few recommendations for tourists.

If they are first time to visit in Zhongshan, Cluster #1 , #2 or #5 would be the best for them. These are commercial enough which means that they are full of excitement to enjoy. Cluster #4 might be less exciting but good enough for them who want to know more about real Zhongshan. I am not recommending to visit cluster #3 that are located in outskirt of Zhongshan, however, it would be wonderful experience if they've already visited Zhongshan few times and they want to know more about this city.