# Capstone Project - The Battle of the Neighborhoods (Week 2)

## Introduction& Business Problem

In this project, an optimal location for opening a new restaurant in Yangzhou, Jiangsu Province, China. This report is aimed at stakeholders who are interested in operating coffee shops around colleges or schools.

Since there are many cases that imply a business failure of choosing a noisy and fierce competing environment. Particularly, a not crowded location will be detected based on other competitors' place strategy. Another condition here is the proximity to the city center, which encourages consumer demands.

## Data description

Part I

To cope with this problem of location selection, these factors will be considered:

1.number of existing restaurants in the neighborhood

2.type of surrounding entertainments (attractive factors to target and potential customers)

3.commercial clusters identification

4.location of city center and relevant traffics distributed in subareas


Part II

We decide to use regularly spaced grid of locations, centered around city center, to define our neighborhoods. The data sources are shown as following:

1.get Yangzhou geometry information from CYBPO(https://postal-codes.cybo.com/china/yangzhou/),in which website the geometry data sets provided

2.get the detailed information in each circle by applying Foursquare API

3.calculate geometry information, under specific conditions (e.g., about 3 km from city center, and each has 600 meters each circle apart)(here mostly for screening appropriate geographic postion, including city center, commercial clusters, and finally, opportunities)


## Detecting neighborhoods

First and foremost, the city center and commercial groups have to be identified to find attractive places for coffee shops before we collect resturants and attractions information.

In [89]:
import pandas as pd    #import library

In [56]:
!wget -O Yangzhou.csv https://raw.githubusercontent.com/WindAlan-sw/Coursera_Capstone/master/Yangzhou.csv
    # 'Yangzhou.csv' this data file has been uploaded on my Github

--2021-01-04 13:33:47--  https://raw.githubusercontent.com/WindAlan-sw/Coursera_Capstone/master/Yangzhou.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 151.101.48.133
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|151.101.48.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1821 (1.8K) [text/plain]
Saving to: ‘Yangzhou.csv’


2021-01-04 13:33:47 (24.2 MB/s) - ‘Yangzhou.csv’ saved [1821/1821]



In [90]:
df=pd.read_csv('Yangzhou.csv')
df.head()
# We get the information about subareas within YANGZHOU city and responding geometry details for further study

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,211400,Yizheng,Yangtzu Garden,32.2754,119.1779
1,211405,Yizheng,Yangtzu Garden,32.3408,119.2415
2,211423,Yizheng,"Yangtzu Garden,Yizheng market",32.3842,119.2895
3,211900,Yizheng,Yizheng market,32.2826,119.1471
4,211931,Yizheng,"Renmen garden,Yizheng market",32.2968,119.1047


In [58]:
df.shape

(33, 5)

## Determining coordinate of Yangzhou

In [4]:
!pip install folium          #installing folium

Collecting folium
  Downloading folium-0.11.0-py2.py3-none-any.whl (93 kB)
[K     |████████████████████████████████| 93 kB 4.1 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.11.0


In [5]:
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
from matplotlib import  pyplot as plt
import requests
import folium
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim
import json
import seaborn as sns
%matplotlib inline

In [9]:
address = 'Yangzhou, China'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Yangzhou are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Yangzhou are 32.3969935, 119.4077008.


In [12]:
map_YZ = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'],df['Borough'],df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_YZ)   
    
map_YZ

In [13]:
print(df['Borough'].unique())

['Yizheng' 'Yangzhou' 'Guangling' 'Weiyang' 'Hanjiang' 'Jiangdu']


In [15]:
address = 'Yangzhou, China'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

map_north_YZ = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_north_YZ)  
    
map_north_YZ

From above result, we could find that there are six districts in Yangzhou and their map locations.

And from the map, it is easy to decide the commercial environment, then we detect commercial centers that may be attractive to college students and figure out where do those shops lie.

In [27]:
df.loc[15, 'Neighborhood']

'Ge Garden,Dongguan Street'

In [29]:
neighborhood_latitude = df.loc[15, 'Latitude']
neighborhood_longitude = df.loc[15, 'Longitude']

neighborhood_name = df.loc[15, 'Neighborhood'] 

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Ge Garden,Dongguan Street are 32.39340737, 119.40827.


## Explore surroundings and fit categories
### by using Foursquare

In [45]:
CLIENT_ID = 'IHZAS2LNSRCR3O4RFNQ2L0MWQG2J3KTKWZCFSKFOIIVQJV5T'
CLIENT_SECRET = 'PK3PTXPQVN5AJUXMG1O2M5KTQVKZJGGZ3G01NNIMGRMM3A1U'
VERSION = '20210104' 
LIMIT = 100 

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: IHZAS2LNSRCR3O4RFNQ2L0MWQG2J3KTKWZCFSKFOIIVQJV5T
CLIENT_SECRET:PK3PTXPQVN5AJUXMG1O2M5KTQVKZJGGZ3G01NNIMGRMM3A1U


In [91]:
LIMIT = 100 

radius = 10000 # set radius= 10k because of the huge range of Yangzhou city

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=IHZAS2LNSRCR3O4RFNQ2L0MWQG2J3KTKWZCFSKFOIIVQJV5T&client_secret=PK3PTXPQVN5AJUXMG1O2M5KTQVKZJGGZ3G01NNIMGRMM3A1U&v=20210104&ll=32.39340737,119.40827&radius=10000&limit=100'

In [32]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ff3140e3e30493c51e2e22a'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Yangzhou',
  'headerFullLocation': 'Yangzhou',
  'headerLocationGranularity': 'city',
  'totalResults': 15,
  'suggestedBounds': {'ne': {'lat': 32.48340746000009,
    'lng': 119.51465697151237},
   'sw': {'lat': 32.30340727999991, 'lng': 119.30188302848764}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4c9ae83b78ffa09309bd7075',
       'name': 'Slender West Lake (瘦西湖)',
       'location': {'address': '28 Dahong Bridge Rd',
        'lat': 32.40795754204398,
        'lng': 119.41688656175528,
        'labeledLatLngs': [{'label': 'display',
          'lat': 32.407957542

In [33]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [127]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) 


filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]


nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)


nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  app.launch_new_instance()


Unnamed: 0,name,categories,lat,lng
0,Dongguan Street (东关街),Historic Site,32.39977,119.44115
1,个园,Garden,32.401714,119.438198
2,Slender West Lake (瘦西湖),Lake,32.407958,119.416887
3,Yechun Teahouse (冶春茶社),Dim Sum Restaurant,32.402896,119.428614
4,Shangri-la Hotel Yangzhou (扬州香格里拉大酒店),Hotel,32.392654,119.363084



Now we get the venture list and it is needed to be more exactly located for each item


In [34]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) 


filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]


nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)


nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

  app.launch_new_instance()


Unnamed: 0,name,categories,lat,lng
0,Slender West Lake (瘦西湖),Lake,32.407958,119.416887
1,Dongguan Street (东关街),Historic Site,32.39977,119.44115
2,个园,Garden,32.401714,119.438198
3,Shangri-la Hotel Yangzhou (扬州香格里拉大酒店),Hotel,32.392654,119.363084
4,Yechun Teahouse (冶春茶社),Dim Sum Restaurant,32.402896,119.428614


In [35]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

15 venues were returned by Foursquare.


In [59]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
       
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [60]:
YZ_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Yangtzu Garden
Yangtzu Garden
Yangtzu Garden,Yizheng market
Yizheng market
Renmen garden,Yizheng market
Yangzhou Uiversity
Shouxi Lake
Yangzhou University
Dongguan Street
Guangling campus
Shouxi Lake,Yangzhou University
Guangling campus
Shunda Square
Dongguan Street
Dongguan Street
Ge Garden,Dongguan Street
Yangzhou Industry Campus
Golden Landscape
Golden Landscape
Yangzhou University
Yangzhou University
Tongda College
Tongda College
UFO City Garden
Mingyue Lake,UFO City Garden
Mingyue Lake, Jinghua Mall
Jinghua Mall,Yangzhou Technology College
Sea Park
Sea Park
Highroad Station
Highroad Station
World Garden
Zhuyu Bay,Zhuyu Zoo


In [92]:
print(YZ_venues.shape)
YZ_venues

(39, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Yizheng market,32.2826,119.1471,浦东商场,32.283544,119.150932,Grocery Store
1,Yangzhou Uiversity,32.3825,119.4102,宝带农贸市场,32.383498,119.412698,Farmers Market
2,Yangzhou Uiversity,32.3825,119.4102,百岁鱼,32.381779,119.40598,Chinese Restaurant
3,Shouxi Lake,32.3945,119.4368,Starbucks (星巴克),32.393923,119.433171,Coffee Shop
4,Shouxi Lake,32.3945,119.4368,Starbucks (星巴克),32.39613,119.432859,Coffee Shop
5,Shouxi Lake,32.3945,119.4368,WuTing Teahouse - 五亭迎春茶社,32.393282,119.435813,Jiangsu Restaurant
6,Shouxi Lake,32.3945,119.4368,Yechun Teahouse (冶春茶社),32.396262,119.437189,Dim Sum Restaurant
7,Yangzhou University,32.3982,119.4286,McDonald's (麦当劳),32.397288,119.428544,Fast Food Restaurant
8,Yangzhou University,32.3982,119.4286,Starbucks (星巴克),32.39613,119.432859,Coffee Shop
9,Yangzhou University,32.3982,119.4286,文昌阁,32.396305,119.428296,Historic Site


In [63]:
YZ_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Dongguan Street,2,2,2,2,2,2
"Ge Garden,Dongguan Street",5,5,5,5,5,5
Sea Park,4,4,4,4,4,4
Shouxi Lake,4,4,4,4,4,4
"Shouxi Lake,Yangzhou University",2,2,2,2,2,2
Shunda Square,1,1,1,1,1,1
Yangzhou Industry Campus,6,6,6,6,6,6
Yangzhou Uiversity,2,2,2,2,2,2
Yangzhou University,12,12,12,12,12,12
Yizheng market,1,1,1,1,1,1


In [64]:
print('There are {} uniques categories.'.format(len(YZ_venues['Venue Category'].unique())))

There are 20 uniques categories.


From this generated venues list, we could find that coffee shops now are more popular besides colleges and very proximity to Shuoxi lake in Guangling, which groups could be seen as obvious commercial circles for coffee shop, while tea is more prevalent around Dongguan Street.

In [65]:
YZ_onehot = pd.get_dummies(YZ_venues[['Venue Category']], prefix="", prefix_sep="")


YZ_onehot['Neighborhood'] = YZ_venues['Neighborhood'] 


fixed_columns = [YZ_onehot.columns[-1]] + list(YZ_onehot.columns[:-1])
YZ_onehot = YZ_onehot[fixed_columns]

YZ_onehot.head()

Unnamed: 0,Neighborhood,Bookstore,Buffet,Café,Chinese Restaurant,Coffee Shop,Department Store,Dim Sum Restaurant,Farmers Market,Fast Food Restaurant,...,Historic Site,Hotel,Hotpot Restaurant,Huaiyang Restaurant,Ice Cream Shop,Jiangsu Restaurant,Motel,Movie Theater,Multiplex,Park
0,Yizheng market,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Yangzhou Uiversity,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0
2,Yangzhou Uiversity,0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Shouxi Lake,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Shouxi Lake,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [67]:
YZ_onehot.shape

(39, 21)

In [68]:
YZ_grouped = YZ_onehot.groupby('Neighborhood').mean().reset_index()
YZ_grouped

Unnamed: 0,Neighborhood,Bookstore,Buffet,Café,Chinese Restaurant,Coffee Shop,Department Store,Dim Sum Restaurant,Farmers Market,Fast Food Restaurant,...,Historic Site,Hotel,Hotpot Restaurant,Huaiyang Restaurant,Ice Cream Shop,Jiangsu Restaurant,Motel,Movie Theater,Multiplex,Park
0,Dongguan Street,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Ge Garden,Dongguan Street",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.2,0.2,0.2,0.0,0.2,0.2,0.0,0.0,0.0
2,Sea Park,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.25,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0
3,Shouxi Lake,0.0,0.0,0.0,0.0,0.5,0.0,0.25,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0
4,"Shouxi Lake,Yangzhou University",0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5
5,Shunda Square,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0
6,Yangzhou Industry Campus,0.0,0.166667,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,...,0.0,0.166667,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0
7,Yangzhou Uiversity,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.5,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Yangzhou University,0.083333,0.0,0.0,0.083333,0.083333,0.166667,0.0,0.0,0.166667,...,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0
9,Yizheng market,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [70]:
YZ_grouped.shape

(10, 21)

In [71]:
num_top_venues = 5

for hood in YZ_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = YZ_grouped[YZ_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Dongguan Street----
           venue  freq
0          Hotel   1.0
1      Bookstore   0.0
2         Buffet   0.0
3      Multiplex   0.0
4  Movie Theater   0.0


----Ge Garden,Dongguan Street----
                 venue  freq
0                Motel   0.2
1   Jiangsu Restaurant   0.2
2  Huaiyang Restaurant   0.2
3    Hotpot Restaurant   0.2
4                Hotel   0.2


----Sea Park----
                  venue  freq
0                  Café  0.50
1             Multiplex  0.25
2  Fast Food Restaurant  0.25
3             Bookstore  0.00
4                 Hotel  0.00


----Shouxi Lake----
                venue  freq
0         Coffee Shop  0.50
1  Dim Sum Restaurant  0.25
2  Jiangsu Restaurant  0.25
3           Bookstore  0.00
4               Hotel  0.00


----Shouxi Lake,Yangzhou University----
           venue  freq
0           Park   0.5
1    Coffee Shop   0.5
2          Hotel   0.0
3      Multiplex   0.0
4  Movie Theater   0.0


----Shunda Square----
                venue  freq
0  Jian

In [72]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [76]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']


columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))


neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = YZ_grouped['Neighborhood']

for ind in np.arange(YZ_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(YZ_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Dongguan Street,Hotel,Park,Fast Food Restaurant,Buffet,Café,Chinese Restaurant,Coffee Shop,Department Store,Dim Sum Restaurant,Farmers Market
1,"Ge Garden,Dongguan Street",Motel,Jiangsu Restaurant,Huaiyang Restaurant,Hotpot Restaurant,Hotel,Park,Dim Sum Restaurant,Buffet,Café,Chinese Restaurant
2,Sea Park,Café,Multiplex,Fast Food Restaurant,Park,Buffet,Chinese Restaurant,Coffee Shop,Department Store,Dim Sum Restaurant,Farmers Market
3,Shouxi Lake,Coffee Shop,Jiangsu Restaurant,Dim Sum Restaurant,Park,Fast Food Restaurant,Buffet,Café,Chinese Restaurant,Department Store,Farmers Market
4,"Shouxi Lake,Yangzhou University",Park,Coffee Shop,Multiplex,Buffet,Café,Chinese Restaurant,Department Store,Dim Sum Restaurant,Farmers Market,Fast Food Restaurant
5,Shunda Square,Jiangsu Restaurant,Park,Fast Food Restaurant,Buffet,Café,Chinese Restaurant,Coffee Shop,Department Store,Dim Sum Restaurant,Farmers Market
6,Yangzhou Industry Campus,Buffet,Jiangsu Restaurant,Ice Cream Shop,Chinese Restaurant,Coffee Shop,Hotel,Park,Farmers Market,Café,Department Store
7,Yangzhou Uiversity,Chinese Restaurant,Farmers Market,Park,Multiplex,Buffet,Café,Coffee Shop,Department Store,Dim Sum Restaurant,Fast Food Restaurant
8,Yangzhou University,Department Store,Hotel,Historic Site,Fast Food Restaurant,Bookstore,Movie Theater,Chinese Restaurant,Coffee Shop,Farmers Market,Buffet
9,Yizheng market,Grocery Store,Multiplex,Buffet,Café,Chinese Restaurant,Coffee Shop,Department Store,Dim Sum Restaurant,Farmers Market,Fast Food Restaurant


## Clustering Neighborhood

In [78]:
kclusters = 5

YZ_grouped_clustering = YZ_grouped.drop('Neighborhood', 1)


kmeans = KMeans(init = "k-means++", n_clusters = kclusters, n_init = 10).fit(YZ_grouped_clustering)

kmeans.labels_

array([2, 1, 1, 4, 4, 0, 1, 1, 1, 3], dtype=int32)

In [81]:

YZ_merged = df
YZ_merged = YZ_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood', how = 'inner')
YZ_merged

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,211900,Yizheng,Yizheng market,32.2826,119.1471,3,Grocery Store,Multiplex,Buffet,Café,Chinese Restaurant,Coffee Shop,Department Store,Dim Sum Restaurant,Farmers Market,Fast Food Restaurant
5,225000,Yangzhou,Yangzhou Uiversity,32.3825,119.4102,1,Chinese Restaurant,Farmers Market,Park,Multiplex,Buffet,Café,Coffee Shop,Department Store,Dim Sum Restaurant,Fast Food Restaurant
6,225001,Guangling,Shouxi Lake,32.3945,119.4368,4,Coffee Shop,Jiangsu Restaurant,Dim Sum Restaurant,Park,Fast Food Restaurant,Buffet,Café,Chinese Restaurant,Department Store,Farmers Market
7,225002,Guangling,Yangzhou University,32.3982,119.4286,1,Department Store,Hotel,Historic Site,Fast Food Restaurant,Bookstore,Movie Theater,Chinese Restaurant,Coffee Shop,Farmers Market,Buffet
19,225104,Hanjiang,Yangzhou University,32.3418,119.5118,1,Department Store,Hotel,Historic Site,Fast Food Restaurant,Bookstore,Movie Theater,Chinese Restaurant,Coffee Shop,Farmers Market,Buffet
20,225111,Hanjiang,Yangzhou University,32.3632,119.5405,1,Department Store,Hotel,Historic Site,Fast Food Restaurant,Bookstore,Movie Theater,Chinese Restaurant,Coffee Shop,Farmers Market,Buffet
8,225003,Guangling,Dongguan Street,32.3897,119.4619,2,Hotel,Park,Fast Food Restaurant,Buffet,Café,Chinese Restaurant,Coffee Shop,Department Store,Dim Sum Restaurant,Farmers Market
13,225008,Weiyang,Dongguan Street,32.4359,119.4054,2,Hotel,Park,Fast Food Restaurant,Buffet,Café,Chinese Restaurant,Coffee Shop,Department Store,Dim Sum Restaurant,Farmers Market
14,225009,Guangling,Dongguan Street,32.3849,119.4264,2,Hotel,Park,Fast Food Restaurant,Buffet,Café,Chinese Restaurant,Coffee Shop,Department Store,Dim Sum Restaurant,Farmers Market
10,225005,Guangling,"Shouxi Lake,Yangzhou University",32.4235,119.4206,4,Park,Coffee Shop,Multiplex,Buffet,Café,Chinese Restaurant,Department Store,Dim Sum Restaurant,Farmers Market,Fast Food Restaurant


In [83]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(YZ_merged['Latitude'], YZ_merged['Longitude'], YZ_merged['Neighborhood'], YZ_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    print(cluster)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

3
1
4
1
1
1
2
2
2
4
0
1
1
1
1


In [84]:
YZ_merged.loc[YZ_merged['Cluster Labels'] == 0, YZ_merged.columns[[1] + list(range(5, YZ_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Guangling,0,Jiangsu Restaurant,Park,Fast Food Restaurant,Buffet,Café,Chinese Restaurant,Coffee Shop,Department Store,Dim Sum Restaurant,Farmers Market


In [85]:
YZ_merged.loc[YZ_merged['Cluster Labels'] == 1, YZ_merged.columns[[1] + list(range(5, YZ_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Yangzhou,1,Chinese Restaurant,Farmers Market,Park,Multiplex,Buffet,Café,Coffee Shop,Department Store,Dim Sum Restaurant,Fast Food Restaurant
7,Guangling,1,Department Store,Hotel,Historic Site,Fast Food Restaurant,Bookstore,Movie Theater,Chinese Restaurant,Coffee Shop,Farmers Market,Buffet
19,Hanjiang,1,Department Store,Hotel,Historic Site,Fast Food Restaurant,Bookstore,Movie Theater,Chinese Restaurant,Coffee Shop,Farmers Market,Buffet
20,Hanjiang,1,Department Store,Hotel,Historic Site,Fast Food Restaurant,Bookstore,Movie Theater,Chinese Restaurant,Coffee Shop,Farmers Market,Buffet
15,Weiyang,1,Motel,Jiangsu Restaurant,Huaiyang Restaurant,Hotpot Restaurant,Hotel,Park,Dim Sum Restaurant,Buffet,Café,Chinese Restaurant
16,Hanjiang,1,Buffet,Jiangsu Restaurant,Ice Cream Shop,Chinese Restaurant,Coffee Shop,Hotel,Park,Farmers Market,Café,Department Store
27,Jiangdu,1,Café,Multiplex,Fast Food Restaurant,Park,Buffet,Chinese Restaurant,Coffee Shop,Department Store,Dim Sum Restaurant,Farmers Market
28,Jiangdu,1,Café,Multiplex,Fast Food Restaurant,Park,Buffet,Chinese Restaurant,Coffee Shop,Department Store,Dim Sum Restaurant,Farmers Market


In [86]:
YZ_merged.loc[YZ_merged['Cluster Labels'] == 2, YZ_merged.columns[[1] + list(range(5, YZ_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Guangling,2,Hotel,Park,Fast Food Restaurant,Buffet,Café,Chinese Restaurant,Coffee Shop,Department Store,Dim Sum Restaurant,Farmers Market
13,Weiyang,2,Hotel,Park,Fast Food Restaurant,Buffet,Café,Chinese Restaurant,Coffee Shop,Department Store,Dim Sum Restaurant,Farmers Market
14,Guangling,2,Hotel,Park,Fast Food Restaurant,Buffet,Café,Chinese Restaurant,Coffee Shop,Department Store,Dim Sum Restaurant,Farmers Market


In [87]:
YZ_merged.loc[YZ_merged['Cluster Labels'] == 3, YZ_merged.columns[[1] + list(range(5, YZ_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Yizheng,3,Grocery Store,Multiplex,Buffet,Café,Chinese Restaurant,Coffee Shop,Department Store,Dim Sum Restaurant,Farmers Market,Fast Food Restaurant


In [88]:
YZ_merged.loc[YZ_merged['Cluster Labels'] == 4, YZ_merged.columns[[1] + list(range(5, YZ_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Guangling,4,Coffee Shop,Jiangsu Restaurant,Dim Sum Restaurant,Park,Fast Food Restaurant,Buffet,Café,Chinese Restaurant,Department Store,Farmers Market
10,Guangling,4,Park,Coffee Shop,Multiplex,Buffet,Café,Chinese Restaurant,Department Store,Dim Sum Restaurant,Farmers Market,Fast Food Restaurant


## Discussion and results