# Capstone Project - The Battle of the Neighborhoods

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

The goal of this project is to recommend a location for someone looking to open a restaurant in New York City.
We will work on answering the following questions:

1. Where are the popular locations for running a restaurant business? Are there any geographical patterns in these popular restaurants?
    To find the hottest spot, we search restaurants from Foursquare location data. Then cluster these restaurants and locate the center of each cluster. 

2. How many times do these restaurants be mentioned by users of Foursquare?
    To confirm whether or not the hottest spot is the most popular one, we get the tips number of these restaurants from Foursquare to see the correlation between location and popularity.

3. Are there any patterns in the locations of these popular restaurants?
    Finally, we cluster restaurants by their nearby venues data, then discuss the characteristics of each cluster.


## Data <a name="data"></a>

1. To answer the first question, we use the latitude and longitude data of the restaurants from Foursquare. Cluster the restaurants based on their locations. Then get the center of each cluster.

2. To see whether or not the restaurant closer to the center of each cluster is more popular, we use the tips number of each restaurant from Foursquare. Also, divide the restaurants into different levels of the tips number.

3. To look for any subtle patterns in the locations of the restaurants, we use the nearby venues' data to cluster the restaurants. For example, use the nearby venue's category as the feature for clustering.


In [1]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation


# !pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans
from sklearn.cluster import DBSCAN
# import sklearn.utils
from sklearn.preprocessing import StandardScaler

# ! pip install folium==0.5.0
import folium # plotting library


In [2]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [3]:
# Import New York neighborhoods data
newyork_data = pd.read_csv('new_york_data.csv')
neighborhoods = newyork_data.iloc[:, 1:]
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [None]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

In [3]:
CLIENT_ID = 'RVUKUV5BWZM51TRGCVUVKXRD1TOOB0CO33XY4NOGPRVQL3EK'
CLIENT_SECRET = 'MEK5DPUC0VCVPALMF0BAO4XUEAQZADXZSQPNRAPORHH1HNJI'
VERSION = '20210722' 
LIMIT = 100 


In [None]:
neighborhood_latitude = neighborhoods.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = neighborhoods.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = neighborhoods.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

In [None]:
radius = 500
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neighborhood_latitude, neighborhood_longitude, VERSION, radius, LIMIT)
url

In [None]:
results = requests.get(url).json()
results

In [4]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [None]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

In [None]:
search_query = 'restaurant'
radius = 500

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neighborhood_latitude, neighborhood_longitude, VERSION, search_query, radius, LIMIT)

results = requests.get(url).json()

# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)

# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

In [None]:
venues_map = folium.Map(location=[neighborhood_latitude, neighborhood_longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Conrad Hotel
folium.CircleMarker(
    [neighborhood_latitude, neighborhood_longitude],
    radius=10,
    color='red',
    popup= neighborhood_name,
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

### Explore Neighborhoods in Manhattan


In [5]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data

NameError: name 'neighborhoods' is not defined

In [6]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


In [7]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

In [7]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name'],
            v['venue']['id']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category',
                  'Venue ID']
    
    return(nearby_venues)

In [None]:
# manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'], 
#                                     latitudes=manhattan_data['Latitude'], 
#                                     longitudes=manhattan_data['Longitude']
#                                     )

In [8]:

# # manhattan_venues.to_csv('manhattan_venues.csv')

manhattan_venues = pd.read_csv('manhattan_venues.csv').iloc[:, 1:]
print(manhattan_venues.shape)
manhattan_venues.head()

(3267, 8)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue ID
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place,4b4429abf964a52037f225e3
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio,4baf59e8f964a520a6f93be3
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner,4b79cc46f964a520c5122fe3
3,Marble Hill,40.876551,-73.91066,Dunkin',40.877136,-73.906666,Donut Shop,4b5357adf964a520319827e3
4,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop,55f81cd2498ee903149fcc64


In [9]:
# Select all duplicate rows based on one column
duplicateRows = manhattan_venues.duplicated(['Venue ID'])
print("Duplicate Rows based on 'Venue ID' column are:", manhattan_venues[duplicateRows], sep='\n')

Duplicate Rows based on 'Venue ID' column are:
      Neighborhood  Neighborhood Latitude  Neighborhood Longitude  \
688     Lenox Hill              40.768113              -73.958860   
713     Lenox Hill              40.768113              -73.958860   
729     Lenox Hill              40.768113              -73.958860   
734     Lenox Hill              40.768113              -73.958860   
756     Lenox Hill              40.768113              -73.958860   
...            ...                    ...                     ...   
3239  Hudson Yards              40.756658              -74.000111   
3240  Hudson Yards              40.756658              -74.000111   
3245  Hudson Yards              40.756658              -74.000111   
3246  Hudson Yards              40.756658              -74.000111   
3252  Hudson Yards              40.756658              -74.000111   

                                             Venue  Venue Latitude  \
688                                   Paper Source    

In [10]:
manhattan_venues.drop_duplicates('Venue ID', inplace=True, ignore_index=True)
manhattan_venues.shape

(3091, 8)

In [11]:
manhattan_restaurants = manhattan_venues[manhattan_venues['Venue Category'].str.contains('Restaurant')]
manhattan_restaurants.shape

(873, 8)

In [13]:
# create map of Manhattan using latitude and longitude values
map_manhattan_restaurants = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(manhattan_restaurants['Venue Latitude'], manhattan_restaurants['Venue Longitude'], manhattan_restaurants['Venue']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan_restaurants)  
    
map_manhattan_restaurants

In [15]:
# ratings = []
# tips_counts = []
# venue_ids_400 = manhattan_restaurants['Venue ID'].head(400)

# for venue_id in venue_ids_400:
# 	url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)

# 	result = requests.get(url).json()
# 	try:
# 		ratings.append(result['response']['venue']['rating'])
# 	except:
# 		ratings.append(np.nan)

# 	try:
# 		tips_counts.append(result['response']['venue']['tips']['count'])
# 	except:
# 		tips_counts.append(np.nan)

# print(ratings)
# print(tips_counts)


[7.1, nan, 8.7, 9.3, 8.4, 9.1, 8.6, 9.0, 8.7, 8.6, 8.1, 8.0, 8.6, 9.4, 8.3, 9.2, 8.2, 8.2, 8.0, 8.7, 7.9, 8.7, 8.8, 8.6, 8.1, 8.2, 8.3, 8.6, 8.4, 8.2, 8.4, 8.9, 8.3, 8.6, 9.0, 8.5, 7.8, 9.2, 8.2, 7.7, 8.9, 8.5, 7.8, 8.6, 7.9, 8.3, 7.9, 7.9, 8.2, 7.8, 7.5, 7.4, 7.0, 7.0, 7.3, 7.6, 7.4, 7.1, 6.8, 7.2, nan, 8.3, 8.1, 7.7, 8.3, 7.4, 7.9, 7.4, 8.0, 7.3, 7.1, 7.3, 7.3, 7.0, 7.2, 6.8, 7.0, 6.2, nan, 9.1, 8.8, 8.2, 8.1, 8.0, 8.1, 7.7, 7.7, 7.9, 8.3, 7.7, 7.6, 7.5, 7.2, 7.1, 7.0, 6.9, 5.9, 8.3, 7.6, 9.0, 8.0, 7.9, 7.6, 7.8, 7.8, 8.1, 7.6, 8.1, 7.3, 7.2, 7.5, 7.2, 6.6, 6.4, nan, 8.5, 8.6, 8.6, 8.1, 8.2, 7.8, 9.1, 7.6, 7.3, 8.0, 7.4, 7.1, 7.7, 7.1, 7.2, nan, 8.7, 8.0, 8.4, 8.1, 8.3, 7.8, 8.8, 8.3, 8.4, 7.8, 7.4, 7.5, 8.0, 7.8, 8.8, 8.5, 8.7, 8.2, 8.0, 8.4, 9.2, 8.7, 8.5, 8.3, 8.0, 8.2, 7.9, 8.0, 8.7, 7.9, 8.1, 7.6, 7.6, 7.9, 8.3, 7.8, 7.9, 7.8, 8.3, 8.5, 8.7, 9.0, 8.3, 8.9, 8.5, 8.2, 8.4, 7.8, 8.1, 8.8, 9.1, 8.4, 8.5, 7.3, 8.7, 8.3, 8.0, 8.1, 8.1, 7.3, 7.0, 7.5, 7.4, 7.2, 7.5, 7.5, 7.3, 7.2, 9.1,

In [19]:
# rating_tipsCount_400 = pd.DataFrame(zip(ratings, tips_counts), columns=['Rating', 'Tips Count'])
# rating_tipsCount_400.to_csv('rating_tipsCount_400.csv')
# rating_tipsCount_400.head()

Unnamed: 0,Rating,Tips Count
0,7.1,19
1,,0
2,8.7,180
3,9.3,205
4,8.4,99


In [15]:
# ratings_tail = []
# tips_counts_tail = []
# # venue_ids_tail = manhattan_restaurants['Venue ID'].tail()
# venue_ids_tail = manhattan_restaurants['Venue ID'].tail(len(manhattan_restaurants['Venue ID']) - 400)

# for venue_id in venue_ids_tail:
# 	url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)

# 	result = requests.get(url).json()
# 	try:
# 		ratings_tail.append(result['response']['venue']['rating'])
# 	except:
# 		ratings_tail.append(np.nan)

# 	try:
# 		tips_counts_tail.append(result['response']['venue']['tips']['count'])
# 	except:
# 		tips_counts_tail.append(np.nan)

# print(ratings_tail)
# print(tips_counts_tail)


[9.0, 8.7, 9.0, 8.2, 8.3, 8.5, 9.0, 8.2, 9.1, 8.6, 8.1, 8.7, 8.8, 9.0, 8.7, 8.4, 8.9, 8.8, 8.1, 8.2, 8.4, 9.0, 9.3, 8.1, 9.3, 9.2, 9.2, 9.1, 8.9, 9.0, 8.7, 9.1, 9.2, 8.6, 8.6, 9.0, 8.9, 8.5, 8.3, 8.3, 8.3, 8.5, 8.8, 8.5, 8.7, 8.6, 9.0, 8.4, 8.2, 8.1, 9.1, 8.8, 8.5, 8.3, 7.9, 8.8, 9.4, 7.9, 8.1, 8.2, 8.4, 9.2, 9.0, 8.8, 8.6, 9.1, 8.2, 8.3, 9.1, 7.9, 7.9, 8.1, 8.0, 7.8, 7.4, nan, 5.6, 9.2, 8.7, 8.5, 8.8, 8.7, 8.6, 8.0, 8.0, 9.1, 8.1, 8.5, 9.0, 9.0, 8.9, 8.8, 8.6, 8.5, 7.9, 8.4, 8.0, 7.8, 7.5, 7.7, 7.8, 7.5, 9.4, 9.0, 9.1, 8.5, 9.2, 8.9, 8.9, 8.9, 9.2, 9.1, 8.7, 8.6, 8.7, 8.8, 9.1, 8.1, 8.2, 8.8, 9.3, 8.7, 9.0, 8.9, 8.5, 8.7, 8.6, 8.6, 8.5, 8.7, 8.9, 9.4, 9.1, 9.1, 9.2, 8.9, 8.8, 8.8, 8.8, 8.6, 9.2, 8.5, 8.8, 9.3, 8.7, 8.1, 8.4, 8.5, 8.4, 8.8, 9.3, 8.8, 8.5, 7.8, 8.6, 8.4, 9.3, 8.3, 9.1, 8.6, 9.3, 7.7, 8.9, 9.2, 7.5, 8.6, 8.7, 8.6, 8.1, 8.1, 8.0, 8.3, 8.5, 8.1, 7.9, 8.4, 7.9, 7.9, 8.0, 8.0, 7.4, 7.7, 7.6, 7.5, 8.2, 7.7, 8.0, 7.7, 8.1, 7.6, 7.3, 6.5, 6.7, 6.1, 8.7, 9.0, 7.8, 8.1, 7.7, 8.6,

In [16]:
# rating_tipsCount_tail = pd.DataFrame(zip(ratings_tail, tips_counts_tail), columns=['Rating', 'Tips Count'])
# rating_tipsCount_tail.to_csv('rating_tipsCount_tail.csv')
# rating_tipsCount_tail.head()

Unnamed: 0,Rating,Tips Count
0,9.0,2
1,8.7,188
2,9.0,395
3,8.2,28
4,8.3,39


In [21]:
# rating_tipsCount_400 = pd.read_csv('rating_tipsCount_400.csv')
# rating_tipsCount = pd.concat([rating_tipsCount_400, rating_tipsCount_tail], axis=0, ignore_index=True).iloc[:, 1:]
# rating_tipsCount.to_csv('rating_tipsCount.csv')
# rating_tipsCount

Unnamed: 0,Rating,Tips Count
0,7.1,19
1,,0
2,8.7,180
3,9.3,205
4,8.4,99
...,...,...
868,7.5,3
869,7.2,3
870,6.8,6
871,6.5,7


In [12]:
# manhattan_restaurants.reset_index(inplace=True, drop=True)
# manhattan_restaurants_merged = pd.concat([manhattan_restaurants, rating_tipsCount], axis=1)
# manhattan_restaurants_merged.to_csv('manhattan_restaurants_merged.csv')
manhattan_restaurants_merged = pd.read_csv('manhattan_restaurants_merged.csv').iloc[:, 1:]
manhattan_restaurants_merged

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue ID,Cluster Label,Rating,Tips Count
0,Marble Hill,40.876551,-73.910660,Land & Sea Restaurant,40.877885,-73.905873,Seafood Restaurant,4b9c9c6af964a520b27236e3,-1,7.1,19
1,Marble Hill,40.876551,-73.910660,Grill 26 at TCR,40.878802,-73.915672,American Restaurant,5012c967e889cf0567e9e2d4,-1,,0
2,Chinatown,40.715618,-73.994279,Spicy Village,40.717010,-73.993530,Chinese Restaurant,4db3374590a0843f295fb69b,0,8.7,180
3,Chinatown,40.715618,-73.994279,Kiki's,40.714476,-73.992036,Greek Restaurant,5521c2ff498ebe2368634187,0,9.3,205
4,Chinatown,40.715618,-73.994279,Wah Fung Number 1 Fast Food 華豐快餐店,40.717278,-73.994177,Chinese Restaurant,4a96bf8ff964a520ce2620e3,0,8.4,99
...,...,...,...,...,...,...,...,...,...,...,...
868,Hudson Yards,40.756658,-74.000111,Via Trenta,40.753004,-74.002898,Italian Restaurant,57e55e46498e04b0dc14dbb0,-1,7.5,3
869,Hudson Yards,40.756658,-74.000111,Nitti’s,40.756726,-73.994175,Italian Restaurant,5bd10c0ca35dce002cb16e6c,-1,7.2,3
870,Hudson Yards,40.756658,-74.000111,Treadwell,40.759964,-73.996284,Restaurant,5bb17b9531ac6c0039f150cf,-1,6.8,6
871,Hudson Yards,40.756658,-74.000111,EDEN Local,40.759909,-73.996301,Restaurant,5a0264e01ffe977e0fea5da3,-1,6.5,7


## Methodology <a name="methodology"></a>

## Analysis <a name="analysis"></a>

### Cluster Restaurants

In [15]:
# set number of clusters
kclusters = 20

restaurants_clustering = manhattan_restaurants_merged[['Venue Latitude', 'Venue Longitude']]

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(restaurants_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([13, 13,  4,  4,  4,  4,  4,  4,  4,  4], dtype=int32)

In [16]:
centers = kmeans.cluster_centers_
print(centers[0:5])

centers_lat = centers[:, 0]
centers_lon = centers[:, 1]
print(centers_lat)
print(centers_lon)

manhattan_restaurants_merged['Cluster Label'] = kmeans.labels_
manhattan_restaurants_merged.head()

[[ 40.75798635 -73.99511329]
 [ 40.8139778  -73.95867519]
 [ 40.7393717  -73.98747078]
 [ 40.77820375 -73.95238728]
 [ 40.71801906 -73.99483248]]
[40.75798635 40.8139778  40.7393717  40.77820375 40.71801906 40.85167019
 40.74889509 40.78317683 40.71910685 40.7681     40.7918972  40.74270368
 40.73023762 40.86750562 40.80085409 40.75105165 40.70976915 40.82020639
 40.75761532 40.72422591]
[-73.99511329 -73.95867519 -73.98747078 -73.95238728 -73.99483248
 -73.93681915 -73.97457042 -73.97724838 -74.0078688  -73.95833415
 -73.9441381  -74.00380171 -74.00285093 -73.91970163 -73.96576466
 -73.98411987 -74.01065808 -73.94690834 -73.96633925 -73.98544014]


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue ID,Cluster Label,Rating,Tips Count
0,Marble Hill,40.876551,-73.91066,Land & Sea Restaurant,40.877885,-73.905873,Seafood Restaurant,4b9c9c6af964a520b27236e3,13,7.1,19
1,Marble Hill,40.876551,-73.91066,Grill 26 at TCR,40.878802,-73.915672,American Restaurant,5012c967e889cf0567e9e2d4,13,,0
2,Chinatown,40.715618,-73.994279,Spicy Village,40.71701,-73.99353,Chinese Restaurant,4db3374590a0843f295fb69b,4,8.7,180
3,Chinatown,40.715618,-73.994279,Kiki's,40.714476,-73.992036,Greek Restaurant,5521c2ff498ebe2368634187,4,9.3,205
4,Chinatown,40.715618,-73.994279,Wah Fung Number 1 Fast Food 華豐快餐店,40.717278,-73.994177,Chinese Restaurant,4a96bf8ff964a520ce2620e3,4,8.4,99


In [47]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
for lat, lon, lab, cluster in zip(manhattan_restaurants_merged['Venue Latitude'], manhattan_restaurants_merged['Venue Longitude'], manhattan_restaurants_merged['Venue'], manhattan_restaurants_merged['Cluster Label']):
    label = folium.Popup(lab, parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

for lat, lon, cluster in zip(centers_lat, centers_lon, x):
    label = folium.Popup('Cluster '+str(cluster), parse_html=True)
    folium.Marker(
        [lat, lon],
        popup=label).add_to(map_clusters)
       
map_clusters

In [17]:
# Compute DBSCAN
clustering_transformed = StandardScaler().fit_transform(restaurants_clustering)
db = DBSCAN(eps=0.1, min_samples=20).fit(clustering_transformed)

manhattan_restaurants_merged['Cluster Label'] = db.labels_
n_clusters = len(manhattan_restaurants_merged['Cluster Label'].unique())
print('The number of clusters is ', n_clusters)
manhattan_restaurants_merged.head()

The number of clusters is  13


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue ID,Cluster Label,Rating,Tips Count
0,Marble Hill,40.876551,-73.91066,Land & Sea Restaurant,40.877885,-73.905873,Seafood Restaurant,4b9c9c6af964a520b27236e3,-1,7.1,19
1,Marble Hill,40.876551,-73.91066,Grill 26 at TCR,40.878802,-73.915672,American Restaurant,5012c967e889cf0567e9e2d4,-1,,0
2,Chinatown,40.715618,-73.994279,Spicy Village,40.71701,-73.99353,Chinese Restaurant,4db3374590a0843f295fb69b,0,8.7,180
3,Chinatown,40.715618,-73.994279,Kiki's,40.714476,-73.992036,Greek Restaurant,5521c2ff498ebe2368634187,0,9.3,205
4,Chinatown,40.715618,-73.994279,Wah Fung Number 1 Fast Food 華豐快餐店,40.717278,-73.994177,Chinese Restaurant,4a96bf8ff964a520ce2620e3,0,8.4,99


In [17]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(n_clusters-1)
ys = [i + x + (i*x)**2 for i in range(n_clusters-1)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array] + ['#a9a9a9']

# add markers to the map
for lat, lon, lab, cluster in zip(manhattan_restaurants_merged['Venue Latitude'], manhattan_restaurants_merged['Venue Longitude'], manhattan_restaurants_merged['Cluster Label'], manhattan_restaurants_merged['Cluster Label']):
    label = folium.Popup('Cluster '+str(lab), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.7).add_to(map_clusters)

# for lat, lon, cluster in zip(centers_lat, centers_lon, x):
#     label = folium.Popup('Cluster '+str(cluster), parse_html=True)
#     folium.Marker(
#         [lat, lon],
#         popup=label).add_to(map_clusters)
       
map_clusters

In [18]:
cluster_mean = manhattan_restaurants_merged.groupby('Cluster Label').mean()[['Rating', 'Tips Count']]
cluster_mean

Unnamed: 0_level_0,Rating,Tips Count
Cluster Label,Unnamed: 1_level_1,Unnamed: 2_level_1
-1,8.126969,59.019417
0,8.684,115.52
1,8.114815,44.333333
2,8.291667,88.0
3,8.006061,47.393939
4,8.730435,125.173913
5,8.678049,128.829268
6,8.351613,100.193548
7,8.675,129.708333
8,8.671429,88.107143


In [55]:
manhattan_restaurants_merged.describe()

Unnamed: 0,Neighborhood Latitude,Neighborhood Longitude,Venue Latitude,Venue Longitude,Cluster Label,Rating,Tips Count
count,873.0,873.0,873.0,873.0,873.0,866.0,873.0
mean,40.756491,-73.978977,40.756554,-73.978969,1.613975,8.276905,77.356243
std,0.037122,0.022116,0.036942,0.022005,3.759531,0.605025,113.980122
min,40.707107,-74.016869,40.704514,-74.018322,-1.0,5.5,0.0
25%,40.726933,-73.997305,40.727157,-73.996284,-1.0,7.9,10.0
50%,40.74851,-73.981669,40.749192,-73.983505,-1.0,8.3,35.0
75%,40.77593,-73.963556,40.775688,-73.961793,4.0,8.7,98.0
max,40.876551,-73.91066,40.878802,-73.905873,11.0,9.5,1050.0


In [60]:
manhattan_restaurants_merged[['Rating', 'Tips Count']].dropna(axis=0).corr()


Unnamed: 0,Rating,Tips Count
Rating,1.0,0.396506
Tips Count,0.396506,1.0


In [61]:
cluster_mean.corr()

Unnamed: 0,Rating,Tips Count
Rating,1.0,0.87544
Tips Count,0.87544,1.0


In [19]:
hot_degree = []
for n in manhattan_restaurants_merged['Tips Count']:
	if n <= 10:
		hot_degree.append(1)
	elif n <= 35:
		hot_degree.append(2)
	elif n <= 98:
		hot_degree.append(3)
	else:
		hot_degree.append(4)

manhattan_restaurants_merged['Hot Degree'] = hot_degree
manhattan_restaurants_merged

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue ID,Cluster Label,Rating,Tips Count,Hot Degree
0,Marble Hill,40.876551,-73.910660,Land & Sea Restaurant,40.877885,-73.905873,Seafood Restaurant,4b9c9c6af964a520b27236e3,-1,7.1,19,2
1,Marble Hill,40.876551,-73.910660,Grill 26 at TCR,40.878802,-73.915672,American Restaurant,5012c967e889cf0567e9e2d4,-1,,0,1
2,Chinatown,40.715618,-73.994279,Spicy Village,40.717010,-73.993530,Chinese Restaurant,4db3374590a0843f295fb69b,0,8.7,180,4
3,Chinatown,40.715618,-73.994279,Kiki's,40.714476,-73.992036,Greek Restaurant,5521c2ff498ebe2368634187,0,9.3,205,4
4,Chinatown,40.715618,-73.994279,Wah Fung Number 1 Fast Food 華豐快餐店,40.717278,-73.994177,Chinese Restaurant,4a96bf8ff964a520ce2620e3,0,8.4,99,4
...,...,...,...,...,...,...,...,...,...,...,...,...
868,Hudson Yards,40.756658,-74.000111,Via Trenta,40.753004,-74.002898,Italian Restaurant,57e55e46498e04b0dc14dbb0,-1,7.5,3,1
869,Hudson Yards,40.756658,-74.000111,Nitti’s,40.756726,-73.994175,Italian Restaurant,5bd10c0ca35dce002cb16e6c,-1,7.2,3,1
870,Hudson Yards,40.756658,-74.000111,Treadwell,40.759964,-73.996284,Restaurant,5bb17b9531ac6c0039f150cf,-1,6.8,6,1
871,Hudson Yards,40.756658,-74.000111,EDEN Local,40.759909,-73.996301,Restaurant,5a0264e01ffe977e0fea5da3,-1,6.5,7,1


In [24]:
manhattan_restaurants_merged.groupby('Hot Degree').count()[['Venue']]

Unnamed: 0_level_0,Venue
Hot Degree,Unnamed: 1_level_1
1,219
2,224
3,213
4,217


In [20]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(4)
ys = [i + x + (i*x)**2 for i in range(4)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
for lat, lon, cluster in zip(manhattan_restaurants_merged['Venue Latitude'], manhattan_restaurants_merged['Venue Longitude'], manhattan_restaurants_merged['Hot Degree']):
    label = folium.Popup('Hot Degree '+str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

map_clusters

In [20]:
dense_zone = []
for cluster in manhattan_restaurants_merged['Cluster Label']:
	if cluster == -1:
		dense_zone.append(0)
	else:
		dense_zone.append(1)
	
manhattan_restaurants_merged['Dense Zone'] = dense_zone
manhattan_restaurants_merged

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Venue ID,Cluster Label,Rating,Tips Count,Hot Degree,Dense Zone
0,Marble Hill,40.876551,-73.910660,Land & Sea Restaurant,40.877885,-73.905873,Seafood Restaurant,4b9c9c6af964a520b27236e3,-1,7.1,19,2,0
1,Marble Hill,40.876551,-73.910660,Grill 26 at TCR,40.878802,-73.915672,American Restaurant,5012c967e889cf0567e9e2d4,-1,,0,1,0
2,Chinatown,40.715618,-73.994279,Spicy Village,40.717010,-73.993530,Chinese Restaurant,4db3374590a0843f295fb69b,0,8.7,180,4,1
3,Chinatown,40.715618,-73.994279,Kiki's,40.714476,-73.992036,Greek Restaurant,5521c2ff498ebe2368634187,0,9.3,205,4,1
4,Chinatown,40.715618,-73.994279,Wah Fung Number 1 Fast Food 華豐快餐店,40.717278,-73.994177,Chinese Restaurant,4a96bf8ff964a520ce2620e3,0,8.4,99,4,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...
868,Hudson Yards,40.756658,-74.000111,Via Trenta,40.753004,-74.002898,Italian Restaurant,57e55e46498e04b0dc14dbb0,-1,7.5,3,1,0
869,Hudson Yards,40.756658,-74.000111,Nitti’s,40.756726,-73.994175,Italian Restaurant,5bd10c0ca35dce002cb16e6c,-1,7.2,3,1,0
870,Hudson Yards,40.756658,-74.000111,Treadwell,40.759964,-73.996284,Restaurant,5bb17b9531ac6c0039f150cf,-1,6.8,6,1,0
871,Hudson Yards,40.756658,-74.000111,EDEN Local,40.759909,-73.996301,Restaurant,5a0264e01ffe977e0fea5da3,-1,6.5,7,1,0


In [28]:
manhattan_restaurants_merged.groupby('Dense Zone').mean()[['Rating', 'Tips Count']]

Unnamed: 0_level_0,Rating,Tips Count
Dense Zone,Unnamed: 1_level_1,Unnamed: 2_level_1
0,8.126969,59.019417
1,8.489665,103.734637


In [39]:
venue_counts = manhattan_restaurants_merged.groupby(['Dense Zone', 'Hot Degree']).count()[['Venue']]
venue_sum = manhattan_restaurants_merged.groupby(['Dense Zone']).count()[['Venue']]
percentage['Percentage'] = venue_counts / venue_sum
percentage

Unnamed: 0_level_0,Unnamed: 1_level_0,Percentage
Dense Zone,Hot Degree,Unnamed: 2_level_1
0,1,0.335922
0,2,0.283495
0,3,0.207767
0,4,0.172816
1,1,0.128492
1,2,0.217877
1,3,0.296089
1,4,0.357542


In [21]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(2)
ys = [i + x + (i*x)**2 for i in range(2)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
for lat, lon, cluster in zip(manhattan_restaurants_merged['Venue Latitude'], manhattan_restaurants_merged['Venue Longitude'], manhattan_restaurants_merged['Dense Zone']):
    label = folium.Popup('Dense Zone '+str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.7).add_to(map_clusters)

map_clusters

### Explore Nearby

In [44]:
# set number of clusters
kclusters = 12

dense_zone_restaurants = manhattan_restaurants_merged[manhattan_restaurants_merged['Dense Zone'] == 1]
restaurants_clustering = dense_zone_restaurants[['Venue Latitude', 'Venue Longitude']]

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(restaurants_clustering)

# # check cluster labels generated for each row in the dataframe
# kmeans.labels_[0:10] 

centers = kmeans.cluster_centers_
# print(centers[0:5])

centers_lat = centers[:, 0]
centers_lon = centers[:, 1]
centers_df = pd.DataFrame(zip(centers_lat, centers_lon), columns=['Latitude', 'Longitude'])
centers_df

Unnamed: 0,Latitude,Longitude
0,40.756047,-73.967811
1,40.717118,-73.992032
2,40.73922,-73.988749
3,40.785104,-73.977145
4,40.727539,-74.001429
5,40.7776,-73.951248
6,40.727326,-73.983917
7,40.748147,-73.97547
8,40.71713,-74.008848
9,40.748068,-73.986258


In [46]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array] + ['#a9a9a9']

# add markers to the map
for lat, lon, cluster in zip(manhattan_restaurants_merged['Venue Latitude'], manhattan_restaurants_merged['Venue Longitude'], manhattan_restaurants_merged['Cluster Label']):
    label = folium.Popup('Cluster '+str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster],
        fill=True,
        fill_color=rainbow[cluster],
        fill_opacity=0.7).add_to(map_clusters)

for lat, lon, center in zip(centers_df['Latitude'], centers_df['Longitude'], centers_df.index):
    label = folium.Popup('Center '+str(center), parse_html=True)
    folium.Marker(
        [lat, lon],
        popup=label).add_to(map_clusters)
       
map_clusters

In [52]:
centers_cluster = [ 10, 0, 11, 2, 4, 1, 5, 3, 6, 9, 7, 8]
centers_df['Cluster'] = centers_cluster
centers_fixed = centers_df.sort_values(['Cluster']).set_index('Cluster')
centers_fixed

Unnamed: 0_level_0,Latitude,Longitude
Cluster,Unnamed: 1_level_1,Unnamed: 2_level_1
0,40.717118,-73.992032
1,40.7776,-73.951248
2,40.785104,-73.977145
3,40.748147,-73.97547
4,40.727539,-74.001429
5,40.727326,-73.983917
6,40.71713,-74.008848
7,40.733589,-74.005268
8,40.721138,-73.987032
9,40.748068,-73.986258


In [54]:
def getNearbyVenues(centers, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for center, lat, lng in zip(centers, latitudes, longitudes):
        print('Center ' + str(center))
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            center, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Center', 
                  'Center Latitude', 
                  'Center Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [55]:
# dense_zone_venues = getNearbyVenues(centers=centers_fixed.index, 
#                                     latitudes=centers_fixed['Latitude'], 
#                                     longitudes=centers_fixed['Longitude']
#                                     )

Center 0
Center 1
Center 2
Center 3
Center 4
Center 5
Center 6
Center 7
Center 8
Center 9
Center 10
Center 11


In [59]:
# dense_zone_venues.to_csv('dense_zone_venues.csv')
nearby_venues = pd.read_csv('dense_zone_venues.csv').iloc[:, 1:]
print(nearby_venues.shape)
nearby_venues.head()

(1200, 7)


Unnamed: 0,Center,Center Latitude,Center Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,0,40.717118,-73.992032,Wayla,40.718291,-73.992584,Thai Restaurant
1,0,40.717118,-73.992032,MooShoes NYC,40.717861,-73.990377,Shoe Store
2,0,40.717118,-73.992032,Simple,40.718145,-73.991988,Asian Restaurant
3,0,40.717118,-73.992032,CW Pencil Enterprise,40.717583,-73.990662,Paper / Office Supplies Store
4,0,40.717118,-73.992032,Orchard Grocer,40.717847,-73.990358,Vegetarian / Vegan Restaurant


In [60]:
print('There are {} uniques categories.'.format(len(nearby_venues['Venue Category'].unique())))

There are 223 uniques categories.


### Analyze each dense zone

In [61]:
# one hot encoding
nearby_onehot = pd.get_dummies(nearby_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
nearby_onehot['Center'] = nearby_venues['Center'] 

# move neighborhood column to the first column
fixed_columns = [nearby_onehot.columns[-1]] + list(nearby_onehot.columns[:-1])
nearby_onehot = nearby_onehot[fixed_columns]

nearby_onehot.head()

Unnamed: 0,Center,Accessories Store,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Australian Restaurant,Austrian Restaurant,...,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Veterinarian,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0


### Group rows by center and by taking the mean of the frequency of occurrence of each category


In [62]:
nearby_grouped = nearby_onehot.groupby('Center').mean().reset_index()
nearby_grouped

Unnamed: 0,Center,Accessories Store,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Australian Restaurant,Austrian Restaurant,...,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Veterinarian,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,0,0.0,0.04,0.0,0.0,0.0,0.0,0.03,0.01,0.01,...,0.02,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.01
1,1,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,...,0.0,0.0,0.0,0.02,0.0,0.02,0.03,0.0,0.01,0.02
2,2,0.01,0.03,0.0,0.0,0.0,0.01,0.02,0.0,0.0,...,0.02,0.0,0.0,0.01,0.0,0.03,0.01,0.0,0.0,0.02
3,3,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,...,0.02,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.01
4,4,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01
5,5,0.0,0.02,0.0,0.01,0.01,0.01,0.0,0.0,0.0,...,0.04,0.0,0.0,0.03,0.0,0.04,0.0,0.0,0.0,0.0
6,6,0.0,0.06,0.01,0.0,0.02,0.0,0.01,0.01,0.0,...,0.01,0.0,0.0,0.0,0.01,0.02,0.02,0.01,0.0,0.01
7,7,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.01,...,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0
8,8,0.0,0.0,0.0,0.01,0.02,0.0,0.02,0.01,0.0,...,0.01,0.01,0.0,0.02,0.01,0.02,0.03,0.0,0.0,0.01
9,9,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0


### Create the new dataframe and display the top 10 venues for each center

In [63]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [65]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Center']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
nearby_venues_sorted = pd.DataFrame(columns=columns)
nearby_venues_sorted['Center'] = nearby_grouped['Center']

for ind in np.arange(nearby_grouped.shape[0]):
    nearby_venues_sorted.iloc[ind, 1:] = return_most_common_venues(nearby_grouped.iloc[ind, :], num_top_venues)

nearby_venues_sorted

Unnamed: 0,Center,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,0,Bakery,Chinese Restaurant,American Restaurant,Pizza Place,Mexican Restaurant,Cocktail Bar,Asian Restaurant,Bar,Dumpling Restaurant,Coffee Shop
1,1,Coffee Shop,Italian Restaurant,Bar,Ice Cream Shop,Mexican Restaurant,Wine Shop,Bagel Shop,Hot Dog Joint,Dessert Shop,Spa
2,2,Café,Italian Restaurant,Bakery,Coffee Shop,American Restaurant,Wine Bar,Ice Cream Shop,Mediterranean Restaurant,Bar,Pizza Place
3,3,Coffee Shop,Sushi Restaurant,Japanese Restaurant,Pizza Place,Burger Joint,Park,Taco Place,Hotel,Gourmet Shop,Gym
4,4,Italian Restaurant,French Restaurant,Dessert Shop,Cosmetics Shop,American Restaurant,Café,Indian Restaurant,Coffee Shop,Cocktail Bar,Sushi Restaurant
5,5,Bar,Wine Bar,Vegetarian / Vegan Restaurant,Korean Restaurant,Vietnamese Restaurant,Cocktail Bar,Coffee Shop,Sushi Restaurant,Pizza Place,Ice Cream Shop
6,6,American Restaurant,Spa,Italian Restaurant,French Restaurant,Coffee Shop,Café,Falafel Restaurant,Burger Joint,Playground,Gym / Fitness Center
7,7,Italian Restaurant,Coffee Shop,Cocktail Bar,New American Restaurant,French Restaurant,Jazz Club,Speakeasy,Ice Cream Shop,Seafood Restaurant,Chinese Restaurant
8,8,Pizza Place,Bakery,French Restaurant,Café,Rock Club,Wine Shop,Cocktail Bar,Italian Restaurant,Coffee Shop,Candy Store
9,9,Korean Restaurant,Hotel,American Restaurant,Japanese Restaurant,Dessert Shop,Italian Restaurant,Gym / Fitness Center,Hotel Bar,Coffee Shop,Bakery


## Results and Discussion <a name="results"></a>

## Conclusion <a name="conclusion"></a>