# Capstone Project

### Introduction/Business Problem
New York City is experiencing a Chinese food renaissance. There are a lot of spots in the city with the Chinese food available. 
In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening a Chinese restaurant in New York City, USA.
Since there are lots of restaurants in New York City we will try to detect locations that are not already crowded with restaurants. We are also interested in areas with no Chinese restaurants in nearby.
We will use our data science powers to compare different neighborhoods in NY. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.


### Data Section
To find the solution to the main problem the following data will be used:
1.	Source: https://cocl.us/new_york_dataset 

This New York City Neighborhood Names point file was created as a guide to New York City’s neighborhoods that appear on the web resource, “New York: A City of Neighborhoods.”  
2.	Source: Foursquare API

The data source will be used to find location of the Chinese restaurants in each neighborhood of NYC.
Above data sources will be needed to extract/generate the required information:
    location for the NYC center
    center addresses of candidate neighborhoods
    number of restaurants and their type and location in every neighborhood will be obtained using Foursquare API


In [60]:
# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 

import requests
import pandas as pd
import numpy as np

import json
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
import folium

In [61]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [65]:
neighborhoods_data = newyork_data['features']

In [66]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

In [67]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [179]:
neighborhoods.tail()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
301,Manhattan,Hudson Yards,40.756658,-74.000111
302,Queens,Hammels,40.587338,-73.80553
303,Queens,Bayswater,40.611322,-73.765968
304,Queens,Queensbridge,40.756091,-73.945631
305,Staten Island,Fox Hills,40.617311,-74.08174


In [70]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


#### Create a map of New York with neighborhoods superimposed on top.

In [71]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

In [72]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [73]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


In [74]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

In [84]:
latitudes = manhattan_data['Latitude'].tolist()
longitudes = manhattan_data['Longitude'].tolist()

In [129]:
CLIENT_ID = 'C3IMJA1JKHLYH52J5CDCPDPSRRUAYCRDMV4FXT0YXA0XCJDR' # your Foursquare ID
CLIENT_SECRET = 'WJ123BQFR5U2K20IAS5QDPOYTYWRGD33OHNBZRE2Q14WL3TR' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: C3IMJA1JKHLYH52J5CDCPDPSRRUAYCRDMV4FXT0YXA0XCJDR
CLIENT_SECRET:WJ123BQFR5U2K20IAS5QDPOYTYWRGD33OHNBZRE2Q14WL3TR


In [130]:
address = 'Manhattan, New York, NY'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

40.7896239 -73.9598939


In [134]:
search_query = 'Chinese'
radius = 50000
print(search_query + ' .... OK!')

Chinese .... OK!


In [135]:
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius)
url

'https://api.foursquare.com/v2/venues/search?client_id=C3IMJA1JKHLYH52J5CDCPDPSRRUAYCRDMV4FXT0YXA0XCJDR&client_secret=WJ123BQFR5U2K20IAS5QDPOYTYWRGD33OHNBZRE2Q14WL3TR&ll=40.7896239,-73.9598939&v=20180605&query=Chinese&radius=50000'

In [136]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ee22ed9c6fd9436f6f17730'},
 'response': {'venues': [{'id': '4cd5ef14fb5954811370e150',
    'name': 'Best Time Chinese & Tex-Mex Food',
    'location': {'address': '1571 Lexington Ave',
     'lat': 40.78864288330078,
     'lng': -73.94852447509766,
     'labeledLatLngs': [{'label': 'entrance',
       'lat': 40.788758,
       'lng': -73.94868},
      {'label': 'display',
       'lat': 40.78864288330078,
       'lng': -73.94852447509766}],
     'distance': 964,
     'postalCode': '10029',
     'cc': 'US',
     'city': 'New York',
     'state': 'NY',
     'country': 'United States',
     'formattedAddress': ['1571 Lexington Ave',
      'New York, NY 10029',
      'United States']},
    'categories': [{'id': '4bf58dd8d48988d145941735',
      'name': 'Chinese Restaurant',
      'pluralName': 'Chinese Restaurants',
      'shortName': 'Chinese',
      'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/asian_',
       'suffix': '.png'},
      '

In [137]:
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,id,name,categories,referralId,hasPerk,location.address,location.lat,location.lng,location.labeledLatLngs,location.distance,...,location.formattedAddress,location.crossStreet,venuePage.id,delivery.id,delivery.url,delivery.provider.name,delivery.provider.icon.prefix,delivery.provider.icon.sizes,delivery.provider.icon.name,location.neighborhood
0,4cd5ef14fb5954811370e150,Best Time Chinese & Tex-Mex Food,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1591881525,False,1571 Lexington Ave,40.788643,-73.948524,"[{'label': 'entrance', 'lat': 40.788758, 'lng'...",964,...,"[1571 Lexington Ave, New York, NY 10029, Unite...",,,,,,,,,
1,4b885bd3f964a520dbf131e3,U Like Chinese,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1591881525,False,917 Columbus Ave,40.798554,-73.963403,"[{'label': 'display', 'lat': 40.79855362787655...",1037,...,"[917 Columbus Ave (105th St.), New York, NY 10...",105th St.,,,,,,,,
2,4daa52fb6a2303012f0c3f72,Hong Kong Chinese,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1591881525,False,1703 Lexington Ave,40.792676,-73.945675,"[{'label': 'display', 'lat': 40.792676, 'lng':...",1245,...,"[1703 Lexington Ave, New York, NY 10029, Unite...",,,,,,,,,
3,4b788bf3f964a520e3d52ee3,Fu Wing Chinese,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1591881525,False,153 E 106th St,40.792273,-73.945705,"[{'label': 'display', 'lat': 40.792273, 'lng':...",1231,...,"[153 E 106th St (Lexington Ave), New York, NY ...",Lexington Ave,,,,,,,,
4,4f14ba2ae4b03856f3a57a74,No. 1 Chinese Restaurant,"[{'id': '4bf58dd8d48988d145941735', 'name': 'C...",v-1591881525,False,83 W 115th St,40.801317,-73.949707,"[{'label': 'display', 'lat': 40.801317, 'lng':...",1559,...,"[83 W 115th St, New York, NY 10026, United Sta...",,,,,,,,,


In [149]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered = dataframe_filtered.drop([5,15,16,17,18], axis=0)

dataframe_filtered.reset_index()

Unnamed: 0,index,name,categories,address,lat,lng,labeledLatLngs,distance,postalCode,cc,city,state,country,formattedAddress,crossStreet,neighborhood,id
0,0,Best Time Chinese & Tex-Mex Food,Chinese Restaurant,1571 Lexington Ave,40.788643,-73.948524,"[{'label': 'entrance', 'lat': 40.788758, 'lng'...",964,10029.0,US,New York,NY,United States,"[1571 Lexington Ave, New York, NY 10029, Unite...",,,4cd5ef14fb5954811370e150
1,1,U Like Chinese,Chinese Restaurant,917 Columbus Ave,40.798554,-73.963403,"[{'label': 'display', 'lat': 40.79855362787655...",1037,10025.0,US,New York,NY,United States,"[917 Columbus Ave (105th St.), New York, NY 10...",105th St.,,4b885bd3f964a520dbf131e3
2,2,Hong Kong Chinese,Chinese Restaurant,1703 Lexington Ave,40.792676,-73.945675,"[{'label': 'display', 'lat': 40.792676, 'lng':...",1245,10029.0,US,New York,NY,United States,"[1703 Lexington Ave, New York, NY 10029, Unite...",,,4daa52fb6a2303012f0c3f72
3,3,Fu Wing Chinese,Chinese Restaurant,153 E 106th St,40.792273,-73.945705,"[{'label': 'display', 'lat': 40.792273, 'lng':...",1231,10029.0,US,New York,NY,United States,"[153 E 106th St (Lexington Ave), New York, NY ...",Lexington Ave,,4b788bf3f964a520e3d52ee3
4,4,No. 1 Chinese Restaurant,Chinese Restaurant,83 W 115th St,40.801317,-73.949707,"[{'label': 'display', 'lat': 40.801317, 'lng':...",1559,10026.0,US,New York,NY,United States,"[83 W 115th St, New York, NY 10026, United Sta...",,,4f14ba2ae4b03856f3a57a74
5,6,Acupuncture & Chinese Healing by R. Brown,Acupuncturist,336 Riverside Dr,40.801959,-73.970011,"[{'label': 'display', 'lat': 40.80195900000000...",1616,10025.0,US,New York,NY,United States,"[336 Riverside Dr, New York, NY 10025, United ...",,,5092d73de4b0476c6e5b9a92
6,7,Empire III Chinese & Pan Asian Cusine,Chinese Restaurant,1902 Adam Clayton Powell Jr Blvd,40.802902,-73.95327,"[{'label': 'display', 'lat': 40.80290222167969...",1580,10026.0,US,New York,NY,United States,"[1902 Adam Clayton Powell Jr Blvd (116th), New...",116th,,4d910d15f5388cfa1949ac3d
7,8,No. 1 Chinese Kitchen,Chinese Restaurant,6514 Park Ave,40.789793,-74.005196,"[{'label': 'display', 'lat': 40.78979299999999...",3818,7093.0,US,West New York,NJ,United States,"[6514 Park Ave (Btw. 65th & 66th), West New Yo...",Btw. 65th & 66th,,4c437029f97fbe9a6cd2b930
8,9,Wing Sing Chinese Restaurant,Chinese Restaurant,1863 Lexington Ave,40.798323,-73.941637,"[{'label': 'display', 'lat': 40.79832299999999...",1817,10029.0,US,New York,NY,United States,"[1863 Lexington Ave, New York, NY 10029, Unite...",,,4e4e4f67bd4101d0d7a76773
9,10,Harvest Chinese | Thai Cuisine,Asian Restaurant,1501 1st Ave,40.771962,-73.953103,"[{'label': 'display', 'lat': 40.77196230918650...",2047,10075.0,US,New York,NY,United States,"[1501 1st Ave, New York, NY 10075, United States]",,,50318022e4b029822734f00d


In [151]:
dataframe_chinese = dataframe_filtered[['name','categories','lat','lng','distance']]
dataframe_chinese

Unnamed: 0,name,categories,lat,lng,distance
0,Best Time Chinese & Tex-Mex Food,Chinese Restaurant,40.788643,-73.948524,964
1,U Like Chinese,Chinese Restaurant,40.798554,-73.963403,1037
2,Hong Kong Chinese,Chinese Restaurant,40.792676,-73.945675,1245
3,Fu Wing Chinese,Chinese Restaurant,40.792273,-73.945705,1231
4,No. 1 Chinese Restaurant,Chinese Restaurant,40.801317,-73.949707,1559
6,Acupuncture & Chinese Healing by R. Brown,Acupuncturist,40.801959,-73.970011,1616
7,Empire III Chinese & Pan Asian Cusine,Chinese Restaurant,40.802902,-73.95327,1580
8,No. 1 Chinese Kitchen,Chinese Restaurant,40.789793,-74.005196,3818
9,Wing Sing Chinese Restaurant,Chinese Restaurant,40.798323,-73.941637,1817
10,Harvest Chinese | Thai Cuisine,Asian Restaurant,40.771962,-73.953103,2047


In [152]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=13) # generate map centred around the Conrad Hotel

# add a red circle marker to represent the Conrad Hotel
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Conrad Hotel',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(venues_map)

# add the Italian restaurants as blue circle markers
for lat, lng, label in zip(dataframe_chinese.lat, dataframe_chinese.lng, dataframe_chinese.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

## Clustering Chinese Restaurants

In [158]:
def getNearbyVenues(names, latitudes, longitudes, radius=50000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [159]:
# type your answer here

manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )



Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [160]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Art Gallery,Art Museum,Bagel Shop,Bakery,Beach,Beer Store,Bike Shop,Bookstore,Botanical Garden,...,Taco Place,Thai Restaurant,Theater,Theme Park Ride / Attraction,Track,Trail,Volleyball Court,Waterfront,Wine Shop,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0


In [161]:
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
manhattan_grouped

Unnamed: 0,Neighborhood,Art Gallery,Art Museum,Bagel Shop,Bakery,Beach,Beer Store,Bike Shop,Bookstore,Botanical Garden,...,Taco Place,Thai Restaurant,Theater,Theme Park Ride / Attraction,Track,Trail,Volleyball Court,Waterfront,Wine Shop,Yoga Studio
0,Battery Park City,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.033333,0.0,0.033333,0.033333,0.0,0.0,0.0
1,Carnegie Hill,0.0,0.033333,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,...,0.0,0.033333,0.033333,0.0,0.033333,0.0,0.0,0.033333,0.0,0.033333
2,Central Harlem,0.0,0.033333,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,0.033333,0.0,0.033333
3,Chelsea,0.033333,0.0,0.033333,0.033333,0.0,0.0,0.033333,0.066667,0.0,...,0.033333,0.0,0.066667,0.0,0.0,0.033333,0.0,0.0,0.066667,0.0
4,Chinatown,0.033333,0.0,0.033333,0.0,0.033333,0.0,0.0,0.0,0.0,...,0.0,0.0,0.033333,0.033333,0.0,0.0,0.033333,0.0,0.033333,0.033333
5,Civic Center,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.033333,0.0,0.033333,0.033333,0.0,0.033333,0.0
6,Clinton,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.033333,0.0,...,0.033333,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,East Harlem,0.0,0.033333,0.0,0.133333,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.033333,0.0,0.033333
8,East Village,0.033333,0.0,0.033333,0.0,0.0,0.033333,0.033333,0.033333,0.0,...,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.066667,0.033333
9,Financial District,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.033333,0.0,0.033333,0.033333,0.0,0.0,0.033333


In [162]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [163]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Park,Garden,Pier,Scenic Lookout,Grocery Store,Ice Cream Shop,Italian Restaurant,Memorial Site,Music Venue,French Restaurant
1,Carnegie Hill,Park,Bakery,Plaza,Fountain,Garden,Reservoir,Gym,Gym / Fitness Center,Field,Exhibit
2,Central Harlem,Park,Bakery,Plaza,Fountain,Garden,Reservoir,Field,Exhibit,Library,Lighthouse
3,Chelsea,Park,Scenic Lookout,Wine Shop,Theater,Bookstore,Art Gallery,Gym,Furniture / Home Store,Ice Cream Shop,Farmers Market
4,Chinatown,Park,Yoga Studio,Bagel Shop,Beach,Bridge,Chocolate Shop,French Restaurant,Garden,Grocery Store,Wine Shop


In [167]:
# set number of clusters
kclusters = 5

manhattan_grouped_clustering = dataframe_chinese.drop(['name','categories','distance'], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 1, 4, 4, 4, 1, 4, 2, 4, 3], dtype=int32)

In [168]:
dataframe_chinese.insert(0, "Cluster Labels", kmeans.labels_)

In [169]:
dataframe_chinese

Unnamed: 0,Cluster Labels,name,categories,lat,lng,distance
0,4,Best Time Chinese & Tex-Mex Food,Chinese Restaurant,40.788643,-73.948524,964
1,1,U Like Chinese,Chinese Restaurant,40.798554,-73.963403,1037
2,4,Hong Kong Chinese,Chinese Restaurant,40.792676,-73.945675,1245
3,4,Fu Wing Chinese,Chinese Restaurant,40.792273,-73.945705,1231
4,4,No. 1 Chinese Restaurant,Chinese Restaurant,40.801317,-73.949707,1559
6,1,Acupuncture & Chinese Healing by R. Brown,Acupuncturist,40.801959,-73.970011,1616
7,4,Empire III Chinese & Pan Asian Cusine,Chinese Restaurant,40.802902,-73.95327,1580
8,2,No. 1 Chinese Kitchen,Chinese Restaurant,40.789793,-74.005196,3818
9,4,Wing Sing Chinese Restaurant,Chinese Restaurant,40.798323,-73.941637,1817
10,3,Harvest Chinese | Thai Cuisine,Asian Restaurant,40.771962,-73.953103,2047


In [173]:
# create map
map_clusters = folium.Map(location=[40.7831, -73.9712], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lng, dist, cluster in zip(dataframe_chinese['lat'], dataframe_chinese['lng'], dataframe_chinese['distance'], dataframe_chinese['Cluster Labels']):
    label = folium.Popup(str(dist) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 5. Examine Clusters

### Cluster 1

In [174]:
dataframe_chinese.loc[dataframe_chinese['Cluster Labels'] == 0, dataframe_chinese.columns[[1] + list(range(5, dataframe_chinese.shape[1]))]]

Unnamed: 0,name,distance
22,No. 1 Peking Chinese Restaurant,11595


### Cluster 2

In [175]:
dataframe_chinese.loc[dataframe_chinese['Cluster Labels'] == 1, dataframe_chinese.columns[[1] + list(range(5, dataframe_chinese.shape[1]))]]

Unnamed: 0,name,distance
1,U Like Chinese,1037
6,Acupuncture & Chinese Healing by R. Brown,1616
12,U Like Chinese Restaurant,703
21,Hwa Ying Chinese Restaurant,1004
25,Spice Chinese & Continental Cuisine,1148


### Cluster 3

In [176]:
dataframe_chinese.loc[dataframe_chinese['Cluster Labels'] == 2, dataframe_chinese.columns[[1] + list(range(5, dataframe_chinese.shape[1]))]]

Unnamed: 0,name,distance
8,No. 1 Chinese Kitchen,3818
14,Chinese East,5503
19,Foliage Garden Chinese Restaurant,4136
23,Jade Garden Chinese Tex Mexican Restaurant,3746
26,Golden House Kitchen Chinese Restaurant,3953
27,#1 Chinese Restaurant,4147


### Cluster 4

In [177]:
dataframe_chinese.loc[dataframe_chinese['Cluster Labels'] == 3, dataframe_chinese.columns[[1] + list(range(5, dataframe_chinese.shape[1]))]]

Unnamed: 0,name,distance
10,Harvest Chinese | Thai Cuisine,2047
13,Eastern Chinese Restaurant,3622
28,Lilly's Chinese Food,4229
29,Hunan Chinese Restaurant,3954


### Cluster 5

In [178]:
dataframe_chinese.loc[dataframe_chinese['Cluster Labels'] == 4, dataframe_chinese.columns[[1] + list(range(5, dataframe_chinese.shape[1]))]]

Unnamed: 0,name,distance
0,Best Time Chinese & Tex-Mex Food,964
2,Hong Kong Chinese,1245
3,Fu Wing Chinese,1231
4,No. 1 Chinese Restaurant,1559
7,Empire III Chinese & Pan Asian Cusine,1580
9,Wing Sing Chinese Restaurant,1817
11,New Aroma Chinese,2623
20,Hong Kong Chinese Restaurant,3280
24,Good Choice Chinese Restaurant,4038
