# Capstone Project - Restaurants in NYC 


## Introduction: Business Problem <a name="introduction"></a>

 In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening an **Chinese restaurant** in **Manhattan**, New York.

Since there are lots of restaurants in NYC we will try to detect **locations that are not already crowded with restaurants**. We are also particularly interested in **areas with no Chinese restaurants in vicinity**. We would also prefer locations **as close to city center as possible**, assuming that first two conditions are met.

We will use our data science powers to generate a few most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.


## Analytical approach <a name="Analytical approach"></a>

Our problem is clearly a clustering problem. We will therefore rely on a clustering model to solve it. Clustering models are numerous, with the two most popular being K-means clustering and hierarchical clustering.
Fortunately, most clustering algorithms are already implemented in open source libraries for the language we will use (Python), therefore we won’t have to do much coding. The most critical and the most tedious part of this project, as with most data science projects, will be to collect and clean the data.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decision are:
* number of existing restaurants in the neighborhood (any type of restaurant)
* number of and distance to Italian restaurants in the neighborhood, if any
* distance of neighborhood from city center

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Google Maps API reverse geocoding**
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of NYC center will be obtained using **Google Maps API geocoding** of well known NYC location 

### Neighborhood Candidates

- New York City data that contains Borough, Neighborhoods along with there latitudes and longitudes
- Data Source: https://cocl.us/new_york_dataset
- Description: This data set contains the required information. And we will use this data set to explore various neighborhoods of new york city.

- Chinese restaurants in Manhattan neighborhood of new york city.
- Data Source: Foursquare API
- Description: By using this API we will get all the venues in Manhattan neighborhood. We can filter these venues to get only Chinese restaurants.

In [3]:
import pandas as pd
import json
import requests # library to handle requests

def get_new_york_data():
    url='https://cocl.us/new_york_dataset'
    resp=requests.get(url).json()
    # all data is present in features label
    features=resp['features']
    # define the dataframe columns
    column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
    # instantiate the dataframe
    new_york_data = pd.DataFrame(columns=column_names)
    for data in features:
        borough = data['properties']['borough'] 
        neighborhood_name = data['properties']['name']
        neighborhood_latlon = data['geometry']['coordinates']
        neighborhood_lat = neighborhood_latlon[1]
        neighborhood_lon = neighborhood_latlon[0]
        new_york_data = new_york_data.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    return new_york_data


ny_data = get_new_york_data()
ny_data

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585
...,...,...,...,...
301,Manhattan,Hudson Yards,40.756658,-74.000111
302,Queens,Hammels,40.587338,-73.805530
303,Queens,Bayswater,40.611322,-73.765968
304,Queens,Queensbridge,40.756091,-73.945631


##### Obtain Manhtattan data then use folium to map it

In [4]:
manhattan_data = ny_data[ny_data['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688
5,Manhattan,Manhattanville,40.816934,-73.957385
6,Manhattan,Central Harlem,40.815976,-73.943211
7,Manhattan,East Harlem,40.792249,-73.944182
8,Manhattan,Upper East Side,40.775639,-73.960508
9,Manhattan,Yorkville,40.77593,-73.947118


In [5]:
manhattan_data.shape

(40, 4)

In [6]:
!pip install geopy
print('geopy installed!')

geopy installed!


In [7]:

from geopy.geocoders import Nominatim

address = 'Manhattan, NY'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
manhattan_latitude = location.latitude
manhattan_longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(manhattan_latitude, manhattan_longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


In [9]:
import folium

# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[manhattan_latitude, manhattan_longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

#### Collect venues information for Foursquare API

In [37]:


CLIENT_ID = '*********************************************'
CLIENT_SECRET = '******************************************'
VERSION = '20180605'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: *********************************************
CLIENT_SECRET:******************************************


In [11]:

def getVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [12]:
manhattan_venues = getVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )
manhattan_venues.head()

Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop
4,Marble Hill,40.876551,-73.91066,Dunkin',40.877136,-73.906666,Donut Shop


In [13]:
# Get total category of manhattan resturants
manhattan_venues['Venue Category'].unique()

array(['Pizza Place', 'Yoga Studio', 'Diner', 'Coffee Shop', 'Donut Shop',
       'Gym', 'Pharmacy', 'Tennis Stadium', 'Department Store',
       'Discount Store', 'Supplement Shop', 'Seafood Restaurant',
       'Ice Cream Shop', 'Video Game Store', 'American Restaurant',
       'Miscellaneous Shop', 'Sandwich Place', 'Steakhouse', 'Kids Store',
       'Big Box Store', 'Shopping Mall', 'Deli / Bodega', 'Hotel',
       'Greek Restaurant', 'Chinese Restaurant', 'Cocktail Bar', 'Spa',
       'Bakery', 'English Restaurant', 'New American Restaurant',
       'Tea Room', 'Indie Movie Theater', 'Salon / Barbershop',
       'Hotpot Restaurant', 'Spanish Restaurant', 'Roof Deck', 'Museum',
       'Asian Restaurant', 'Noodle House', 'Bubble Tea Shop', 'Bike Shop',
       'Boutique', 'Furniture / Home Store', 'Historic Site',
       'Thai Restaurant', 'Dessert Shop', 'Music Venue', 'Karaoke Bar',
       'Cosmetics Shop', 'Supermarket', 'Italian Restaurant',
       'Shanghai Restaurant', 'Vietname

##### Calculate frequency of each category by neighborhood

In [14]:
manhattan_venues.loc[manhattan_venues['Venue Category']=='Hotpot Restaurant', ['Venue Category']]='Chinese Restaurant'
manhattan_venues.loc[manhattan_venues['Venue Category']=='Taiwanese Restaurant', ['Venue Category']]='Chinese Restaurant'
manhattan_venues.loc[manhattan_venues['Venue Category']=='Dim Sum Restaurant', ['Venue Category']]='Chinese Restaurant'
manhattan_venues.loc[manhattan_venues['Venue Category']=='Dumpling Restaurant', ['Venue Category']]='Chinese Restaurant'
manhattan_venues.loc[manhattan_venues['Venue Category']=='Cantonese Restaurant', ['Venue Category']]='Chinese Restaurant'
manhattan_venues.loc[manhattan_venues['Venue Category']=='Shanghai Restaurant', ['Venue Category']]='Chinese Restaurant'
manhattan_venues['Venue Category'].unique()

array(['Pizza Place', 'Yoga Studio', 'Diner', 'Coffee Shop', 'Donut Shop',
       'Gym', 'Pharmacy', 'Tennis Stadium', 'Department Store',
       'Discount Store', 'Supplement Shop', 'Seafood Restaurant',
       'Ice Cream Shop', 'Video Game Store', 'American Restaurant',
       'Miscellaneous Shop', 'Sandwich Place', 'Steakhouse', 'Kids Store',
       'Big Box Store', 'Shopping Mall', 'Deli / Bodega', 'Hotel',
       'Greek Restaurant', 'Chinese Restaurant', 'Cocktail Bar', 'Spa',
       'Bakery', 'English Restaurant', 'New American Restaurant',
       'Tea Room', 'Indie Movie Theater', 'Salon / Barbershop',
       'Spanish Restaurant', 'Roof Deck', 'Museum', 'Asian Restaurant',
       'Noodle House', 'Bubble Tea Shop', 'Bike Shop', 'Boutique',
       'Furniture / Home Store', 'Historic Site', 'Thai Restaurant',
       'Dessert Shop', 'Music Venue', 'Karaoke Bar', 'Cosmetics Shop',
       'Supermarket', 'Italian Restaurant', 'Vietnamese Restaurant',
       'Organic Grocery', 'Mexican 

In [15]:
# transpose by category
manhattan_byCategory = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column
manhattan_byCategory['Neighborhood'] = manhattan_venues['Neighborhood'] 

manhattan_grouped = manhattan_byCategory.groupby('Neighborhood').mean().reset_index()
manhattan_grouped

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,...,Video Store,Vietnamese Restaurant,Volleyball Court,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Battery Park City,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0
1,Carnegie Hill,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.01087,0.0,...,0.0,0.01087,0.0,0.0,0.0,0.01087,0.032609,0.0,0.01087,0.032609
2,Central Harlem,0.0,0.0,0.0,0.066667,0.044444,0.0,0.0,0.0,0.022222,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Chelsea,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.05,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0
4,Chinatown,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,...,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01
5,Civic Center,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,...,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.03
6,Clinton,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,...,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.0
7,East Harlem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,East Village,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,...,0.0,0.03,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0
9,Financial District,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0


In [16]:
grouped_cols = manhattan_grouped.columns
restaurant_cols = []
count = 0
for restaurant in grouped_cols:
    if 'Restaurant' in restaurant:
        restaurant_cols.append(restaurant)
        count+=1
    elif 'Food & Drink Shop' in restaurant:
        restaurant_cols.append(restaurant)
        count+=1
    elif 'Café' in restaurant:
        restaurant_cols.append(restaurant)
        count+=1
    elif 'BBQ' in restaurant:
        restaurant_cols.append(restaurant)
        count+=1
    elif 'Bar' in restaurant:
        restaurant_cols.append(restaurant)
        count+=1
    elif 'Cafe' in restaurant:
        restaurant_cols.append(restaurant)
        count+=1
        
print("Total {} of Restaurants, Cafe, Bars are added and available.".format(count))
restaurant_cols

Total 90 of Restaurants, Cafe, Bars are added and available.


['Afghan Restaurant',
 'African Restaurant',
 'American Restaurant',
 'Arepa Restaurant',
 'Argentinian Restaurant',
 'Asian Restaurant',
 'Australian Restaurant',
 'Austrian Restaurant',
 'BBQ Joint',
 'Bar',
 'Beer Bar',
 'Brazilian Restaurant',
 'Cafeteria',
 'Café',
 'Cajun / Creole Restaurant',
 'Cambodian Restaurant',
 'Caribbean Restaurant',
 'Caucasian Restaurant',
 'Chinese Restaurant',
 'Cocktail Bar',
 'College Cafeteria',
 'Cuban Restaurant',
 'Czech Restaurant',
 'Eastern European Restaurant',
 'Egyptian Restaurant',
 'Empanada Restaurant',
 'English Restaurant',
 'Ethiopian Restaurant',
 'Falafel Restaurant',
 'Fast Food Restaurant',
 'Filipino Restaurant',
 'Food & Drink Shop',
 'French Restaurant',
 'Gaming Cafe',
 'Gay Bar',
 'German Restaurant',
 'Greek Restaurant',
 'Hawaiian Restaurant',
 'Himalayan Restaurant',
 'Hookah Bar',
 'Hotel Bar',
 'Indian Restaurant',
 'Israeli Restaurant',
 'Italian Restaurant',
 'Japanese Curry Restaurant',
 'Japanese Restaurant',
 'Jew

In [32]:
manhattan_grouped['Total Visited Frequecy'] = manhattan_grouped[restaurant_cols].sum(axis=1)
manhattan_grouped_sorted = manhattan_grouped.sort_values(ascending=False,by=['Total Visited Frequecy']).reset_index(drop=True)
print(manhattan_grouped_sorted.shape)
manhattan_grouped_sorted.head(10)

(40, 324)


Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,...,Vietnamese Restaurant,Volleyball Court,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Total Visited Frequecy
0,East Village,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,...,0.03,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.59
1,Upper West Side,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,...,0.011111,0.0,0.0,0.0,0.033333,0.011111,0.0,0.0,0.011111,0.555556
2,Manhattanville,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.533333
3,Turtle Bay,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.52
4,Greenwich Village,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,...,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.52
5,Central Harlem,0.0,0.0,0.0,0.066667,0.044444,0.0,0.0,0.0,0.022222,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.511111
6,Chinatown,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,...,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.48
7,Hamilton Heights,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.016949,0.0,0.0,0.0,0.033898,0.457627
8,Inwood,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.035088,0.017544,0.0,0.0,0.017544,0.45614
9,West Village,0.01,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.01,...,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.45


In [36]:
result=manhattan_grouped_sorted.loc[:,['Neighborhood','Total Visited Frequecy']]
result.head(10)

Unnamed: 0,Neighborhood,Total Visited Frequecy
0,East Village,0.59
1,Upper West Side,0.555556
2,Manhattanville,0.533333
3,Turtle Bay,0.52
4,Greenwich Village,0.52
5,Central Harlem,0.511111
6,Chinatown,0.48
7,Hamilton Heights,0.457627
8,Inwood,0.45614
9,West Village,0.45


In [33]:
no_ch_neighborhood = manhattan_grouped_sorted[manhattan_grouped_sorted['Chinese Restaurant']==0].reset_index(drop=True)


food_neighborhoods = no_ch_neighborhood.drop(columns='Total Visited Frequecy')
food_neighborhoods.head(10)

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arepa Restaurant,Argentinian Restaurant,Art Gallery,...,Video Store,Vietnamese Restaurant,Volleyball Court,Waterfront,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Turtle Bay,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0
1,East Harlem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Manhattan Valley,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.021739,0.0,0.0,0.0,0.0,0.021739,0.021739,0.0,0.043478
3,Noho,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.04,...,0.0,0.0,0.0,0.0,0.01,0.02,0.02,0.0,0.0,0.01
4,Gramercy,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.011111,...,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.011111
5,Civic Center,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,...,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.03
6,Flatiron,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,...,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.02
7,Financial District,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0
8,Morningside Heights,0.0,0.0,0.0,0.0,0.073171,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Hudson Yards,0.0,0.0,0.0,0.0,0.053571,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0


In [19]:

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [20]:
import numpy as np

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = food_neighborhoods['Neighborhood']

for ind in np.arange(food_neighborhoods.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(food_neighborhoods.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head(10)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Turtle Bay,Sushi Restaurant,Italian Restaurant,Coffee Shop,Wine Bar,French Restaurant,Park,Deli / Bodega,Japanese Restaurant,Seafood Restaurant,Café
1,East Harlem,Mexican Restaurant,Bakery,Thai Restaurant,Deli / Bodega,Spa,Latin American Restaurant,Sandwich Place,Taco Place,Donut Shop,Cocktail Bar
2,Manhattan Valley,Mexican Restaurant,Bar,Thai Restaurant,Pizza Place,Park,Coffee Shop,Yoga Studio,Fried Chicken Joint,Clothing Store,Ice Cream Shop
3,Noho,Italian Restaurant,Coffee Shop,Pizza Place,Art Gallery,French Restaurant,Grocery Store,Sandwich Place,Rock Club,Sushi Restaurant,Mexican Restaurant
4,Gramercy,Bar,Pizza Place,Italian Restaurant,American Restaurant,Coffee Shop,Playground,Cocktail Bar,Mexican Restaurant,Bagel Shop,Grocery Store
5,Civic Center,Coffee Shop,Hotel,Cocktail Bar,Gym / Fitness Center,French Restaurant,Italian Restaurant,Yoga Studio,Spa,Park,Sushi Restaurant
6,Flatiron,Gym / Fitness Center,New American Restaurant,Spa,Italian Restaurant,Mediterranean Restaurant,Furniture / Home Store,Vegetarian / Vegan Restaurant,Japanese Restaurant,Gym,American Restaurant
7,Financial District,Coffee Shop,Pizza Place,American Restaurant,Café,Cocktail Bar,Gym,Gym / Fitness Center,Italian Restaurant,Bar,Park
8,Morningside Heights,Park,Bookstore,American Restaurant,Coffee Shop,Burger Joint,Deli / Bodega,Sandwich Place,Pub,Supermarket,Mediterranean Restaurant
9,Hudson Yards,Hotel,Gym / Fitness Center,Italian Restaurant,American Restaurant,Coffee Shop,Dog Run,Nightclub,Gym,Park,Spanish Restaurant


### Use KMeans to cluster Neighborhoods

In [21]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

manhattan_grouped_clustering = food_neighborhoods.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

manhattan_merged = manhattan_data

manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged = manhattan_merged.dropna().reset_index(drop=True)

manhattan_merged.head() # check the last columns!



Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,4.0,Coffee Shop,Gym,Yoga Studio,Big Box Store,Supplement Shop,Steakhouse,Shopping Mall,Seafood Restaurant,Sandwich Place,Donut Shop
1,Manhattan,East Harlem,40.792249,-73.944182,3.0,Mexican Restaurant,Bakery,Thai Restaurant,Deli / Bodega,Spa,Latin American Restaurant,Sandwich Place,Taco Place,Donut Shop,Cocktail Bar
2,Manhattan,Roosevelt Island,40.76216,-73.949168,2.0,Park,Restaurant,Residential Building (Apartment / Condo),Sandwich Place,Dry Cleaner,Liquor Store,Outdoors & Recreation,Coffee Shop,Supermarket,Baseball Field
3,Manhattan,Manhattan Valley,40.797307,-73.964286,1.0,Mexican Restaurant,Bar,Thai Restaurant,Pizza Place,Park,Coffee Shop,Yoga Studio,Fried Chicken Joint,Clothing Store,Ice Cream Shop
4,Manhattan,Morningside Heights,40.808,-73.963896,1.0,Park,Bookstore,American Restaurant,Coffee Shop,Burger Joint,Deli / Bodega,Sandwich Place,Pub,Supermarket,Mediterranean Restaurant


In [34]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[manhattan_latitude, manhattan_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

###  Examine Clusters

In [23]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Stuyvesant Town,Boat or Ferry,Park,Baseball Field,Heliport,Gas Station,Skating Rink,Farmers Market,Bistro,Gym / Fitness Center,Cocktail Bar


In [24]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Manhattan Valley,Mexican Restaurant,Bar,Thai Restaurant,Pizza Place,Park,Coffee Shop,Yoga Studio,Fried Chicken Joint,Clothing Store,Ice Cream Shop
4,Morningside Heights,Park,Bookstore,American Restaurant,Coffee Shop,Burger Joint,Deli / Bodega,Sandwich Place,Pub,Supermarket,Mediterranean Restaurant
5,Gramercy,Bar,Pizza Place,Italian Restaurant,American Restaurant,Coffee Shop,Playground,Cocktail Bar,Mexican Restaurant,Bagel Shop,Grocery Store
6,Financial District,Coffee Shop,Pizza Place,American Restaurant,Café,Cocktail Bar,Gym,Gym / Fitness Center,Italian Restaurant,Bar,Park
7,Noho,Italian Restaurant,Coffee Shop,Pizza Place,Art Gallery,French Restaurant,Grocery Store,Sandwich Place,Rock Club,Sushi Restaurant,Mexican Restaurant
8,Civic Center,Coffee Shop,Hotel,Cocktail Bar,Gym / Fitness Center,French Restaurant,Italian Restaurant,Yoga Studio,Spa,Park,Sushi Restaurant
9,Turtle Bay,Sushi Restaurant,Italian Restaurant,Coffee Shop,Wine Bar,French Restaurant,Park,Deli / Bodega,Japanese Restaurant,Seafood Restaurant,Café
11,Flatiron,Gym / Fitness Center,New American Restaurant,Spa,Italian Restaurant,Mediterranean Restaurant,Furniture / Home Store,Vegetarian / Vegan Restaurant,Japanese Restaurant,Gym,American Restaurant
12,Hudson Yards,Hotel,Gym / Fitness Center,Italian Restaurant,American Restaurant,Coffee Shop,Dog Run,Nightclub,Gym,Park,Spanish Restaurant


In [25]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Roosevelt Island,Park,Restaurant,Residential Building (Apartment / Condo),Sandwich Place,Dry Cleaner,Liquor Store,Outdoors & Recreation,Coffee Shop,Supermarket,Baseball Field


In [26]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,East Harlem,Mexican Restaurant,Bakery,Thai Restaurant,Deli / Bodega,Spa,Latin American Restaurant,Sandwich Place,Taco Place,Donut Shop,Cocktail Bar


In [27]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Coffee Shop,Gym,Yoga Studio,Big Box Store,Supplement Shop,Steakhouse,Shopping Mall,Seafood Restaurant,Sandwich Place,Donut Shop


In [28]:
top_10_neigh = []
for neigh in manhattan_grouped_sorted['Neighborhood'].head(10):
    top_10_neigh.append(neigh)

top_10_neigh

['East Village',
 'Upper West Side',
 'Manhattanville',
 'Turtle Bay',
 'Greenwich Village',
 'Central Harlem',
 'Chinatown',
 'Hamilton Heights',
 'Inwood',
 'West Village']

In [29]:
top_10_neigh_df = manhattan_merged[manhattan_merged['Neighborhood'].isin(top_10_neigh)]
top_10_neigh_df

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Manhattan,Turtle Bay,40.752042,-73.967708,1.0,Sushi Restaurant,Italian Restaurant,Coffee Shop,Wine Bar,French Restaurant,Park,Deli / Bodega,Japanese Restaurant,Seafood Restaurant,Café


In [30]:
for lat, lon, poi, cluster in zip(top_10_neigh_df['Latitude'], top_10_neigh_df['Longitude'], top_10_neigh_df['Neighborhood'], top_10_neigh_df['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        fill=True,
        fill_color=rainbow[4],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Result:
- Turtle Bay is the best location to start a Chinese Restuarnt since it's the best neighborhood has most food venues frequent with low Chinese food venues


#### Conclusion:
- Manhattan has totally 40 neighborhoods
- Top 10 neighborhoods with most food venues are: East Village *freq=0.59*,Upper West Side *freq=0.56*,Manhattanville *freq=0.53*,Turtle Bay *freq=0.52*,Greenwich Village *freq=0.52*,Central Harlem *freq=0.51*,Chinatown *freq=0.48*,Hamilton Heights *freq=0.46*,Inwood *freq=0.46*,West Village *freq=0.45*
- Cluster 1 is the one shareholder should consider