# Capstone Project - Battle of the Neighborhoods

### Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

# Introduction: Business Problem <a name="introduction"></a>

In Module 3, we explored New York City and segmented and clustered their neighborhoods. New York is very diverse and is the financial capitals of the United States. It is known as the "City that Never Sleeps" and has many different aspects that it is known for around the world. One of which being one of the most diverse culinary landscapes in the world. When it comes to opening a restaurant having a great product does not equal success, location is extremely important. In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to a restaurateur interested in opening a new restaurant in **Manhattan**.

Since New York is one of the food capitals of the world there is abundant competition. We will try to search for **areas where there is a thriving restaurant scene but try to find a gap in competition of similar cuisine**.

We will use our data science powers to generate a few most promising neighborhoods based on this criteria. Advantages of each area can then be further explored by the stakeholders when reviewing which neighborhoods to potentially open in.

# Data <a name="data"></a>

Based on definition of our problem, factor that will influence our decision are:
* number of existing restaurants in the neighborhood (specifically what type of restaurants)

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained from https://cocl.us/new_york_dataset
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**

### Download all libraries and dependencies needed

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.21.0               |             py_0          58 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.21.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

## Download NYC Dataset

NYC has a total of 5 boroughs and 306 neighborhoods. In order to segment the neighborhoods and explore them, we will need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and longitude coordinates of each neighborhood via the `wget` command and access the data provided by coursera.

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

## View Data

In [4]:
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

## Define new variable to include 'features' data

In [5]:
neighborhoods_data = newyork_data['features']

In [6]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

## Define the dataframe columns and instantiate the dataframe

In [7]:
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

neighborhoods = pd.DataFrame(columns=column_names)

## Loop through the data and fill the dataframe then create new dataframe to include only Manhattan

In [8]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [9]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head(5)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


## Use geopy library to get the latitude and longitude values of Manhattan.

In [10]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


## Create map of Manhattan using latitude and longitude values

In [11]:
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

## Define Foursquare Credentials, Version, and Search Criteria

In [88]:
# The code was removed by Watson Studio for sharing.

## Explore first neighborhood in the dataset

In [13]:
manhattan_data.loc[0, 'Neighborhood']

'Marble Hill'

In [14]:
neighborhood_latitude = manhattan_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = manhattan_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = manhattan_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Marble Hill are 40.87655077879964, -73.91065965862981.


## Let's get the top 100 venues that are in Marble Hill within a radius of 600 meters

In [91]:

LIMIT = 100
radius = 600

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET,
    VERSION,
    neighborhood_latitude, 
    neighborhood_longitude,
    CATEGORY1,
    radius, 
    LIMIT)


In [61]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e540335949393001bd7aeb5'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': '$-$$$$', 'key': 'price'},
    {'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Marble Hill',
  'headerFullLocation': 'Marble Hill, New York',
  'headerLocationGranularity': 'neighborhood',
  'query': 'food',
  'totalResults': 27,
  'suggestedBounds': {'ne': {'lat': 40.881950784199645,
    'lng': -73.9035312752877},
   'sw': {'lat': 40.871150773399634, 'lng': -73.91778804197192}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b4429abf964a52037f225e3',
       'name': "Arturo's",
       'location': {'address': '5198 Broadway',
        'crossStreet': 'at 225th St.',
        'lat': 40.87441177110231,
        'l

## From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [62]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

## Clean the json and structure it into a *pandas* dataframe.

In [63]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head(15)

Unnamed: 0,name,categories,lat,lng
0,Arturo's,Pizza Place,40.874412,-73.910271
1,Tibbett Diner,Diner,40.880404,-73.908937
2,Sam's Pizza,Pizza Place,40.879435,-73.905859
3,Dunkin',Donut Shop,40.877136,-73.906666
4,Estrellita Poblana V,Mexican Restaurant,40.879687,-73.906257
5,Land & Sea Restaurant,Seafood Restaurant,40.877885,-73.905873
6,Loeser's Delicatessen,Sandwich Place,40.879111,-73.905693
7,El Economico Restaurant,Spanish Restaurant,40.87933,-73.904597
8,Broadway Pizza & Pasta,Pizza Place,40.878822,-73.904494
9,Dunkin',Donut Shop,40.879308,-73.905066


## Create a function to repeat the same process to all the neighborhoods in Manhattan

In [64]:
def getNearbyVenues(names, latitudes, longitudes, radius=600):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng,
            CATEGORY1,
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()['response']['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    
    return(nearby_venues)

In [65]:
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )



Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [66]:
print(manhattan_venues.shape)
manhattan_venues.head(5)

(3274, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
2,Marble Hill,40.876551,-73.91066,Sam's Pizza,40.879435,-73.905859,Pizza Place
3,Marble Hill,40.876551,-73.91066,Dunkin',40.877136,-73.906666,Donut Shop
4,Marble Hill,40.876551,-73.91066,Estrellita Poblana V,40.879687,-73.906257,Mexican Restaurant


## Check how many venues were returned for each neighborhood

In [67]:
manhattan_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,49,49,49,49,49,49
Carnegie Hill,100,100,100,100,100,100
Central Harlem,50,50,50,50,50,50
Chelsea,100,100,100,100,100,100
Chinatown,100,100,100,100,100,100
Civic Center,100,100,100,100,100,100
Clinton,100,100,100,100,100,100
East Harlem,75,75,75,75,75,75
East Village,100,100,100,100,100,100
Financial District,100,100,100,100,100,100


## Check unique categories can be curated from all the returned venues

In [68]:
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))

There are 125 uniques categories.


# Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of Manhattan and identify restaurant density. We will limit our analysis to 600m around the center of each neighborhood.

In first step we have collected the required **data: location and type (category) of every restaurant**. We will then focus on tryign to identify the most promising areas by createing **clusters of locations that meet some basic requirements** established in discussion with stakeholders: we will take into consideration locations. We will present map of all such locations but also create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue locations.

# Analysis <a name="analysis"></a>

## Let's analyze each neighborhood

In [69]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,BBQ Joint,Bagel Shop,Bakery,Belgian Restaurant,Bistro,Brazilian Restaurant,Breakfast Spot,Burger Joint,Burrito Place,Cafeteria,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Creperie,Cuban Restaurant,Czech Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Donut Shop,Dosa Place,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Fondue Restaurant,Food,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Halal Restaurant,Hawaiian Restaurant,Himalayan Restaurant,Hot Dog Joint,Hotpot Restaurant,Indian Restaurant,Indonesian Restaurant,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Lebanese Restaurant,Mac & Cheese Joint,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Noodle House,Paella Restaurant,Pakistani Restaurant,Peking Duck Restaurant,Persian Restaurant,Peruvian Restaurant,Pet Café,Pizza Place,Poke Place,Poutine Place,Ramen Restaurant,Restaurant,Russian Restaurant,Salad Place,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Shanghai Restaurant,Snack Place,Soba Restaurant,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spanish Restaurant,Steakhouse,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tex-Mex Restaurant,Thai Restaurant,Theme Restaurant,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Wings Joint
0,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


### New Dataframe size

In [70]:
manhattan_onehot.shape

(3274, 126)

####  Group rows by neighborhood and by taking the mean of the frequency of occurrence of each restaurant category

In [71]:
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
manhattan_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,BBQ Joint,Bagel Shop,Bakery,Belgian Restaurant,Bistro,Brazilian Restaurant,Breakfast Spot,Burger Joint,Burrito Place,Cafeteria,Café,Cajun / Creole Restaurant,Cambodian Restaurant,Cantonese Restaurant,Caribbean Restaurant,Caucasian Restaurant,Chinese Restaurant,Comfort Food Restaurant,Creperie,Cuban Restaurant,Czech Restaurant,Deli / Bodega,Dim Sum Restaurant,Diner,Donut Shop,Dosa Place,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Empanada Restaurant,English Restaurant,Ethiopian Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Fondue Restaurant,Food,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Gluten-free Restaurant,Greek Restaurant,Halal Restaurant,Hawaiian Restaurant,Himalayan Restaurant,Hot Dog Joint,Hotpot Restaurant,Indian Restaurant,Indonesian Restaurant,Irish Pub,Israeli Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Jewish Restaurant,Kebab Restaurant,Korean Restaurant,Kosher Restaurant,Latin American Restaurant,Lebanese Restaurant,Mac & Cheese Joint,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,New American Restaurant,Noodle House,Paella Restaurant,Pakistani Restaurant,Peking Duck Restaurant,Persian Restaurant,Peruvian Restaurant,Pet Café,Pizza Place,Poke Place,Poutine Place,Ramen Restaurant,Restaurant,Russian Restaurant,Salad Place,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Shanghai Restaurant,Snack Place,Soba Restaurant,Soup Place,South American Restaurant,South Indian Restaurant,Southern / Soul Food Restaurant,Souvlaki Shop,Spanish Restaurant,Steakhouse,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tex-Mex Restaurant,Thai Restaurant,Theme Restaurant,Turkish Restaurant,Udon Restaurant,Ukrainian Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Wings Joint
0,Battery Park City,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.040816,0.0,0.020408,0.0,0.0,0.040816,0.020408,0.0,0.040816,0.0,0.0,0.0,0.0,0.0,0.061224,0.0,0.0,0.0,0.0,0.040816,0.0,0.020408,0.061224,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.0,0.040816,0.0,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.061224,0.0,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.020408,0.040816,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.122449,0.0,0.0,0.0,0.040816,0.0,0.020408,0.061224,0.0,0.040816,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.040816,0.020408,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Carnegie Hill,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.08,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.09,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.06,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.05,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.1,0.0,0.0,0.01,0.03,0.0,0.02,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.02,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01
2,Central Harlem,0.0,0.06,0.04,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.02,0.0,0.0,0.0,0.02,0.02,0.0,0.02,0.02,0.0,0.0,0.0,0.06,0.0,0.06,0.0,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.04,0.1,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.02,0.0,0.0,0.04,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0
3,Chelsea,0.0,0.0,0.06,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.07,0.0,0.01,0.0,0.03,0.01,0.0,0.0,0.03,0.01,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.05,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.06,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.01,0.0,0.0,0.0,0.04,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.05,0.01,0.0,0.01,0.01,0.0,0.01,0.03,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.04,0.0,0.0,0.01,0.0,0.04,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0
4,Chinatown,0.0,0.0,0.03,0.0,0.0,0.02,0.01,0.01,0.0,0.01,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.03,0.0,0.0,0.15,0.0,0.0,0.0,0.0,0.0,0.05,0.01,0.01,0.0,0.04,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.04,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.06,0.0
5,Civic Center,0.0,0.0,0.06,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.05,0.0,0.0,0.0,0.0,0.05,0.01,0.0,0.05,0.01,0.0,0.02,0.0,0.0,0.05,0.0,0.0,0.01,0.0,0.02,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.09,0.0,0.02,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.02,0.05,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.04,0.01
6,Clinton,0.0,0.0,0.07,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.03,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.13,0.0,0.02,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.02,0.05,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.05,0.01,0.0,0.01,0.03,0.0,0.01,0.05,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0
7,East Harlem,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.093333,0.0,0.0,0.0,0.0,0.053333,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.053333,0.0,0.0,0.013333,0.0,0.12,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.013333,0.013333,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.013333,0.053333,0.0,0.0,0.0,0.0,0.106667,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.093333,0.0,0.0,0.0,0.053333,0.0,0.0,0.026667,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026667,0.026667,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.053333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,East Village,0.0,0.0,0.03,0.01,0.01,0.01,0.0,0.0,0.01,0.03,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.05,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.04,0.01,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.05,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.07,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.03,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.05,0.0,0.06,0.01
9,Financial District,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.04,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.02,0.0,0.0,0.0,0.01,0.0,0.05,0.02,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.07,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.05,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09,0.0,0.0,0.0,0.01,0.0,0.04,0.07,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0


#### Print each neighborhood along with the top 5 most common restaurants

In [72]:
num_top_venues = 5

for hood in manhattan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = manhattan_grouped[manhattan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Battery Park City----
                venue  freq
0         Pizza Place  0.12
1  Chinese Restaurant  0.06
2          Donut Shop  0.06
3  Italian Restaurant  0.06
4      Sandwich Place  0.06


----Carnegie Hill----
                venue  freq
0         Pizza Place  0.10
1                Café  0.09
2              Bakery  0.08
3    Sushi Restaurant  0.06
4  Italian Restaurant  0.06


----Central Harlem----
                             venue  freq
0                    Deli / Bodega  0.12
1              Fried Chicken Joint  0.10
2  Southern / Soul Food Restaurant  0.08
3                      Pizza Place  0.06
4               Chinese Restaurant  0.06


----Chelsea----
                 venue  freq
0               Bakery  0.07
1  American Restaurant  0.06
2   Italian Restaurant  0.06
3          Pizza Place  0.05
4    French Restaurant  0.05


----Chinatown----
                   venue  freq
0     Chinese Restaurant  0.15
1                 Bakery  0.10
2  Vietnamese Restaurant  0.06
3     D

#### Put that into a *pandas* dataframe

First, sort the venues in descending order.

In [73]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create the new dataframe and display the top 10 venues for each neighborhood.

In [74]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Pizza Place,Donut Shop,Sandwich Place,Italian Restaurant,Chinese Restaurant,Food Truck,Steakhouse,Seafood Restaurant,Restaurant,Mexican Restaurant
1,Carnegie Hill,Pizza Place,Café,Bakery,Italian Restaurant,Sushi Restaurant,Mexican Restaurant,Japanese Restaurant,French Restaurant,Sandwich Place,Chinese Restaurant
2,Central Harlem,Deli / Bodega,Fried Chicken Joint,Southern / Soul Food Restaurant,African Restaurant,Seafood Restaurant,Caribbean Restaurant,Chinese Restaurant,Pizza Place,French Restaurant,Sandwich Place
3,Chelsea,Bakery,American Restaurant,Italian Restaurant,French Restaurant,Pizza Place,Sushi Restaurant,Seafood Restaurant,New American Restaurant,Tapas Restaurant,Mexican Restaurant
4,Chinatown,Chinese Restaurant,Bakery,Vietnamese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Hotpot Restaurant,Noodle House,Sandwich Place,Mexican Restaurant,American Restaurant


Run *k*-means to cluster the neighborhood into 10 clusters.

In [75]:
kclusters = 10

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

kmeans.labels_[0:10] 

array([0, 1, 7, 4, 4, 4, 1, 2, 4, 0], dtype=int32)

Create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [76]:

neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

manhattan_merged = manhattan_data


manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head() 

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,9,Sandwich Place,Pizza Place,Deli / Bodega,American Restaurant,Donut Shop,Latin American Restaurant,Café,Mexican Restaurant,Diner,Spanish Restaurant
1,Manhattan,Chinatown,40.715618,-73.994279,4,Chinese Restaurant,Bakery,Vietnamese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Hotpot Restaurant,Noodle House,Sandwich Place,Mexican Restaurant,American Restaurant
2,Manhattan,Washington Heights,40.851903,-73.9369,2,Bakery,Mexican Restaurant,Deli / Bodega,Chinese Restaurant,Latin American Restaurant,Tapas Restaurant,Pizza Place,Café,Donut Shop,Fast Food Restaurant
3,Manhattan,Inwood,40.867684,-73.92121,2,Spanish Restaurant,Mexican Restaurant,Pizza Place,Restaurant,Café,Bakery,Latin American Restaurant,Deli / Bodega,Seafood Restaurant,Thai Restaurant
4,Manhattan,Hamilton Heights,40.823604,-73.949688,7,Deli / Bodega,Pizza Place,Mexican Restaurant,Chinese Restaurant,Donut Shop,Sandwich Place,Café,Food Truck,American Restaurant,Fast Food Restaurant


Visualize the resulting clusters

In [77]:

map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Each Cluster

### Cluster 1

In [78]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
28,Battery Park City,Pizza Place,Donut Shop,Sandwich Place,Italian Restaurant,Chinese Restaurant,Food Truck,Steakhouse,Seafood Restaurant,Restaurant,Mexican Restaurant
29,Financial District,Pizza Place,American Restaurant,Sandwich Place,Italian Restaurant,Mexican Restaurant,Food Truck,Café,Salad Place,Deli / Bodega,Steakhouse
37,Stuyvesant Town,Pizza Place,Deli / Bodega,Bakery,Diner,Sandwich Place,Ethiopian Restaurant,Mexican Restaurant,South Indian Restaurant,Bistro,Taco Place


### Cluster 2

In [79]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Yorkville,Italian Restaurant,Pizza Place,Thai Restaurant,Deli / Bodega,Indian Restaurant,Japanese Restaurant,Mexican Restaurant,Diner,Sandwich Place,Sushi Restaurant
10,Lenox Hill,Italian Restaurant,Sushi Restaurant,Café,Burger Joint,Pizza Place,Deli / Bodega,Mexican Restaurant,Bakery,Restaurant,Thai Restaurant
12,Upper West Side,Italian Restaurant,Pizza Place,American Restaurant,Bakery,Indian Restaurant,Mediterranean Restaurant,Deli / Bodega,Bagel Shop,Café,Breakfast Spot
14,Clinton,Italian Restaurant,American Restaurant,Thai Restaurant,Pizza Place,Sandwich Place,Mexican Restaurant,Bakery,Restaurant,Chinese Restaurant,Steakhouse
27,Gramercy,Italian Restaurant,Deli / Bodega,Indian Restaurant,Thai Restaurant,Bagel Shop,Mexican Restaurant,Pizza Place,Sandwich Place,Restaurant,Diner
30,Carnegie Hill,Pizza Place,Café,Bakery,Italian Restaurant,Sushi Restaurant,Mexican Restaurant,Japanese Restaurant,French Restaurant,Sandwich Place,Chinese Restaurant
31,Noho,Italian Restaurant,Pizza Place,Japanese Restaurant,Sushi Restaurant,Mexican Restaurant,Bakery,Café,Sandwich Place,French Restaurant,Thai Restaurant
34,Sutton Place,Italian Restaurant,American Restaurant,Indian Restaurant,Pizza Place,Chinese Restaurant,Sushi Restaurant,Japanese Restaurant,Salad Place,French Restaurant,Thai Restaurant
35,Turtle Bay,Italian Restaurant,Japanese Restaurant,Indian Restaurant,Steakhouse,Deli / Bodega,Sushi Restaurant,Greek Restaurant,French Restaurant,Ramen Restaurant,Seafood Restaurant


### Cluster 3

In [80]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Washington Heights,Bakery,Mexican Restaurant,Deli / Bodega,Chinese Restaurant,Latin American Restaurant,Tapas Restaurant,Pizza Place,Café,Donut Shop,Fast Food Restaurant
3,Inwood,Spanish Restaurant,Mexican Restaurant,Pizza Place,Restaurant,Café,Bakery,Latin American Restaurant,Deli / Bodega,Seafood Restaurant,Thai Restaurant
5,Manhattanville,Chinese Restaurant,Mexican Restaurant,Deli / Bodega,Sandwich Place,Seafood Restaurant,Fried Chicken Joint,Bakery,Italian Restaurant,Sushi Restaurant,Donut Shop
7,East Harlem,Deli / Bodega,Mexican Restaurant,Pizza Place,Bakery,Latin American Restaurant,Restaurant,Chinese Restaurant,Thai Restaurant,Burger Joint,Fast Food Restaurant
20,Lower East Side,Deli / Bodega,Chinese Restaurant,Pizza Place,Mexican Restaurant,Café,Japanese Restaurant,Bakery,Sandwich Place,Italian Restaurant,Diner


### Cluster 4

In [81]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
33,Midtown South,Korean Restaurant,Japanese Restaurant,Bakery,American Restaurant,Salad Place,Burger Joint,Italian Restaurant,Sandwich Place,New American Restaurant,Restaurant


### Cluster 5

In [82]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chinatown,Chinese Restaurant,Bakery,Vietnamese Restaurant,Dim Sum Restaurant,Dumpling Restaurant,Hotpot Restaurant,Noodle House,Sandwich Place,Mexican Restaurant,American Restaurant
15,Midtown,American Restaurant,Sandwich Place,Bakery,Steakhouse,Japanese Restaurant,French Restaurant,Burger Joint,Café,Italian Restaurant,Pizza Place
16,Murray Hill,Sandwich Place,Japanese Restaurant,Sushi Restaurant,American Restaurant,Burger Joint,Pizza Place,Italian Restaurant,Bakery,Restaurant,Seafood Restaurant
17,Chelsea,Bakery,American Restaurant,Italian Restaurant,French Restaurant,Pizza Place,Sushi Restaurant,Seafood Restaurant,New American Restaurant,Tapas Restaurant,Mexican Restaurant
19,East Village,Pizza Place,Vietnamese Restaurant,Chinese Restaurant,Mexican Restaurant,Vegetarian / Vegan Restaurant,French Restaurant,Italian Restaurant,Korean Restaurant,Japanese Restaurant,Ramen Restaurant
22,Little Italy,Chinese Restaurant,Italian Restaurant,Café,Bakery,Pizza Place,Mediterranean Restaurant,Thai Restaurant,Seafood Restaurant,Vegetarian / Vegan Restaurant,Sandwich Place
32,Civic Center,Italian Restaurant,French Restaurant,American Restaurant,Chinese Restaurant,Sandwich Place,Café,Bakery,Burger Joint,Vietnamese Restaurant,Pizza Place


### Cluster 6

In [83]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 5, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
26,Morningside Heights,Deli / Bodega,Chinese Restaurant,Italian Restaurant,Pizza Place,Café,Food Truck,Burger Joint,American Restaurant,Mexican Restaurant,Sandwich Place
36,Tudor City,Café,Deli / Bodega,Pizza Place,Food Truck,Mexican Restaurant,Sushi Restaurant,Burger Joint,Japanese Restaurant,Bagel Shop,Sandwich Place
39,Hudson Yards,Italian Restaurant,American Restaurant,Café,Deli / Bodega,Restaurant,Pizza Place,Salad Place,Sandwich Place,Burger Joint,Spanish Restaurant


### Cluster 7

In [84]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 6, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Upper East Side,Italian Restaurant,American Restaurant,Bakery,Pizza Place,Diner,Café,Sushi Restaurant,Salad Place,Burger Joint,Mexican Restaurant
13,Lincoln Square,Italian Restaurant,Food Truck,Café,French Restaurant,American Restaurant,Pizza Place,Chinese Restaurant,Bakery,Mediterranean Restaurant,Mexican Restaurant
18,Greenwich Village,Italian Restaurant,Café,Sushi Restaurant,American Restaurant,Indian Restaurant,Pizza Place,Chinese Restaurant,French Restaurant,Sandwich Place,Seafood Restaurant
21,Tribeca,Italian Restaurant,American Restaurant,Deli / Bodega,French Restaurant,Café,Pizza Place,Mexican Restaurant,Asian Restaurant,Sushi Restaurant,Steakhouse
23,Soho,Italian Restaurant,Café,French Restaurant,Mediterranean Restaurant,American Restaurant,Sandwich Place,Bakery,Vegetarian / Vegan Restaurant,Pizza Place,Asian Restaurant
24,West Village,Italian Restaurant,American Restaurant,New American Restaurant,Japanese Restaurant,Pizza Place,French Restaurant,Seafood Restaurant,Gastropub,Steakhouse,Indian Restaurant
38,Flatiron,Italian Restaurant,American Restaurant,New American Restaurant,Sandwich Place,Mediterranean Restaurant,Japanese Restaurant,Café,Mexican Restaurant,Vegetarian / Vegan Restaurant,Sushi Restaurant


### Cluster 8

In [85]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 7, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Hamilton Heights,Deli / Bodega,Pizza Place,Mexican Restaurant,Chinese Restaurant,Donut Shop,Sandwich Place,Café,Food Truck,American Restaurant,Fast Food Restaurant
6,Central Harlem,Deli / Bodega,Fried Chicken Joint,Southern / Soul Food Restaurant,African Restaurant,Seafood Restaurant,Caribbean Restaurant,Chinese Restaurant,Pizza Place,French Restaurant,Sandwich Place
25,Manhattan Valley,Deli / Bodega,Indian Restaurant,Pizza Place,Mexican Restaurant,Chinese Restaurant,Latin American Restaurant,Thai Restaurant,Gastropub,Café,Bagel Shop


### Cluster 9

In [86]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 8, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Roosevelt Island,Café,Greek Restaurant,Sandwich Place,Japanese Restaurant,Caribbean Restaurant,Food,Deli / Bodega,Pizza Place,Dumpling Restaurant,Eastern European Restaurant


### Cluster 10

In [87]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 9, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Sandwich Place,Pizza Place,Deli / Bodega,American Restaurant,Donut Shop,Latin American Restaurant,Café,Mexican Restaurant,Diner,Spanish Restaurant


## Results and Discussion <a name="results"></a>

Our analysis shows that although there is a great number of restaurants in Manhanntan, there are pockets of low restaurant diversity represented by outlier clusters containing singular neighborhoods(Clusters 4,9,10). Highest diversity of restaurants categories was detected in Clusters 2 & 7 which are areas that offer a combination of popularity among tourists, closeness to city center, strong socio-economic dynamics.

Those location candidates were then clustered to create zones of interest which contain greatest number of location candidates. This does not imply that these areas are actually optimal locations for a new restaurant. Purpose of this analysis was to only provide information on areas which currently have a strong restaurant presence. Recommended zones should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify areas in Manhattan and segment what restaurants were present in order to aid a restauranteur in narrowing down the search for optimal location for a restaurant. By utilizing Foursquare data to identify restaurant distribution we have first identified general areas that justify further analysis, and then generated extensive collection of locations which satisfy some basic requirements regarding existing nearby restaurants. Clustering of those locations was then performed in order to create major zones of interest (containing greatest number of potential locations) and addresses of those zone centers were created to be used as starting points for final exploration.

Final decission on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors such as competition quality, real estate availability/pricing, social and economic dynamics of every neighborhood.