# Capstone Project - The Battle of the Neighborhoods

## Applied Data Science Capstone by IBM | Coursera
---
### TABLE OF CONTENTS
1. INTRODUCTION: BUSINESS PROBLEM
2. DATA
3. ANALYSIS
4. RESULTS & DISCUSSION
5. CONCLUSION
---

### 1. INTRODUCTION: BUSINESS PROBLEM

A pair of restauraunt owners want to expand their operations to high density locations that are most ideal location to open up a profitable restaurant(s). The growth strategy is to be able to open a chain of sustainable franchises domestically in the United States northeast and then other major cities across the nation in the future. The owners want to target Manhattan in New York City first.


Over the past couple decades, the food services industry has evolved significantly to accommodate a new wave of genres and eating habits. Boutique restaurants, fast-food franchises, and even grocery stores have gradually shifted to a wider range of appeal to everchanging modern consumer diets. The push for healthier foods and proof of provenance from farm to consumer has become increasingly vital - thus creating more market space for niche yet popular cuisines. Manhattan is one of the most dense areas in the United States and is known for it’s fast pace and diverse culture with a variety of food options that the owners think will serve as a great starting point for our initiative. By exploring each neighborhood, their venues, and analyzing trends via Foursquare data, the owners will be able to effectively gauge optimal locations that will yield the best possible success, sustainability, and growth for their restaurnant expansion.

---

### 2. DATA

#### Dataset 1 - New York City borough and neighborhood data

Firstly, I will be using data from a webpage which provides information about list of different boroughs and their neighborhoods in New York City. I will download the file locally as a JSON file and upload it into the notebook. I will then extracting the data into a table from the JSON format. This table contains four columns: Borough, Neighborhood, Latitude, and Longitude. The link to the page woth the data is: (https://cocl.us/new_york_dataset).

#### Here is a screenshot of a sample from the dataset from the JSON file:

#### Dataset 2 - Different venues in Manhattan, New York City

This dataset will be formed using the Foursquare API. I will use the Foursquare location data to explore different venues in each Manhattan neighborhood. The types of these venues can vary from Parks, Coffee Shops, Hotels, to Gyms, etc. Using the Foursquare location data, I will quickly be able to get information about these venues and analyze the neighborhoods they reside in based on this information.

We will use the geographical coordinates from the above dataset to generate the location dataset. 

#### Here is a screenshot of a sample from the Foursquare dataset:

I will cross reference the two datasets along with aggregating preprocessed data from each dataset to analyze and find the best place to open a restaurant(s) in Manhattan.

---

#### Importing libraries used to explore New York City and Foursquare data.

In [1]:
# Data gathering, analysis, and manipulation

# dataframe and vector processing
import pandas as pd
import numpy as np

# handles data in JSON files format and web scraping
import json
import requests

# handles data clustering
from sklearn.cluster import KMeans
from sklearn.cluster import DBSCAN

#!conda install -c conda-forge geopy --yes
# convert an address into latitude and longitude values
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values


# Data visualization

import matplotlib.cm as cm
import matplotlib.colors as colors

# map rendering library
!conda install -c conda-forge folium=0.5.0 --yes
import folium 

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    ca-certificates-2019.6.16  |       hecc5488_0         145 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    altair-3.2.0               |           py36_0         770 KB  conda-forge
    certifi-2019.6.16          |           py36_1         149 KB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         3.3 MB

The following NEW packages will be 

#### Downloading the New York City dataset as a local copy and loading it into the environment.

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    
newyork_data

Data downloaded!


{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

In [3]:
neighborhoods_data = newyork_data['features']
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

#### Parsing out the Borough, Neighborhood, Latitude, and Longitude data from dataset and transposing it into a DataFrame.

In [4]:
# Define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# Instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

# Looping through the data to fill the dataframe one row at a time
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    
neighborhoods.head(10)

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585
5,Bronx,Kingsbridge,40.881687,-73.902818
6,Manhattan,Marble Hill,40.876551,-73.91066
7,Bronx,Woodlawn,40.898273,-73.867315
8,Bronx,Norwood,40.877224,-73.879391
9,Bronx,Williamsbridge,40.881039,-73.857446


In [5]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


#### Geographical coordinates of New York City.

In [6]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of New York City are 40.7127281, -74.0060152.


### Visualization
We visualize data many times throughout the stages of this analysis. We are first visualizing the each of the boroughs and their neighborhoods to validate their coordinates.

#### Creating a map of New York City's 5 boroughs and 306 neighborhoods.

In [7]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

Feel free to zoom in the above map, and click on each circle mark to reveal the name of the neighborhood and its respective borough.

#### Segmenting the neighborhoods by their respective borough.

In [8]:
neighborhoods['Borough'].unique()

manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
# manhattan_data.head()

bronx_data = neighborhoods[neighborhoods['Borough'] == 'Bronx'].reset_index(drop=True)
# bronx_data.head()

brooklyn_data = neighborhoods[neighborhoods['Borough'] == 'Brooklyn'].reset_index(drop=True)
# brooklyn_data.head()

queens_data = neighborhoods[neighborhoods['Borough'] == 'Queens'].reset_index(drop=True)
# queens_data.head()

staten_island_data = neighborhoods[neighborhoods['Borough'] == 'Staten Island'].reset_index(drop=True)
#staten_island_data.head()
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


#### Manhattan Neighborhoods

In [9]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location_manhattan = geolocator.geocode(address)
latitude_manhattan = location_manhattan.latitude
longitude_manhattan = location.longitude
print('The geograpical coordinate of ' + address + ' are {}, {}.'.format(latitude_manhattan, longitude_manhattan))

The geograpical coordinate of Manhattan, NY are 40.7900869, -74.0060152.


In [10]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude_manhattan, longitude_manhattan], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan

#### Bronx Neighborhoods

In [11]:
address = 'Bronx, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location_bronx = geolocator.geocode(address)
latitude_bronx = location_bronx.latitude
longitude_bronx = location_bronx.longitude
print('The geograpical coordinate of ' + address + ' are {}, {}.'.format(latitude_bronx, longitude_bronx))

The geograpical coordinate of Bronx, NY are 40.85048545, -73.8404035580209.


In [12]:
map_bronx = folium.Map(location=[latitude_bronx, longitude_bronx], zoom_start=11)

# add markers to map
for lat, lng, label in zip(bronx_data['Latitude'], bronx_data['Longitude'], bronx_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bronx)  
    
map_bronx

#### Brooklyn Neighborhoods

In [13]:
address = 'Brooklyn, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location_brooklyn = geolocator.geocode(address)
latitude_brooklyn = location_brooklyn.latitude
longitude_brooklyn = location_brooklyn.longitude
print('The geograpical coordinate of ' + address + ' are {}, {}.'.format(latitude_brooklyn, longitude_brooklyn))

The geograpical coordinate of Brooklyn, NY are 40.6501038, -73.9495823.


In [14]:
map_brooklyn = folium.Map(location=[latitude_brooklyn, longitude_brooklyn], zoom_start=11)

# add markers to map
for lat, lng, label in zip(brooklyn_data['Latitude'], brooklyn_data['Longitude'], brooklyn_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_brooklyn)  
    
map_brooklyn

#### Queens Neighborhoods

In [15]:
address = 'Queens, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location_queens = geolocator.geocode(address)
latitude_queens = location_queens.latitude
longitude_queens = location_queens.longitude
print('The geograpical coordinate of ' + address + ' are {}, {}.'.format(latitude_queens, longitude_queens))

The geograpical coordinate of Queens, NY are 40.6524927, -73.7914214158161.


In [16]:
map_queens = folium.Map(location=[latitude_queens, longitude_queens], zoom_start=11)

# add markers to map
for lat, lng, label in zip(queens_data['Latitude'], queens_data['Longitude'], queens_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_queens)  
    
map_queens

#### Staten Island Neighborhoods

In [17]:
address = 'Staten Island, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of ' + address + ' are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Staten Island, NY are 40.5834557, -74.1496048.


In [18]:
map_staten_island = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(staten_island_data['Latitude'], staten_island_data['Longitude'], staten_island_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_staten_island)  
    
map_staten_island

---

### Foursqaure API

Next, we are going to use Foursquare API to explore the neighborhood venues in Manhattan and segment them into a second dataset.

#### Foursquare Credentials and Version

In [19]:
# The code was removed by Watson Studio for sharing.

Your credentails:
CLIENT_ID: OKMJLTY41IIO40Z2DYIAT0QMYII3LEKBHYAWHF000RW2LY5F
CLIENT_SECRET:VL2W2DJ0NAT4KDGG4B5EOWW22D13FZNQDRWH0GWSCFKRBXAE


---

### Explore different venues in different Neighborhoods of Manhattan

In [20]:
# function to get all neighborhood venues in a borough

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
            
       # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [21]:
# run the above function on each Manhattan neighborhood and create a new dataframe
manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )

In [22]:
print(manhattan_venues.shape)

(3321, 7)


#### Manhattan Venues dataframe

In [23]:
# showing first 15 rows of Manhattan venues dataframe
manhattan_venues.head(15)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Arturo's,40.874412,-73.910271,Pizza Place
1,Marble Hill,40.876551,-73.91066,Bikram Yoga,40.876844,-73.906204,Yoga Studio
2,Marble Hill,40.876551,-73.91066,Tibbett Diner,40.880404,-73.908937,Diner
3,Marble Hill,40.876551,-73.91066,Starbucks,40.877531,-73.905582,Coffee Shop
4,Marble Hill,40.876551,-73.91066,Dunkin',40.877136,-73.906666,Donut Shop
5,Marble Hill,40.876551,-73.91066,Blink Fitness Riverdale,40.877147,-73.905837,Gym
6,Marble Hill,40.876551,-73.91066,TCR The Club of Riverdale,40.878628,-73.914568,Tennis Stadium
7,Marble Hill,40.876551,-73.91066,Land & Sea Restaurant,40.877885,-73.905873,Seafood Restaurant
8,Marble Hill,40.876551,-73.91066,T.J. Maxx,40.877232,-73.905042,Department Store
9,Marble Hill,40.876551,-73.91066,Starbucks,40.873755,-73.908613,Coffee Shop


In [24]:
manhattan_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,100,100,100,100,100,100
Carnegie Hill,100,100,100,100,100,100
Central Harlem,42,42,42,42,42,42
Chelsea,100,100,100,100,100,100
Chinatown,100,100,100,100,100,100
Civic Center,100,100,100,100,100,100
Clinton,100,100,100,100,100,100
East Harlem,44,44,44,44,44,44
East Village,100,100,100,100,100,100
Financial District,100,100,100,100,100,100


#### Checking if Foursquare has missing data for any of the neighborhoods in the original dataset.

In [25]:
print('There are {} number of Manhattan neighborhoods that Forusquare does not provide information for.'.format(
        len(manhattan_data['Neighborhood'].unique()) - len(manhattan_venues['Neighborhood'].unique())
    )
)

There are 0 number of Manhattan neighborhoods that Forusquare does not provide information for.


---
### 3. ANALYSIS

We analyze each neighborhood's venues in Manhattan through one hot encoding (assigning ‘1’ if a venue category is there and ‘0’ in if venue category is not there). On the basis of one hot encoding, we calculate mean of the frequency of occurrence of each category and picked top five venues on that basis for each neighborhood. This means that the top venues are showing greater foot traffic indicating they are more visited places of which we can cross reference and visualize. 


#### Preprocessing the second dataset, manhattan_venues dataframe, with one hot encoding so that we can then easily cluster the dataset.

In [26]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,...,Volleyball Court,Watch Shop,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
2,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [27]:
# grouping rows by neighborhood and suming the frqueency of each venue and saving to a new dataframe
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').sum().reset_index()
manhattan_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Adult Boutique,Afghan Restaurant,African Restaurant,American Restaurant,Antique Shop,Arcade,Arepa Restaurant,Argentinian Restaurant,...,Volleyball Court,Watch Shop,Waterfront,Weight Loss Center,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,Battery Park City,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,1,3,0,2,0
1,Carnegie Hill,0,0,0,0,1,0,0,0,1,...,0,0,0,0,0,1,3,0,1,3
2,Central Harlem,0,0,0,3,2,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Chelsea,0,0,0,0,3,1,0,0,0,...,0,0,0,0,0,0,2,0,1,0
4,Chinatown,0,0,0,0,4,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


We're interested in venues in the 'food' category, but only those that are proper restaurants - coffee shops, pizza places, bakeries, etc. are not direct competitors, so we don't care about those. Therefore, our list will only focus on venues that have 'restaurant' in their category name, and we'll make sure to detect and include all the subcategories of different restaurants in the neighborhood. For example, Afghan restaurant, Italian restaurant, etc. For this, we locate venues from manhattan_onehot dataframe that are restaurants only.

In [28]:
# including rows from dataframe manhattan_onehot with venues category=Restaurants and saving it to new dataframe manhattan_restaurants
col = ['Neighborhood']
for column in manhattan_onehot.columns:
    if column.__contains__('Restaurant'):
        col.append(column)

manhattan_restaurants = manhattan_onehot[col]
manhattan_restaurants = manhattan_restaurants.groupby('Neighborhood').sum().reset_index()
manhattan_restaurants['Total'] = manhattan_restaurants.sum(axis=1)
manhattan_restaurants.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,Brazilian Restaurant,...,Szechuan Restaurant,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Total
0,Battery Park City,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,10
1,Carnegie Hill,0,0,1,0,1,0,0,0,0,...,0,0,0,1,0,0,1,0,2,23
2,Central Harlem,0,3,2,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,14
3,Chelsea,0,0,3,0,0,1,0,0,0,...,0,0,2,0,0,0,1,0,0,24
4,Chinatown,0,0,4,0,0,2,0,1,0,...,0,0,0,1,0,0,1,0,4,41


In [29]:
manhattan_restaurants.shape

(40, 79)

#### Using K-Means clustering algorithm to segment the dataset into clusters.

In [30]:
# set number of clusters
kclusters = 5

manhattan_grouped_clustering = manhattan_restaurants.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 4, 2, 4, 3, 4, 4, 2, 3, 4], dtype=int32)

In [31]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#### Preparing dataset neighborhoods_venues_sorted in which all Manhattan neighborhoods are listed along with its top 5 most common venues. This will help yield a better visualization of each cluster.

In [32]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_restaurants['Neighborhood']

for ind in np.arange(manhattan_restaurants.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Battery Park City,Park,Coffee Shop,Hotel,Memorial Site,Gym
1,Carnegie Hill,Pizza Place,Coffee Shop,Cosmetics Shop,Café,French Restaurant
2,Central Harlem,African Restaurant,French Restaurant,Public Art,Cosmetics Shop,Seafood Restaurant
3,Chelsea,Coffee Shop,Italian Restaurant,Ice Cream Shop,Nightclub,Bakery
4,Chinatown,Chinese Restaurant,Cocktail Bar,Vietnamese Restaurant,American Restaurant,Salon / Barbershop


In [33]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

manhattan_merged = manhattan_data

# merge manhattan_restaurants with manhattan_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,1,Sandwich Place,Coffee Shop,Discount Store,Yoga Studio,Pharmacy
1,Manhattan,Chinatown,40.715618,-73.994279,3,Chinese Restaurant,Cocktail Bar,Vietnamese Restaurant,American Restaurant,Salon / Barbershop
2,Manhattan,Washington Heights,40.851903,-73.9369,4,Café,Bakery,Mobile Phone Shop,Grocery Store,Spanish Restaurant
3,Manhattan,Inwood,40.867684,-73.92121,2,Mexican Restaurant,Café,Lounge,Deli / Bodega,Pizza Place
4,Manhattan,Hamilton Heights,40.823604,-73.949688,2,Mexican Restaurant,Café,Pizza Place,Coffee Shop,Yoga Studio
5,Manhattan,Manhattanville,40.816934,-73.957385,2,Italian Restaurant,Coffee Shop,Mexican Restaurant,Park,Deli / Bodega
6,Manhattan,Central Harlem,40.815976,-73.943211,2,African Restaurant,French Restaurant,Public Art,Cosmetics Shop,Seafood Restaurant
7,Manhattan,East Harlem,40.792249,-73.944182,2,Mexican Restaurant,Bakery,Deli / Bodega,Thai Restaurant,Latin American Restaurant
8,Manhattan,Upper East Side,40.775639,-73.960508,4,Italian Restaurant,Exhibit,Art Gallery,Bakery,Coffee Shop
9,Manhattan,Yorkville,40.77593,-73.947118,0,Italian Restaurant,Gym,Coffee Shop,Bar,Pizza Place


#### Map of Manhattan showing all 40 neighborhoods and their respective cluster. Different colors representing each cluster.

In [34]:
# create map
map_clusters = folium.Map(location=[latitude_manhattan, longitude_manhattan], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

---

#### Cluster 0

In [35]:
cluster_0 = manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]
cluster_0.set_index('Neighborhood', inplace=True)
cluster_0

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Yorkville,Italian Restaurant,Gym,Coffee Shop,Bar,Pizza Place
Upper West Side,Italian Restaurant,Bar,Wine Bar,Mediterranean Restaurant,Cosmetics Shop
Murray Hill,Coffee Shop,Sandwich Place,Japanese Restaurant,Hotel,Gym
West Village,Italian Restaurant,Cosmetics Shop,New American Restaurant,Cocktail Bar,American Restaurant
Noho,Italian Restaurant,French Restaurant,Cocktail Bar,Sushi Restaurant,Grocery Store
Midtown South,Korean Restaurant,Hotel,Dessert Shop,Japanese Restaurant,Hotel Bar
Sutton Place,Gym / Fitness Center,Italian Restaurant,Furniture / Home Store,Indian Restaurant,Juice Bar
Tudor City,Mexican Restaurant,Park,Café,Pizza Place,Greek Restaurant
Flatiron,Gym,Yoga Studio,American Restaurant,Gym / Fitness Center,Japanese Restaurant


#### Cluster 1

In [36]:
cluster_1 = manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]
cluster_1.set_index('Neighborhood', inplace=True)
cluster_1

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Marble Hill,Sandwich Place,Coffee Shop,Discount Store,Yoga Studio,Pharmacy
Roosevelt Island,Park,Sandwich Place,Coffee Shop,Outdoors & Recreation,Greek Restaurant
Morningside Heights,Bookstore,American Restaurant,Coffee Shop,Park,Food Truck
Stuyvesant Town,Boat or Ferry,Bar,Park,Playground,Pet Service


#### Cluster 2

In [37]:
cluster_2 = manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]
cluster_2.set_index('Neighborhood', inplace=True)
cluster_2

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Inwood,Mexican Restaurant,Café,Lounge,Deli / Bodega,Pizza Place
Hamilton Heights,Mexican Restaurant,Café,Pizza Place,Coffee Shop,Yoga Studio
Manhattanville,Italian Restaurant,Coffee Shop,Mexican Restaurant,Park,Deli / Bodega
Central Harlem,African Restaurant,French Restaurant,Public Art,Cosmetics Shop,Seafood Restaurant
East Harlem,Mexican Restaurant,Bakery,Deli / Bodega,Thai Restaurant,Latin American Restaurant
Lincoln Square,Gym / Fitness Center,Theater,Concert Hall,Café,Plaza
Lower East Side,Coffee Shop,Pizza Place,Café,Chinese Restaurant,Bakery
Battery Park City,Park,Coffee Shop,Hotel,Memorial Site,Gym


#### Cluster 3

In [38]:
cluster_3 = manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]
cluster_3.set_index('Neighborhood', inplace=True)
cluster_3

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Chinatown,Chinese Restaurant,Cocktail Bar,Vietnamese Restaurant,American Restaurant,Salon / Barbershop
Greenwich Village,Italian Restaurant,Clothing Store,Sushi Restaurant,Café,Cosmetics Shop
East Village,Bar,Wine Bar,Chinese Restaurant,Pizza Place,Ice Cream Shop
Turtle Bay,Italian Restaurant,Coffee Shop,Steakhouse,Sushi Restaurant,Ramen Restaurant


#### Cluster 4

In [39]:
cluster_4 = manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]
cluster_4.set_index('Neighborhood', inplace=True)
cluster_4

Unnamed: 0_level_0,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Washington Heights,Café,Bakery,Mobile Phone Shop,Grocery Store,Spanish Restaurant
Upper East Side,Italian Restaurant,Exhibit,Art Gallery,Bakery,Coffee Shop
Lenox Hill,Coffee Shop,Italian Restaurant,Pizza Place,Sushi Restaurant,Sporting Goods Shop
Clinton,Theater,Gym / Fitness Center,Italian Restaurant,Hotel,American Restaurant
Midtown,Hotel,Theater,Coffee Shop,Cocktail Bar,Clothing Store
Chelsea,Coffee Shop,Italian Restaurant,Ice Cream Shop,Nightclub,Bakery
Tribeca,Italian Restaurant,Park,Café,Boutique,American Restaurant
Little Italy,Bakery,Café,Sandwich Place,Bubble Tea Shop,Clothing Store
Soho,Clothing Store,Boutique,Women's Store,Art Gallery,Shoe Store
Manhattan Valley,Coffee Shop,Pizza Place,Indian Restaurant,Yoga Studio,Szechuan Restaurant


---

### Examing Clusters
#### Breakdown of neighborhood and restaurant frequencies per Cluster

In [42]:
manhattan_restaurants.set_index('Neighborhood', inplace=True)
manhattan_restaurants.head()

Unnamed: 0_level_0,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,Brazilian Restaurant,Cambodian Restaurant,...,Szechuan Restaurant,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Total
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Battery Park City,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,10
Carnegie Hill,0,0,1,0,1,0,0,0,0,0,...,0,0,0,1,0,0,1,0,2,23
Central Harlem,0,3,2,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,14
Chelsea,0,0,3,0,0,1,0,0,0,0,...,0,0,2,0,0,0,1,0,0,24
Chinatown,0,0,4,0,0,2,0,1,0,0,...,0,0,0,1,0,0,1,0,4,41


In [43]:
print('Total number of neighborhoods in cluster 0 is', manhattan_restaurants.loc[cluster_0.index,:].shape[0])
print('Total number of restaurants in this cluster is', manhattan_restaurants.loc[cluster_0.index,:]['Total'].sum())
print('Ratio of Restaurant/Neighborhood in this cluster is', (manhattan_restaurants.loc[cluster_0.index,:]['Total'].sum()/manhattan_restaurants.loc[cluster_0.index,:].shape[0]))


Total number of neighborhoods in cluster 0 is 9
Total number of restaurants in this cluster is 284
Ratio of Restaurant/Neighborhood in this cluster is 31.555555555555557


In [44]:
print('Total number of neighborhoods in cluster 1 is', manhattan_restaurants.loc[cluster_1.index,:].shape[0])
print('Total number of restaurants in this cluster is', manhattan_restaurants.loc[cluster_1.index,:]['Total'].sum())
print('Ratio of Restaurant/Neighborhood in this cluster is', (manhattan_restaurants.loc[cluster_1.index,:]['Total'].sum()/manhattan_restaurants.loc[cluster_1.index,:].shape[0]))


Total number of neighborhoods in cluster 1 is 4
Total number of restaurants in this cluster is 13
Ratio of Restaurant/Neighborhood in this cluster is 3.25


In [45]:
print('Total number of neighborhoods in cluster 2 is', manhattan_restaurants.loc[cluster_2.index,:].shape[0])
print('Total number of restaurants in this cluster is', manhattan_restaurants.loc[cluster_2.index,:]['Total'].sum())
print('Ratio of Restaurant/Neighborhood in this cluster is', (manhattan_restaurants.loc[cluster_2.index,:]['Total'].sum()/manhattan_restaurants.loc[cluster_2.index,:].shape[0]))


Total number of neighborhoods in cluster 2 is 8
Total number of restaurants in this cluster is 119
Ratio of Restaurant/Neighborhood in this cluster is 14.875


In [46]:
print('Total number of neighborhoods in cluster 3 is', manhattan_restaurants.loc[cluster_3.index,:].shape[0])
print('Total number of restaurants in this cluster is', manhattan_restaurants.loc[cluster_3.index,:]['Total'].sum())
print('Ratio of Restaurant/Neighborhood in this cluster is', (manhattan_restaurants.loc[cluster_3.index,:]['Total'].sum()/manhattan_restaurants.loc[cluster_3.index,:].shape[0]))


Total number of neighborhoods in cluster 3 is 4
Total number of restaurants in this cluster is 163
Ratio of Restaurant/Neighborhood in this cluster is 40.75


In [47]:
print('Total number of neighborhoods in cluster 4 is', manhattan_restaurants.loc[cluster_4.index,:].shape[0])
print('Total number of restaurants in this cluster is', manhattan_restaurants.loc[cluster_4.index,:]['Total'].sum())
print('Ratio of Restaurant/Neighborhood in this cluster is', (manhattan_restaurants.loc[cluster_4.index,:]['Total'].sum()/manhattan_restaurants.loc[cluster_4.index,:].shape[0]))


Total number of neighborhoods in cluster 4 is 15
Total number of restaurants in this cluster is 336
Ratio of Restaurant/Neighborhood in this cluster is 22.4


#### Note: As it is clearly visible that Restaurant/Neighbourhood ratio is lowest for Cluster 0, we will further analyse neighbourhoods belonging to cluster 0 only.

---
#### Cluster 1 has the lowest Restaurant/Neighbourhood ratio out of the 5 clusters. 

In [48]:
cluster_1_restaurants = manhattan_restaurants.loc[cluster_1.index,:]
cluster_1_restaurants

Unnamed: 0_level_0,Afghan Restaurant,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Australian Restaurant,Austrian Restaurant,Brazilian Restaurant,Cambodian Restaurant,...,Szechuan Restaurant,Taiwanese Restaurant,Tapas Restaurant,Thai Restaurant,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Total
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
Marble Hill,0,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2
Roosevelt Island,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,2
Morningside Heights,0,0,3,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,8
Stuyvesant Town,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1


The above neighborhoods look ideal for opening a restaurant since there is no restaurants among their top 5 most common venues. Also, there are only 5 restaurants in total among the three neighborhoods.

We finally store the information of these three neighborhoods and their into a final dataframe.

In [49]:
manhattan_merged.set_index('Neighborhood', inplace=True)
final_manhattan_merged.head()

Unnamed: 0_level_0,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
Marble Hill,Manhattan,40.876551,-73.91066,1,Sandwich Place,Coffee Shop,Discount Store,Yoga Studio,Pharmacy
Chinatown,Manhattan,40.715618,-73.994279,3,Chinese Restaurant,Cocktail Bar,Vietnamese Restaurant,American Restaurant,Salon / Barbershop
Washington Heights,Manhattan,40.851903,-73.9369,4,Café,Bakery,Mobile Phone Shop,Grocery Store,Spanish Restaurant
Inwood,Manhattan,40.867684,-73.92121,2,Mexican Restaurant,Café,Lounge,Deli / Bodega,Pizza Place
Hamilton Heights,Manhattan,40.823604,-73.949688,2,Mexican Restaurant,Café,Pizza Place,Coffee Shop,Yoga Studio


#### Dropping Marble Hill from target neighborhoods in Cluster 1 since it is not located near the center of Manhattan

In [63]:
cluster_1_neighborhoods = manhattan_merged.loc[cluster_1_restaurants.index][['Borough', 'Latitude', 'Longitude']].reset_index(drop=False)
cluster_1_neighborhoods[['Borough', 'Neighborhood', 'Latitude', 'Longitude']]
final_neighborhoods = cluster_1_neighborhoods.iloc[[1,2,3]]
final_neighborhoods

Unnamed: 0,Neighborhood,Borough,Latitude,Longitude
1,Roosevelt Island,Manhattan,40.76216,-73.949168
2,Morningside Heights,Manhattan,40.808,-73.963896
3,Stuyvesant Town,Manhattan,40.731,-73.974052


#### Mapping the final three Manhattan neighborhoods

In [64]:
# create map of most suitable neighborhoods in Manhattan using latitude and longitude values from final dataframe
map_final_neighborhoods = folium.Map(location=[latitude_manhattan, longitude_manhattan], zoom_start=12)

# add markers to map
for lat_final, lng_final, borough_final, neighborhood_final in zip(final_neighborhoods['Latitude'], final_neighborhoods['Longitude'], final_neighborhoods['Borough'], final_neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood_final, borough_final)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat_final, lng_final],
        radius=9,
        popup=label,
        color='blue',
        fill=True,
        fill_color='blue',
        fill_opacity=1,
        parse_html=False).add_to(map_final_neighborhoods)  
    
map_final_neighborhoods

The three Manhattan neighborhoods - Morningside Heights, Roosevelt Island, and Stuyvesant Town - are depicted by the blue dots in the map above.

---

### 4. RESULTS & DISCUSSION

Although there are a plethora of restaurants in Manhattan, our analysis shows that there are pockets of low restaurant density fairly close to city center. To identify these pockets, we used a clustering algorithm and segmmented our neighborhood dataset accordingly.

Using the K-means clustering algorithm, we contructed five clusters each containing a portion of Manhattan neighborhoods based on the number of restaurants in their vicinity. Next, we analyzed each cluster by calculating Restaurant/Neighborhood ratio of each cluster. We saw that Cluster 1 had lowest ratio, which means very there are fewer restaurants present within the vicinity of each neighborhood compared to that of the other clusters. There were total 4 neighborhoods in Cluster 1. Upon further analysis, we found that 1 of the 4 neighborhoods, Marble Hill, was not a good location for opening up a new restaurant due it not being close to the center of Manhattan.

According to our analysis, we found three neighborhoods where new restaurant business might see increasing success and there are two reasons for that. First, we saw that these neighborhoods do not inhabit many restaurants which will lower the competition for the new restaurant(s) aspring success and sustainable growth. Second, as seen in the above map, these three neighborhoods are located toward the center and more dense areas of Manhattan that are more populus which will provide more foot traffic and potential customers.

The final three resulting neighborhoods ideal for opening a new restaurant are stored in a dataframe with their extended geographical contains information.

The owners can look to examine these three locations first and further determine the type of restaurant and cuisine that behooves their business strategy and growth.

---

### 5. CONCLUSION

The purpose of this project was to identify neighborhoods in Manhattan, New York with a low number of restaurants in order to aid stakeholders in narrowing down the search for optimal location(s) for a new restaurant. By analyzing restaurant density distribution from Foursquare data we have first identified the five most common nearby venues of each neighborhood. Then with the help of clustering techniques and further analysis we were able to narrow down to three neighborhoods, Morningside Heights, Roosevelt Island, and Stuyvesant Town, that fit the density criteria yielding ideal candidate locations for opening up a new restaurant.