# IBM CAPSTONE PROJECT: The Battle of Neighboorhoods

### Capstone Project - REPORT CONTENT
1. Introduction Section : Discussion of the business problem and the interested audience in this project.
2. Data Section : Description of the data that will be used to solve the problem and the sources.
3. Methodology section : Discussion and description of exploratory data analysis carried out, any inferential statistical testing performed, and if any machine learnings were used establishing the strategy and purposes.
4. Results section : Discussion of the results.
5. Discussion section : Elaboration and discussion on any observations noted and any recommendations suggested based on the results.
6. Conclusion section : Report Conclusion.

### Introduction:

New York City (NYC), often called simply New York, is the most populous city in the United States. With an estimated 2019 population of 8,336,817 distributed over about 302.6 square miles (784 km2), New York City is also the most densely populated major city in the United States.
New York City is composed of five boroughs, each of which is a county of the State of New York. The five boroughs—Brooklyn, Queens, Manhattan, the Bronx, and Staten Island—were consolidated into a single city in 1898.

This final project explores the best locations for Sushi restaurants throughout the city of New York. As New York is the most diverse city in the world (800 languages are spoken in New York), it has a long tradition of different ethnical restaurants. Now when the idea of a healthy lifestyle conquered the minds of people all over the country, Sushi restaurants became extremely popular, as they offer a healthy alternative to regular American eating habits. That's why potentially the owner of the new Japanese restaurant can have great success and consistent profit. However, as with any business, opening a new restaurant requires serious considerations and is more complicated than it seems from the first glance. In particular, the location of the restaurant is one of the most important factors that will affect whether it will have success or a failure. So my project will attempt to answer the questions “Where should the investor open a Sushi Restaurant in Manhattan?” and “Where should I go If I want great Shusi?”



### Data Section:

In order to answer the above questions, data on New York City neighborhoods, boroughs to include boundaries, latitude, longitude, restaurants, and restaurant ratings and tips are required.

1. New York City data containing the neighborhoods and boroughs, latitudes, and longitudes will be obtained from the data source: https://cocl.us/new_york_dataset

2. All data related to locations and quality of Japanese restaurants will be obtained via the FourSquare API utilized via the Request library in Python.



#### Import Libraries:
In this section we import the libraries that will be required to process the data.

In [1]:
!pip install beautifulsoup4
!pip install lxml
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation


# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 


from IPython.display import display_html
import pandas as pd
import numpy as np
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize


# import k-means from clustering stage
from sklearn.cluster import KMeans

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt
%matplotlib inline


print('Libraries imported.')


Libraries imported.


In [2]:
!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values



In [3]:
! pip install folium==0.5.0
import folium # plotting library
from bs4 import BeautifulSoup
print('Folium installed')

Collecting folium==0.5.0
  Downloading folium-0.5.0.tar.gz (79 kB)
[K     |████████████████████████████████| 79 kB 7.0 MB/s  eta 0:00:01
[?25hCollecting branca
  Downloading branca-0.4.1-py3-none-any.whl (24 kB)
Building wheels for collected packages: folium
  Building wheel for folium (setup.py) ... [?25ldone
[?25h  Created wheel for folium: filename=folium-0.5.0-py3-none-any.whl size=76240 sha256=b32d0fe6822765cd192b88cfea632d1e1bb87a893ae7a13d31174b517d3c7a92
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/b2/2f/2c/109e446b990d663ea5ce9b078b5e7c1a9c45cca91f377080f8
Successfully built folium
Installing collected packages: branca, folium
Successfully installed branca-0.4.1 folium-0.5.0
Folium installed


In [4]:
import urllib.request
from urllib.request import urlopen
import json

### Methodology:
Download and explore data set of New York City

In [5]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Data downloaded!


#### Tranform the data into a pandas dataframe

In [6]:
neighborhoods_data = newyork_data['features']
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [7]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


#### Use geopy library to get the latitude and longitude values of New York City.

In [8]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


#### Create a map of New York with neighborhoods superimposed on top.

In [9]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [10]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Borough'], manhattan_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

### Using FourSquare API

In [11]:
import urllib
def getNearbyVenues(names, latitudes, longitudes, radius=5000, categoryIds=''):
    try:
        venues_list=[]
        for name, lat, lng in zip(names, latitudes, longitudes):
            #print(name)

            # create the API request URL
            url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)

            if (categoryIds != ''):
                url = url + '&categoryId={}'
                url = url.format(categoryIds)

            # make the GET request
            response = requests.get(url).json()
            results = response["response"]['venues']

            # return only relevant information for each nearby venue
            for v in results:
                success = False
                try:
                    category = v['categories'][0]['name']
                    success = True
                except:
                    pass

                if success:
                    venues_list.append([(
                        name, 
                        lat, 
                        lng, 
                        v['name'], 
                        v['location']['lat'], 
                        v['location']['lng'],
                        v['categories'][0]['name']
                    )])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude',  
                  'Venue Category']
    
    except:
        print(url)
        print(response)
        print(results)
        print(nearby_venues)

    return(nearby_venues)



In [12]:
LIMIT = 500 
radius = 5000            
CLIENT_ID = 'J1LQTAE1OSOK1OFEDUETVUBMV3IIPJUGYNPRU5E3ITVCGB1U'
CLIENT_SECRET = 'XNVSM1R4XDCAOXOGBOG004PFRIZQ4L3O5FECYDD31O0XNR5L'
VERSION = '20181020'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: J1LQTAE1OSOK1OFEDUETVUBMV3IIPJUGYNPRU5E3ITVCGB1U
CLIENT_SECRET:XNVSM1R4XDCAOXOGBOG004PFRIZQ4L3O5FECYDD31O0XNR5L


In [13]:
#https://developer.foursquare.com/docs/resources/categories
#Sushi = 4bf58dd8d48988d1d2941735
neighborhoods = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
newyork_venues_sushi = getNearbyVenues(names=neighborhoods['Neighborhood'], latitudes=neighborhoods['Latitude'], longitudes=neighborhoods['Longitude'], radius=1000, categoryIds='4bf58dd8d48988d1d2941735')
newyork_venues_sushi.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Planet Tokyo,40.886233,-73.909479,Sushi Restaurant
1,Chinatown,40.715618,-73.994279,Nakaji,40.715912,-73.996597,Sushi Restaurant
2,Chinatown,40.715618,-73.994279,Sushumai Asian Fusion,40.721155,-73.987337,Sushi Restaurant
3,Chinatown,40.715618,-73.994279,Shinsen,40.715608,-73.996611,Japanese Restaurant
4,Chinatown,40.715618,-73.994279,Bondi Bar,40.721247,-73.996264,Sushi Restaurant


In [14]:
newyork_venues_sushi.shape

(847, 7)

In [15]:
def addToMap(df, color, existingMap):
    for lat, lng, local, venue, venueCat in zip(df['Venue Latitude'], df['Venue Longitude'], df['Neighborhood'], df['Venue'], df['Venue Category']):
        label = '{} ({}) - {}'.format(venue, venueCat, local)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color=color,
            fill=True,
            fill_color=color,
            fill_opacity=0.7).add_to(existingMap)

In [16]:
map_newyork_sushi = folium.Map(location=[latitude, longitude], zoom_start=10)
addToMap(newyork_venues_sushi, 'red', map_newyork_sushi)

map_newyork_sushi

In [17]:
def addColumn(startDf, columnTitle, dataDf):
    grouped = dataDf.groupby('Neighborhood').count()
    
    for n in startDf['Neighborhood']:
        try:
            startDf.loc[startDf['Neighborhood'] == n,columnTitle] = grouped.loc[n, 'Venue']
        except:
            startDf.loc[startDf['Neighborhood'] == n,columnTitle] = 0

In [18]:
manhattan_grouped = newyork_venues_sushi.groupby('Neighborhood').count()
manhattan_grouped
#print('There are {} uniques categories.'.format(len(newyork_venues_sushi['Venue Category'].unique())))

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,22,22,22,22,22,22
Carnegie Hill,24,24,24,24,24,24
Central Harlem,4,4,4,4,4,4
Chelsea,30,30,30,30,30,30
Chinatown,24,24,24,24,24,24
Civic Center,30,30,30,30,30,30
Clinton,30,30,30,30,30,30
East Harlem,4,4,4,4,4,4
East Village,29,29,29,29,29,29
Financial District,20,20,20,20,20,20


#### Analyze each neighbourhoods.

In [19]:
# one hot encoding
manhattan_onehot = pd.get_dummies(newyork_venues_sushi[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = newyork_venues_sushi['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Asian Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant,Indian Chinese Restaurant,...,Noodle House,Poke Place,Ramen Restaurant,Sake Bar,Seafood Restaurant,Smoothie Shop,Steakhouse,Sushi Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant
0,Marble Hill,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
1,Chinatown,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
2,Chinatown,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0
3,Chinatown,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Chinatown,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,1,0,0


In [20]:
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
manhattan_grouped

Unnamed: 0,Neighborhood,Asian Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant,Indian Chinese Restaurant,...,Noodle House,Poke Place,Ramen Restaurant,Sake Bar,Seafood Restaurant,Smoothie Shop,Steakhouse,Sushi Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant
0,Battery Park City,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.818182,0.045455,0.0
1,Carnegie Hill,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.75,0.0,0.041667
2,Central Harlem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
3,Chelsea,0.066667,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.766667,0.0,0.033333
4,Chinatown,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.875,0.0,0.0
5,Civic Center,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.866667,0.033333,0.0
6,Clinton,0.033333,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.733333,0.0,0.0
7,East Harlem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0
8,East Village,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.896552,0.0,0.034483
9,Financial District,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.85,0.0,0.0


In [21]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [22]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Sushi Restaurant,Japanese Restaurant,Noodle House,Theme Restaurant,Hawaiian Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market
1,Carnegie Hill,Sushi Restaurant,Japanese Restaurant,Indian Chinese Restaurant,Vegetarian / Vegan Restaurant,Asian Restaurant,Smoothie Shop,Seafood Restaurant,Sake Bar,Ramen Restaurant,Poke Place
2,Central Harlem,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant
3,Chelsea,Sushi Restaurant,Japanese Restaurant,Asian Restaurant,Vegetarian / Vegan Restaurant,Fish Market,Hawaiian Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega
4,Chinatown,Sushi Restaurant,Japanese Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant


#### Cluster Neighbourhoods

In [23]:
# set number of clusters
kclusters = 5

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 1, 2, 1, 0, 0, 1, 2, 0, 0], dtype=int32)

In [24]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

manhattan_merged = manhattan_data
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,2,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant
1,Manhattan,Chinatown,40.715618,-73.994279,0,Sushi Restaurant,Japanese Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant
2,Manhattan,Washington Heights,40.851903,-73.9369,2,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant
3,Manhattan,Inwood,40.867684,-73.92121,2,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant
4,Manhattan,Hamilton Heights,40.823604,-73.949688,2,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant


In [25]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Cluster 0 has Red color in the map.

In [26]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chinatown,Sushi Restaurant,Japanese Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant
11,Roosevelt Island,Sushi Restaurant,Asian Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant
19,East Village,Sushi Restaurant,Japanese Restaurant,Vegetarian / Vegan Restaurant,Steakhouse,Hawaiian Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market
21,Tribeca,Sushi Restaurant,Noodle House,Theme Restaurant,Vegetarian / Vegan Restaurant,Hawaiian Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market
22,Little Italy,Sushi Restaurant,Japanese Restaurant,Noodle House,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store
23,Soho,Sushi Restaurant,Japanese Restaurant,Noodle House,Theme Restaurant,Hawaiian Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market
28,Battery Park City,Sushi Restaurant,Japanese Restaurant,Noodle House,Theme Restaurant,Hawaiian Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market
31,Noho,Sushi Restaurant,Japanese Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant
32,Civic Center,Sushi Restaurant,Japanese Restaurant,Noodle House,Theme Restaurant,Hawaiian Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market
33,Midtown South,Sushi Restaurant,Vegetarian / Vegan Restaurant,Ramen Restaurant,Bakery,Japanese Restaurant,Seafood Restaurant,Sake Bar,Smoothie Shop,Poke Place,Noodle House


Cluster 1 has the purple color in the map.

In [27]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Upper East Side,Sushi Restaurant,Japanese Restaurant,Asian Restaurant,Vegetarian / Vegan Restaurant,Hawaiian Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market
9,Yorkville,Sushi Restaurant,Japanese Restaurant,Indian Chinese Restaurant,Vegetarian / Vegan Restaurant,Asian Restaurant,Smoothie Shop,Seafood Restaurant,Sake Bar,Ramen Restaurant,Poke Place
10,Lenox Hill,Sushi Restaurant,Japanese Restaurant,Asian Restaurant,Vegetarian / Vegan Restaurant,Hawaiian Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market
13,Lincoln Square,Sushi Restaurant,Japanese Restaurant,Smoothie Shop,Chinese Restaurant,Grocery Store,Indian Chinese Restaurant,Bakery,Cocktail Bar,Deli / Bodega,Fish Market
14,Clinton,Sushi Restaurant,Japanese Restaurant,Seafood Restaurant,Chinese Restaurant,Cocktail Bar,Asian Restaurant,Sake Bar,Ramen Restaurant,Poke Place,Noodle House
15,Midtown,Sushi Restaurant,Seafood Restaurant,Vegetarian / Vegan Restaurant,Bakery,Ramen Restaurant,Japanese Restaurant,Sake Bar,Smoothie Shop,Poke Place,Noodle House
16,Murray Hill,Sushi Restaurant,Japanese Restaurant,Ramen Restaurant,Bakery,Vegetarian / Vegan Restaurant,Asian Restaurant,Seafood Restaurant,Sake Bar,Smoothie Shop,Poke Place
17,Chelsea,Sushi Restaurant,Japanese Restaurant,Asian Restaurant,Vegetarian / Vegan Restaurant,Fish Market,Hawaiian Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega
20,Lower East Side,Sushi Restaurant,Japanese Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant
24,West Village,Sushi Restaurant,Japanese Restaurant,Sake Bar,Fish Market,Vegetarian / Vegan Restaurant,Asian Restaurant,Seafood Restaurant,Smoothie Shop,Ramen Restaurant,Poke Place


Cluster 2 has the orange color in the map.

In [28]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,Upper West Side,Sushi Restaurant,Japanese Restaurant,Grocery Store,Asian Restaurant,Smoothie Shop,Seafood Restaurant,Sake Bar,Ramen Restaurant,Poke Place,Noodle House


Cluster 3 has the green color in the map.

In [29]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant
2,Washington Heights,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant
3,Inwood,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant
4,Hamilton Heights,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant
5,Manhattanville,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant
6,Central Harlem,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant
7,East Harlem,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant
18,Greenwich Village,Sushi Restaurant,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store,Hawaiian Restaurant


Cluster 4 has the orange color in the map.

In [30]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
25,Manhattan Valley,Sushi Restaurant,Japanese Restaurant,Hawaiian Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store
26,Morningside Heights,Sushi Restaurant,Japanese Restaurant,Hawaiian Restaurant,Indian Chinese Restaurant,Bakery,Chinese Restaurant,Cocktail Bar,Deli / Bodega,Fish Market,Grocery Store


### Results section:

The results of the exploratory data analysis and clustering is summarized below :
-Analyzing the data we can affirm that sushi restaurant es a very common venue in Manhattan.
-After clustering the neighborhoods of the borough we stablish that cluster 1 (colour purple in the map) has the more quantity of sushi restaurants.
-Cluster 0 (colour red in the map) is the second cluster with more sushi restaurants.

-Otherwise cluster 2 (colour orange in the map) and cluster 4 (colour green in the map) have the least number of sushi restaurants.


### Discussion:

After analyze the results we check that from Central Park to Manhattan Bridge (South Side Manhattan) is a touristic and financial area, that is the reason because we can find more Sushi restaurant in this area. This area is the most important zone of Manhattan. This area includes cluster purple and red, just the two clusters with more number of Sushi restaurant.
The rest of area of Manhattan from the Central Park to Inwood Hill Park (The north side Manhattan) is a residential area. There are not as many business as in the south touristic zone.
Based on the result of the project an investor should open a sushi restaurant in the area of the cluster purple or in case that can not afford it would also be interesting open it in the red cluster. Basically is all the touristic and financial area where greater consumption is generated.
 


### Conclusion:
Finally, to conclude this project, we have got a small glimpse for how real-life Data science project looks like. I have used some frequently used python libraries to handle JSON file, plotting maps, and other exploratory data analysis. Use Foursquare API to major boroughs of New York City and their neighborhoods. Potential for this kind of analysis in a real-life business problem is discussed in great detail. As a final note, all of the above analyses is depended on the adequacy and accuracy of Four Square data.
