# Peer-Graded Assignment: Week 5 Notebook

### Step 1: Setting up the environment by importing necessary libraries

As this project is closely related to the suggestions provided in the *Instructions* sections of Week 4 we can import the similar libraries as we did in the segmenting and clustering lab in Week 3. Thus, we can import the following libraries:

In [1]:
import numpy as np #library to handle data in a vectorized manner
import pandas as pd #library for data analysis
pd.set_option('display.max_columns', None) 
pd.set_option('display.max_rows', None)

#!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim
import urllib.request
import json
from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import matplotlib.colors as colors
%matplotlib inline
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Libraries imported.')

Libraries imported.


### Step 2: Download and explore the New York City dataset

As explained in the *Data* explanation of the project, we are going to focus our analysis on the New York City region and especially Manhattan borough later. In order to this we have found the site to download the '.json' file of the region. However since we already have the path provided to us in the lab during Week 3, we will use the same file link for our project. 

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Data downloaded!


Since the file contains huge amount of data, we don't display the entire data here but use it for further manipulations.

##### Transforming the given data into a _*Pandas*_ DataFrame

In [3]:
neighborhoods_data = newyork_data['features']
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [4]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


The above output displays the list of all the neighborhoods in the New York City region. The first task in this project is to display the neighborhoods on the map so that we have some idea about the geographical locations of these regions and then focus on the neighborhoods in the Manhattan region. It can be done using a **GeoCoder** library in python in the following manner:

In [5]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


Since we are more concerned with the Manhattan borough, we will display the neighborhoods of Manhattan on the map of New York City as a whole. 

In [6]:
man_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
man_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [7]:
import folium
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(man_data['Latitude'], man_data['Longitude'], man_data['Borough'], man_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

### Step 3: Extracting data from the Foursqaure API database

In [8]:
#Create Foursquare Credentials for the project 
CLIENT_ID = 'OS3MINE40UI4MBXV3SKECK5VSZP432X2AFXFRSWKFEXBLSKD' # your Foursquare ID
CLIENT_SECRET = 'JYYT3I41Y3T22GFDYKFKAUTTRC5K0LM5VIQTEYTGR0GVAIPW' # your Foursquare Secret
VERSION = '20181020' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: OS3MINE40UI4MBXV3SKECK5VSZP432X2AFXFRSWKFEXBLSKD
CLIENT_SECRET:JYYT3I41Y3T22GFDYKFKAUTTRC5K0LM5VIQTEYTGR0GVAIPW


We now use the next dataset as explained in the previous *Data* explanation. It is the Foursquare dataset which contains information of the Indian restaurants in the surrounding regions. 
The link can be given as follows:

--> From Foursquare Venues Categories - https://developer.foursquare.com/docs/resources/categories 

--> Indian Restaurant ID - 4bf58dd8d48988d10f941735 


In [9]:
import urllib
def getNearbyVenues(names, latitudes, longitudes, radius=5000, categoryIds=''):
    try:
        venues_list=[]
        for name, lat, lng in zip(names, latitudes, longitudes):
            #print(name)

            # create the API request URL
            url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)

            if (categoryIds != ''):
                url = url + '&categoryId={}'
                url = url.format(categoryIds)

            # make the GET request
            response = requests.get(url).json()
            results = response["response"]['venues']

            # return only relevant information for each nearby venue
            for v in results:
                success = False
                try:
                    category = v['categories'][0]['name']
                    success = True
                except:
                    pass

                if success:
                    venues_list.append([(
                        name, 
                        lat, 
                        lng, 
                        v['name'], 
                        v['location']['lat'], 
                        v['location']['lng'],
                        v['categories'][0]['name']
                    )])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude',  
                  'Venue Category']
    
    except:
        print(url)
        print(response)
        print(results)
        print(nearby_venues)

    return(nearby_venues)

In [10]:
LIMIT = 500 
radius = 5000 

In [11]:
neighborhoods = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
newyork_venues_ind = getNearbyVenues(names=neighborhoods['Neighborhood'], latitudes=neighborhoods['Latitude'], longitudes=neighborhoods['Longitude'], radius=1000, categoryIds='4bf58dd8d48988d10f941735')
newyork_venues_ind.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Marble Hill,40.876551,-73.91066,Riverdale Indian Cuisine,40.880886,-73.9088,Indian Restaurant
1,Marble Hill,40.876551,-73.91066,Tazmohol Indian Restaurant,40.879331,-73.903192,Indian Restaurant
2,Marble Hill,40.876551,-73.91066,Cumin Indian Cuisine,40.886459,-73.909816,Indian Restaurant
3,Chinatown,40.715618,-73.994279,Kabab Bites,40.720094,-73.995819,Indian Restaurant
4,Chinatown,40.715618,-73.994279,indi thai,40.71983,-73.99035,Indian Restaurant


In [12]:
newyork_venues_ind.shape

(1090, 7)

Adding these restaurants to the existing map of New York...

In [13]:
def addToMap(df, color, existingMap):
    for lat, lng, local, venue, venueCat in zip(df['Venue Latitude'], df['Venue Longitude'], df['Neighborhood'], df['Venue'], df['Venue Category']):
        label = '{} ({}) - {}'.format(venue, venueCat, local)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color=color,
            fill=True,
            fill_color=color,
            fill_opacity=0.7).add_to(existingMap)

In [14]:
map_newyork_ind = folium.Map(location=[latitude, longitude], zoom_start=10)
addToMap(newyork_venues_ind, 'red', map_newyork_ind)

map_newyork_ind

In [15]:
man_ind=newyork_venues_ind.groupby('Neighborhood').count()
man_ind

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Battery Park City,25,25,25,25,25,25
Carnegie Hill,13,13,13,13,13,13
Central Harlem,10,10,10,10,10,10
Chelsea,17,17,17,17,17,17
Chinatown,25,25,25,25,25,25
Civic Center,30,30,30,30,30,30
Clinton,45,45,45,45,45,45
East Harlem,8,8,8,8,8,8
East Village,47,47,47,47,47,47
Financial District,26,26,26,26,26,26


In [16]:
print('There are {} uniques categories.'.format(len(newyork_venues_ind['Venue Category'].unique())))

There are 16 uniques categories.


The above statement means that there are 16 different types of restaurants or chains of restaurants which have the potential to prepare Indian foods, although of different types. 

### Step 4: Analyze each neighborhood

The next step involves the analysis based on the frequency of use of these restaurants. It is similar to that which is performed in the lab course in Week 3.


In [17]:
# one hot encoding
manhattan_onehot = pd.get_dummies(newyork_venues_ind[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = newyork_venues_ind['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

manhattan_onehot.head()

Unnamed: 0,Neighborhood,Asian Restaurant,Burrito Place,Chaat Place,Deli / Bodega,Diner,Dosa Place,Food Truck,Himalayan Restaurant,Hookah Bar,Indian Chinese Restaurant,Indian Restaurant,North Indian Restaurant,Pakistani Restaurant,South Indian Restaurant,Tapas Restaurant,Vegetarian / Vegan Restaurant
0,Marble Hill,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
1,Marble Hill,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
2,Marble Hill,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
3,Chinatown,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0
4,Chinatown,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0


In [18]:
manhattan_onehot.shape

(1090, 17)

Now let's calculate the frequency of occurence of these events by calculating the mean for each cell. This will give us an idea about the number of restaurants a particular neighborhood has for a specific cuisine. 

In [19]:
manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
manhattan_grouped

Unnamed: 0,Neighborhood,Asian Restaurant,Burrito Place,Chaat Place,Deli / Bodega,Diner,Dosa Place,Food Truck,Himalayan Restaurant,Hookah Bar,Indian Chinese Restaurant,Indian Restaurant,North Indian Restaurant,Pakistani Restaurant,South Indian Restaurant,Tapas Restaurant,Vegetarian / Vegan Restaurant
0,Battery Park City,0.04,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.84,0.0,0.0,0.0,0.04,0.0
1,Carnegie Hill,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.923077,0.076923,0.0,0.0,0.0,0.0
2,Central Harlem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
3,Chelsea,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.058824,0.764706,0.0,0.0,0.0,0.0,0.058824
4,Chinatown,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.88,0.04,0.0,0.0,0.0,0.0
5,Civic Center,0.033333,0.0,0.0,0.0,0.0,0.033333,0.1,0.0,0.0,0.0,0.8,0.0,0.0,0.0,0.033333,0.0
6,Clinton,0.0,0.0,0.0,0.0,0.022222,0.0,0.111111,0.0,0.0,0.0,0.844444,0.0,0.0,0.022222,0.0,0.0
7,East Harlem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0
8,East Village,0.0,0.0,0.021277,0.042553,0.0,0.042553,0.021277,0.0,0.0,0.0,0.829787,0.021277,0.0,0.0,0.0,0.021277
9,Financial District,0.038462,0.0,0.0,0.0,0.0,0.0,0.153846,0.0,0.0,0.0,0.769231,0.0,0.0,0.0,0.038462,0.0


In [20]:
manhattan_grouped.shape

(39, 17)

Now let's perform a detailed frequency analysis which will help us denote which restaurant is frequently visited by customers in a partcular neighborhood. 

In [21]:
num_top_venues = 5

for hood in manhattan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = manhattan_grouped[manhattan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Battery Park City----
               venue  freq
0  Indian Restaurant  0.84
1         Food Truck  0.08
2   Asian Restaurant  0.04
3   Tapas Restaurant  0.04
4      Burrito Place  0.00


----Carnegie Hill----
                     venue  freq
0        Indian Restaurant  0.92
1  North Indian Restaurant  0.08
2         Asian Restaurant  0.00
3            Burrito Place  0.00
4              Chaat Place  0.00


----Central Harlem----
               venue  freq
0  Indian Restaurant   1.0
1   Asian Restaurant   0.0
2      Burrito Place   0.0
3        Chaat Place   0.0
4      Deli / Bodega   0.0


----Chelsea----
                           venue  freq
0              Indian Restaurant  0.76
1                     Food Truck  0.12
2      Indian Chinese Restaurant  0.06
3  Vegetarian / Vegan Restaurant  0.06
4               Asian Restaurant  0.00


----Chinatown----
                     venue  freq
0        Indian Restaurant  0.88
1            Deli / Bodega  0.04
2               Dosa Place  0.04

Putting the above data into a pandas dataframe we have the following details:


In [22]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [23]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Indian Restaurant,Food Truck,Tapas Restaurant,Asian Restaurant,Vegetarian / Vegan Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar
1,Carnegie Hill,Indian Restaurant,North Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck
2,Central Harlem,Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck
3,Chelsea,Indian Restaurant,Food Truck,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Hookah Bar,Himalayan Restaurant
4,Chinatown,Indian Restaurant,North Indian Restaurant,Dosa Place,Deli / Bodega,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,Indian Chinese Restaurant,Hookah Bar


### Step 5: Clustering the Neighborhoods

In [44]:
# set number of clusters
kclusters = 5

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 0, 1, 2, 0, 2, 2, 1, 4, 2], dtype=int32)

In [46]:
# add clustering labels
#neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

manhattan_merged = man_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

manhattan_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,1.0,Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck
1,Manhattan,Chinatown,40.715618,-73.994279,0.0,Indian Restaurant,North Indian Restaurant,Dosa Place,Deli / Bodega,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,Indian Chinese Restaurant,Hookah Bar
2,Manhattan,Washington Heights,40.851903,-73.9369,1.0,Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck
3,Manhattan,Inwood,40.867684,-73.92121,,,,,,,,,,,
4,Manhattan,Hamilton Heights,40.823604,-73.949688,1.0,Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck


In [47]:
manhattan_merged=manhattan_merged.dropna(axis=0)
manhattan_merged=manhattan_merged.reset_index(drop=True)
manhattan_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,1.0,Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck
1,Manhattan,Chinatown,40.715618,-73.994279,0.0,Indian Restaurant,North Indian Restaurant,Dosa Place,Deli / Bodega,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,Indian Chinese Restaurant,Hookah Bar
2,Manhattan,Washington Heights,40.851903,-73.9369,1.0,Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck
3,Manhattan,Hamilton Heights,40.823604,-73.949688,1.0,Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck
4,Manhattan,Manhattanville,40.816934,-73.957385,1.0,Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck


In [48]:
manhattan_merged.shape

(39, 15)

In [49]:
manhattan_merged['Cluster Labels']=manhattan_merged['Cluster Labels'].astype(int)
manhattan_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,1,Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck
1,Manhattan,Chinatown,40.715618,-73.994279,0,Indian Restaurant,North Indian Restaurant,Dosa Place,Deli / Bodega,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,Indian Chinese Restaurant,Hookah Bar
2,Manhattan,Washington Heights,40.851903,-73.9369,1,Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck
3,Manhattan,Hamilton Heights,40.823604,-73.949688,1,Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck
4,Manhattan,Manhattanville,40.816934,-73.957385,1,Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck


In [50]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Step 6: Examine the cluster contents


In this Section we examine the neighbourhoods in each cluster one at a time. The first cluster can be shown as follows:

**Cluster 1**

In [51]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chinatown,Indian Restaurant,North Indian Restaurant,Dosa Place,Deli / Bodega,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,Indian Chinese Restaurant,Hookah Bar
7,Upper East Side,Indian Restaurant,North Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck
8,Yorkville,Indian Restaurant,North Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck
9,Lenox Hill,Indian Restaurant,North Indian Restaurant,Asian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant
10,Roosevelt Island,Indian Restaurant,Asian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant
11,Upper West Side,Indian Restaurant,South Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck
12,Lincoln Square,Indian Restaurant,South Indian Restaurant,Food Truck,Vegetarian / Vegan Restaurant,Tapas Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant
17,Greenwich Village,Indian Restaurant,North Indian Restaurant,Food Truck,Dosa Place,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,Indian Chinese Restaurant,Hookah Bar
19,Lower East Side,Indian Restaurant,Vegetarian / Vegan Restaurant,North Indian Restaurant,Deli / Bodega,Chaat Place,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,Indian Chinese Restaurant,Hookah Bar
22,Soho,Indian Restaurant,Dosa Place,North Indian Restaurant,Food Truck,Deli / Bodega,Asian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant


In [52]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck
2,Washington Heights,Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck
3,Hamilton Heights,Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck
4,Manhattanville,Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck
5,Central Harlem,Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck
6,East Harlem,Indian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Himalayan Restaurant,Food Truck
24,Manhattan Valley,Indian Restaurant,Himalayan Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Food Truck
25,Morningside Heights,Indian Restaurant,Himalayan Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar,Food Truck


In [53]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
13,Clinton,Indian Restaurant,Food Truck,South Indian Restaurant,Diner,Vegetarian / Vegan Restaurant,Tapas Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar
16,Chelsea,Indian Restaurant,Food Truck,Vegetarian / Vegan Restaurant,Indian Chinese Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Hookah Bar,Himalayan Restaurant
27,Battery Park City,Indian Restaurant,Food Truck,Tapas Restaurant,Asian Restaurant,Vegetarian / Vegan Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar
28,Financial District,Indian Restaurant,Food Truck,Tapas Restaurant,Asian Restaurant,Vegetarian / Vegan Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar
31,Civic Center,Indian Restaurant,Food Truck,Tapas Restaurant,Dosa Place,Asian Restaurant,Vegetarian / Vegan Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant
38,Hudson Yards,Indian Restaurant,Food Truck,South Indian Restaurant,Diner,Vegetarian / Vegan Restaurant,Tapas Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant,Hookah Bar


In [54]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
36,Stuyvesant Town,Indian Restaurant,Vegetarian / Vegan Restaurant,North Indian Restaurant,Dosa Place,Deli / Bodega,Chaat Place,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant,Indian Chinese Restaurant


In [55]:
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Midtown,Indian Restaurant,Food Truck,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Diner,Vegetarian / Vegan Restaurant,Tapas Restaurant,Indian Chinese Restaurant,Hookah Bar
15,Murray Hill,Indian Restaurant,South Indian Restaurant,Food Truck,Vegetarian / Vegan Restaurant,North Indian Restaurant,Diner,Tapas Restaurant,Pakistani Restaurant,Indian Chinese Restaurant,Hookah Bar
18,East Village,Indian Restaurant,Dosa Place,Deli / Bodega,Vegetarian / Vegan Restaurant,North Indian Restaurant,Food Truck,Chaat Place,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant
20,Tribeca,Indian Restaurant,Food Truck,Tapas Restaurant,Dosa Place,Asian Restaurant,Vegetarian / Vegan Restaurant,South Indian Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant
21,Little Italy,Indian Restaurant,Dosa Place,North Indian Restaurant,Food Truck,Deli / Bodega,Asian Restaurant,Vegetarian / Vegan Restaurant,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant
26,Gramercy,Indian Restaurant,Vegetarian / Vegan Restaurant,South Indian Restaurant,North Indian Restaurant,Hookah Bar,Food Truck,Deli / Bodega,Chaat Place,Burrito Place,Tapas Restaurant
30,Noho,Indian Restaurant,Dosa Place,Vegetarian / Vegan Restaurant,North Indian Restaurant,Food Truck,Deli / Bodega,Chaat Place,Tapas Restaurant,South Indian Restaurant,Pakistani Restaurant
32,Midtown South,Indian Restaurant,South Indian Restaurant,Food Truck,Vegetarian / Vegan Restaurant,Diner,Burrito Place,Tapas Restaurant,Pakistani Restaurant,North Indian Restaurant,Indian Chinese Restaurant
37,Flatiron,Indian Restaurant,South Indian Restaurant,Food Truck,Vegetarian / Vegan Restaurant,North Indian Restaurant,Hookah Bar,Diner,Deli / Bodega,Burrito Place,Tapas Restaurant


From the frequency of use as displayed above, we can compare the most visited places and see which of them have Indian restaurants as the *Most Common Venue*. Even if the North Indian Restaurants have the second or third most common venue place in a particular neighborhood it can tell us that we have a great chance of making profit there if we have adequate sources to open our restaurant in that place. A more detailed summary and analysis is provided in the project report. Please check out the project report for more details. 