<a href="https://colab.research.google.com/github/Kavipriya01/Battle-Neighborhood/blob/Temp_codes/Segmenting_and_clustering_Neighborhoods_in_New_York_City.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

<h1 align=center><font size = 5>Segmenting and Clustering Neighborhoods in New York City</font></h1>

Converted addresses into their equivalent latitude and longitude values. Also, Used the **Foursquare API** to explore neighborhoods in New York City.Used the **explore** function to get the most common venue categories in each neighborhood, and then use this feature to group the neighborhoods into clusters. Used use the **_k_-means clustering** algorithm to complete this task. Finally,used the **Folium library** to visualize the neighborhoods in New York City and their emerging clusters.

Before we get the data and start exploring it, let's download all the dependencies that we will need.


In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## Download and Explore Dataset

Neighborhood has a total of 5 boroughs and 306 neighborhoods. In order to segement the neighborhoods and explore them, we will essentially need a dataset that contains the 5 boroughs and the neighborhoods that exist in each borough as well as the the latitude and logitude coordinates of each neighborhood. 



In [2]:
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
print('Data downloaded!')

Data downloaded!


#### Load and explore the data

Next, let's load the data.

In [3]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [None]:
newyork_data

{'bbox': [-74.2492599487305,
  40.5033187866211,
  -73.7061614990234,
  40.9105606079102],
 'crs': {'properties': {'name': 'urn:ogc:def:crs:EPSG::4326'}, 'type': 'name'},
 'features': [{'geometry': {'coordinates': [-73.84720052054902,
     40.89470517661],
    'type': 'Point'},
   'geometry_name': 'geom',
   'id': 'nyu_2451_34572.1',
   'properties': {'annoangle': 0.0,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661],
    'borough': 'Bronx',
    'name': 'Wakefield',
    'stacked': 1},
   'type': 'Feature'},
  {'geometry': {'coordinates': [-73.82993910812398, 40.87429419303012],
    'type': 'Point'},
   'geometry_name': 'geom',
   'id': 'nyu_2451_34572.2',
   'properties': {'annoangle': 0.0,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.874294193

Notice how all the relevant data is in the _features_ key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.


In [4]:
neighborhoods_data = newyork_data['features']
neighborhoods_data

[{'geometry': {'coordinates': [-73.84720052054902, 40.89470517661],
   'type': 'Point'},
  'geometry_name': 'geom',
  'id': 'nyu_2451_34572.1',
  'properties': {'annoangle': 0.0,
   'annoline1': 'Wakefield',
   'annoline2': None,
   'annoline3': None,
   'bbox': [-73.84720052054902,
    40.89470517661,
    -73.84720052054902,
    40.89470517661],
   'borough': 'Bronx',
   'name': 'Wakefield',
   'stacked': 1},
  'type': 'Feature'},
 {'geometry': {'coordinates': [-73.82993910812398, 40.87429419303012],
   'type': 'Point'},
  'geometry_name': 'geom',
  'id': 'nyu_2451_34572.2',
  'properties': {'annoangle': 0.0,
   'annoline1': 'Co-op',
   'annoline2': 'City',
   'annoline3': None,
   'bbox': [-73.82993910812398,
    40.87429419303012,
    -73.82993910812398,
    40.87429419303012],
   'borough': 'Bronx',
   'name': 'Co-op City',
   'stacked': 2},
  'type': 'Feature'},
 {'geometry': {'coordinates': [-73.82780644716412, 40.887555677350775],
   'type': 'Point'},
  'geometry_name': 'geom',


In [5]:
neighborhoods_data[0]

{'geometry': {'coordinates': [-73.84720052054902, 40.89470517661],
  'type': 'Point'},
 'geometry_name': 'geom',
 'id': 'nyu_2451_34572.1',
 'properties': {'annoangle': 0.0,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661],
  'borough': 'Bronx',
  'name': 'Wakefield',
  'stacked': 1},
 'type': 'Feature'}

#### Tranform the data into a _pandas_ dataframe

The next task is essentially transforming this data of nested Python dictionaries into a _pandas_ dataframe. So let's start by creating an empty dataframe.


In [6]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time.

In [7]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
neighborhoods.head()    

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [8]:
neighborhoods.shape # shape of the df 


(306, 4)

In [9]:
## make sure the df has 5 boroughs and 306 neighborhoods
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(neighborhoods['Borough'].unique()),
        neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


#### Use geopy library to get the latitude and longitude values of New York City.


In [10]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


#### Create a map of New York with neighborhoods superimposed on top.


In [11]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

simplify the above map and segment and cluster only the neighborhoods in Manhattan. So let's slice the original dataframe and create a new dataframe of the Manhattan data.


In [12]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


In [13]:
## geographical coordinates of manhatten data

address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7896239, -73.9598939.


In [14]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  
    
map_manhattan


utilizing the Foursquare API to explore the neighborhoods and segment them.


#### Foursquare Credentials and Version


In [15]:
CLIENT_ID = 'F5PTHIVYMRU0KTRGJ15R43UGC2YQPPOTY2M5TVVKNCMFDG2A' # your Foursquare ID
CLIENT_SECRET = 'FQXS2CGW0YHTWSX55Z0HZD2VJWALC1OM1DEZXEXRMIDQG1VA' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: F5PTHIVYMRU0KTRGJ15R43UGC2YQPPOTY2M5TVVKNCMFDG2A
CLIENT_SECRET:FQXS2CGW0YHTWSX55Z0HZD2VJWALC1OM1DEZXEXRMIDQG1VA


Get the neighborhood's name,latitude and longitude values.


In [16]:
manhattan_data.loc[0, 'Neighborhood']

neighborhood_latitude = manhattan_data.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = manhattan_data.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = manhattan_data.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Marble Hill are 40.87655077879964, -73.91065965862981.


#### the top 100 venues that are in Marble Hill within a radius of 500 meters.


In [17]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
print(url) # display URL

## examine the results
results = requests.get(url).json()
print(results)

https://api.foursquare.com/v2/venues/explore?&client_id=F5PTHIVYMRU0KTRGJ15R43UGC2YQPPOTY2M5TVVKNCMFDG2A&client_secret=FQXS2CGW0YHTWSX55Z0HZD2VJWALC1OM1DEZXEXRMIDQG1VA&v=20180605&ll=40.87655077879964,-73.91065965862981&radius=500&limit=100
{'meta': {'code': 200, 'requestId': '608a3e59512b0e27322b4c23'}, 'response': {'suggestedFilters': {'header': 'Tap to show:', 'filters': [{'name': 'Open now', 'key': 'openNow'}]}, 'headerLocation': 'Marble Hill', 'headerFullLocation': 'Marble Hill, New York', 'headerLocationGranularity': 'neighborhood', 'totalResults': 23, 'suggestedBounds': {'ne': {'lat': 40.88105078329964, 'lng': -73.90471933917806}, 'sw': {'lat': 40.87205077429964, 'lng': -73.91659997808156}}, 'groups': [{'type': 'Recommended Places', 'name': 'recommended', 'items': [{'reasons': {'count': 0, 'items': [{'summary': 'This spot is popular', 'type': 'general', 'reasonName': 'globalInteractionReason'}]}, 'venue': {'id': '4baf59e8f964a520a6f93be3', 'name': 'Bikram Yoga', 'location': {'add

In [18]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [20]:
### json to dataframe


venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

print(nearby_venues.head())

## venues returned from foursquareAPI

print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

                               name   categories        lat        lng
0                       Bikram Yoga  Yoga Studio  40.876844 -73.906204
1                          Arturo's  Pizza Place  40.874412 -73.910271
2                     Tibbett Diner        Diner  40.880404 -73.908937
3  Astral Fitness & Wellness Center          Gym  40.876705 -73.906372
4                         Starbucks  Coffee Shop  40.877531 -73.905582
23 venues were returned by Foursquare.


  


## 2. Explore Neighborhoods in Manhattan


In [21]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )


Marble Hill
Chinatown
Washington Heights
Inwood
Hamilton Heights
Manhattanville
Central Harlem
East Harlem
Upper East Side
Yorkville
Lenox Hill
Roosevelt Island
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
Chelsea
Greenwich Village
East Village
Lower East Side
Tribeca
Little Italy
Soho
West Village
Manhattan Valley
Morningside Heights
Gramercy
Battery Park City
Financial District
Carnegie Hill
Noho
Civic Center
Midtown South
Sutton Place
Turtle Bay
Tudor City
Stuyvesant Town
Flatiron
Hudson Yards


In [22]:
## size of a dataframe
print(manhattan_venues.shape)
print(manhattan_venues.head())

## venues for each neighborhood
manhattan_venues.groupby('Neighborhood').count()

## unique categories
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))

(3231, 7)
  Neighborhood  Neighborhood Latitude  Neighborhood Longitude  \
0  Marble Hill              40.876551               -73.91066   
1  Marble Hill              40.876551               -73.91066   
2  Marble Hill              40.876551               -73.91066   
3  Marble Hill              40.876551               -73.91066   
4  Marble Hill              40.876551               -73.91066   

                              Venue  Venue Latitude  Venue Longitude  \
0                       Bikram Yoga       40.876844       -73.906204   
1                          Arturo's       40.874412       -73.910271   
2                     Tibbett Diner       40.880404       -73.908937   
3  Astral Fitness & Wellness Center       40.876705       -73.906372   
4                         Starbucks       40.877531       -73.905582   

  Venue Category  
0    Yoga Studio  
1    Pizza Place  
2          Diner  
3            Gym  
4    Coffee Shop  
There are 330 uniques categories.


## Analyze Each Neighborhood


In [23]:
# one hot encoding
manhattan_onehot = pd.get_dummies(manhattan_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manhattan_onehot['Neighborhood'] = manhattan_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [manhattan_onehot.columns[-1]] + list(manhattan_onehot.columns[:-1])
manhattan_onehot = manhattan_onehot[fixed_columns]

print(manhattan_onehot.head())

print("size of a dataframe")
print(manhattan_onehot.shape)

manhattan_grouped = manhattan_onehot.groupby('Neighborhood').mean().reset_index()
print(manhattan_grouped)

print("new size")
print(manhattan_grouped.shape)

  Neighborhood  Accessories Store  Adult Boutique  Afghan Restaurant  \
0  Marble Hill                  0               0                  0   
1  Marble Hill                  0               0                  0   
2  Marble Hill                  0               0                  0   
3  Marble Hill                  0               0                  0   
4  Marble Hill                  0               0                  0   

   African Restaurant  American Restaurant  Antique Shop  \
0                   0                    0             0   
1                   0                    0             0   
2                   0                    0             0   
3                   0                    0             0   
4                   0                    0             0   

   Argentinian Restaurant  Art Gallery  Art Museum  Arts & Crafts Store  \
0                       0            0           0                    0   
1                       0            0           0      

In [24]:
## each neighborhood alone with top 5 common venues

num_top_venues = 5

for hood in manhattan_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = manhattan_grouped[manhattan_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Battery Park City----
            venue  freq
0            Park  0.10
1     Coffee Shop  0.06
2           Hotel  0.05
3  Clothing Store  0.04
4   Shopping Mall  0.04


----Carnegie Hill----
         venue  freq
0  Coffee Shop  0.09
1         Café  0.05
2    Wine Shop  0.04
3  Yoga Studio  0.03
4    Bookstore  0.03


----Central Harlem----
                  venue  freq
0    Chinese Restaurant  0.07
1            Public Art  0.04
2    African Restaurant  0.04
3   American Restaurant  0.04
4  Gym / Fitness Center  0.04


----Chelsea----
                 venue  freq
0          Coffee Shop  0.06
1               Bakery  0.05
2          Art Gallery  0.05
3  American Restaurant  0.04
4    French Restaurant  0.03


----Chinatown----
                 venue  freq
0   Chinese Restaurant  0.10
1               Bakery  0.07
2         Cocktail Bar  0.05
3         Dessert Shop  0.04
4  American Restaurant  0.04


----Civic Center----
                  venue  freq
0           Coffee Shop  0.08
1     

In [25]:
## function to sort venues

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [26]:

## displaying top 10 venues for each neighborhood
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = manhattan_grouped['Neighborhood']

for ind in np.arange(manhattan_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manhattan_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Battery Park City,Park,Coffee Shop,Hotel,Clothing Store,Shopping Mall,Memorial Site,Gym,Playground,Plaza,BBQ Joint
1,Carnegie Hill,Coffee Shop,Café,Wine Shop,Yoga Studio,Bookstore,Pizza Place,Bar,Gym,French Restaurant,Grocery Store
2,Central Harlem,Chinese Restaurant,French Restaurant,African Restaurant,American Restaurant,Bar,Public Art,Gym / Fitness Center,Seafood Restaurant,Grocery Store,Ethiopian Restaurant
3,Chelsea,Coffee Shop,Art Gallery,Bakery,American Restaurant,Italian Restaurant,French Restaurant,Wine Shop,Seafood Restaurant,Ice Cream Shop,Cycle Studio
4,Chinatown,Chinese Restaurant,Bakery,Cocktail Bar,Dessert Shop,American Restaurant,Spa,Salon / Barbershop,Ice Cream Shop,Optical Shop,Bubble Tea Shop


## Cluster Neighborhoods


In [27]:
## k-means clustering
# set number of clusters
kclusters = 5

manhattan_grouped_clustering = manhattan_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manhattan_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 0, 0, 0, 0, 1, 1, 3, 0, 1], dtype=int32)

In [28]:
## new dataframe with all top 10 venues
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

manhattan_merged = manhattan_data

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
manhattan_merged = manhattan_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')


manhattan_merged.head() # check the last columns!

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Manhattan,Marble Hill,40.876551,-73.91066,4,Sandwich Place,Gym,Yoga Studio,Diner,Clothing Store,Supplement Shop,Donut Shop,Tennis Stadium,Seafood Restaurant,Kids Store
1,Manhattan,Chinatown,40.715618,-73.994279,0,Chinese Restaurant,Bakery,Cocktail Bar,Dessert Shop,American Restaurant,Spa,Salon / Barbershop,Ice Cream Shop,Optical Shop,Bubble Tea Shop
2,Manhattan,Washington Heights,40.851903,-73.9369,3,Café,Bakery,Mobile Phone Shop,Bank,Grocery Store,Latin American Restaurant,Pizza Place,Tapas Restaurant,New American Restaurant,Spanish Restaurant
3,Manhattan,Inwood,40.867684,-73.92121,3,Mexican Restaurant,Café,Restaurant,Spanish Restaurant,Park,Chinese Restaurant,Bakery,Caribbean Restaurant,Pizza Place,Lounge
4,Manhattan,Hamilton Heights,40.823604,-73.949688,3,Pizza Place,Coffee Shop,Café,Mexican Restaurant,Deli / Bodega,Yoga Studio,Bakery,Latin American Restaurant,Liquor Store,Park


In [29]:
## visualize clusters

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manhattan_merged['Latitude'], manhattan_merged['Longitude'], manhattan_merged['Neighborhood'], manhattan_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories,assign a name to each cluster.



In [30]:
## cluster 1

manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 0, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Chinatown,Chinese Restaurant,Bakery,Cocktail Bar,Dessert Shop,American Restaurant,Spa,Salon / Barbershop,Ice Cream Shop,Optical Shop,Bubble Tea Shop
6,Central Harlem,Chinese Restaurant,French Restaurant,African Restaurant,American Restaurant,Bar,Public Art,Gym / Fitness Center,Seafood Restaurant,Grocery Store,Ethiopian Restaurant
8,Upper East Side,Exhibit,Italian Restaurant,Coffee Shop,Bakery,American Restaurant,Juice Bar,Gym / Fitness Center,Yoga Studio,French Restaurant,Hotel
9,Yorkville,Italian Restaurant,Gym,Coffee Shop,Bar,Sushi Restaurant,Deli / Bodega,Mexican Restaurant,Wine Shop,Japanese Restaurant,Diner
10,Lenox Hill,Italian Restaurant,Coffee Shop,Cocktail Bar,Pizza Place,Sushi Restaurant,Burger Joint,Gym / Fitness Center,Café,Sporting Goods Shop,Gym
12,Upper West Side,Italian Restaurant,Bakery,Café,Wine Bar,Mediterranean Restaurant,Coffee Shop,Bar,Vegetarian / Vegan Restaurant,Seafood Restaurant,Breakfast Spot
13,Lincoln Square,Concert Hall,Performing Arts Venue,Theater,Café,Plaza,French Restaurant,Indie Movie Theater,Wine Shop,Gym / Fitness Center,Gym
17,Chelsea,Coffee Shop,Art Gallery,Bakery,American Restaurant,Italian Restaurant,French Restaurant,Wine Shop,Seafood Restaurant,Ice Cream Shop,Cycle Studio
18,Greenwich Village,Italian Restaurant,Clothing Store,Sushi Restaurant,Bubble Tea Shop,Coffee Shop,Indian Restaurant,Dessert Shop,Boutique,Seafood Restaurant,Cosmetics Shop
19,East Village,Bar,Mexican Restaurant,Pizza Place,Italian Restaurant,Korean Restaurant,Cocktail Bar,Coffee Shop,Vegetarian / Vegan Restaurant,Speakeasy,Wine Bar


In [31]:
## cluster 2
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 1, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,Clinton,Theater,Gym / Fitness Center,American Restaurant,Italian Restaurant,Coffee Shop,Gym,Pizza Place,Cocktail Bar,Wine Shop,Spa
15,Midtown,Hotel,Coffee Shop,Sporting Goods Shop,Theater,Steakhouse,Clothing Store,Bookstore,Bakery,Sushi Restaurant,Spa
16,Murray Hill,Coffee Shop,Japanese Restaurant,Sandwich Place,American Restaurant,Hotel,Burger Joint,Taco Place,Bar,Gym / Fitness Center,Gym
29,Financial District,Coffee Shop,Pizza Place,Cocktail Bar,Gym,Wine Shop,Hotel,Sandwich Place,Italian Restaurant,Bar,Food Truck
32,Civic Center,Coffee Shop,Spa,Cocktail Bar,French Restaurant,Gym / Fitness Center,Park,American Restaurant,Italian Restaurant,Hotel,Bakery
33,Midtown South,Korean Restaurant,Hotel,Hotel Bar,Dessert Shop,Coffee Shop,American Restaurant,Gym / Fitness Center,Cosmetics Shop,Clothing Store,Japanese Restaurant
34,Sutton Place,Pizza Place,Coffee Shop,Furniture / Home Store,Italian Restaurant,Park,Gym / Fitness Center,Gym,Bar,Bakery,Mediterranean Restaurant
38,Flatiron,Italian Restaurant,Japanese Restaurant,New American Restaurant,Sporting Goods Shop,Spa,Wine Shop,Furniture / Home Store,Mediterranean Restaurant,American Restaurant,Coffee Shop
39,Hudson Yards,Gym / Fitness Center,American Restaurant,Hotel,Café,Italian Restaurant,Burger Joint,Gym,Coffee Shop,Restaurant,Bar


In [32]:
## cluster 3
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 2, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
37,Stuyvesant Town,Park,Bar,Coffee Shop,Heliport,Gas Station,German Restaurant,Boat or Ferry,Farmers Market,Bistro,Gym / Fitness Center


In [33]:
## cluster 4
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 3, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Washington Heights,Café,Bakery,Mobile Phone Shop,Bank,Grocery Store,Latin American Restaurant,Pizza Place,Tapas Restaurant,New American Restaurant,Spanish Restaurant
3,Inwood,Mexican Restaurant,Café,Restaurant,Spanish Restaurant,Park,Chinese Restaurant,Bakery,Caribbean Restaurant,Pizza Place,Lounge
4,Hamilton Heights,Pizza Place,Coffee Shop,Café,Mexican Restaurant,Deli / Bodega,Yoga Studio,Bakery,Latin American Restaurant,Liquor Store,Park
5,Manhattanville,Deli / Bodega,Coffee Shop,Chinese Restaurant,Seafood Restaurant,Mexican Restaurant,Italian Restaurant,Spanish Restaurant,Bus Station,Bus Stop,Café
7,East Harlem,Mexican Restaurant,Bakery,Thai Restaurant,Latin American Restaurant,Spa,Deli / Bodega,Sandwich Place,Gas Station,Café,Grocery Store
11,Roosevelt Island,Deli / Bodega,Park,Farmers Market,Gym,Greek Restaurant,Metro Station,Supermarket,Bridge,Bubble Tea Shop,Bus Line
26,Morningside Heights,Coffee Shop,Bookstore,Park,American Restaurant,Burger Joint,Sandwich Place,Café,Deli / Bodega,Ice Cream Shop,Mexican Restaurant
28,Battery Park City,Park,Coffee Shop,Hotel,Clothing Store,Shopping Mall,Memorial Site,Gym,Playground,Plaza,BBQ Joint
36,Tudor City,Park,Mexican Restaurant,Café,Deli / Bodega,Pizza Place,Greek Restaurant,Coffee Shop,Sushi Restaurant,Vietnamese Restaurant,Wine Shop


In [34]:
## cluster 5
manhattan_merged.loc[manhattan_merged['Cluster Labels'] == 4, manhattan_merged.columns[[1] + list(range(5, manhattan_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Marble Hill,Sandwich Place,Gym,Yoga Studio,Diner,Clothing Store,Supplement Shop,Donut Shop,Tennis Stadium,Seafood Restaurant,Kids Store
