# **Applied Data Science Capstone**

## **Final Assignment**
## _Finding a location to a open a Restaurant in Manhattan_

## Leobardo Gómez


## Introduction

Given that New York City is one of the biggest cities in the world, it is very difficult to find the best place to start a new business. Manhattan is one of the most important boroughs in New York City, because it is very densely populated and for the fact that is the economic, administrative and cultural center of the city. This makes Manhattan and ideal spot for starting a restaurant. 

Is it possible to find a good location in Manhattan to start a restaurant?


## Table of Contents

1. <a href="#item1">Download and Explore Datasets</a>

  * <a href="#item11">Filtering Neighborhoods</a>
  * <a href="#item12">Get data from age</a>
  * <a href="#item13">Map of Manhattan</a>

2. <a href="#item2">Explore Neighborhoods in Manhattan</a>

3. <a href="#item3">Analyze Each Neighborhood</a>

4. <a href="#item4">Cluster Neighborhoods</a>

5. <a href="#item5">Examine Clusters</a>    


Before start, it's necessary to downloand the required libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Solving environment: done


  current version: 4.5.11
  latest version: 4.7.12

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.

Libraries imported.


<a id='item1'></a>

## 1. Download and Explore Datasets

The information about the neighborhoods and their respective coordinates are in the dataset:

https://geo.nyu.edu/catalog/nyu_2451_34572

To load the data:

In [2]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Defining a variable _neighborhoods_data_ to include this data

In [3]:
neighborhoods_data = newyork_data['features']

In [4]:
# Transform the data into a pandas dataframe
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

Then let's loop through the data and fill the dataframe one row at a time.

In [5]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [6]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


For this project, the information of Manhattan is relevant. The dataframe _manhattan_data_ contains only the information of this borough

In [7]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)

In [8]:
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


<a id='item11'></a>

#### Filtering Neighborhoods


The following web article:

https://fitsmallbusiness.com/choose-a-restaurant-location/

Contains information to choose a good location to open a restaurant.

The following page holds information about population per neighborhood:

https://data.cityofnewyork.us/City-Government/New-York-City-Population-By-Neighborhood-Tabulatio/swpk-hqdp

The file _New_York_City_Population_By_Neighborhood_Tabulation_Areas_ some data to filter neighborhoods.

In [9]:
population_df = pd.read_csv('New_York_City_Population_By_Neighborhood_Tabulation_Areas.csv')
population_df.head()

Unnamed: 0,Borough,Year,FIPS County Code,NTA Code,NTA Name,Population
0,Bronx,2000,5,BX01,Claremont-Bathgate,28149
1,Bronx,2000,5,BX03,Eastchester-Edenwald-Baychester,35422
2,Bronx,2000,5,BX05,Bedford Park-Fordham North,55329
3,Bronx,2000,5,BX06,Belmont,25967
4,Bronx,2000,5,BX07,Bronxdale,34309


The dataset must be filtered to see only the Manhattan's info. Also it's possible to slice the data by population and by the year of info (2010).

In [10]:
manhattan_filtered = population_df[population_df['Borough'] == 'Manhattan'] #The data for Manhattan only
manhattan_filtered = manhattan_filtered[manhattan_filtered['Year'] == 2010] #2010 data only

In [11]:
manhattan_filtered.head()

Unnamed: 0,Borough,Year,FIPS County Code,NTA Code,NTA Name,Population
284,Manhattan,2010,61,MN01,Marble Hill-Inwood,46746
285,Manhattan,2010,61,MN03,Central Harlem North-Polo Grounds,75282
286,Manhattan,2010,61,MN04,Hamilton Heights,48520
288,Manhattan,2010,61,MN06,Manhattanville,22950
289,Manhattan,2010,61,MN09,Morningside Heights,55929


In [12]:
#cleanup combined names 
#I'm leaving only the first name for simplicity
nameList = list()

for i in manhattan_filtered['NTA Name']:
    if i.find('-') != -1:
        nameList.append(i)

indexList = manhattan_filtered[manhattan_filtered['NTA Name'].isin(nameList)].index
for i, name in zip(indexList, manhattan_filtered.loc[indexList,'NTA Name']):
    manhattan_filtered.loc[i,'NTA Name'] = name[:name.find('-')]
    #print(str(i) + " " + name[:name.find('-')])

In [13]:
manhattan_filtered.head()

Unnamed: 0,Borough,Year,FIPS County Code,NTA Code,NTA Name,Population
284,Manhattan,2010,61,MN01,Marble Hill,46746
285,Manhattan,2010,61,MN03,Central Harlem North,75282
286,Manhattan,2010,61,MN04,Hamilton Heights,48520
288,Manhattan,2010,61,MN06,Manhattanville,22950
289,Manhattan,2010,61,MN09,Morningside Heights,55929


In [14]:
#The list of neighborhoods is in the column 'NTA Name'
manhattan_neighborhoods = manhattan_filtered[['NTA Name','NTA Code']]

In [15]:
#Filter only the neighborhoods in the list from above
manhattan_data = manhattan_data[manhattan_data['Neighborhood'].isin(manhattan_neighborhoods['NTA Name'])].reset_index(drop=True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Hamilton Heights,40.823604,-73.949688
3,Manhattan,Manhattanville,40.816934,-73.957385
4,Manhattan,Upper East Side,40.775639,-73.960508


<a id='item12'></a>

#### Get data from age


The following URL contains information about groups of age in New York, divided by Borough:
    
https://www1.nyc.gov/site/planning/planning-level/nyc-population/census-2010.page
    
From this site, it's possible to download information about age.

The file _totpop_5yrgrps_nta_ contains the information in CSV format.

In [16]:
age_df = pd.read_csv('totpop_5yrgrps_nta.csv')
age_df.head()

Unnamed: 0,Borough,FIPS County Code,Code,Name,Total Population,Under 5 Years,5-9 Years,10-14 Years,15-19 Years,20-24 Years,25-29 Years,30-34 Years,35-39 Years,40-44 Years,45-49 Years,50-54 Years,55-59 Years,60-64 Years,65 Years and Over,Median Age
0,Bronx,5,BX01,Claremont-Bathgate,31078,2890,2816,2769,3113,2658,2325,2012,1836,2078,2095,1830,1293,1070,2293,27.8
1,Bronx,5,BX03,Eastchester-Edenwald-Baychester,34517,2225,2436,2568,3194,2684,2344,2064,2016,2413,2724,2398,2018,1755,3678,34.4
2,Bronx,5,BX05,Bedford Park-Fordham North,54415,4517,4183,4058,4623,4693,4663,4262,3854,3830,3786,3490,2724,2029,3703,30.5
3,Bronx,5,BX06,Belmont,27378,2076,2073,1969,3458,3937,2157,1846,1761,1580,1622,1229,966,756,1948,25.4
4,Bronx,5,BX07,Bronxdale,35538,2458,2311,2404,2600,2629,2953,2636,2389,2491,2546,2364,1960,1575,4222,34.6


For the purpose of this project, we will be using only the information of selected neighborhoods in Manhattan.

In [17]:
age_data = age_df[age_df['Code'].isin(manhattan_neighborhoods['NTA Code'])]
age_data.head()

Unnamed: 0,Borough,FIPS County Code,Code,Name,Total Population,Under 5 Years,5-9 Years,10-14 Years,15-19 Years,20-24 Years,25-29 Years,30-34 Years,35-39 Years,40-44 Years,45-49 Years,50-54 Years,55-59 Years,60-64 Years,65 Years and Over,Median Age
89,Manhattan,61,MN01,Marble Hill-Inwood,46746,2812,2541,2769,3178,3752,4006,3750,3492,3482,3314,3230,3032,2387,5001,35.8
90,Manhattan,61,MN03,Central Harlem North-Polo Grounds,75282,4762,4478,4658,5569,6248,6490,6059,5204,5730,5636,4975,3964,3203,8306,34.4
91,Manhattan,61,MN04,Hamilton Heights,48520,2668,2388,2523,3232,4776,5219,4245,3515,3424,3384,3125,2687,2183,5151,33.9
92,Manhattan,61,MN06,Manhattanville,22950,1334,1394,1505,1863,2247,2188,1706,1527,1554,1569,1383,1163,974,2543,32.7
93,Manhattan,61,MN09,Morningside Heights,55929,2098,1849,1846,4827,10046,7417,4454,3197,2880,2795,2970,2791,2311,6448,29.9


In [18]:
# Groups of age 
groups_age = pd.DataFrame()
groups_age['Code'] = age_data['Code']
groups_age['Young'] = age_data['15-19 Years'] + age_data['20-24 Years'] + age_data['25-29 Years']
groups_age['Middle Age'] = age_data['30-34 Years'] + age_data['35-39 Years'] + age_data['40-44 Years'] + age_data['45-49 Years']
groups_age['Old'] = age_data['50-54 Years'] + age_data['55-59 Years'] + age_data['60-64 Years'] + age_data['65 Years and Over']
groups_age.head()

Unnamed: 0,Code,Young,Middle Age,Old
89,MN01,10936,14038,13650
90,MN03,18307,22629,20448
91,MN04,13227,14568,13146
92,MN06,6298,6356,6063
93,MN09,22290,13326,14520


In [19]:
features_df = manhattan_neighborhoods.merge(groups_age, left_on='NTA Code', right_on='Code')
features_df.drop('NTA Code', axis=1,inplace=True)
features_df.columns=['Name','Code','Young','Middle Age','Old']

In [20]:
features_df.head()

Unnamed: 0,Name,Code,Young,Middle Age,Old
0,Marble Hill,MN01,10936,14038,13650
1,Central Harlem North,MN03,18307,22629,20448
2,Hamilton Heights,MN04,13227,14568,13146
3,Manhattanville,MN06,6298,6356,6063
4,Morningside Heights,MN09,22290,13326,14520


<a id='item13'></a>

#### Map of Manhattan


Let's get the geographical coordinates of Manhattan.

In [21]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Manhattan are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Manhattan are 40.7900869, -73.9598295.


Let's visualize Manhattan and the filtered list of neighborhoods

In [22]:
# create map of Manhattan using latitude and longitude values
map_manhattan = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(manhattan_data['Latitude'], manhattan_data['Longitude'], manhattan_data['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_manhattan)  

In [23]:
map_manhattan

<a id='item2'></a>

## 2. Explore Neighborhoods in Manhattan


#### Define Foursquare Credentials and Version

In [24]:
CLIENT_ID = 'MEAHDEWTMT5RNRRN4GKB2TB5HKXKOCMLYMCPXYE42IZPUQWA' # your Foursquare ID
CLIENT_SECRET = '14BONTC1P2Q3CINM3WXJ0PEXOBD4KR1QTTWUWLRJKHTSUF3L' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: MEAHDEWTMT5RNRRN4GKB2TB5HKXKOCMLYMCPXYE42IZPUQWA
CLIENT_SECRET:14BONTC1P2Q3CINM3WXJ0PEXOBD4KR1QTTWUWLRJKHTSUF3L


In [25]:
#Function to obtain the venues of Manhattan
def getNearbyVenues(names, latitudes, longitudes):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limit)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Important Note:

I have to limit the number of venues and radius because otherwise I get errors at runtime!!!

In [26]:
limit = 80 #Top 80 Venues
radius = 500 #Within a 500 radius

manhattan_venues = getNearbyVenues(names=manhattan_data['Neighborhood'],
                                   latitudes=manhattan_data['Latitude'],
                                   longitudes=manhattan_data['Longitude']
                                  )


Marble Hill
Chinatown
Hamilton Heights
Manhattanville
Upper East Side
Yorkville
Lenox Hill
Upper West Side
Lincoln Square
Clinton
Midtown
Murray Hill
East Village
Lower East Side
West Village
Morningside Heights
Gramercy
Battery Park City
Turtle Bay
Stuyvesant Town
Hudson Yards


In [28]:
print('There are {} uniques categories.'.format(len(manhattan_venues['Venue Category'].unique())))

There are 260 uniques categories.


In [None]:
#list(manhattan_venues['Venue Category'].unique())

In [29]:
print('{} venues were returned by Foursquare.'.format(manhattan_venues.shape[0]))

1444 venues were returned by Foursquare.


<a id="item3"></a>

## 3. Analyze Each Neighborhood


In [30]:
#Add a column Type for the type of venue
type_list = list()
for item in manhattan_venues['Venue Category']:
    if (item.find('Restaurant') != -1) or (item.find('Diner') != -1) or (item.find('Steak') != -1) or (item.find('Bistro') != -1): 
        type_list.append('Restaurant')
    elif (item.find('Bar') != -1) or (item.find('Pub') != -1):
        type_list.append('Bar')
    elif (item.find('Hot Dog') != -1) or (item.find('Sandwich') != -1) or (item.find('Burger') != -1) or (item.find('Taco') != -1) or (item.find('Burrito') != -1) or (item.find('Bagel') != -1) or (item.find('Fast Food') != -1):
        type_list.append('Fast Food')
    elif (item.find('Store') != -1) or (item.find('Shop') != -1) or (item.find('Boutique') != -1) or (item.find('Clothing') != -1) or (item.find('Plaza') != -1) or (item.find('Market') != -1) or (item.find('Mall') != -1):
        type_list.append('Shopping')
    elif (item.find('Gym') != -1) or (item.find('Fitness') != -1) or (item.find('Sports') != -1) or (item.find('Spa') != -1) or (item.find('Health') != -1) or (item.find('Yoga') != -1) or (item.find('Dance') != -1) or (item.find('Tennis') != -1) or (item.find('Basketball') != -1) or (item.find('Massage') != -1):
        type_list.append('Health/Sports')
    elif (item.find('Club') != -1) or (item.find('Night') != -1) or (item.find('Concert') != -1) or (item.find('Jazz') != -1) or (item.find('Rock') != -1) or (item.find('Theatre') != -1) or (item.find('Cinema') != -1):
        type_list.append('Entertainment')
    elif (item.find('Monument') != -1) or (item.find('Memorial') != -1) or (item.find('Park') != -1) or (item.find('Historic') != -1) or (item.find('Museum') != -1) or (item.find('Sculpture') != -1) or (item.find('Auditorium') != -1):
        type_list.append('Spots of Interest')
    else:
        type_list.append('Other')
    
manhattan_venues['Type'] = type_list

In [31]:
#Group the information by Neighborhood and Type of venue
manhattan_grouped = pd.DataFrame(manhattan_venues.groupby(['Neighborhood','Type']).size()).reset_index()
manhattan_grouped.columns = ['Neighborhood','Type','Count']

#Transpose columns
manhattan_type_venues = manhattan_grouped.pivot_table(index=['Neighborhood'],columns=['Type'],fill_value=0).reset_index()
manhattan_type_venues.columns = manhattan_type_venues.columns.to_flat_index() #This function returns column names in one level 
manhattan_type_venues.columns = ['Neighborhood','Bar','Entertainment','Fast Food','Healt/Sports','Other','Restaurant','Shopping','Spots of Interest']
manhattan_type_venues.head()

Unnamed: 0,Neighborhood,Bar,Entertainment,Fast Food,Healt/Sports,Other,Restaurant,Shopping,Spots of Interest
0,Battery Park City,3,0,4,3,20,7,29,14
1,Chinatown,8,0,2,5,15,30,18,2
2,Clinton,9,3,2,10,30,15,10,1
3,East Village,13,0,4,1,12,31,18,1
4,Gramercy,11,4,6,3,17,19,18,2


<a id='item4'></a>

## 4. Cluster Neighborhoods


In [32]:
features_df = features_df.merge(manhattan_type_venues, left_on='Name', right_on='Neighborhood')

In [33]:
features_df.drop('Neighborhood', axis=1, inplace=True)

In [34]:
features_df.head()

Unnamed: 0,Name,Code,Young,Middle Age,Old,Bar,Entertainment,Fast Food,Healt/Sports,Other,Restaurant,Shopping,Spots of Interest
0,Marble Hill,MN01,10936,14038,13650,0,0,2,3,4,4,11,0
1,Hamilton Heights,MN04,13227,14568,13146,7,0,3,2,20,19,8,3
2,Manhattanville,MN06,6298,6356,6063,2,1,0,1,13,17,4,2
3,Morningside Heights,MN09,22290,13326,14520,2,0,4,2,14,8,7,4
4,Upper West Side,MN12,22909,41259,50714,9,0,2,4,16,29,19,1


Now let's normalize the dataset. Normalization is a statistical method that helps mathematical-based algorithms interpret features with different magnitudes and distributions equally. We use **StandardScaler()** to normalize our dataset.

In [40]:
from sklearn.preprocessing import StandardScaler

X = features_df.values[:,2:]
X = np.nan_to_num(X)
cluster_dataset = StandardScaler().fit_transform(X)
cluster_dataset[0]



array([-0.6669229 , -0.40900786, -0.39250042, -1.70507116, -0.72587356,
       -0.40640041, -0.53975054, -1.99953961, -1.64196872, -0.45673764,
       -1.00158376])

In [67]:
# set number of clusters
kclusters = 6

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(cluster_dataset)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([2, 1, 2, 0, 0, 5, 5, 5, 1, 1, 3, 3, 3, 0, 4, 1, 0, 0, 0, 1, 2],
      dtype=int32)

In [68]:
features_df['Cluster'] = kmeans.labels_

In [69]:
features_df.head()

Unnamed: 0,Name,Code,Young,Middle Age,Old,Bar,Entertainment,Fast Food,Healt/Sports,Other,Restaurant,Shopping,Spots of Interest,Cluster
0,Marble Hill,MN01,10936,14038,13650,0,0,2,3,4,4,11,0,2
1,Hamilton Heights,MN04,13227,14568,13146,7,0,3,2,20,19,8,3,1
2,Manhattanville,MN06,6298,6356,6063,2,1,0,1,13,17,4,2,2
3,Morningside Heights,MN09,22290,13326,14520,2,0,4,2,14,8,7,4,0
4,Upper West Side,MN12,22909,41259,50714,9,0,2,4,16,29,19,1,0


<a id='item5'></a>

## 5. Examine Clusters


#### Cluster 1

In [70]:
cluster = features_df.loc[features_df['Cluster'] == 0]
total = cluster.apply(np.sum)
total['Name'] = ''
total['Code'] = 'Total'
total['Cluster'] = ''
cluster.append(pd.DataFrame(total.values, index=total.keys()).T, ignore_index=True)

Unnamed: 0,Name,Code,Young,Middle Age,Old,Bar,Entertainment,Fast Food,Healt/Sports,Other,Restaurant,Shopping,Spots of Interest,Cluster
0,Morningside Heights,MN09,22290,13326,14520,2,0,4,2,14,8,7,4,0.0
1,Upper West Side,MN12,22909,41259,50714,9,0,2,4,16,29,19,1,0.0
2,West Village,MN23,18855,22849,20335,9,3,1,2,17,27,17,4,0.0
3,Lower East Side,MN28,16444,19999,25856,3,1,3,3,22,17,11,2,0.0
4,Lenox Hill,MN31,17860,27656,26309,5,1,5,6,16,24,23,0,0.0
5,Yorkville,MN32,18635,26010,24761,6,0,5,11,17,24,14,3,0.0
6,,Total,116993,151099,162495,34,5,20,28,102,129,91,14,


This cluster seems to have:
  * OLD & MIDDLE AGE groups of people
  * RESTAURANT venues are predominant
  * SHOPPING & OTHER venues are also popular

#### Cluster 2

In [71]:
cluster = features_df.loc[features_df['Cluster'] == 1]
total = cluster.apply(np.sum)
total['Name'] = ''
total['Code'] = 'Total'
total['Cluster'] = ''
cluster.append(pd.DataFrame(total.values, index=total.keys()).T, ignore_index=True)

Unnamed: 0,Name,Code,Young,Middle Age,Old,Bar,Entertainment,Fast Food,Healt/Sports,Other,Restaurant,Shopping,Spots of Interest,Cluster
0,Hamilton Heights,MN04,13227,14568,13146,7,0,3,2,20,19,8,3,1.0
1,Midtown,MN17,9183,9526,8368,4,0,1,6,25,18,22,4,1.0
2,Turtle Bay,MN19,11482,15832,20231,8,0,1,3,15,38,11,4,1.0
3,Chinatown,MN27,11228,14470,16745,8,0,2,5,15,30,18,2,1.0
4,Upper East Side,MN40,8306,15502,28086,5,1,3,7,20,20,20,4,1.0
5,,Total,53426,69898,86576,32,1,10,23,95,125,79,17,


This cluster seems to have:
  * OLD people are predominant
  * RESTAURANT venues
  * SHOPPING & OTHER venues are also popular

#### Cluster 3

In [72]:
cluster = features_df.loc[features_df['Cluster'] == 2]
total = cluster.apply(np.sum)
total['Name'] = ''
total['Code'] = 'Total'
total['Cluster'] = ''
cluster.append(pd.DataFrame(total.values, index=total.keys()).T, ignore_index=True)

Unnamed: 0,Name,Code,Young,Middle Age,Old,Bar,Entertainment,Fast Food,Healt/Sports,Other,Restaurant,Shopping,Spots of Interest,Cluster
0,Marble Hill,MN01,10936,14038,13650,0,0,2,3,4,4,11,0,2.0
1,Manhattanville,MN06,6298,6356,6063,2,1,0,1,13,17,4,2,2.0
2,Stuyvesant Town,MN50,6443,5159,7603,3,0,0,1,10,2,2,2,2.0
3,,Total,23677,25553,27316,5,1,2,5,27,23,17,4,


This cluster seems to have:
  * More OLD people, but balanced in all age groups
  * More focused in OTHER venues
  * RESTAURANT and SHOPPING spots are also popular

#### Cluster 4

In [73]:
cluster = features_df.loc[features_df['Cluster'] == 3]
total = cluster.apply(np.sum)
total['Name'] = ''
total['Code'] = 'Total'
total['Cluster'] = ''
cluster.append(pd.DataFrame(total.values, index=total.keys()).T, ignore_index=True)

Unnamed: 0,Name,Code,Young,Middle Age,Old,Bar,Entertainment,Fast Food,Healt/Sports,Other,Restaurant,Shopping,Spots of Interest,Cluster
0,Murray Hill,MN20,18382,15250,13954,8,1,7,8,12,35,8,1,3.0
1,Gramercy,MN21,10231,8637,7587,11,4,6,3,17,19,18,2,3.0
2,East Village,MN22,18360,13649,10050,13,0,4,1,12,31,18,1,3.0
3,,Total,46973,37536,31591,32,5,17,12,41,85,44,4,


This cluster seems to have:
  * More YOUNG people 
  * More focused in RESTAURANT venues
  * BAR, SHOPPING and OTHER spots are also popular

#### Cluster 5

In [74]:
cluster = features_df.loc[features_df['Cluster'] == 4]
total = cluster.apply(np.sum)
total['Name'] = ''
total['Code'] = 'Total'
total['Cluster'] = ''
cluster.append(pd.DataFrame(total.values, index=total.keys()).T, ignore_index=True)

Unnamed: 0,Name,Code,Young,Middle Age,Old,Bar,Entertainment,Fast Food,Healt/Sports,Other,Restaurant,Shopping,Spots of Interest,Cluster
0,Battery Park City,MN25,14215,14952,6514,3,0,4,3,20,7,29,14,4.0
1,,Total,14215,14952,6514,3,0,4,3,20,7,29,14,


This cluster seems to have:
  * YOUNG/MIDDLE AGE people
  * OTHER venues
  * SHOPPING spots are also popular

#### Cluster 6

In [53]:
cluster = features_df.loc[features_df['Cluster'] == 5]
total = cluster.apply(np.sum)
total['Name'] = ''
total['Code'] = 'Total'
total['Cluster'] = ''
cluster.append(pd.DataFrame(total.values, index=total.keys()).T, ignore_index=True)

Unnamed: 0,Name,Code,Young,Middle Age,Old,Bar,Entertainment,Fast Food,Healt/Sports,Other,Restaurant,Shopping,Spots of Interest,Cluster
0,Battery Park City,MN25,14215,14952,6514,3,0,4,3,20,7,29,14,5.0
1,,Total,14215,14952,6514,3,0,4,3,20,7,29,14,


This cluster seems to have:
  * More MIDDLE AGE people 
  * More focused in OTHER and RESTAURANT venues
  * SHOPPING spots are also popular