<h1 align=center><font size = 7> A tailored neighborhood in New York City</font></h1>

# 1. Introduction

## 1.1 Problem

We want to create a score for neighborhoods in New York so that professionals that work in Manhattan can easily find the most suitable option for them.  
The variables we are interested in are the distance from Manhattan, the median rent cost and the neighborhood venues profile.

## 1.2 Analysis 

Neighborhoods are complex systems and many different variables are at play.  
For our analysis we will focus on median rent price, distance from Manhattan and the kind of venues present around the area.  
We will start by segmenting neighborhoods on the base of their venues profile.  
We want to create many different profiles so that they could more closely relate to the many different interests people might have.  
We will have to create some kind of weighting system to balance the variables of distance from Manahattan (giving a better score to those in Manhattan or closer) and median cost of rent.  
The distance from Manhattan will be calculated as 0 for the neighborhoods in the Borough of Manahattan and as the distance from the closest neighborhood in Manhattan Borough for all the others and we will probably have to normalize it.  
After these stesps we should be able to create a scoring system so that we can rank each neaighborhood in their respective category.

# 2. Methodology 

## 2.1 Data sources

* Data for the geography median rent cost (we will use the data for 2BR apartments as a reference) by neighborhood will be gathered from the website renthop.com (https://www.renthop.com/study/assets/new-york-city-cost-of-living-2017/nyc_col_geojson.js) which is a GeoJson that also contains our rent info.  
* The venues data will be retrieved from FourSquare.com using the API provided.  

## 2.2 Data retrieval 

### 2.2.1 Import and install necessary libraries

Proceding with the necessary import

In [407]:
# uncomment the following line if geopy isn't installed
#!conda install -c conda-forge geopy --yes 

# uncomment the following line if folium isn't installed
!conda install -c conda-forge folium=0.5.0 --yes 


# install BeautifulSoup and requests
#!pip install beautifulsoup4
!pip install requests
!pip install lxml

import re

# library to handle data in a vectorized manner
import numpy as np 

# library for data analsysis
import pandas as pd 
from pandas import json_normalize # tranform JSON file into a pandas dataframe

# import BeautifulSoup
#from bs4 import BeautifulSoup

# library to handle JSON files
import json

# convert an address into latitude and longitude values
from geopy.geocoders import Nominatim 

# library to handle http requests
import requests 

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# map rendering library
import folium 

print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... 
  - anaconda/win-64::ca-certificates-2020.1.1-0, anaconda/win-64::certifi-2019.11.28-py37_0, anaconda/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::ca-certificates-2020.1.1-0, anaconda/win-64::certifi-2019.11.28-py37_0, defaults/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::certifi-2019.11.28-py37_0, anaconda/win-64::openssl-1.1.1d-he774522_4, defaults/win-64::ca-certificates-2020.1.1-0
  - anaconda/win-64::certifi-2019.11.28-py37_0, defaults/win-64::ca-certificates-2020.1.1-0, defaults/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::openssl-1.1.1d-he774522_4, defaults/win-64::ca-certificates-2020.1.1-0, defaults/win-64::certifi-2019.11.28-py37_0
  - defaults/win-64::ca-certificates-2020.1.1-0, defaults/win-64::certifi-2019.11.28-py37_0, defaults/win-64::openssl-1.1.1d-he774522_4
  - anaconda/win-64::ca-certificates-2020.1.1-0, anaconda/win-64::openssl-1.

### 2.2.2 NYC Neighborhoods rent and geography data

Let's proceed with the download of the dataset of New York Neighborhoods Geography which comes as a GeoJson file.

In [408]:
url = 'https://www.renthop.com/study/assets/new-york-city-cost-of-living-2017/nyc_col_geojson.js'
filename = 'newyork_rent_location_data.json'
response = requests.get(url)
file = open(filename, "w")
file.write(re.sub(r'var shapes.*?{', '{', response.text))
with open(filename) as json_rent_location_data:
    newyork_rent_location_data = json.load(json_rent_location_data)
newyork_rent_location_data

{'type': 'FeatureCollection',
 'features': [{'type': 'Feature',
   'geometry': {'type': 'MultiPolygon',
    'coordinates': [[[[-74.000959, 40.694069],
       [-74.003014, 40.694778],
       [-74.002398, 40.695712],
       [-74.000269, 40.694966],
       [-74.000959, 40.694069]]],
     [[[-73.999474, 40.696704],
       [-73.999524, 40.696575],
       [-74.000489, 40.696926],
       [-74.000431, 40.697052],
       [-73.999593, 40.696782],
       [-73.999554, 40.696945],
       [-73.999407, 40.696885],
       [-73.999474, 40.696704]]],
     [[[-73.998378, 40.698063],
       [-73.998871, 40.697158],
       [-74.001049, 40.697908],
       [-74.000488, 40.698758],
       [-73.998378, 40.698063]]],
     [[[-73.998021, 40.698762],
       [-74.000018, 40.699466],
       [-73.999475, 40.700323],
       [-73.997499, 40.699639],
       [-73.998021, 40.698762]]],
     [[[-73.996698, 40.700877],
       [-73.996822, 40.700991],
       [-73.99776, 40.701359],
       [-73.997802, 40.701319],
       [-7

### 2.2.2 Data cleansing

Analyzing the content we can notice many information are available in this set, for the sake of this study from this josn we need to retrieve the Borough and Neighborhood name along with it's geography.  
These informations are available in each feature instance through  __features.properties.Neighborhood__, __features.properties.Borough__, __features.properties.median2__ and __features.geometry__ contain all the informations needed to define the neighborhood area.  
Let's use the json_normalize method to transfer the information into a DataFrame and we will decide which informations are worth keeping later.

In [409]:
neighborhoods_rent_location_data = newyork_rent_location_data['features']

neighborhoods_rent_location = json_normalize(neighborhoods_rent_location_data)
neighborhoods_rent_location

Unnamed: 0,type,geometry.type,geometry.coordinates,properties.ntacode,properties.Neighborhood,properties.Borough,properties.house_income,properties.median2,properties.perc_income,properties.income4_2Br
0,Feature,MultiPolygon,"[[[[-74.000959, 40.694069], [-74.003014, 40.69...",MN25,Battery Park City-Lower Manhattan,Manhattan,125434.0,4552.5,0.435528,182100.0
1,Feature,Polygon,"[[[-73.96003, 40.798038], [-73.974997, 40.7775...",MN12,Upper West Side,Manhattan,92268.0,3980.0,0.517623,159200.0
2,Feature,Polygon,"[[[-73.982556, 40.73135], [-73.978027, 40.7294...",MN22,East Village,Manhattan,72665.0,3750.0,0.619280,150000.0
3,Feature,Polygon,"[[[-73.993831, 40.772932], [-73.993503, 40.772...",MN15,Clinton,Manhattan,73591.0,3600.0,0.587028,144000.0
4,Feature,MultiPolygon,"[[[[-73.962304, 40.733114], [-73.962263, 40.73...",MN20,Murray Hill-Kips Bay,Manhattan,97458.0,3498.0,0.430709,139920.0
...,...,...,...,...,...,...,...,...,...,...
190,Feature,MultiPolygon,"[[[[-73.746994, 40.637151], [-73.747, 40.63712...",QN98,Airport,Queens,,,,
191,Feature,Polygon,"[[[-74.169826, 40.56109], [-74.169775, 40.5603...",SI01,Annadale-Huguenot-Prince's Bay-Eltingville,Staten Island,,,,
192,Feature,MultiPolygon,"[[[[-74.159456, 40.641448], [-74.159979, 40.64...",SI12,Mariner's Harbor-Arlington-Port Ivory-Granitev...,Staten Island,,,,
193,Feature,Polygon,"[[[-73.760315, 40.67511], [-73.758572, 40.6726...",QN02,Springfield Gardens North,Queens,,,,


'type', 'properties.ntacode', 'properties.house_income', 'properties.perc_income', 'properties.income4_2Br' have no use for us so we can just drop them.  
Regarding 'geometry.coordinates' and 'geometry.type' might be useful in later stages so we will keep it. We can also fix other columns names and reorder them.

In [410]:
neighborhoods_data = neighborhoods_rent_location.drop(['type', 'properties.ntacode', 'properties.house_income', 'properties.perc_income', 'properties.income4_2Br'], axis = 1).rename(columns = {'properties.Neighborhood':'Neighborhood', 'properties.Borough':'Borough', 'properties.median2':'Median rent'})
neighborhoods_data = neighborhoods_data[['Borough','Neighborhood','Median rent','geometry.coordinates', 'geometry.type']]
neighborhoods_data.head()

Unnamed: 0,Borough,Neighborhood,Median rent,geometry.coordinates,geometry.type
0,Manhattan,Battery Park City-Lower Manhattan,4552.5,"[[[[-74.000959, 40.694069], [-74.003014, 40.69...",MultiPolygon
1,Manhattan,Upper West Side,3980.0,"[[[-73.96003, 40.798038], [-73.974997, 40.7775...",Polygon
2,Manhattan,East Village,3750.0,"[[[-73.982556, 40.73135], [-73.978027, 40.7294...",Polygon
3,Manhattan,Clinton,3600.0,"[[[-73.993831, 40.772932], [-73.993503, 40.772...",Polygon
4,Manhattan,Murray Hill-Kips Bay,3498.0,"[[[[-73.962304, 40.733114], [-73.962263, 40.73...",MultiPolygon


To better navigate the Frame we can set a multi index on Borough and Neighborhood.

In [411]:
neighborhoods_data = neighborhoods_data.sort_values(by = ['Borough','Neighborhood']).set_index(['Borough','Neighborhood'])
neighborhoods_data.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Median rent,geometry.coordinates,geometry.type
Borough,Neighborhood,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bronx,Allerton-Pelham Gardens,,"[[[-73.853636, 40.873301], [-73.848597, 40.871...",Polygon
Bronx,Bedford Park-Fordham North,1895.0,"[[[-73.883625, 40.867258], [-73.886833, 40.865...",Polygon
Bronx,Belmont,,"[[[-73.883094, 40.866602], [-73.882676, 40.866...",Polygon
Bronx,Bronxdale,,"[[[-73.861379, 40.871337], [-73.861563, 40.865...",Polygon
Bronx,Claremont-Bathgate,,"[[[-73.89039, 40.854689], [-73.892167, 40.8527...",Polygon


In [412]:
neighborhoods_data['Median rent'].isnull().sum()

56

As we can see some neighborhoods do not have rent data and try to find other sources for this information would lead to a mistmatch in results due to many different factor such as a different period in time were the survey has been conducted, different kind of houses (not only 2BR as we are considering here) different neighborhoods definition and so on.
The only sensible choice here is, unfortunately, to drop those rows.

In [413]:
neighborhoods_data.dropna(subset = ['Median rent'], inplace = True)
neighborhoods_data.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Median rent,geometry.coordinates,geometry.type
Borough,Neighborhood,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bronx,Bedford Park-Fordham North,1895.0,"[[[-73.883625, 40.867258], [-73.886833, 40.865...",Polygon
Bronx,East Concourse-Concourse Village,2225.0,"[[[-73.909587, 40.842756], [-73.909625, 40.842...",Polygon
Bronx,Highbridge,2008.0,"[[[-73.917287, 40.845104], [-73.917507, 40.844...",Polygon
Bronx,Hunts Point,1937.5,"[[[-73.88439, 40.822967], [-73.883788, 40.8219...",Polygon
Bronx,Melrose South-Mott Haven North,1875.0,"[[[-73.901293, 40.820475], [-73.90301, 40.8163...",Polygon


### 2.2.3 Defining a Neighborhood center

We are now gonna need to find a center for our neighborhoods to get the venues in the area and to calculate the distance from Manhattan. 

#### 2.2.3.1 Centroid from all points avarage 

We can easily calculate the centroid of the area using a pandas mean.

In [414]:
for index, row in neighborhoods_data.iterrows():
    if row[2] == 'Polygon':
        df = pd.DataFrame(row[1]).T
        count = 0
        latitude_avg = 0
        longitude_avg = 0
        for index2, column2 in df.iteritems():
            df2 = pd.DataFrame(column2.dropna().tolist())
            longitude_avg += df2.mean()[0]
            latitude_avg += df2.mean()[1]
            count += 1
    elif row[2] == 'MultiPolygon':
        df = pd.DataFrame(row[1])
        count = 0
        latitude_avg = 0
        longitude_avg = 0
        for index2, column2 in df.iteritems():
            df2 = pd.DataFrame(column2.tolist()).T
            for index3, column3 in df2.iteritems():
                df3 = pd.DataFrame(column3.dropna().tolist())
                longitude_avg += df3.mean()[0]
                latitude_avg += df3.mean()[1]
                count += 1
    neighborhoods_data.loc[index, 'latitude_avg']= latitude_avg/count
    neighborhoods_data.loc[index, 'longitude_avg']= longitude_avg/count
    
neighborhoods_data.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Median rent,geometry.coordinates,geometry.type,latitude_avg,longitude_avg
Borough,Neighborhood,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bronx,Bedford Park-Fordham North,1895.0,"[[[-73.883625, 40.867258], [-73.886833, 40.865...",Polygon,40.867928,-73.890009
Bronx,East Concourse-Concourse Village,2225.0,"[[[-73.909587, 40.842756], [-73.909625, 40.842...",Polygon,40.831691,-73.914419
Bronx,Highbridge,2008.0,"[[[-73.917287, 40.845104], [-73.917507, 40.844...",Polygon,40.838775,-73.926395
Bronx,Hunts Point,1937.5,"[[[-73.88439, 40.822967], [-73.883788, 40.8219...",Polygon,40.807513,-73.886837
Bronx,Melrose South-Mott Haven North,1875.0,"[[[-73.901293, 40.820475], [-73.90301, 40.8163...",Polygon,40.818142,-73.916419


In [415]:
means = neighborhoods_data.mean()
lat = means[1]
lon = means[2]

In [416]:
open(filename)
# create a plain world map
ny_map1 = folium.Map(location=[lat, lon], zoom_start=10)
ny_map_mix = folium.Map(location=[lat, lon], zoom_start=10)

In [417]:
ny_map1.choropleth(
    geo_data=filename,
    fill_opacity=0.5, 
    line_opacity=1.0,
    legend_name='Neighborhoods NYC'
)

ny_map_mix.choropleth(
    geo_data=filename,
    fill_opacity=0.5, 
    line_opacity=1.0,
    legend_name='Neighborhoods NYC'
)
# display map
ny_map1

The Choropleth maps look fine,let's check out our centroid.

In [418]:
for latitude, longitude, label in zip(neighborhoods_data.latitude_avg, neighborhoods_data.longitude_avg, neighborhoods_data.index.get_level_values('Neighborhood')):
    folium.features.CircleMarker(
        [latitude, longitude],
        radius=5, # define how big you want the circle markers to be
        color='yellow',
        fill=True,
        popup=label,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(ny_map1)

ny_map1

We can see that with this method our 'neighborhood center' is biased towards more irregular sides.  
Let's persist our points in a different map so we can compare them at a later stage.

In [419]:
for latitude, longitude, label in zip(neighborhoods_data.latitude_avg, neighborhoods_data.longitude_avg, neighborhoods_data.index.get_level_values('Neighborhood')):
    folium.features.CircleMarker(
        [latitude, longitude],
        radius=1, # define how big you want the circle markers to be
        color='red',
        fill=True,
        popup= "Center type 1 " + label,
        fill_color='red'
    ).add_to(ny_map_mix)

#### 2.2.3.2 Center from the rectangle inscribing the neighborhood 

Let's simplify things and let's get the central point of the rectangle inscribing each neighborhood using the minimum and maximum latitude and longitude as vertices. 

In [420]:
for index, row in neighborhoods_data.iterrows():
    if row[2] == 'Polygon':
        df = pd.DataFrame(row[1]).T
        latitude_min = 0
        latitude_max = 0
        longitude_min = 0
        longitude_max = 0
        for index2, column2 in df.iteritems():
            df2 = pd.DataFrame(column2.dropna().tolist())
            if latitude_min == 0 or latitude_min > df2.min()[1]:
                latitude_min = df2.min()[1]
            if longitude_min == 0 or longitude_min > df2.min()[0]:
                longitude_min = df2.min()[0]
            if latitude_max == 0 or latitude_max < df2.max()[1]:
                latitude_max = df2.max()[1]
            if longitude_max == 0 or longitude_max < df2.max()[0]:
                longitude_max = df2.max()[0]
                
    elif row[2] == 'MultiPolygon':
        df = pd.DataFrame(row[1])
        latitude_min = 0
        latitude_max = 0
        longitude_min = 0
        longitude_max = 0
        for index2, column2 in df.iteritems():
            df2 = pd.DataFrame(column2.tolist()).T
            for index3, column3 in df2.iteritems():
                df3 = pd.DataFrame(column3.dropna().tolist())
                if latitude_min == 0 or latitude_min > df3.min()[1]:
                    latitude_min = df3.min()[1]
                if longitude_min == 0 or longitude_min > df3.min()[0]:
                    longitude_min = df3.min()[0]
                if latitude_max == 0 or latitude_max < df3.max()[1]:
                    latitude_max = df3.max()[1]
                if longitude_max == 0 or longitude_max < df3.max()[0]:
                    longitude_max = df3.max()[0]
                            
    neighborhoods_data.loc[index, 'latitude_avg']= (latitude_max + latitude_min) / 2
    neighborhoods_data.loc[index, 'longitude_avg']= (longitude_max + longitude_min) / 2
    
neighborhoods_data.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Median rent,geometry.coordinates,geometry.type,latitude_avg,longitude_avg
Borough,Neighborhood,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bronx,Bedford Park-Fordham North,1895.0,"[[[-73.883625, 40.867258], [-73.886833, 40.865...",Polygon,40.868678,-73.891274
Bronx,East Concourse-Concourse Village,2225.0,"[[[-73.909587, 40.842756], [-73.909625, 40.842...",Polygon,40.830635,-73.91635
Bronx,Highbridge,2008.0,"[[[-73.917287, 40.845104], [-73.917507, 40.844...",Polygon,40.836774,-73.924894
Bronx,Hunts Point,1937.5,"[[[-73.88439, 40.822967], [-73.883788, 40.8219...",Polygon,40.813916,-73.8856
Bronx,Melrose South-Mott Haven North,1875.0,"[[[-73.901293, 40.820475], [-73.90301, 40.8163...",Polygon,40.818335,-73.91396


Now let's plot our neighborhoods and the new centr to see if they look better this time.

In [421]:
means = neighborhoods_data.mean()
lat = means[1]
lon = means[2] 

In [422]:
open(filename)
# create a plain world map
ny_map2 = folium.Map(location=[lat, lon], zoom_start=10)

In [423]:
ny_map2.choropleth(
    geo_data=filename,
    fill_opacity=0.5, 
    line_opacity=1.0,
    legend_name='Neighborhoods NYC'
)

In [424]:
for latitude, longitude, label in zip(neighborhoods_data.latitude_avg, neighborhoods_data.longitude_avg, neighborhoods_data.index.get_level_values('Neighborhood')):
    folium.features.CircleMarker(
        [latitude, longitude],
        radius=5, # define how big you want the circle markers to be
        color='yellow',
        fill=True,
        popup=label,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(ny_map2)

ny_map2

It's probably a step forward but we can try another method so let's add this points as well to the support map that we will use to compare the center with the different methods.

In [425]:
for latitude, longitude, label in zip(neighborhoods_data.latitude_avg, neighborhoods_data.longitude_avg, neighborhoods_data.index.get_level_values('Neighborhood')):
    folium.features.CircleMarker(
        [latitude, longitude],
        radius=1, # define how big you want the circle markers to be
        color='yellow',
        fill=True,
        popup="Center type 2 " + label,
        fill_color='yellow',
    ).add_to(ny_map_mix)

#### 2.2.3.3 Centroid from North - East - South - West most points 

We did remove the bias towards more irregular sides but some point (such as Stapleton-Rosebank) now resides outside the neighbohood itself. We can attempt this one more time with a mixed approach using 4 points for each neighborhoods (the North-East-South-West most) and then avarage them out. This way some point might still end up being outside their neighborhood but we should have a more precise result without one side weighing more than another. 

In [426]:
for index, row in neighborhoods_data.iterrows():
    if row[2] == 'Polygon':
        df = pd.DataFrame(row[1]).T
        latitude_min = 0
        latitude_max = 0
        longitude_min = 0
        longitude_max = 0
        for index2, column2 in df.iteritems():
            df2 = pd.DataFrame(column2.dropna().tolist())
            if latitude_min == 0 or latitude_min > df2.min()[1]:
                latitude_min = df2.min()[1]
                south_longitude = df2.loc[df2[1] == latitude_min,0].values[0]
            if longitude_min == 0 or longitude_min > df2.min()[0]:
                longitude_min = df2.min()[0]
                west_latitude = df2.loc[df2[0] == longitude_min,1].values[0]
            if latitude_max == 0 or latitude_max < df2.max()[1]:
                latitude_max = df2.max()[1]
                north_longitude = df2.loc[df2[1] == latitude_max,0].values[0]
            if longitude_max == 0 or longitude_max < df2.max()[0]:
                longitude_max = df2.max()[0]
                east_latitude = df2.loc[df2[0] == longitude_max,1].values[0]                
        
    elif row[2] == 'MultiPolygon':
        df = pd.DataFrame(row[1])
        latitude_min = 0
        latitude_max = 0
        longitude_min = 0
        longitude_max = 0
        for index2, column2 in df.iteritems():
            df2 = pd.DataFrame(column2.tolist()).T
            for index3, column3 in df2.iteritems():
                df3 = pd.DataFrame(column3.dropna().tolist())
                if latitude_min == 0 or latitude_min > df3.min()[1]:
                    latitude_min = df3.min()[1]
                    south_longitude = df3.loc[df3[1] == latitude_min,0].values[0]
                if longitude_min == 0 or longitude_min > df3.min()[0]:
                    longitude_min = df3.min()[0]
                    west_latitude = df3.loc[df3[0] == longitude_min,1].values[0]
                if latitude_max == 0 or latitude_max < df3.max()[1]:
                    latitude_max = df3.max()[1]
                    north_longitude = df3.loc[df3[1] == latitude_max,0].values[0]
                if longitude_max == 0 or longitude_max < df3.max()[0]:
                    longitude_max = df3.max()[0]
                    east_latitude = df3.loc[df3[0] == longitude_max,1].values[0]
        
    neighborhoods_data.loc[index, 'latitude_avg']= ((latitude_max + latitude_min + east_latitude + west_latitude) / 4)
    neighborhoods_data.loc[index, 'longitude_avg']= ((longitude_max + longitude_min + north_longitude + south_longitude) / 4)
    
neighborhoods_data.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Median rent,geometry.coordinates,geometry.type,latitude_avg,longitude_avg
Borough,Neighborhood,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bronx,Bedford Park-Fordham North,1895.0,"[[[-73.883625, 40.867258], [-73.886833, 40.865...",Polygon,40.867171,-73.890891
Bronx,East Concourse-Concourse Village,2225.0,"[[[-73.909587, 40.842756], [-73.909625, 40.842...",Polygon,40.8299,-73.917451
Bronx,Highbridge,2008.0,"[[[-73.917287, 40.845104], [-73.917507, 40.844...",Polygon,40.837168,-73.927736
Bronx,Hunts Point,1937.5,"[[[-73.88439, 40.822967], [-73.883788, 40.8219...",Polygon,40.81111,-73.882411
Bronx,Melrose South-Mott Haven North,1875.0,"[[[-73.901293, 40.820475], [-73.90301, 40.8163...",Polygon,40.818316,-73.91229


In [427]:
means = neighborhoods_data.mean()
lat = means[1]
lon = means[2] 

In [428]:
open(filename)
# create a plain world map
ny_map3 = folium.Map(location=[lat, lon], zoom_start=10)

In [429]:
ny_map3.choropleth(
    geo_data=filename,
    fill_opacity=0.5, 
    line_opacity=1.0,
    legend_name='Neighborhoods NYC'
)

In [430]:
for latitude, longitude, label in zip(neighborhoods_data.latitude_avg, neighborhoods_data.longitude_avg, neighborhoods_data.index.get_level_values('Neighborhood')):
    folium.features.CircleMarker(
        [latitude, longitude],
        radius=5, # define how big you want the circle markers to be
        color='yellow',
        fill=True,
        popup=label,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(ny_map3)

ny_map3

Persist our new centers on the comparison map as usual.

In [431]:
for latitude, longitude, label in zip(neighborhoods_data.latitude_avg, neighborhoods_data.longitude_avg, neighborhoods_data.index.get_level_values('Neighborhood')):
    folium.features.CircleMarker(
        [latitude, longitude],
        radius=1, # define how big you want the circle markers to be
        color='white',
        fill=True,
        popup="Center type 3 " + label,
        fill_color='white',
    ).add_to(ny_map_mix)


#### 2.2.3.4 Conclusion on neighborhood center points

Finally let's check the centers side by side.

In [432]:
ny_map_mix

Still not perfect but the last solution (white dots) looks probably better in most cases so we will go along with it.

We can now drop the geometry column and rename the columns with the centers coordinates.

In [433]:
neighborhoods_data.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Median rent,geometry.coordinates,geometry.type,latitude_avg,longitude_avg
Borough,Neighborhood,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bronx,Bedford Park-Fordham North,1895.0,"[[[-73.883625, 40.867258], [-73.886833, 40.865...",Polygon,40.867171,-73.890891
Bronx,East Concourse-Concourse Village,2225.0,"[[[-73.909587, 40.842756], [-73.909625, 40.842...",Polygon,40.8299,-73.917451
Bronx,Highbridge,2008.0,"[[[-73.917287, 40.845104], [-73.917507, 40.844...",Polygon,40.837168,-73.927736
Bronx,Hunts Point,1937.5,"[[[-73.88439, 40.822967], [-73.883788, 40.8219...",Polygon,40.81111,-73.882411
Bronx,Melrose South-Mott Haven North,1875.0,"[[[-73.901293, 40.820475], [-73.90301, 40.8163...",Polygon,40.818316,-73.91229


In [434]:
neighborhoods_data = neighborhoods_data.drop(['geometry.coordinates', 'geometry.type'], axis = 1, errors = 'ignore').rename({'latitude_avg' : 'Center latitude', 'longitude_avg' : 'Center longitude'}, axis = 1)
neighborhoods_data.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,Median rent,Center latitude,Center longitude
Borough,Neighborhood,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Bronx,Bedford Park-Fordham North,1895.0,40.867171,-73.890891
Bronx,East Concourse-Concourse Village,2225.0,40.8299,-73.917451
Bronx,Highbridge,2008.0,40.837168,-73.927736
Bronx,Hunts Point,1937.5,40.81111,-73.882411
Bronx,Melrose South-Mott Haven North,1875.0,40.818316,-73.91229


Much more neat. Now let's proceed with finding the FourSquare venues in each neighborhood vicinity.

### 2.2.4 Venues

#### 2.2.4.1 Preparing the FourSquare query

Start to create a query for a particular neighborhood and check out the response.

In [563]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
radius = 600
latitude = neighborhoods_data.loc[('Queens', 'Breezy Point-Belle Harbor-Rockaway Park-Broad Channel'), 'Center latitude']
longitude = neighborhoods_data.loc[('Queens', 'Breezy Point-Belle Harbor-Rockaway Park-Broad Channel'), 'Center longitude']
url = f'https://api.foursquare.com/v2/venues/explore?client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&ll={latitude},{longitude}&v={VERSION}&radius={radius}&limit={LIMIT}'
url

'https://api.foursquare.com/v2/venues/explore?client_id=BQSIXUEPIAXWIRQCLORU30OOCRJLMQVI5CRSMJK4SPGQS1AA&client_secret=PYJOM0MYRLG3HELSEPMD15UZYQKVCDPGD2XBHJF2OFTJLAPS&ll=40.576834250000005,-73.879249&v=20180604&radius=600&limit=30'

In [564]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ea0a4cf98205d7478938ba8'},
  'headerLocation': 'Brooklyn',
  'headerFullLocation': 'Brooklyn',
  'headerLocationGranularity': 'city',
  'totalResults': 0,
  'suggestedBounds': {'ne': {'lat': 40.58223425540001,
    'lng': -73.8721526483172},
   'sw': {'lat': 40.5714342446, 'lng': -73.8863453516828}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': []}]}}

The response is a quite complex json, we can try to navigate it down and apply a json_normalize and work with the resulting df. 

In [476]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [507]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) 
nearby_venues.head()

Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.address,venue.location.lat,venue.location.lng,venue.location.distance,venue.location.postalCode,venue.location.cc,venue.location.city,venue.location.state,venue.location.country,venue.location.formattedAddress,venue.categories,venue.photos.count,venue.photos.groups,venue.location.crossStreet,venue.location.labeledLatLngs,venue.delivery.id,venue.delivery.url,venue.delivery.provider.name,venue.delivery.provider.icon.prefix,venue.delivery.provider.icon.sizes,venue.delivery.provider.icon.name
0,e-0-5578d1fc498efdba911d7b1b-0,0,"[{'summary': 'This spot is popular', 'type': '...",5578d1fc498efdba911d7b1b,High Bridge,Harlem River Dr. & W. 172nd St.,40.842049,-73.927047,546,10452,US,New York,NY,United States,"[Harlem River Dr. & W. 172nd St., New York, NY...","[{'id': '4bf58dd8d48988d1df941735', 'name': 'B...",0,[],,,,,,,,
1,e-0-55df4787498e8cb6cd7d5575-1,0,"[{'summary': 'This spot is popular', 'type': '...",55df4787498e8cb6cd7d5575,Fine Fare,Ogden Ave,40.835736,-73.927793,159,10452,US,Bronx,NY,United States,"[Ogden Ave (W 166 ST), Bronx, NY 10452, United...","[{'id': '50be8ee891d4fa8dcc7199a7', 'name': 'M...",0,[],W 166 ST,"[{'label': 'display', 'lat': 40.83573590003396...",,,,,,
2,e-0-4fb7a862e4b095e4290ec1e0-2,0,"[{'summary': 'This spot is popular', 'type': '...",4fb7a862e4b095e4290ec1e0,Yankee Stadium Track & Field,,40.833397,-73.931335,517,10451,US,Bronx,NY,United States,"[Bronx, NY 10451, United States]","[{'id': '4bf58dd8d48988d106941735', 'name': 'T...",0,[],,"[{'label': 'display', 'lat': 40.83339688020503...",,,,,,
3,e-0-4d3244caeefa8cfa58ee34b3-3,0,"[{'summary': 'This spot is popular', 'type': '...",4d3244caeefa8cfa58ee34b3,Rite Aid,1091 Ogden Ave,40.835608,-73.928108,176,10452,US,Bronx,NY,United States,"[1091 Ogden Ave, Bronx, NY 10452, United States]","[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",0,[],,"[{'label': 'display', 'lat': 40.8356081, 'lng'...",,,,,,
4,e-0-4c453ad4dcd61b8d0e7f7c56-4,0,"[{'summary': 'This spot is popular', 'type': '...",4c453ad4dcd61b8d0e7f7c56,Mullaly Park,Jerome Avenue,40.83292,-73.924331,553,10452,US,Bronx,NY,United States,"[Jerome Avenue (164th to 167th Street), Bronx,...","[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",0,[],164th to 167th Street,"[{'label': 'display', 'lat': 40.83291951037224...",,,,,,


Drop some extra.

In [508]:
nearby_venues =nearby_venues.loc[:, ['venue.name', 'venue.categories', 'venue.location.distance']]
nearby_venues

Unnamed: 0,venue.name,venue.categories,venue.location.distance
0,High Bridge,"[{'id': '4bf58dd8d48988d1df941735', 'name': 'B...",546
1,Fine Fare,"[{'id': '50be8ee891d4fa8dcc7199a7', 'name': 'M...",159
2,Yankee Stadium Track & Field,"[{'id': '4bf58dd8d48988d106941735', 'name': 'T...",517
3,Rite Aid,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",176
4,Mullaly Park,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",553
5,Retro Fitness,"[{'id': '4bf58dd8d48988d176941735', 'name': 'G...",429
6,Baskin-Robbins,"[{'id': '4bf58dd8d48988d1c9941735', 'name': 'I...",598
7,Bravo Supermarket,"[{'id': '4bf58dd8d48988d118951735', 'name': 'G...",522
8,Little Caesars Pizza,"[{'id': '4bf58dd8d48988d1ca941735', 'name': 'P...",542
9,Dunkin',"[{'id': '4bf58dd8d48988d148941735', 'name': 'D...",599


And do some cleaning.

In [509]:
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,distance
0,High Bridge,Bridge,546
1,Fine Fare,Market,159
2,Yankee Stadium Track & Field,Track,517
3,Rite Aid,Pharmacy,176
4,Mullaly Park,Park,553


In [548]:
g_count = nearby_venues.groupby(['categories']).count().drop('distance', axis = 1)
g_sum = nearby_venues.groupby(['categories']).sum()

g_join = g_count.join(g_sum, on = 'categories').sort_values(by = ['name','distance'], ascending = (False,True))
g_join

Unnamed: 0_level_0,name,distance
categories,Unnamed: 1_level_1,Unnamed: 2_level_1
Pizza Place,2,1128
Market,1,159
Bus Station,1,170
Sports Club,1,171
Pharmacy,1,176
Food,1,291
Spanish Restaurant,1,410
Gym,1,429
Train Station,1,508
Track,1,517


High bridge doesn't look like a very active place venues wise. Still we need only few categories so we are gonna rank them by ammount of venues of that category and for those that are equally present we can sort them by distance.  
To avoid penalizing neighborhoods in the suburbs as moving around in those areas by car is probably less impacted by traffic and also, in some cases, our choice of the neighborhood center might have influenced the results of this operation. So we will consider a radious of 600 meters incrementing ite until at least 5 venues are identified in the area and limit the amount of venues to 100.

In [566]:
LIMIT = 100

In [567]:
for index, row in neighborhoods_data.iterrows():
    radius = 600
    count = 5
    #get the coordinates of the neighborhood
    latitude = neighborhoods_data.loc[index, 'Center latitude']
    longitude = neighborhoods_data.loc[index, 'Center longitude']
    #prepare the request
    url = f'https://api.foursquare.com/v2/venues/explore?client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&ll={latitude},{longitude}&v={VERSION}&radius={radius}&limit={LIMIT}'
    #get the result and parsing it
    results = requests.get(url).json()
    #to avoid not having data on some Neighborhoods we expand the radius until we find at least count results    
    while results['response']['totalResults'] < count:
        #prepare the request
        url = f'https://api.foursquare.com/v2/venues/explore?client_id={CLIENT_ID}&client_secret={CLIENT_SECRET}&ll={latitude},{longitude}&v={VERSION}&radius={radius}&limit={LIMIT}'
        #get the result and parsing it
        results = requests.get(url).json()
        radius += 100
    
    venues = results['response']['groups'][0]['items']
    #transfer in a df and clean it
    nearby_venues = json_normalize(venues)
    nearby_venues = nearby_venues.loc[:, ['venue.name', 'venue.categories', 'venue.location.distance']]
    nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)
    nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
    #create a df with the count of venues by category name
    g_count = nearby_venues.groupby(['categories']).count().drop('distance', axis = 1)
    #create a df with the sum of distances by category
    g_sum = nearby_venues.groupby(['categories']).sum()
    #join the two sets and order them by descending frequency first and ascending distance later
    g_join = g_count.join(g_sum, on = 'categories').sort_values(by = ['name','distance'], ascending = (False,True))
    #reduce the df to the needed number of elements
    g_join = g_join.head(count)
    for index2, row2 in g_join.iterrows():
        neighborhoods_data.loc[index, index2] = row2[0]
neighborhoods_data

Unnamed: 0_level_0,Unnamed: 1_level_0,Median rent,Center latitude,Center longitude,Diner,Pizza Place,Clothing Store,Chinese Restaurant,Spanish Restaurant,Grocery Store,Donut Shop,Pharmacy,Bus Station,Fried Chicken Joint,Market,Sports Club,Gym,Bank,BBQ Joint,Farmers Market,Mexican Restaurant,Mobile Phone Shop,Supermarket,Kids Store,Peruvian Restaurant,Auto Workshop,Fast Food Restaurant,Latin American Restaurant,Playground,Art Gallery,River,Garden,Park,Sandwich Place,Italian Restaurant,Deli / Bodega,Department Store,Baseball Field,Bar,Coffee Shop,Rental Car Location,Lounge,Baseball Stadium,Bus Line,Japanese Restaurant,Scenic Lookout,Spa,Greek Restaurant,Cosmetics Shop,Juice Bar,Liquor Store,Sushi Restaurant,Bakery,Eastern European Restaurant,Russian Restaurant,Beach,Wine Shop,Yoga Studio,Music Venue,Discount Store,Caribbean Restaurant,Taco Place,Shopping Mall,Seafood Restaurant,Café,New American Restaurant,Southern / Soul Food Restaurant,Plaza,Breakfast Spot,Arts & Crafts Store,Concert Hall,Gym / Fitness Center,Middle Eastern Restaurant,Train Station,Cocktail Bar,Thai Restaurant,Ice Cream Shop,Bagel Shop,Dessert Shop,American Restaurant,Convenience Store,Sake Bar,Construction & Landscaping,Burger Joint,Turkish Restaurant,Asian Restaurant,Tea Room,Malay Restaurant,Restaurant,Bus Stop,Falafel Restaurant,African Restaurant,Boutique,French Restaurant,Noodle House,Theater,Boxing Gym,Hotel,Bookstore,Boat or Ferry,Exhibit,Lake,IT Services,Indian Restaurant,Supplement Shop,Health & Beauty Service,Athletics & Sports,Video Store,Event Space,Cantonese Restaurant,Steakhouse,Dance Studio,Print Shop,Residential Building (Apartment / Condo),Food Truck,Salad Place,Public Art,Historic Site,Pier,Korean Restaurant,Bubble Tea Shop,Vietnamese Restaurant,Nightclub,School,Sporting Goods Shop,Hotpot Restaurant,Pool,Shoe Store,Rental Service,Empanada Restaurant,South American Restaurant,Library,Vegetarian / Vegan Restaurant,Hobby Shop,Pub,Cafeteria,Museum
Borough,Neighborhood,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1,Unnamed: 22_level_1,Unnamed: 23_level_1,Unnamed: 24_level_1,Unnamed: 25_level_1,Unnamed: 26_level_1,Unnamed: 27_level_1,Unnamed: 28_level_1,Unnamed: 29_level_1,Unnamed: 30_level_1,Unnamed: 31_level_1,Unnamed: 32_level_1,Unnamed: 33_level_1,Unnamed: 34_level_1,Unnamed: 35_level_1,Unnamed: 36_level_1,Unnamed: 37_level_1,Unnamed: 38_level_1,Unnamed: 39_level_1,Unnamed: 40_level_1,Unnamed: 41_level_1,Unnamed: 42_level_1,Unnamed: 43_level_1,Unnamed: 44_level_1,Unnamed: 45_level_1,Unnamed: 46_level_1,Unnamed: 47_level_1,Unnamed: 48_level_1,Unnamed: 49_level_1,Unnamed: 50_level_1,Unnamed: 51_level_1,Unnamed: 52_level_1,Unnamed: 53_level_1,Unnamed: 54_level_1,Unnamed: 55_level_1,Unnamed: 56_level_1,Unnamed: 57_level_1,Unnamed: 58_level_1,Unnamed: 59_level_1,Unnamed: 60_level_1,Unnamed: 61_level_1,Unnamed: 62_level_1,Unnamed: 63_level_1,Unnamed: 64_level_1,Unnamed: 65_level_1,Unnamed: 66_level_1,Unnamed: 67_level_1,Unnamed: 68_level_1,Unnamed: 69_level_1,Unnamed: 70_level_1,Unnamed: 71_level_1,Unnamed: 72_level_1,Unnamed: 73_level_1,Unnamed: 74_level_1,Unnamed: 75_level_1,Unnamed: 76_level_1,Unnamed: 77_level_1,Unnamed: 78_level_1,Unnamed: 79_level_1,Unnamed: 80_level_1,Unnamed: 81_level_1,Unnamed: 82_level_1,Unnamed: 83_level_1,Unnamed: 84_level_1,Unnamed: 85_level_1,Unnamed: 86_level_1,Unnamed: 87_level_1,Unnamed: 88_level_1,Unnamed: 89_level_1,Unnamed: 90_level_1,Unnamed: 91_level_1,Unnamed: 92_level_1,Unnamed: 93_level_1,Unnamed: 94_level_1,Unnamed: 95_level_1,Unnamed: 96_level_1,Unnamed: 97_level_1,Unnamed: 98_level_1,Unnamed: 99_level_1,Unnamed: 100_level_1,Unnamed: 101_level_1,Unnamed: 102_level_1,Unnamed: 103_level_1,Unnamed: 104_level_1,Unnamed: 105_level_1,Unnamed: 106_level_1,Unnamed: 107_level_1,Unnamed: 108_level_1,Unnamed: 109_level_1,Unnamed: 110_level_1,Unnamed: 111_level_1,Unnamed: 112_level_1,Unnamed: 113_level_1,Unnamed: 114_level_1,Unnamed: 115_level_1,Unnamed: 116_level_1,Unnamed: 117_level_1,Unnamed: 118_level_1,Unnamed: 119_level_1,Unnamed: 120_level_1,Unnamed: 121_level_1,Unnamed: 122_level_1,Unnamed: 123_level_1,Unnamed: 124_level_1,Unnamed: 125_level_1,Unnamed: 126_level_1,Unnamed: 127_level_1,Unnamed: 128_level_1,Unnamed: 129_level_1,Unnamed: 130_level_1,Unnamed: 131_level_1,Unnamed: 132_level_1,Unnamed: 133_level_1,Unnamed: 134_level_1,Unnamed: 135_level_1,Unnamed: 136_level_1,Unnamed: 137_level_1
Bronx,Bedford Park-Fordham North,1895.0,40.867171,-73.890891,6.0,2.0,3.0,3.0,2.0,,,,,,,,,,,,,,,,,,,,,,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Bronx,East Concourse-Concourse Village,2225.0,40.8299,-73.917451,,,,,,2.0,2.0,3.0,2.0,2.0,,,,,,,,,,,,,,,,,,,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Bronx,Highbridge,2008.0,40.837168,-73.927736,,2.0,,,,,,1.0,1.0,,1.0,1.0,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Bronx,Hunts Point,1937.5,40.81111,-73.882411,,1.0,,,1.0,,,,,,,,,1.0,1.0,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Bronx,Melrose South-Mott Haven North,1875.0,40.818316,-73.91229,,3.0,,,,,,,,,,,,,,,4.0,3.0,3.0,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Bronx,Mott Haven-Port Morris,2200.0,40.80635,-73.913399,,2.0,,,,,2.0,,,,,,3.0,,,,,,,,1.0,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Bronx,Mount Hope,1950.0,40.849534,-73.904594,,,,,,4.0,2.0,,,2.0,,,,,,,,,3.0,,,,3.0,3.0,,,,,,2.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Bronx,North Riverdale-Fieldston-Riverdale,3295.0,40.897635,-73.910543,,,,,,,,,2.0,,,,,,,,,,,,,,,,2.0,1.0,1.0,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Bronx,Norwood,1925.0,40.879282,-73.879629,,4.0,,,3.0,,,3.0,,,,,,3.0,,,,,,,,,,,,,,,3.0,,,5.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Bronx,Pelham Parkway,2169.0,40.854845,-73.849788,,,,,,,3.0,2.0,,,,,,2.0,,,,,,,,,,,,,,,,3.0,3.0,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,3.0,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,


In [568]:
neighborhoods_data.shape

(139, 136)

We finally ghatered all the needed data, the next steps will be calculating the distances from Manhattan, and normalize the datas but these problems are gonna be tackled at a later stage.