# Segmenting and Clustering Neighborhoods in New York City
We will convert adresses into latitude and longitude values. We will use the Foursquare API to explore neighborhoods in New York City and then we will group neighborhoods with similar characteristics into clusters using k-means.

## Let's load the libraries

In [2]:
import numpy as np

import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json 
from geopy.geocoders import Nominatim

import requests
from pandas.io.json import json_normalize

# Plotting Modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means
from sklearn.cluster import KMeans

import folium

print('Libraries Imported!')

Libraries Imported!


## 1. Download and Explore Dataset
We need a dataset that contains the 5 boroughs and all the neighborhoods of NY and their latitude and longitude coordinates.

We can download this dataset from:
https://geo.nyu.edu/catalog/nyu_2451_34572

We have downloaded the json file, let's load it and explore the data

In [3]:
with open('./nyu-2451-34572-geojson.json') as json_data:
    newyork_data = json.load(json_data)

In [4]:
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

All the relevant data is in the features key, which is basically a list of the neighborhoods. Let's define a new variable that includes this data.

In [5]:
neighborhoods_data = newyork_data['features']

Let's take a look at the first item in this list.

In [6]:
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

### Transform data into a pandas dataframe
Let's transform this data of nested Python dictionaries into a pandas dataframe.

In [25]:
# define the dataframe columns
column_names = ['Borough', 
                'Neighborhood', 
                'Latitude', 
                'Longitude']
# instantiate the dataframe
neighborhoods = pd.DataFrame(columns = column_names)
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Now we will loop through the data and fill the dataframe one row at the time:

In [26]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough']
    neighborhood_name = data['properties']['name']
    
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                         'Neighborhood': neighborhood_name,
                                         'Latitude': neighborhood_lat,
                                         'Longitude': neighborhood_lon},
                                        ignore_index = True)
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


Let's check that the dataset has all 5 boroughs and 306 neighborhoods

In [27]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
    len(neighborhoods['Borough'].unique()),
    neighborhoods.shape[0]
    )
)

The dataframe has 5 boroughs and 306 neighborhoods.


### Use geopy library to get the latitude and longitude values of New York City

In [29]:
address = 'New York City, NY'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinate of New York City are {}, {}.'.format(latitude, longitude))



The geographical coordinate of New York City are 40.7308619, -73.9871558.


### Create a map of New York with neighborhoods superimposed on top.

In [34]:
# create a map of New York using latitude and longitude
map_newyork = folium.Map(
    location = [latitude, longitude],
    zoom_start = 10
)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'],
                                           neighborhoods['Longitude'],
                                           neighborhoods['Borough'],
                                           neighborhoods['Neighborhood']
                                          ):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color ='#3186cc',
        fill_opacity = 0.7#,
        #parse_html = False
    ).add_to(map_newyork)
map_newyork

For illustration purposes, let's simplify the above map and segment and cluster only the neighborhoods in Manhattan.
Let's slice the original dataframe and create a new dataframe of the Manhattan data.

In [35]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop = True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Manhattan,Marble Hill,40.876551,-73.91066
1,Manhattan,Chinatown,40.715618,-73.994279
2,Manhattan,Washington Heights,40.851903,-73.9369
3,Manhattan,Inwood,40.867684,-73.92121
4,Manhattan,Hamilton Heights,40.823604,-73.949688


Let's visualize Manhattan and its neighborhoods

In [37]:
# Create Map of Manhattan
map_manhattan = folium.Map(
    location = [latitude, longitude],
    zoom_start = 11
)

# Add Markers to Map
for lat, lng, label in zip(manhattan_data['Latitude'],
                           manhattan_data['Longitude'],
                           manhattan_data['Neighborhood']
                          ):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7#,
        #parse_html = False
    ).add_to(map_manhattan)
map_manhattan

Nex, we are going to use Foursquare API to explore the neighborhoods and segment them.

In [38]:
CLIENT_ID = '0CI31W5DXZVF4WWFI0YSQWYSGZILZG35AF5OHKPNUZT5AKT0' # your Foursquare ID
CLIENT_SECRET = 'HVYTAEBAAT5NU4LMNNTRKIYWF42B31V0KW5DQ2V4SX14MYXG' # your Foursquare Secret
VERSION = '20180602' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 0CI31W5DXZVF4WWFI0YSQWYSGZILZG35AF5OHKPNUZT5AKT0
CLIENT_SECRET:HVYTAEBAAT5NU4LMNNTRKIYWF42B31V0KW5DQ2V4SX14MYXG


Let's explore the first neighborhood in our dataframe

In [39]:
manhattan_data.loc[0, 'Neighborhood']

'Marble Hill'

Get neighborhood lat and long

In [40]:
neighborhood_latitude = manhattan_data.loc[0, 'Latitude']
neighborhood_longitude = manhattan_data.loc[0, 'Longitude']

neighborhood_name = manhattan_data.loc[0, 'Neighborhood']

print('Latitude and Longitude values of {} are {}, {}.'.format(neighborhood_name,
                                                               neighborhood_latitude,
                                                               neighborhood_longitude))

Latitude and Longitude values of Marble Hill are 40.87655077879964, -73.91065965862981.


Let's get the top 100 venues that are in MarbelHill within a radius of 500 meters

In [42]:
radius = 500
LIMIT = 100
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, neighborhood_latitude, neighborhood_longitude, VERSION, radius, LIMIT)
results = requests.get(url).json()
results


{'meta': {'code': 410,
  'errorType': 'param_error',
  'errorDetail': 'The Foursquare API no longer supports requests that pass in a version v <= 20120609. For more details see https://developer.foursquare.com/overview/versioning',
  'requestId': '5bc24d224434b9406d722bc7'},
 'response': {}}

**There are missing cells here...come back once we figure out how to fix the version error from the Foursquare API**

### 2. Explore Neighborhoods in Manhattan
Let's create a function to repeat the same process to all the neighborhoods in Manhattan

In [43]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)