# Capstone Project - The Battle of the Neighborhoods 
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a restuarant. In particular for this project we will try to find an optimal location for an **Italian Restaurant** in **Manhattan,NY**.

Since there are a lot of restaurants in and around the many neighborhoods of **Manhattan,NY** we will try to find locations where there aren't that many restaurants and in particular **only a few Italian restaurants in the vicinity**.

We will use our data science powers to generate a few promising locations based on this criteria. Advantages of each one will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

Based on definition of our problem, factors that will influence our decision are:
* number of existing restaurants in the neighborhood (any type of restaurant)
* number of and distance to Italian restaurants in the neighborhood, if any
* distance from neighborhood center as given by latitude,longitude details.

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Google Maps API reverse geocoding**
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**

## Methodology <a name="methodology"></a>

To calculate distances from the center to the restaurants and other such distances, it is better to have the Cartesian 2D coordinates for the locations which is done using the pyproj library.

In [1]:
!pip install shapely
import shapely.geometry

!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

Collecting shapely
[?25l  Downloading https://files.pythonhosted.org/packages/9d/18/557d4f55453fe00f59807b111cc7b39ce53594e13ada88e16738fb4ff7fb/Shapely-1.7.1-cp36-cp36m-manylinux1_x86_64.whl (1.0MB)
[K     |████████████████████████████████| 1.0MB 12.8MB/s eta 0:00:010:01
[?25hInstalling collected packages: shapely
Successfully installed shapely-1.7.1


Now we can import the other necessary libraries pandas(for dataframes), folium(to render maps),numpy(for data manipulation).

In [2]:
import numpy as np 

import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json 

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium 

Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.9.1
  latest version: 4.9.2

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    ca-certificates-2020.11.8  |       ha878542_0         145 KB  conda-forge
    certifi-2020.11.8          |   py36h5fab9bb_0         150 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         392 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forg

Foursquare and GoogleMapsAPI Credentials are defined in the cell hidden below

In [3]:
{
    "tags": [
        "remove-input",
    ]
}
CLIENT_ID = '03B1XSVRMZPPS5PYLPNSXHJNKFPKGIY5OQPL3D1A1MMVLQO5' # your Foursquare ID
CLIENT_SECRET = 'EOCZ45Z2MURMGHP2QPGJAW3L4I3FCUSAGLKLTXDZLRU0PM4Q' # your Foursquare Secret
VERSION = '20201203' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value
google_api_key = 'AIzaSyCjYvm5b4vMCtkdxX98lTP6tyiqlC8_uD0'

In [4]:
!wget -q -O 'newyork_data.json' https://cf-courses-data.s3.us.cloud-object-storage.appdomain.cloud/IBMDeveloperSkillsNetwork-DS0701EN-SkillsNetwork/labs/newyork_data.json
print('Data downloaded!')

Data downloaded!


In [5]:
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

Extracting the relevant neighborhood latitude and longitude information in neghborhoods_data.

In [6]:
neighborhoods_data = newyork_data['features']

column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
neighborhoods = pd.DataFrame(columns=column_names)

for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)


In [7]:
address = 'New York City, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New York City are 40.7127281, -74.0060152.


In [8]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [9]:
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

Add X and Y Cartesian coordinates for each of the neighborhoods

In [10]:
X = []
Y = []
for lon,lat in zip(neighborhoods['Longitude'],neighborhoods['Latitude']):
    x,y = lonlat_to_xy(lon,lat)
    X.append(x)
    Y.append(y)
    
neighborhoods['X'] = X 
neighborhoods['Y'] = Y 

In [11]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,X,Y
0,Bronx,Wakefield,40.894705,-73.847201,-5790894.0,9850049.0
1,Bronx,Co-op City,40.874294,-73.829939,-5794276.0,9847728.0
2,Bronx,Eastchester,40.887556,-73.827806,-5792026.0,9847524.0
3,Bronx,Fieldston,40.895437,-73.905643,-5790994.0,9857548.0
4,Bronx,Riverdale,40.890834,-73.912585,-5791797.0,9858416.0


Now that we have our location candidates, let's use Foursquare API to get info on restaurants in each neighborhood.

We're interested in venues in 'food' category, but only those that are proper restaurants - coffee shops, pizza places, bakeries etc. are not direct competitors so we don't care about those. So we will include in our list only venues that have 'restaurant' in category name, and we'll make sure to detect and include all the subcategories of specific 'Italian restaurant' category, as we need info on Italian restaurants in the neighborhood.

Category IDs corresponding to Italian restaurants were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

In [12]:
food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

italian_restaurant_categories = ['4bf58dd8d48988d110941735','55a5a1ebe4b013909087cbb6','55a5a1ebe4b013909087cb7c',
                                 '55a5a1ebe4b013909087cba7','55a5a1ebe4b013909087cba1','55a5a1ebe4b013909087cba4',
                                 '55a5a1ebe4b013909087cb95','55a5a1ebe4b013909087cb89','55a5a1ebe4b013909087cb9b',
                                 '55a5a1ebe4b013909087cb98','55a5a1ebe4b013909087cbbf','55a5a1ebe4b013909087cb79',
                                 '55a5a1ebe4b013909087cbb0','55a5a1ebe4b013909087cbb3','55a5a1ebe4b013909087cb74',
                                 '55a5a1ebe4b013909087cbaa','55a5a1ebe4b013909087cb83','55a5a1ebe4b013909087cb8c',
                                 '55a5a1ebe4b013909087cb92','55a5a1ebe4b013909087cb8f','55a5a1ebe4b013909087cb86',
                                 '55a5a1ebe4b013909087cbb9','55a5a1ebe4b013909087cb7f','55a5a1ebe4b013909087cbbc',
                                 '55a5a1ebe4b013909087cb9e','55a5a1ebe4b013909087cbc2','55a5a1ebe4b013909087cbad']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=1500, limit=100):
    version = '20201203'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

Restricting ourselves to only the Manhattan area 

In [13]:
manhattan_data = neighborhoods[neighborhoods['Borough'] == 'Manhattan'].reset_index(drop=True)

Let us now store the details of the restaurants and the italian restaurants within 750m of each of the neighborhood centers.

In [14]:
import pickle

In [15]:
def get_restaurants(lats, lons):
    restaurants = {}
    italian_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        venues = get_venues_near_location(lat, lon, food_category, CLIENT_ID, CLIENT_SECRET, radius=750, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_italian = is_restaurant(venue_categories, specific_filter=italian_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_italian, x, y)
                if venue_distance<=600:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_italian:
                    italian_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, italian_restaurants, location_restaurants

# Try to load from local file system in case we did this before
restaurants = {}
italian_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_350.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('italian_restaurants_350.pkl', 'rb') as f:
        italian_restaurants = pickle.load(f)
    with open('location_restaurants_350.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    restaurants, italian_restaurants, location_restaurants = get_restaurants(manhattan_data['Latitude'],manhattan_data['Longitude'])
    
    # Let's persists this in local file system
    with open('restaurants_350.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('italian_restaurants_350.pkl', 'wb') as f:
        pickle.dump(italian_restaurants, f)
    with open('location_restaurants_350.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)        

Restaurant data loaded.


In [16]:
print('Total number of restaurants:', len(restaurants))
print('Total number of Italian restaurants:', len(italian_restaurants))
print('Percentage of Italian restaurants: {:.2f}%'.format(len(italian_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 1258
Total number of Italian restaurants: 164
Percentage of Italian restaurants: 13.04%
Average number of restaurants in neighborhood: 38.125


In [17]:
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:5]:
    print(r)
print('...')
print('Total:', len(restaurants))

List of all restaurants
-----------------------
('4b79cc46f964a520c5122fe3', 'Tibbett Diner', 40.8804044222466, -73.90893738006402, '3033 Tibbett Ave (btwn 230th & 231st), Bronx, NY 10463, United States', 452, False, -5793547.072627897, 9857897.170961995)
('5217dd2811d2d06ccafb77d3', 'Estrellita Poblana V', 40.879687039717524, -73.906256832975, '240 W 231st St, Bronx, NY 10463, United States', 509, False, -5793658.388995575, 9857549.679802075)
('4bd8e98811dcc928f865f833', 'El Malecon', 40.87933806746814, -73.90445707056641, '5592 Broadway (at W 231st St), Bronx, NY 10463, United States', 607, False, -5793710.681330542, 9857317.011742378)
('503cfaffe4b066d39de5005a', 'Aoyu Japanese Restaurant', 40.88625663623957, -73.90971942607067, '3532A Johnson Ave, Bronx, NY 10463, United States', 1083, False, -5792560.607723963, 9858026.146415643)
('4ca785a597c8a1cd7e577ba5', 'El Economico Restaurant', 40.87933018698782, -73.90459710835415, '5589 Broadway, Bronx, NY 10463, United States', 596, Fals

In [18]:
print('List of all Italian restaurants')
print('-----------------------')
for r in list(italian_restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(italian_restaurants))

List of all Italian restaurants
-----------------------
('55aaee4d498e3cbb70e625d6', 'Bella Notte Pizzeria', 40.88659539353357, -73.90955624657254, '3552 Johnson Ave (West 236th Street), Bronx, NY 10463, United States', 1122, True, -5792502.735607199, 9858006.868106764)
('472a027af964a520ea4b1fe3', 'Bacaro', 40.714467897557796, -73.9915893664933, '136 Division St (at Ludlow St), New York, NY 10002, United States', 260, True, -5821945.856693519, 9867750.469715875)
('3fd66200f964a52023eb1ee3', 'Peasant', 40.72172197017359, -73.99445044122072, '194 Elizabeth St (btwn Prince & Spring St), New York, NY 10012, United States', 535, True, -5820725.260110438, 9868152.715473551)
('4cc6222106c25481d7a4a047', 'Rubirosa Ristorante', 40.72270625453151, -73.99595719792266, '235 Mulberry St (btwn Prince & Spring St), New York, NY 10012, United States', 637, True, -5820563.511430034, 9868351.620788395)
('49e4f405f964a52078631fe3', 'Emporio', 40.72263320366371, -73.99512464627406, '231 Mott St (btwn Pri

Now we can see the Restaurants and  the Italian Restaurants around Manhattan on the map

The restaurants are in blue, the italian restaurants are in red, the markers are of the neighborhood centers and the green circles are at a radius of 750m from the centers(i.e the distance within which the restaurant data was collected).

In [19]:
address = 'Manhattan, NY'

geolocator = Nominatim(user_agent="ny_explorer")
manhattan_location = geolocator.geocode(address)
manhattan_latitude = location.latitude
manhattan_longitude = location.longitude

In [20]:
map_manhattan = folium.Map(location=[manhattan_latitude,manhattan_longitude], zoom_start=8)
folium.Marker([manhattan_latitude,manhattan_longitude], popup='Manhattan, NY').add_to(map_manhattan)
for neigh,lat,lon in zip(manhattan_data['Neighborhood'],manhattan_data['Latitude'],manhattan_data['Longitude']):
    folium.Marker([lat,lon], popup=neigh).add_to(map_manhattan)
    folium.Circle([lat,lon], radius=1500, color='green', fill=False).add_to(map_manhattan)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_italian = res[6]
    color = 'red' if is_italian else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_manhattan)
map_manhattan

In [21]:
location_restaurants_count = [len(res) for res in location_restaurants]
manhattan_data['Restaurant count'] = location_restaurants_count
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,X,Y,Restaurant count
0,Manhattan,Marble Hill,40.876551,-73.91066,-5794205.0,9858099.0,10
1,Manhattan,Chinatown,40.715618,-73.994279,-5821760.0,9868103.0,55
2,Manhattan,Washington Heights,40.851903,-73.9369,-5798470.0,9861349.0,39
3,Manhattan,Inwood,40.867684,-73.92121,-5795743.0,9859410.0,40
4,Manhattan,Hamilton Heights,40.823604,-73.949688,-5803305.0,9862859.0,21


Now that we have the data of the restaurants and the Italian restaurants around the different neighbourhoods, it is time to see the relative positions of the different restaurants to know the areas where there are a lot of restaurants and areas of low density.
To do that we will use **heatmaps** to identify a few promising areas close to center with low number of restaurants in general (*and* no Italian restaurants in vicinity) and focus our attention on those areas.

Finally we will focus on most promising areas and within those create **clusters of locations that meet some basic requirements** established in discussion with stakeholders: we will take into consideration locations with **no more than two restaurants in radius of 250 meters**, and we want locations **without Italian restaurants in radius of 400 meters**. We will present map of all such locations but also create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

In [22]:
distances_to_italian_restaurant = []

for area_x, area_y in zip(manhattan_data['X'], manhattan_data['Y']):
    min_distance = 10000
    for res in italian_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_italian_restaurant.append(min_distance)

manhattan_data['Distance to Italian restaurant'] = distances_to_italian_restaurant
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,X,Y,Restaurant count,Distance to Italian restaurant
0,Manhattan,Marble Hill,40.876551,-73.91066,-5794205.0,9858099.0,10,1704.796454
1,Manhattan,Chinatown,40.715618,-73.994279,-5821760.0,9868103.0,55,398.455676
2,Manhattan,Washington Heights,40.851903,-73.9369,-5798470.0,9861349.0,39,376.515448
3,Manhattan,Inwood,40.867684,-73.92121,-5795743.0,9859410.0,40,894.778175
4,Manhattan,Hamilton Heights,40.823604,-73.949688,-5803305.0,9862859.0,21,385.612084


Now that we have calculated the distance to the nearest Italian Restaurant from every neighbourhood, let us look at the average.

In [23]:
manhattan_data['Distance to Italian restaurant'].mean()

555.4299726132746

OK, so **on average Italian restaurant can be found within ~550m** from every  center candidate. That's fairly close, so we need to filter our areas carefully!

Let's create a map showing **heatmap / density of restaurants** and try to extract some meaningful information from that.

In [24]:
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]

italian_latlons = [[res[2], res[3]] for res in italian_restaurants.values()]

Let's plot the heatmap of Restaurants around Manhattan

In [25]:
from folium import plugins
from folium.plugins import HeatMap

map_manhattan = folium.Map(location=[manhattan_latitude,manhattan_longitude], zoom_start=12)
folium.TileLayer('cartodbpositron').add_to(map_manhattan) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_manhattan)
folium.Marker([manhattan_latitude,manhattan_longitude]).add_to(map_manhattan)
for neigh,lat,lon in zip(manhattan_data['Neighborhood'],manhattan_data['Latitude'],manhattan_data['Longitude']):
    folium.Marker([lat,lon],popup = neigh).add_to(map_manhattan)
map_manhattan

Let's also plot the heatmap of Italian Restaurants around Manhattan

In [26]:
map_manhattan = folium.Map(location=[manhattan_latitude,manhattan_longitude], zoom_start=12)
folium.TileLayer('cartodbpositron').add_to(map_manhattan) #cartodbpositron cartodbdark_matter
HeatMap(italian_latlons).add_to(map_manhattan)
folium.Marker([manhattan_latitude,manhattan_longitude]).add_to(map_manhattan)
for neigh,lat,lon in zip(manhattan_data['Neighborhood'],manhattan_data['Latitude'],manhattan_data['Longitude']):
    folium.Marker([lat,lon],popup = neigh).add_to(map_manhattan)
map_manhattan

In [27]:
manhattan_data.sort_values(by = ['Distance to Italian restaurant'],ascending = False,inplace = True)
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,X,Y,Restaurant count,Distance to Italian restaurant
0,Manhattan,Marble Hill,40.876551,-73.91066,-5794205.0,9858099.0,10,1704.796454
36,Manhattan,Tudor City,40.746917,-73.971219,-5816372.0,9865272.0,32,1433.494167
11,Manhattan,Roosevelt Island,40.76216,-73.949168,-5813710.0,9862501.0,5,1368.595906
6,Manhattan,Central Harlem,40.815976,-73.943211,-5804573.0,9861989.0,15,1313.973672
7,Manhattan,East Harlem,40.792249,-73.944182,-5808594.0,9862002.0,21,1309.637201


Since there are 40 Neighborhoods in Manhattan, I have chosen to group them into 4 groups based on their proximity using the k-means clustering algorithm and then try to find optimal locations within each cluster.

In [28]:
kclusters = 4
m_data = manhattan_data[['X','Y']]
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(m_data)

In [29]:
kmeans.labels_

array([1, 2, 2, 3, 3, 3, 1, 0, 0, 2, 2, 0, 2, 3, 2, 2, 0, 2, 3, 0, 2, 3,
       0, 3, 0, 1, 0, 0, 2, 0, 0, 2, 0, 3, 2, 0, 0, 3, 0, 0], dtype=int32)

In [30]:
manhattan_data['Cluster'] = kmeans.labels_
manhattan_data.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,X,Y,Restaurant count,Distance to Italian restaurant,Cluster
0,Manhattan,Marble Hill,40.876551,-73.91066,-5794205.0,9858099.0,10,1704.796454,1
36,Manhattan,Tudor City,40.746917,-73.971219,-5816372.0,9865272.0,32,1433.494167,2
11,Manhattan,Roosevelt Island,40.76216,-73.949168,-5813710.0,9862501.0,5,1368.595906,2
6,Manhattan,Central Harlem,40.815976,-73.943211,-5804573.0,9861989.0,15,1313.973672,3
7,Manhattan,East Harlem,40.792249,-73.944182,-5808594.0,9862002.0,21,1309.637201,3


In [31]:
manhattan_data_0 = manhattan_data[manhattan_data['Cluster'] == 0]
manhattan_data_1 = manhattan_data[manhattan_data['Cluster'] == 1]
manhattan_data_2 = manhattan_data[manhattan_data['Cluster'] == 2]
manhattan_data_3 = manhattan_data[manhattan_data['Cluster'] == 3]

Let's plot the different neighborhood clusters in different colors to get a feel for their relative positions.

In [32]:
map_manhattan = folium.Map(location=[manhattan_latitude,manhattan_longitude], zoom_start=12)
folium.Marker([manhattan_latitude,manhattan_longitude], popup='Manhattan, NY').add_to(map_manhattan)
for neigh,lat,lon,c in zip(manhattan_data['Neighborhood'],manhattan_data['Latitude'],manhattan_data['Longitude'],manhattan_data['Cluster']):
    color = 'green'
    if c == 0: color = 'blue'
    elif c == 1: color = 'red'
    elif c == 2: color = 'orange'
    else:
        color = 'green'
    folium.Circle([lat,lon], radius=2, color=color, fill=False).add_to(map_manhattan)
map_manhattan    

In [33]:
manhattan_data_1

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,X,Y,Restaurant count,Distance to Italian restaurant,Cluster
0,Manhattan,Marble Hill,40.876551,-73.91066,-5794205.0,9858099.0,10,1704.796454,1
3,Manhattan,Inwood,40.867684,-73.92121,-5795743.0,9859410.0,40,894.778175,1
2,Manhattan,Washington Heights,40.851903,-73.9369,-5798470.0,9861349.0,39,376.515448,1


The cluster with label 1 hasonly a few neighborhoods and so I decided to skip this cluster. 

In [34]:
manhattan_data_3

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,X,Y,Restaurant count,Distance to Italian restaurant,Cluster
6,Manhattan,Central Harlem,40.815976,-73.943211,-5804573.0,9861989.0,15,1313.973672,3
7,Manhattan,East Harlem,40.792249,-73.944182,-5808594.0,9862002.0,21,1309.637201,3
26,Manhattan,Morningside Heights,40.808,-73.963896,-5805997.0,9864613.0,16,950.338222,3
5,Manhattan,Manhattanville,40.816934,-73.957385,-5804461.0,9863817.0,16,558.917518,3
9,Manhattan,Yorkville,40.77593,-73.947118,-5811369.0,9862302.0,42,446.679598,3
25,Manhattan,Manhattan Valley,40.797307,-73.964286,-5807809.0,9864613.0,0,411.288137,3
4,Manhattan,Hamilton Heights,40.823604,-73.949688,-5803305.0,9862859.0,21,385.612084,3
12,Manhattan,Upper West Side,40.787658,-73.977059,-5809488.0,9866213.0,44,236.585525,3
30,Manhattan,Carnegie Hill,40.782683,-73.953256,-5810247.0,9863125.0,38,62.712476,3


The cluster with label 3 has few restaurants nearby which might mean that this part of the neighborhood may not be a good place to open a restaurant and thus I have decided to skip this cluster also.

Now we will concentrate only on cluster with labels 0 and 2 in a sequential manner.

In [35]:
manhattan_data_0

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,X,Y,Restaurant count,Distance to Italian restaurant,Cluster
20,Manhattan,Lower East Side,40.717807,-73.98089,-5821342.0,9866385.0,25,873.363007,0
37,Manhattan,Stuyvesant Town,40.731,-73.974052,-5819081.0,9865563.0,16,869.476371,0
27,Manhattan,Gramercy,40.73721,-73.981376,-5818053.0,9866537.0,41,584.897238,0
29,Manhattan,Financial District,40.707107,-74.010665,-5823260.0,9870180.0,40,482.309872,0
28,Manhattan,Battery Park City,40.711932,-74.016869,-5822463.0,9871002.0,10,445.094977,0
1,Manhattan,Chinatown,40.715618,-73.994279,-5821760.0,9868103.0,55,398.455676,0
17,Manhattan,Chelsea,40.744035,-74.003116,-5816971.0,9869371.0,41,379.288237,0
32,Manhattan,Civic Center,40.715229,-74.005415,-5821864.0,9869539.0,40,364.478286,0
23,Manhattan,Soho,40.722184,-74.000657,-5820668.0,9868956.0,48,359.922178,0
38,Manhattan,Flatiron,40.739673,-73.990947,-5817669.0,9867782.0,49,296.282204,0


In [36]:
centers = kmeans.cluster_centers_

In [37]:
latlon = []
for c in centers:
    x = xy_to_lonlat(c[0],c[1])
    y = [x[1],x[0]]
    latlon.append(y)
latlon 

[[40.72469497987609, -73.99643969994972],
 [40.86537918829592, -73.92292727589506],
 [40.758830018936685, -73.975109915937],
 [40.80003657063136, -73.9555654344504]]

Let us look at the heatmap of the restaurants in the cluster 0 neighborhoods.

In [38]:
map_manhattan = folium.Map(location=latlon[0], zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_manhattan) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_manhattan)
folium.Marker(latlon[0]).add_to(map_manhattan)
folium.Circle(latlon[0], radius=500, fill=False, color='white').add_to(map_manhattan)
folium.Circle(latlon[0], radius=1000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(latlon[0], radius=1500, fill=False, color='white').add_to(map_manhattan)
for lat,lon in zip(manhattan_data_0['Latitude'],manhattan_data_0['Longitude']):    
    folium.Circle([lat,lon], radius=2, fill=True, color='white').add_to(map_manhattan)
map_manhattan

This is the heatmap of the italian restaurants in cluster 0 neighborhoods

In [39]:
map_manhattan = folium.Map(location=latlon[0], zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_manhattan) #cartodbpositron cartodbdark_matter
HeatMap(italian_latlons).add_to(map_manhattan)
folium.Marker(latlon[0]).add_to(map_manhattan)
folium.Circle(latlon[0], radius=500, fill=False, color='white').add_to(map_manhattan)
folium.Circle(latlon[0], radius=1000, fill=False, color='white').add_to(map_manhattan)
folium.Circle(latlon[0], radius=1500, fill=False, color='white').add_to(map_manhattan)
for lat,lon in zip(manhattan_data_0['Latitude'],manhattan_data_0['Longitude']):    
    folium.Circle([lat,lon], radius=2, fill=True, color='white').add_to(map_manhattan)
map_manhattan

We can see from the above map that north of the center there is a region where the density of Italian restaurants is very low while the density of restaurants is reasonably high, thus that is an area of interest to us. 

In [40]:
c_new = [centers[0][0] + 1400,centers[0][1] - 500]
center_lon,center_lat = xy_to_lonlat(c_new[0],c_new[1])
center = [center_lat,center_lon]
center

[40.73286529955981, -73.9922767509276]

In [41]:
map_manhattan = folium.Map(location=latlon[0], zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_manhattan) #cartodbpositron cartodbdark_matter
HeatMap(italian_latlons).add_to(map_manhattan)
folium.Marker(latlon[0],popup = 'Centroid of 1st Cluster').add_to(map_manhattan)
folium.Circle(center, radius=750, fill=False, color='white').add_to(map_manhattan)
for lat,lon in zip(manhattan_data_0['Latitude'],manhattan_data_0['Longitude']):    
    folium.Circle([lat,lon], radius=100, fill=False, color='white').add_to(map_manhattan)
map_manhattan

The white circle now contains our area of interest.

The following lines of code is to divide our area into locations that are spaced about 50m with each other. We will then find out the number of restaurants that are nearby and also the distance of the nearest italian restaurant. We would then choose the locations that have less than 2 restaurants within 250m and nearest italian restaurant more than 400m far, as our locations of interest.

In [42]:
center_x = centers[0][0] + 1400
center_y = centers[0][1] - 500

In [43]:
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 50
y_step = 50 * k 
y_min = center_y - 750
x_min = center_x - 800

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = y_min + i * y_step
    x_offset = 25 if i%2==0 else 0
    for j in range(0, 51):
        x = x_min + j * x_step + x_offset
        d = calc_xy_distance(center_x, center_y, x, y)
        if (d <= 1250):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')

1705 candidate neighborhood centers generated.


In [44]:
def count_restaurants_nearby(x, y, restaurants, radius = 300):    
    count = 0
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=radius:
            count += 1
    return count

def find_nearest_restaurant(x, y, restaurants):
    d_min = 100000
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=d_min:
            d_min = d
    return d_min

roi_restaurant_counts = []
roi_italian_distances = []

print('Generating data on location candidates... ', end='')
for x, y in zip(roi_xs, roi_ys):
    count = count_restaurants_nearby(x, y, restaurants, radius=250)
    roi_restaurant_counts.append(count)
    distance = find_nearest_restaurant(x, y, italian_restaurants)
    roi_italian_distances.append(distance)
print('done.')

Generating data on location candidates... done.


In [45]:
df_roi_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Restaurants nearby':roi_restaurant_counts,
                                 'Distance to Italian restaurant':roi_italian_distances})

df_roi_locations.head(10)

Unnamed: 0,Latitude,Longitude,X,Y,Restaurants nearby,Distance to Italian restaurant
0,40.72818,-73.986629,-5819603.0,9867173.0,11,360.783845
1,40.728474,-73.986618,-5819553.0,9867173.0,7,393.064932
2,40.728769,-73.986608,-5819503.0,9867173.0,7,428.759955
3,40.729064,-73.986597,-5819453.0,9867173.0,7,455.68894
4,40.729358,-73.986587,-5819403.0,9867173.0,7,438.841693
5,40.729653,-73.986577,-5819353.0,9867173.0,5,427.213826
6,40.729947,-73.986566,-5819303.0,9867173.0,4,421.237789
7,40.730242,-73.986556,-5819253.0,9867173.0,4,421.154243
8,40.730537,-73.986545,-5819203.0,9867173.0,5,426.966647
9,40.730831,-73.986535,-5819153.0,9867173.0,4,438.440577


In [46]:
good_res_count = np.array((df_roi_locations['Restaurants nearby']<=2))
print('Locations with no more than two restaurants nearby:', good_res_count.sum())

good_ita_distance = np.array(df_roi_locations['Distance to Italian restaurant']>=400)
print('Locations with no Italian restaurants within 400m:', good_ita_distance.sum())

good_locations = np.logical_and(good_res_count, good_ita_distance)
print('Locations with both conditions met:', good_locations.sum())

df_good_locations = df_roi_locations[good_locations]

Locations with no more than two restaurants nearby: 815
Locations with no Italian restaurants within 400m: 826
Locations with both conditions met: 548


Now we will look at our places of interest on the map

In [47]:
good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_manhattan = folium.Map(location = center, zoom_start=15)
folium.TileLayer('cartodbpositron').add_to(map_manhattan)
HeatMap(restaurant_latlons).add_to(map_manhattan)
folium.Circle(center, radius=750, color='white', fill=True, fill_opacity=0.6).add_to(map_manhattan)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_manhattan) 
map_manhattan

Since the number of locations is large, we will cluster the locations into 10 clusters and give their cluster centers as the final location. The shareholders can then do a street level exploration to find the best location according to their needs(for example monetary, foot traffic etc.)

In [48]:
kclusters = 10

good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(c[0],c[1]) for c in kmeans.cluster_centers_]

map_manhattan = folium.Map(location = center, zoom_start=16)
folium.TileLayer('cartodbpositron').add_to(map_manhattan)
HeatMap(restaurant_latlons).add_to(map_manhattan)
folium.Circle(center, radius=750, color='white', fill=True, fill_opacity=0.6).add_to(map_manhattan)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=200, color='green', fill=True, fill_opacity=0.25).add_to(map_manhattan)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_manhattan) 
map_manhattan

Now to reverse geocode the candidate areas to get their addresses which can be presented to the stakeholders.

In [99]:
google_api_key = 'AIzaSyCjYvm5b4vMCtkdxX98lTP6tyiqlC8_uD0 '

In [101]:
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

These are the 10 candidate addresses from our cluster 0 neighborhoods 

In [103]:
for lon, lat in cluster_centers:
    addr = get_address(google_api_key, lat, lon)
    print(addr)

7 E 9th St, New York, NY 10003, USA
89 5th Ave, New York, NY 10003, USA
260 Greene St, New York, NY 10003, USA
780 Broadway, New York, NY 10003, USA
2 Union Square E, New York, NY 10003, USA
76 5th Ave, New York, NY 10011, USA
44 West 4th Street, New York, NY 10012, USA
18 Washington Square N, New York, NY 10011, USA
Union Square Park, 201 Park Ave S, New York, NY 10003, USA
13 Astor Pl, New York, NY 10003, USA


Now we reproduce the same process for the cluster 2 neighborhoods and the addresses we get are in the final cell of this notebook.

In [61]:
manhattan_data_2

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,X,Y,Restaurant count,Distance to Italian restaurant,Cluster
36,Manhattan,Tudor City,40.746917,-73.971219,-5816372.0,9865272.0,32,1433.494167,2
11,Manhattan,Roosevelt Island,40.76216,-73.949168,-5813710.0,9862501.0,5,1368.595906,2
16,Manhattan,Murray Hill,40.748303,-73.978332,-5816162.0,9866195.0,57,764.601177,2
39,Manhattan,Hudson Yards,40.756658,-74.000111,-5814821.0,9869041.0,23,693.004459,2
15,Manhattan,Midtown,40.754691,-73.981669,-5815091.0,9866655.0,40,575.899238,2
35,Manhattan,Turtle Bay,40.752042,-73.967708,-5815491.0,9864843.0,45,531.237097,2
13,Manhattan,Lincoln Square,40.773529,-73.985338,-5811911.0,9867214.0,35,527.213642,2
10,Manhattan,Lenox Hill,40.768113,-73.95886,-5812735.0,9863778.0,54,466.928172,2
14,Manhattan,Clinton,40.759101,-73.996119,-5814393.0,9868537.0,41,435.057407,2
8,Manhattan,Upper East Side,40.775639,-73.960508,-5811466.0,9864025.0,54,302.416688,2


In [62]:
map_manhattan = folium.Map(location=latlon[2], zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_manhattan) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_manhattan)
folium.Marker(latlon[2]).add_to(map_manhattan)
folium.Circle(latlon[2], radius=1000, fill=False, color='white').add_to(map_manhattan)
for lat,lon in zip(manhattan_data_2['Latitude'],manhattan_data_2['Longitude']):    
    folium.Circle([lat,lon], radius=100, fill=False, color='white').add_to(map_manhattan)
map_manhattan

In [63]:
map_manhattan = folium.Map(location=latlon[2], zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_manhattan) #cartodbpositron cartodbdark_matter
HeatMap(italian_latlons).add_to(map_manhattan)
folium.Marker(latlon[2]).add_to(map_manhattan)
folium.Circle(latlon[2], radius=1000, fill=False, color='white').add_to(map_manhattan)
for lat,lon in zip(manhattan_data_2['Latitude'],manhattan_data_2['Longitude']):    
    folium.Circle([lat,lon], radius=100, fill=False, color='white').add_to(map_manhattan)
map_manhattan

In [64]:
c_new = [centers[2][0] - 400,centers[2][1] + 500]
center_lon,center_lat = xy_to_lonlat(c_new[0],c_new[1])
center = [center_lat,center_lon]
center

[40.75655198043386, -73.97907091818759]

In [65]:
map_manhattan = folium.Map(location=latlon[2], zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_manhattan) #cartodbpositron cartodbdark_matter
HeatMap(italian_latlons).add_to(map_manhattan)
folium.Marker(latlon[2],popup = 'Centroid of 3st Cluster').add_to(map_manhattan)
folium.Circle(center, radius=600, fill=False, color='white').add_to(map_manhattan)
for lat,lon in zip(manhattan_data_2['Latitude'],manhattan_data_2['Longitude']):    
    folium.Circle([lat,lon], radius=100, fill=False, color='white').add_to(map_manhattan)
map_manhattan

In [66]:
center_x = c_new[0]
center_y = c_new[1]

In [67]:
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 50
y_step = 50 * k 
y_min = center_y - 600
x_min = center_x - 600

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(26/k)):
    y = y_min + i * y_step
    x_offset = 25 if i%2==0 else 0
    for j in range(0, 51):
        x = x_min + j * x_step + x_offset
        d = calc_xy_distance(center_x, center_y, x, y)
        if (d <= 1250):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')

1081 candidate neighborhood centers generated.


In [68]:
roi_restaurant_counts = []
roi_italian_distances = []

print('Generating data on location candidates... ', end='')
for x, y in zip(roi_xs, roi_ys):
    count = count_restaurants_nearby(x, y, restaurants, radius=250)
    roi_restaurant_counts.append(count)
    distance = find_nearest_restaurant(x, y, italian_restaurants)
    roi_italian_distances.append(distance)
print('done.')

Generating data on location candidates... done.


In [69]:
df_roi_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Restaurants nearby':roi_restaurant_counts,
                                 'Distance to Italian restaurant':roi_italian_distances})

df_roi_locations.head(10)

Unnamed: 0,Latitude,Longitude,X,Y,Restaurants nearby,Distance to Italian restaurant
0,40.753066,-73.974541,-5815341.0,9865728.0,3,851.945982
1,40.753361,-73.97453,-5815291.0,9865728.0,2,831.530251
2,40.753655,-73.97452,-5815241.0,9865728.0,2,813.678783
3,40.75395,-73.974509,-5815191.0,9865728.0,1,798.563563
4,40.754245,-73.974498,-5815141.0,9865728.0,1,786.342398
5,40.75454,-73.974488,-5815091.0,9865728.0,2,777.151832
6,40.754834,-73.974477,-5815041.0,9865728.0,1,771.100235
7,40.755129,-73.974467,-5814991.0,9865728.0,1,768.261788
8,40.755424,-73.974456,-5814941.0,9865728.0,1,768.672088
9,40.755719,-73.974446,-5814891.0,9865728.0,1,772.325955


In [70]:
good_res_count = np.array((df_roi_locations['Restaurants nearby']<=2))
print('Locations with no more than two restaurants nearby:', good_res_count.sum())

good_ita_distance = np.array(df_roi_locations['Distance to Italian restaurant']>=400)
print('Locations with no Italian restaurants within 400m:', good_ita_distance.sum())

good_locations = np.logical_and(good_res_count, good_ita_distance)
print('Locations with both conditions met:', good_locations.sum())

Locations with no more than two restaurants nearby: 765
Locations with no Italian restaurants within 400m: 875
Locations with both conditions met: 601


In [71]:
df = df_roi_locations[good_locations]
dist = []
for x,y in zip(df['X'],df['Y']):
    d = calc_xy_distance(x,y,center_x,center_y)
    dist.append(d)
df['dist'] = dist
df_good_locations = df[df['dist'] < 600]

good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_manhattan = folium.Map(location = center, zoom_start=16)
folium.TileLayer('cartodbpositron').add_to(map_manhattan)
HeatMap(restaurant_latlons).add_to(map_manhattan)
folium.Circle(center, radius=600, color='white', fill=True, fill_opacity=0.6).add_to(map_manhattan)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_manhattan) 
map_manhattan

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  


In [72]:
map_manhattan = folium.Map(location = center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_manhattan)
HeatMap(italian_latlons).add_to(map_manhattan)
folium.Circle(center, radius=600, color='white', fill=True, fill_opacity=0.6).add_to(map_manhattan)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_manhattan) 
map_manhattan

In [73]:
kclusters = 10

good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(c[0],c[1]) for c in kmeans.cluster_centers_]

map_manhattan = folium.Map(location = center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_manhattan)
HeatMap(restaurant_latlons).add_to(map_manhattan)
folium.Circle(center, radius=600, color='white', fill=True, fill_opacity=0.6).add_to(map_manhattan)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=200, color='green', fill=True, fill_opacity=0.25).add_to(map_manhattan)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_manhattan) 
map_manhattan

In [177]:
for lon, lat in cluster_centers:
    addr = get_address(google_api_key, lat, lon)
    print(addr)

341 Madison Ave, New York, NY 10017, USA
22 E 50th St, New York, NY 10022, USA
1180 6th Ave, New York, NY 10036, USA
66 E 46th St, New York, NY 10017, USA
33 W 42nd St, New York, NY 10036, USA
6 Av/W 48 St, 6th Ave, New York, NY 10020, USA
14 E 47th St, New York, NY 10017, USA
16 W 51st St, New York, NY 10111, USA
280 Park Ave # 27e, New York, NY 10017, USA
511 5th Ave, New York, NY 10017, USA
