### Exploring New York City center for new Indian Restaurant

Import Libraries

In [2]:
import pandas as pd
import numpy as np
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
from bs4 import BeautifulSoup
import requests
import os
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
%matplotlib inline
!pip install folium
import folium # map rendering library
from geopy.geocoders import Nominatim # convert an address into latitude and longit
# Matplotlib and associated plotting modules
print('Libraries imported.')

Libraries imported.


In [3]:
def geo_location(address):
 # get geo location of address
 geolocator = Nominatim(user_agent="ny_explorer")
 location = geolocator.geocode(address)
 latitude = location.latitude
 longitude = location.longitude
 return latitude,longitude

### Neighborhood Candidates
Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods. We will create a grid of cells covering our area of interest which is aprox. 12x12 killometers centered around Manhattan.
Let's first find the latitude & longitude of Manhattan, using specific, well known address and Google Maps geocoding API.

Get geospatial data


In [32]:
google_api_key='Your API Key'   ### Removed key for sharing violation

In [33]:
import requests

def get_coordinates(api_key, address, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&address={}'.format(api_key, address)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        geographical_data = results[0]['geometry']['location'] # get geographical coordinates
        lat = geographical_data['lat']
        lon = geographical_data['lng']
        return [lat, lon]
    except:
        return [None, None]
    
address = 'Manhattan, New York'
manhattan = get_coordinates(google_api_key, address)
print('Coordinate of {}: {}'.format(address, manhattan))

Coordinate of Manhattan, New York: [40.7830603, -73.9712488]


Now let's create a grid of area candidates, equaly spaced, centered around New york city and within ~6km from Manhattan. Our neighborhoods will be defined as circular areas with a radius of 300 meters, so our neighborhood centers will be 600 meters apart.

To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters (not in latitude/longitude degrees). Then we'll project those coordinates back to latitude/longitude degrees to be shown on Folium map. So let's create functions to convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in meters).

In [34]:
!pip install shapely
import shapely.geometry

!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Manhattan longitude={}, latitude={}'.format(manhattan[1], manhattan[0]))
x, y = lonlat_to_xy(manhattan[1], manhattan[0])
print('Manhattan UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Manhattan longitude={}, latitude={}'.format(lo, la))

Collecting shapely
[?25l  Downloading https://files.pythonhosted.org/packages/20/fa/c96d3461fda99ed8e82ff0b219ac2c8384694b4e640a611a1a8390ecd415/Shapely-1.7.0-cp36-cp36m-manylinux1_x86_64.whl (1.8MB)
[K     |████████████████████████████████| 1.8MB 7.7MB/s eta 0:00:01
[?25hInstalling collected packages: shapely
Successfully installed shapely-1.7.0
Collecting pyproj
[?25l  Downloading https://files.pythonhosted.org/packages/ce/37/705ee471f71130d4ceee41bbcb06f3b52175cb89273cbb5755ed5e6374e0/pyproj-2.6.0-cp36-cp36m-manylinux2010_x86_64.whl (10.4MB)
[K     |████████████████████████████████| 10.4MB 5.8MB/s eta 0:00:01
[?25hInstalling collected packages: pyproj
Successfully installed pyproj-2.6.0
Coordinate transformation check
-------------------------------
Manhattan longitude=-73.9712488, latitude=40.7830603
Manhattan UTM X=-5810246.805659814, Y=9865443.186247082
Manhattan longitude=-73.97124879999963, latitude=40.783060299998894


Let's create a hexagonal grid of cells: we offset every other row, and adjust vertical row spacing so that every cell center is equally distant from all it's neighbors.

In [35]:
manhattan_x, manhattan_y = lonlat_to_xy(manhattan[1], manhattan[0]) # Manhattan in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = manhattan_x - 6000
x_step = 600
y_min = manhattan_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(manhattan_x, manhattan_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

364 candidate neighborhood centers generated.


Let's visualize the data we have so far: city center location and candidate neighborhood centers:

In [36]:
!pip install folium

import folium



In [37]:
map_nyc = folium.Map(location=manhattan, zoom_start=13)
folium.Marker(manhattan, popup='Manhattan').add_to(map_nyc)
for lat, lon in zip(latitudes, longitudes):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin) 
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_nyc)
    #folium.Marker([lat, lon]).add_to(map_berlin)
map_nyc

In [38]:
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

addr = get_address(google_api_key, manhattan[0], manhattan[1])
print('Reverse geocoding check')
print('-----------------------')
print('Address of [{}, {}] is: {}'.format(manhattan[0], manhattan[1], addr))

Reverse geocoding check
-----------------------
Address of [40.7830603, -73.9712488] is: 225 Central Park West, New York, NY 10024, USA


In [39]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(google_api_key, lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    address = address.replace(', USA', '') # We don't need country part of address
    addresses.append(address)
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [40]:
addresses[100:150]

['414 E 71st St, New York, NY 10021',
 '325 E 75th St, New York, NY 10021',
 '237 E 79th St, New York, NY 10075',
 '1486 3rd Ave, New York, NY 10028',
 '120 E 88th St, New York, NY 10128',
 '63 E 92nd St, New York, NY 10128',
 '6 E 97th St, New York, NY 10029',
 'East Dr, New York, NY 10029',
 'East Dr, New York, NY 10029',
 '145 Central Park N, New York, NY 10026',
 '208 W 114th St, New York, NY 10026',
 '2190 Frederick Douglass Blvd, New York, NY 10026',
 '539 Manhattan Ave, New York, NY 10027',
 '412 W 127th St, New York, NY 10027',
 'E Rd, New York, NY 10044',
 '54 Bond St, New York, NY 10012',
 '25 Sutton Pl, New York, NY 10022',
 '403 E 62nd St, New York, NY 10065',
 '310 E 67th St, New York, NY 10065',
 '243 E 71st St, New York, NY 10021',
 '1308 3rd Ave, New York, NY 10021',
 '116 E 80th St, New York, NY 10075',
 '48 E 84th St, New York, NY 10028',
 'a, New York, NY 10128',
 '5 Ave & E 93 St, New York, NY 10128',
 '97th St Transverse, New York, NY 10029',
 'Central Park Drivewa

Let's now place all this into a Pandas dataframe.

In [41]:

df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,"24-43 28th St, Astoria, NY 11102",40.771502,-73.927293,-5812047.0,9859727.0,5992.495307
1,"25-33 14th Pl, Long Island City, NY 11102",40.775041,-73.92716,-5811447.0,9859727.0,5840.3767
2,"I-278, Astoria, NY 11102",40.778579,-73.927028,-5810847.0,9859727.0,5747.173218
3,"40.7816884 -73.9269238, Wards Meadow Loop, New...",40.782118,-73.926895,-5810247.0,9859727.0,5715.767665
4,"125 Hell Gate Cir, New York, NY 10035",40.785657,-73.926762,-5809647.0,9859727.0,5747.173218
5,"Main Rdwy/ Manhattan Psyc Ctr, New York, NY 10035",40.789197,-73.926629,-5809047.0,9859727.0,5840.3767
6,"20 Randalls Is Rd, New York, NY 10035",40.792736,-73.926496,-5808447.0,9859727.0,5992.495307
7,"14-44 31st Dr, Long Island City, NY 11106",40.766282,-73.931522,-5812947.0,9860247.0,5855.766389
8,"30-56 14th St, Long Island City, NY 11102",40.76982,-73.93139,-5812347.0,9860247.0,5604.462508
9,"27-16 28th Ave, Long Island City, NY 11102",40.773359,-73.931258,-5811747.0,9860247.0,5408.326913


In [42]:
df_locations.to_pickle('./locations.pkl')

### Foursquare API data


In [43]:
# Removed API Key to prevent security violation
CLIENT_ID = 'your Foursquare ID' # your Foursquare ID
CLIENT_SECRET = 'your Foursquare Secret' # your Foursquare Secret
VERSION = '20200406'

In [44]:
LIMIT = 30

In [45]:
# Category IDs corresponding to Indian restaurants were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

indian_restaurant_categories = ['4bf58dd8d48988d10f941735','54135bf5e4b08f3d2429dfe5','54135bf5e4b08f3d2429dff3',
                                 '54135bf5e4b08f3d2429dff5','54135bf5e4b08f3d2429dfe2','54135bf5e4b08f3d2429dff2',
                                 '54135bf5e4b08f3d2429dfe1','54135bf5e4b08f3d2429dfe3','54135bf5e4b08f3d2429dfe8',
                                 '54135bf5e4b08f3d2429dfe9','54135bf5e4b08f3d2429dfe6','54135bf5e4b08f3d2429dfdf',
                                 '54135bf5e4b08f3d2429dfe4','54135bf5e4b08f3d2429dfe7','54135bf5e4b08f3d2429dfea',
                                 '54135bf5e4b08f3d2429dfeb','54135bf5e4b08f3d2429dfed','54135bf5e4b08f3d2429dfee',
                                 '54135bf5e4b08f3d2429dff4','54135bf5e4b08f3d2429dfe0','54135bf5e4b08f3d2429dfdd',
                                 '54135bf5e4b08f3d2429dff6','54135bf5e4b08f3d2429dfef','54135bf5e4b08f3d2429dff0',
                                 '54135bf5e4b08f3d2429dff1','54135bf5e4b08f3d2429dfde','54135bf5e4b08f3d2429dfec']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', United States', '')
    address = address.replace(', USA', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20200406'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [46]:
# Let's now go over our neighborhood locations and get nearby restaurants; we'll also maintain a dictionary of all found restaurants and all found indian restaurants

import pickle

def get_restaurants(lats, lons):
    restaurants = {}
    indian_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, CLIENT_ID, CLIENT_SECRET, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_indian = is_restaurant(venue_categories, specific_filter=indian_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_indian, x, y)
                if venue_distance<=300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_indian:
                    indian_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, indian_restaurants, location_restaurants

# Try to load from local file system in case we did this before
restaurants = {}
indian_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_350.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('indian_restaurants_350.pkl', 'rb') as f:
        indian_restaurants = pickle.load(f)
    with open('location_restaurants_350.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    restaurants, indian_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
    
    # Let's persists this in local file system
    with open('restaurants_350.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('indian_restaurants_350.pkl', 'wb') as f:
        pickle.dump(indian_restaurants, f)
    with open('location_restaurants_350.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)

Obtaining venues around candidate locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [47]:
print('Total number of restaurants:', len(restaurants))
print('Total number of Indian restaurants:', len(indian_restaurants))
print('Percentage of Indian restaurants: {:.2f}%'.format(len(indian_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 1492
Total number of Indian restaurants: 64
Percentage of Indian restaurants: 4.29%
Average number of restaurants in neighborhood: 7.3489010989010985


In [48]:
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(restaurants))

List of all restaurants
-----------------------
('49c68eaaf964a5205c571fe3', 'Vesta Trattoria & Wine Bar', 40.76980934497303, -73.9277960938928, '21-02 30th Ave (at 21st St), Astoria, NY 11102', 302, False, -5812335.513822393, 9859784.009561414)
('597cdab7bd40092a30c3f2bc', 'Astoria Provisions', 40.77218710579064, -73.92895119263953, '12-23 Astoria Blvd (at 14th St), Long Island City, NY 11102', 234, False, -5811936.871087021, 9859944.285675734)
('4ab6d166f964a5202f7920e3', 'Roti Boti', 40.77198140414186, -73.92612563916222, '2709 21st St (btw Astoria Blvd & 27th Rd), Astoria, NY 11102', 111, True, -5811961.344086435, 9859579.330063501)
('502e6f61e4b0eed9c3113816', 'El Ancla', 40.77097725983302, -73.92705823255376, '28-08 21st St, Astoria, NY 11102', 61, False, -5812134.911398285, 9859694.603966929)
('4f345f48e4b03a18765668a9', 'La Herradura', 40.771986322122, -73.92548524462315, '2109 Astoria Blvd (Astoria Blvd and Newtown), Astoria, NY 11102', 161, False, -5811958.154462351, 9859496.

In [49]:
print('List of Indian restaurants')
print('---------------------------')
for r in list(indian_restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(indian_restaurants))

List of Indian restaurants
---------------------------
('4ab6d166f964a5202f7920e3', 'Roti Boti', 40.77198140414186, -73.92612563916222, '2709 21st St (btw Astoria Blvd & 27th Rd), Astoria, NY 11102', 111, True, -5811961.344086435, 9859579.330063501)
('4a60aa67f964a520f9c01fe3', 'Polash Indian restaurant', 40.79974477464853, -73.9387466714612, '2179 3rd Ave (at 119th St.), New York, NY 10035', 308, True, -5807304.652044634, 9861337.753099667)
('4ed4034102d5feaa206baf69', '21st Halal Kitchen INC.', 40.755577087402344, -73.9411392211914, '39-44 21st St (40th Ave), Long Island City, NY 11101', 333, True, -5814796.138218673, 9861434.972364187)
('58dd31f6d25ded1a4f6e8e3a', 'Yeti Spice Grill', 40.780301771768876, -73.94661962985992, '1764 1st Ave, New York, NY 10128', 68, True, -5810626.338013608, 9862258.697280634)
('545eb9de498ed803ac5569c4', 'SPICEHUT INDIAN RESTAURANT', 40.789675, -73.942797, '2036 2nd Ave (105 Street), New York, NY 10029', 14, True, -5809024.757389879, 9861811.059048066)

In [50]:
print('Restaurants around location')
print('---------------------------')
for i in range(100, 110):
    rs = location_restaurants[i][:8]
    names = ', '.join([r[1] for r in rs])
    print('Restaurants around location {}: {}'.format(i+1, names))

Restaurants around location
---------------------------
Restaurants around location 101: 
Restaurants around location 102: Up Thai, Boqueria, Uva, THEP Thai Restaurant, The Meatball Shop, Sushi Ishikawa, Heidi's House By The Side Of The Road, Bohemian Spirit Restaurant
Restaurants around location 103: Heidi's House By The Side Of The Road, Uva, Luke's Lobster, San Matteo Pizzeria e Cucina, Pil Pil, Calexico, Caffe Buon Gusto - Manhattan, Quality Eats
Restaurants around location 104: Flex Mussels, Toloache 82, Elio's, Dulce Vida Latin Bistro, Antonucci, The Simone, Beyoglu, Lexington Candy Shop Luncheonette
Restaurants around location 105: Dig Inn, Lex Restaurant, Ooki Sushi, Wok 88, Naruto Ramen, Guzan Sushi & Bar, Lolita's Kitchen, Peri Ela
Restaurants around location 106: Russ & Daughters, Paola's Restaurant, Table d'Hote, Lex Restaurant, Sfoglia, Pascalou, Lolita's Kitchen, Peri Ela
Restaurants around location 107: Earl's Beer & Cheese, Tre Otto
Restaurants around location 108: 
Res

Let's now see all the collected restaurants in our area of interest on map, and let's also show Indian restaurants in different color.

In [51]:
map_nyc = folium.Map(location=manhattan, zoom_start=13)
folium.Marker(manhattan, popup='Manhattan').add_to(map_nyc)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_indian = res[6]
    color = 'red' if is_indian else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_nyc)
map_nyc

Now we have all the restaurants in area within few kilometers from Manhattan, and we know which ones are Indian restaurants! We also know which restaurants exactly are in vicinity of every neighborhood candidate center.

This concludes the data gathering phase - we're now ready to use this data for analysis to produce the report on optimal locations for a new Indian restaurant!

### Analysis 
Let's perform some basic explanatory data analysis and derive some additional info from our raw data. First let's count the number of restaurants in every area candidate:

In [52]:
location_restaurants_count = [len(res) for res in location_restaurants]

df_locations['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=300m:', np.array(location_restaurants_count).mean())

df_locations.head(10)

Average number of restaurants in every area with radius=300m: 7.3489010989010985


Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area
0,"24-43 28th St, Astoria, NY 11102",40.771502,-73.927293,-5812047.0,9859727.0,5992.495307,8
1,"25-33 14th Pl, Long Island City, NY 11102",40.775041,-73.92716,-5811447.0,9859727.0,5840.3767,2
2,"I-278, Astoria, NY 11102",40.778579,-73.927028,-5810847.0,9859727.0,5747.173218,0
3,"40.7816884 -73.9269238, Wards Meadow Loop, New...",40.782118,-73.926895,-5810247.0,9859727.0,5715.767665,0
4,"125 Hell Gate Cir, New York, NY 10035",40.785657,-73.926762,-5809647.0,9859727.0,5747.173218,0
5,"Main Rdwy/ Manhattan Psyc Ctr, New York, NY 10035",40.789197,-73.926629,-5809047.0,9859727.0,5840.3767,0
6,"20 Randalls Is Rd, New York, NY 10035",40.792736,-73.926496,-5808447.0,9859727.0,5992.495307,2
7,"14-44 31st Dr, Long Island City, NY 11106",40.766282,-73.931522,-5812947.0,9860247.0,5855.766389,8
8,"30-56 14th St, Long Island City, NY 11102",40.76982,-73.93139,-5812347.0,9860247.0,5604.462508,2
9,"27-16 28th Ave, Long Island City, NY 11102",40.773359,-73.931258,-5811747.0,9860247.0,5408.326913,1


In [53]:
distances_to_indian_restaurant = []

for area_x, area_y in zip(xs, ys):
    min_distance = 10000
    for res in indian_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_indian_restaurant.append(min_distance)

df_locations['Distance to Indian restaurant'] = distances_to_indian_restaurant

In [54]:
df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area,Distance to Indian restaurant
0,"24-43 28th St, Astoria, NY 11102",40.771502,-73.927293,-5812047.0,9859727.0,5992.495307,8,170.979209
1,"25-33 14th Pl, Long Island City, NY 11102",40.775041,-73.92716,-5811447.0,9859727.0,5840.3767,2,535.425066
2,"I-278, Astoria, NY 11102",40.778579,-73.927028,-5810847.0,9859727.0,5747.173218,0,1124.333631
3,"40.7816884 -73.9269238, Wards Meadow Loop, New...",40.782118,-73.926895,-5810247.0,9859727.0,5715.767665,0,1720.921912
4,"125 Hell Gate Cir, New York, NY 10035",40.785657,-73.926762,-5809647.0,9859727.0,5747.173218,0,2174.511817
5,"Main Rdwy/ Manhattan Psyc Ctr, New York, NY 10035",40.789197,-73.926629,-5809047.0,9859727.0,5840.3767,0,2083.757116
6,"20 Randalls Is Rd, New York, NY 10035",40.792736,-73.926496,-5808447.0,9859727.0,5992.495307,2,1974.257364
7,"14-44 31st Dr, Long Island City, NY 11106",40.766282,-73.931522,-5812947.0,9860247.0,5855.766389,8,1190.362476
8,"30-56 14th St, Long Island City, NY 11102",40.76982,-73.93139,-5812347.0,9860247.0,5604.462508,2,770.979206
9,"27-16 28th Ave, Long Island City, NY 11102",40.773359,-73.931258,-5811747.0,9860247.0,5408.326913,1,701.323783


In [55]:
print('Average distance to closest Indian restaurant from each area center:', df_locations['Distance to Indian restaurant'].mean())

Average distance to closest Indian restaurant from each area center: 1398.5805344930536


OK, so on average Indian restaurant can be found within ~1400m** from every area center candidate. That's close, so we need to filter our areas carefully!

Let's crete a map showing heatmap / density of restaurants and try to extract some meaningfull info from that. Also, let's show borders of New York boroughs on our map and a few circles indicating distance of 1km, 2km and 3km from Manhattan.

In [56]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
print('Data downloaded!')

Data downloaded!


In [57]:
import json # library to handle JSON files
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)
    

In [58]:
newyork_data

{'type': 'FeatureCollection',
 'totalFeatures': 306,
 'features': [{'type': 'Feature',
   'id': 'nyu_2451_34572.1',
   'geometry': {'type': 'Point',
    'coordinates': [-73.84720052054902, 40.89470517661]},
   'geometry_name': 'geom',
   'properties': {'name': 'Wakefield',
    'stacked': 1,
    'annoline1': 'Wakefield',
    'annoline2': None,
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.84720052054902,
     40.89470517661,
     -73.84720052054902,
     40.89470517661]}},
  {'type': 'Feature',
   'id': 'nyu_2451_34572.2',
   'geometry': {'type': 'Point',
    'coordinates': [-73.82993910812398, 40.87429419303012]},
   'geometry_name': 'geom',
   'properties': {'name': 'Co-op City',
    'stacked': 2,
    'annoline1': 'Co-op',
    'annoline2': 'City',
    'annoline3': None,
    'annoangle': 0.0,
    'borough': 'Bronx',
    'bbox': [-73.82993910812398,
     40.87429419303012,
     -73.82993910812398,
     40.87429419303012]}},
  {'type': 'Feature',
 

In [59]:
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]

indian_latlons = [[res[2], res[3]] for res in indian_restaurants.values()]

In [60]:
def boroughs_style(feature):
    return { 'color': 'blue', 'fill': False }

In [61]:
from folium import plugins
from folium.plugins import HeatMap

map_nyc = folium.Map(location=manhattan, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_nyc) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_nyc)
folium.Marker(manhattan).add_to(map_nyc)
folium.Circle(manhattan, radius=1000, fill=False, color='white').add_to(map_nyc)
folium.Circle(manhattan, radius=2000, fill=False, color='white').add_to(map_nyc)
folium.Circle(manhattan, radius=3000, fill=False, color='white').add_to(map_nyc)
folium.GeoJson(newyork_data, style_function=boroughs_style, name='geojson').add_to(map_nyc)
map_nyc

Looks like a few pockets of low restaurant density closest to city center can be found new East Harlem area.

Let's create another heatmap map showing heatmap/density of Indian restaurants only.

In [62]:
map_nyc = folium.Map(location=manhattan, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_nyc) #cartodbpositron cartodbdark_matter
HeatMap(indian_latlons).add_to(map_nyc)
folium.Marker(manhattan).add_to(map_nyc)
folium.Circle(manhattan, radius=1000, fill=False, color='white').add_to(map_nyc)
folium.Circle(manhattan, radius=2000, fill=False, color='white').add_to(map_nyc)
folium.Circle(manhattan, radius=3000, fill=False, color='white').add_to(map_nyc)
folium.GeoJson(newyork_data, style_function=boroughs_style, name='geojson').add_to(map_nyc)
map_nyc

This map is not so 'hot' (Indian restaurants represent a subset of ~4.29% of all restaurants in NYC) but it also indicates higher density of existing Indian restaurants between 42nd street and 57th street between 2nd and 3rd ave, with closest pockets of low Indian restaurant density positioned near East Harlem.

Based on this, we recommend East Harlem area as the preferred location for new Indian Restaurant

### Results and Discussion

Our analysis shows that although there is a great number of restaurants in New York City(~1500 in our initial area of interest which was 12x12km around Manhattan), there are pockets of low restaurant density fairly close to Manhattan. Highest concentration of restaurants was detected most of the Manhattan area with least density around East Harlem. The highest density of Indian restaurants are in between 42nd street and 57th street, close to 2nd and 3rd ave. Again East Harlem area had low density of Indian restaurants compared to other parts of New York city.

### Conclusion

Purpose of this project was to identify NYC areas close to Manhattan with low number of restaurants (particularly Indian restaurants) in order to aid stakeholders in narrowing down the search for optimal location for a new Indian restaurant. Based on our analysis of restaurant density in general around Manhattan and Indian restaurant density in particular, we recommend East Harlem as the preferred location for new Indian Restaurant.

Final decission on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location (proximity to park or water), levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.