# Capstone Project - The Battle of the Neighborhoods
### Applied Data Science Capstone by IBM/Coursera 
#### Houssam AlRachid

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction <a name="introduction"></a>

In this project, we will do data analysis to find an optimal location to open a new restaurant. Specifically, this report will be targeted to stakeholders interested in opening a **Japanese restaurant** in **New York City (NYC)**. 

Since there are lots of restaurants in NYC, therefore our detection we will based on :

**1. Locations that are not already crowded with restaurants**;

**2. Areas with no Japanese restaurants in vicinity**;

**3. Locations  close to city center of NYC**.

We will use our data science powers to generate the most promissing neighborhoods based on this criteria. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by investors who want to open a new Japanese restaurant in NYC.


## Data <a name="data"></a>

Based on our problem comprehension, several factors will influence our decision:

- Number of existing restaurants in each neighborhood of NYC;
- Number of existing Japanese restaurants in each neighborhood of NYC;
- Distance between Japanese restaurants in each neighborhood of NYC;
- Distance of neighborhood from NYC Center.

The necessary data will be needed are:

- NYC has a total of 5 boroughs and 306 neighborhoods. We will essentially need a dataset that contains all the boroughs and neighborhoods that exist in each borough along with their latitude and longitude.
  Source: https://geo.nyu.edu/catalog/nyu_2451_34572

- Information on venues in the neighborhoods of NYC. 
  Source: Fousquare API;
- Japanese restaurants in each neighborhood of NYC.
  Source: Fousquare API.
- Category IDs corresponding to Japanese restaurants were taken from Foursquare. Source : https://developer.foursquare.com/docs/resources/categories


### Download and Explore Datase

Before we get the data and start exploring it, let's download all the dependencies that we will need.

In [139]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans # library for data clustering

#!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

%matplotlib inline

print('Libraries imported.')

Libraries imported.


In order to segment the neighborhoods in NYC and explore them, we will essentially need a dataset that contains the all the **5 boroughs** and the **306 neighborhoods** associated with their *latitude* and *longitude* coordinates. The following link give us a free access to this dataset: https://geo.nyu.edu/catalog/nyu_2451_34572

In [140]:
newyork_data = requests.get("https://cocl.us/new_york_dataset").json()
print('Data downloaded!')
#newyork_data

Data downloaded!


Notice how all the relevant data is in the *features* key, which is basically a list of the neighborhoods. So, let's define a new variable that includes this data.

In [141]:
neighborhoods_data = newyork_data['features']
neighborhoods_data[0]

{'type': 'Feature',
 'id': 'nyu_2451_34572.1',
 'geometry': {'type': 'Point',
  'coordinates': [-73.84720052054902, 40.89470517661]},
 'geometry_name': 'geom',
 'properties': {'name': 'Wakefield',
  'stacked': 1,
  'annoline1': 'Wakefield',
  'annoline2': None,
  'annoline3': None,
  'annoangle': 0.0,
  'borough': 'Bronx',
  'bbox': [-73.84720052054902,
   40.89470517661,
   -73.84720052054902,
   40.89470517661]}}

The next task is essentially transforming this data of nested Python dictionaries into a *pandas* dataframe. So let's start by creating an empty dataframe.

In [142]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)
neighborhoods

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude


Then let's loop through the data and fill the dataframe one row at a time.

In [143]:
for data in neighborhoods_data:
    borough = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

In [144]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [145]:
neighborhoods.shape

(306, 4)

Use geopy library to get the longitude and latitude values of NYC Center.

In [146]:
address = '131 W 55th St, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
nycc_latitude = location.latitude
nycc_longitude = location.longitude
print('The geograpical coordinate of NYC Center are {}, {}.'.format(nycc_longitude, nycc_latitude))

The geograpical coordinate of NYC Center are -73.9796239, 40.7637566.


Let's now compute the distances of each Neighbourhood from NYC center

In [174]:
NYC_center = [nycc_longitude, nycc_latitude]

#!pip install shapely
import shapely.geometry
#!pip install pyproj
import pyproj
import math

proj_coo = pyproj.Proj(proj='utm',zone=10,ellps='WGS84', preserve_units=False)
#x,y = proj_coo(lon, lat)
#proj_coo(x,y,inverse=True)

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('NYC Center longitude={}, latitude={}'.format(NYC_center[0], NYC_center[1]))
x, y = proj_coo(NYC_center[0], NYC_center[1])
print('NYC Center UTM X={}, Y={}'.format(x, y))
lo, la = proj_coo(x, y, inverse=True)
print('NYC Center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
NYC Center longitude=-73.9796239, latitude=40.7637566
NYC Center UTM X=4651473.954676002, Y=5848293.05464907
NYC Center longitude=-73.97962390000002, latitude=40.76375659999999


In [148]:
NYC_center_x, NYC_center_y = proj_coo(NYC_center[0], NYC_center[1]) # City center in Cartesian coordinates
Lon_x, Lat_y = proj_coo(neighborhoods['Longitude'].get_values(), neighborhoods['Latitude'].get_values())
distances = []

for i in range(len(Lon_x)):
    d = calc_xy_distance(Lon_x[i], Lat_y[i], NYC_center_x, NYC_center_y)
    distances.append(d)

neighborhoods['Distance from center'] = distances


In [149]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Distance from center
0,Bronx,Wakefield,40.894705,-73.847201,22348.171855
1,Bronx,Co-op City,40.874294,-73.829939,21466.653725
2,Bronx,Eastchester,40.887556,-73.827806,22901.247837
3,Bronx,Fieldston,40.895437,-73.905643,19372.981141
4,Bronx,Riverdale,40.890834,-73.912585,18524.536065


OK, we now have the coordinates of centers of neighborhoods. Next let's keep only the **neighborhoods within ~15 km from NYC Center**. 

In [150]:
neighborhoods.drop( neighborhoods[ neighborhoods['Distance from center'] > 15000 ].index , inplace=True)
neighborhoods = neighborhoods.reset_index(drop=True)

xs = []
ys = []
latitudes = []
longitudes = []

for i in range(len(neighborhoods['Longitude'])):
    x,y = proj_coo(neighborhoods['Longitude'][i], neighborhoods['Latitude'][i])
    xs.append(x)
    ys.append(y)
    
neighborhoods['X'] = xs
neighborhoods['Y'] = ys

latitudes = neighborhoods['Latitude'].get_values()
longitudes = neighborhoods['Longitude'].get_values()
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Distance from center,X,Y
0,Bronx,University Heights,40.855727,-73.910416,14337.28519,4649657.0,5862515.0
1,Bronx,Morris Heights,40.847898,-73.919672,12948.680744,4649537.0,5861096.0
2,Bronx,East Tremont,40.842696,-73.887356,14290.22206,4652610.0,5862538.0
3,Bronx,West Farms,40.839475,-73.877745,14657.713936,4653661.0,5862787.0
4,Bronx,High Bridge,40.836623,-73.926102,11293.397863,4649930.0,5859480.0


In [151]:
neighborhoods.shape

(109, 7)

Let's visualize the data we have so far: NYC Center location and candidate neighborhood centers within ~15km

In [152]:
map_NYC_center = folium.Map(location=[nycc_latitude, nycc_longitude], zoom_start=11)
folium.Marker([nycc_latitude, nycc_longitude], popup='NYC Center').add_to(map_NYC_center)
for lat, lon in zip(latitudes, longitudes):
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_NYC_center)
map_NYC_center

### Foursquare

Next, we are going to start utilizing the Foursquare API to explore the **restaurants in each neighborhood**.
We will include in our list only venues that have 'restaurant' in category name, and we'll make sure to detect and include all the subcategories of specific '**Japanese restaurant**' category, as we need info on Japanese restaurants in the neighborhood.

To do this we are going to use the help of **Foursquare**. First let us define the Foursquare credentials.

In [None]:
CLIENT_ID = 
CLIENT_SECRET = 
VERSION = 

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Let's now search for all restaurants locations and get nearby ones in a **radius of 500 m from each neighborhood**.

To do this, we start by getting relevant part of JSON and transform it into a *pandas* dataframe and we define some needed functions to extract restaurants and Japanese restaurants as well:

In [154]:
# Category IDs corresponding to Japanese restaurants were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

japanese_restaurant_categories = ['4bf58dd8d48988d111941735','55a59bace4b013909087cb0c','55a59bace4b013909087cb30',
                                 '55a59bace4b013909087cb21','55a59bace4b013909087cb06','55a59bace4b013909087cb1b',
                                 '55a59bace4b013909087cb1e','55a59bace4b013909087cb18','55a59bace4b013909087cb24',
                                 '55a59bace4b013909087cb15','55a59bace4b013909087cb27','55a59bace4b013909087cb12',
                                 '4bf58dd8d48988d1d2941735','55a59bace4b013909087cb2d','55a59a31e4b013909087cb00',
                                 '55a59af1e4b013909087cb03','55a59bace4b013909087cb2a','55a59bace4b013909087cb0f',
                                 '55a59bace4b013909087cb33','55a59bace4b013909087cb09','55a59bace4b013909087cb36']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', New York', '')
    address = address.replace(', USA', '')
    return address

def get_venues_near_location(lat, lon, category, CLIENT_ID, CLIENT_SECRET, radius=510, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        CLIENT_ID, CLIENT_SECRET, version, lat, lon, category, radius, limit)

    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [155]:
# Let's now go over our neighborhood locations and get nearby restaurants; 
# we'll also maintain a dictionary of all found restaurants and all found Japanese restaurants

def get_restaurants(lats, lons):
    restaurants = {}
    jpanese_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        venues = get_venues_near_location(lat, lon, food_category, CLIENT_ID, CLIENT_SECRET, radius=510, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_japanese = is_restaurant(venue_categories, specific_filter=japanese_restaurant_categories)
            if is_res:
                x, y = proj_coo(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_japanese, x, y)
                if venue_distance <= 500:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_japanese:
                    japanese_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, japanese_restaurants, location_restaurants

restaurants = {}
japanese_restaurants = {}
location_restaurants = []

restaurants, japanese_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)


Obtaining venues around candidate locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [156]:
print('Total number of restaurants:', len(restaurants))
print('Total number of Japanese restaurants:', len(japanese_restaurants))
print('Percentage of Japanese restaurants: {:.2f}%'.format(len(japanese_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 2734
Total number of Japanese restaurants: 289
Percentage of Japanese restaurants: 10.57%
Average number of restaurants in neighborhood: 26.853211009174313


In [157]:
#print('List of all restaurants')
#print('-----------------------')
#for r in list(restaurants.values())[:10]:
#    print(r)
#print('...')
#print('Total:', len(restaurants))

In [158]:
#print('List of Japanese restaurants')
#print('---------------------------')
#for r in list(japanese_restaurants.values())[:10]:
#    print(r)
#print('...')
#print('Total:', len(japanese_restaurants))

In [159]:
print('Restaurants around location')
print('---------------------------')
for i in range(0, 5):
    rs = location_restaurants[i][:8]
    names = ', '.join([r[1] for r in rs])
    print('Restaurants around location {}: {}'.format(i+1, names))

Restaurants around location
---------------------------
Restaurants around location 1: Accra Resturant, Liberato, Don Pancholo Lechonera Restaurant, Number One Chinese Restaurant, NO.1 restaurant, Happy Land Chinese Restaurant Buffet, Morena Restaurant, Cuchara Restuarant And Lounge
Restaurants around location 2: Yoly Restaurant, Xing Sheng, El Valle Restaurant Bar, Mamajuana, NO.1 Chinese Restaurant
Restaurants around location 3: El Nuevo Bohio, Lounge & Restaurant, Wings Chinese Restaurant, Roy's Restaurant, El Valle Restaurant, JJ Restaurant & Cuchifritos, Chinatown
Restaurants around location 4: Jimbo's Hamburger Palace, Food Hai, El Salvadoreño, bar & restaurante
Restaurants around location 5: Justine Restaurant, Happy Garden, wah yong, Checkers, Dong King, El Tina, Boca Chica Seafood Restaurant, La Fuente Restaurant


Let's now see all the collected restaurants in our area of interest on map, and let's also show Japanese restaurants in a different color.

In [160]:
map_NYC_Center = folium.Map(location=[nycc_latitude, nycc_longitude], zoom_start=13)
folium.Marker([nycc_latitude, nycc_longitude], popup='NYC Center').add_to(map_NYC_Center)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_japanese = res[6]
    color = 'red' if is_japanese else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_NYC_Center)
map_NYC_Center

The map illustrates all the restaurants in an area within a few kilometers from New York City Center , and we know which ones are Japanese restaurants (red circles)! We also know which restaurants are in the vicinity of every neighborhood candidate center.

This concludes the data gathering phase - we're now ready to use this data for analysis to produce the report on optimal locations for a new Japanese restaurant!

## Methodology <a name="methodology"></a>

In this present project, we will focus on detecting areas of New York City that have low restaurant density, particularly those with low number of Japanese restaurants. We will limit our analysis to area ~15km around NYC Center.

The methodology can be resumes in three big steps as follows:

1. We have collected the required **data: location and type (category) of every restaurant within 15km from NYC Center**. We have also **identified Japanses restaurants** (according to Foursquare categorization).

2. We will compute and explore '**restaurant density**' across different areas of NYC - we will use **heatmaps** to identify a few promising areas close to center with low number of restaurants in general (*and* no Japanese restaurants in vicinity) and focus our attention on those areas.

3. We focus on most promising areas and within those create **clusters of locations that meet some basic requirements** established in discussion with stakeholders: we will take into consideration locations with **no more than 5 restaurants in radius of 500 meters**. We will present map of all such locations but also create clusters (using **k-means clustering**) of those locations to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

## Analysis <a name="analysis"></a>

Let's perform some basic explanatory data analysis and derive some additional info from our raw data. First let's count the **number of restaurants in every area candidate** :

In [161]:
location_restaurants_count = [len(res) for res in location_restaurants]

neighborhoods['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=500m:', np.array(location_restaurants_count).mean())

neighborhoods.head(10)

Average number of restaurants in every area with radius=500m: 26.853211009174313


Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Distance from center,X,Y,Restaurants in area
0,Bronx,University Heights,40.855727,-73.910416,14337.28519,4649657.0,5862515.0,8
1,Bronx,Morris Heights,40.847898,-73.919672,12948.680744,4649537.0,5861096.0,5
2,Bronx,East Tremont,40.842696,-73.887356,14290.22206,4652610.0,5862538.0,7
3,Bronx,West Farms,40.839475,-73.877745,14657.713936,4653661.0,5862787.0,3
4,Bronx,High Bridge,40.836623,-73.926102,11293.397863,4649930.0,5859480.0,14
5,Bronx,Melrose,40.819754,-73.909422,10468.79895,4652674.0,5858693.0,5
6,Bronx,Mott Haven,40.806239,-73.9161,8704.857021,4653229.0,5856819.0,7
7,Bronx,Port Morris,40.801664,-73.913221,8544.128814,4653839.0,5856503.0,6
8,Bronx,Longwood,40.815099,-73.895788,11076.340286,4654173.0,5859036.0,4
9,Bronx,Hunts Point,40.80973,-73.883315,11701.258801,4655634.0,5859230.0,3


OK, now let's calculate the **distance to nearest Japanese restaurant from every area candidate center** (not only those within 500m - we want distance to closest one, regardless of how distant it is).

In [162]:
distances_to_japanese_restaurant = []

for area_x, area_y in zip(xs, ys):
    min_distance = 10000
    for res in japanese_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_japanese_restaurant.append(min_distance)

neighborhoods['Distance to Japanese restaurant'] = distances_to_japanese_restaurant
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Distance from center,X,Y,Restaurants in area,Distance to Japanese restaurant
0,Bronx,University Heights,40.855727,-73.910416,14337.28519,4649657.0,5862515.0,8,2771.935757
1,Bronx,Morris Heights,40.847898,-73.919672,12948.680744,4649537.0,5861096.0,5,1776.801415
2,Bronx,East Tremont,40.842696,-73.887356,14290.22206,4652610.0,5862538.0,7,4183.623152
3,Bronx,West Farms,40.839475,-73.877745,14657.713936,4653661.0,5862787.0,3,4841.442198
4,Bronx,High Bridge,40.836623,-73.926102,11293.397863,4649930.0,5859480.0,14,1626.799685


In [163]:
print('Average distance to closest Japenese restaurant from each area center:', neighborhoods['Distance to Japanese restaurant'].mean())

Average distance to closest Japenese restaurant from each area center: 1047.9144675673256


Thus, **on average Japanese restaurant can be found within ~1km** from every area center candidate. That's fairly close, so we need to filter our areas carefully!

Let's crete a **heatmap** showing the **density of restaurants** and try to extract some meaningfull info from that. Also, let's show a few circles indicating distance of 1km, 2km, 5km and 10km from NYC Center.

In [164]:
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]
japanese_latlons = [[res[2], res[3]] for res in japanese_restaurants.values()]

In [165]:
from folium import plugins
from folium.plugins import HeatMap

map_NYC = folium.Map(location=[nycc_latitude, nycc_longitude], zoom_start=12)
folium.TileLayer('cartodbpositron').add_to(map_NYC) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_NYC)
folium.Marker([nycc_latitude, nycc_longitude]).add_to(map_NYC)
folium.Circle([nycc_latitude, nycc_longitude], radius=1000, fill=False, color='white').add_to(map_NYC)
folium.Circle([nycc_latitude, nycc_longitude], radius=2000, fill=False, color='white').add_to(map_NYC)
folium.Circle([nycc_latitude, nycc_longitude], radius=5000, fill=False, color='white').add_to(map_NYC)
folium.Circle([nycc_latitude, nycc_longitude], radius=10000, fill=False, color='white').add_to(map_NYC)
#folium.Circle([nycc_latitude, nycc_longitude], radius=15000, fill=False, color='white').add_to(map_NYC)
map_NYC

Looks like a few pockets of low restaurant density closest to city center can be found **East and North from NYC Center comapring to south**. The West is not counted since it does not belong to NYC! 

Let's create another **heatmap** showing the **density of Japanese restaurants** only.

In [166]:
map_NYC = folium.Map(location=[nycc_latitude, nycc_longitude], zoom_start=12)
folium.TileLayer('cartodbpositron').add_to(map_NYC) #cartodbpositron cartodbdark_matter
HeatMap(japanese_latlons).add_to(map_NYC)
folium.Marker([nycc_latitude, nycc_longitude]).add_to(map_NYC)
folium.Circle([nycc_latitude, nycc_longitude], radius=1000, fill=False, color='white').add_to(map_NYC)
folium.Circle([nycc_latitude, nycc_longitude], radius=2000, fill=False, color='white').add_to(map_NYC)
folium.Circle([nycc_latitude, nycc_longitude], radius=5000, fill=False, color='white').add_to(map_NYC)
folium.Circle([nycc_latitude, nycc_longitude], radius=10000, fill=False, color='white').add_to(map_NYC)
map_NYC

This map is not so 'hot' (Japanese restaurants represent a subset of ~10% of all restaurants in NYC) but it also indicates higher density of existing Japanese restaurants directly **South from NYC Center**, with closest pockets of **low Japanese restaurant density positioned North and East from city center**. 

Based on this we will now focus our analysis on areas *North and East from NYC Center* - we will move the center of our area of interest and reduce it's size to have a radius of **10km**. This places our location candidates mostly in boroughs **Brooklyn, Queens and Bronx**.

### Brooklyn, Queens and Bronx

Let's define the neignborhoods parts of Brooklyn, Queens and Bronx closest to NYC Center.

In [167]:
BB_data = neighborhoods[neighborhoods['Borough'].isin(['Brooklyn', 'Queens','Bronx']) ].reset_index(drop=True)
BB_data.shape

(71, 9)

OK. Let us now **filter** those locations: we're interested only in **locations with no more than 5 restaurants in radius of 500 meters**, and **no Japanese restaurants in radius of 500 meters**.

In [168]:
good_res_count = np.array((BB_data['Restaurants in area']<=5))
print('Locations with no more than 4 restaurants nearby:', good_res_count.sum())

good_jap_distance = np.array(BB_data['Distance to Japanese restaurant']>=500)
print('Locations with no Japanese restaurants within 500 m:', good_jap_distance.sum())

good_locations = np.logical_and(good_res_count, good_jap_distance)
print('Locations with both conditions met:', good_locations.sum())

BB_good_locations = BB_data[good_locations]

Locations with no more than 4 restaurants nearby: 21
Locations with no Japanese restaurants within 500 m: 41
Locations with both conditions met: 19


Let's see those good locations on map....

In [2]:
good_latitudes = BB_good_locations['Latitude'].values
good_longitudes = BB_good_locations['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_NYC = folium.Map(location=[nycc_latitude, nycc_longitude], zoom_start=11.3)
folium.TileLayer('cartodbpositron').add_to(map_NYC)
HeatMap(japanese_latlons).add_to(map_NYC)
folium.Circle([nycc_latitude, nycc_longitude], radius=10000, color='white', fill=True, fill_opacity=0.6).add_to(map_NYC)
folium.Marker([nycc_latitude, nycc_longitude]).add_to(map_NYC)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_NYC) 
map_NYC

NameError: name 'BB_good_locations' is not defined


Looking good. What we have now is a clear indication of zones with low number of restaurants in vicinity, and *no* Japanese restaurants at all nearby.

Let us now **cluster** those locations to create **centers of zones containing good locations**. Those zones, their centers and addresses will be the final result of our analysis. 


In [170]:
number_of_clusters = 3

good_xys = BB_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [proj_coo(cc[0], cc[1],inverse=True) for cc in kmeans.cluster_centers_]

map_NYC = folium.Map(location=[nycc_latitude, nycc_longitude], zoom_start=11.3)
folium.TileLayer('cartodbpositron').add_to(map_NYC)
HeatMap(japanese_latlons).add_to(map_NYC)
folium.Circle([nycc_latitude, nycc_longitude], radius=10000, color='white', fill=True, fill_opacity=0.4).add_to(map_NYC)
folium.Marker([nycc_latitude, nycc_longitude]).add_to(map_NYC)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=3000, color='green', fill=True, fill_opacity=0.25).add_to(map_NYC) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_NYC)
map_NYC

Not bad - our clusters represent groupings of most of the candidate locations and cluster centers are placed nicely in the middle of the zones 'rich' with location candidates.

Addresses of those cluster centers will be a good starting point for exploring the neighborhoods to find the best possible location based on neighborhood specifics.

Let's see those zones on a city map without heatmap, using shaded areas to indicate our clusters:

In [171]:
map_NYC = folium.Map(location=[nycc_latitude, nycc_longitude], zoom_start=11.3)
folium.Marker([nycc_latitude, nycc_longitude]).add_to(map_NYC)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=10000, color='#00000000', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_NYC)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_NYC)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=3000, color='green', fill=False).add_to(map_NYC) 
map_NYC


Let's zoom in on candidate areas in **Brooklyn**:

In [172]:
address = 'Brooklyn, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude_broo = location.latitude
longitude_broo = location.longitude
print('The geograpical coordinate of Brooklyn are {}, {}.'.format(latitude_broo, longitude_broo))


map_NYC = folium.Map(location=[latitude_broo, longitude_broo], zoom_start=11)
folium.Marker([nycc_latitude, nycc_longitude]).add_to(map_NYC)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=3000, color='green', fill=False).add_to(map_NYC) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=2, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_NYC)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_NYC)
map_NYC


The geograpical coordinate of Brooklyn are 40.6501038, -73.9495823.


The candidate areas in **Queens**:

In [175]:
address = 'Queens, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude_que = location.latitude
longitude_que = location.longitude
print('The geograpical coordinate of Brooklyn are {}, {}.'.format(latitude_que, longitude_que))


map_NYC = folium.Map(location=[latitude_que, longitude_que], zoom_start=12)
folium.Marker([nycc_latitude, nycc_longitude]).add_to(map_NYC)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=3000, color='green', fill=False).add_to(map_NYC) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=2, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_NYC)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_NYC)
map_NYC


The geograpical coordinate of Brooklyn are 40.7498243, -73.7976337.


...and candidate areas in **Bronx**:

In [176]:
address = 'The Bronx, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude_bro = location.latitude
longitude_bro = location.longitude
print('The geograpical coordinate of Brooklyn are {}, {}.'.format(latitude_bro, longitude_bro))


map_NYC = folium.Map(location=[latitude_bro, longitude_bro], zoom_start=12)
folium.Marker([nycc_latitude, nycc_longitude]).add_to(map_NYC)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=3000, color='green', fill=False).add_to(map_NYC) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=2, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_NYC)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_NYC)
map_NYC


The geograpical coordinate of Brooklyn are 40.8466508, -73.8785937.


Finaly, let's **get the addresses** which can be presented to stakeholders.

In [177]:
candidate_area_addresses = []
print('==============================================================')
print('Addresses of centers of areas recommended for further analysis')
print('==============================================================\n')
BB_good_locations.reset_index(drop=True)

Addresses of centers of areas recommended for further analysis



Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Distance from center,X,Y,Restaurants in area,Distance to Japanese restaurant
0,Bronx,Morris Heights,40.847898,-73.919672,12948.680744,4649537.0,5861096.0,5,1776.801415
1,Bronx,West Farms,40.839475,-73.877745,14657.713936,4653661.0,5862787.0,3,4841.442198
2,Bronx,Melrose,40.819754,-73.909422,10468.79895,4652674.0,5858693.0,5,1400.285479
3,Bronx,Longwood,40.815099,-73.895788,11076.340286,4654173.0,5859036.0,4,2930.439232
4,Bronx,Hunts Point,40.80973,-73.883315,11701.258801,4655634.0,5859230.0,3,4404.696854
5,Bronx,Soundview,40.821012,-73.865746,14047.736309,4656154.0,5861538.0,4,5706.396957
6,Bronx,Clason Point,40.806551,-73.854144,14152.526982,4658287.0,5860697.0,3,6486.965201
7,Queens,East Elmhurst,40.764073,-73.867041,11590.158567,4660700.0,5855308.0,5,2317.290994
8,Queens,Glendale,40.702762,-73.870742,13929.986305,4665403.0,5848451.0,1,1580.155182
9,Brooklyn,Wingate,40.660947,-73.937187,14592.365437,4663341.0,5839801.0,4,1096.176738


This concludes our analysis. We have created 19 addresses representing centers of zones containing locations with low number of restaurants and no Japanese restaurants nearby, all zones being fairly close to NYC Center (all less than 10 km from NYC Center). 
Those zones are identified in Queens, Bronx and Brooklynn boroughs, which we have identified as interesting due to being popular with tourists, fairly close to city center and well connected by public transport.

## Results and Discussion <a name="results"></a>

Our analysis shows that although there is a great number of restaurants in NYC (~1400 in our initial area of interest which was 15x15km around NYC Center), there are pockets of low restaurant density fairly close to city center. Highest concentration of restaurants was detected south from NYC Center, so we focused our attention to areas East and North, corresponding to boroughs Queens, Bronx and Brooklynn which offer a combination of popularity among tourists, closeness to city center, strong socio-economic dynamics and a number of pockets of low restaurant density.

After directing our attention to this more narrow area of interest, we first filter locations inside those boroughs so that those with more than five restaurants in radius of 500m and those with an Japanese restaurant closer than 500m were removed.

Result of all this is 19 zones containing largest number of potential new restaurant locations based on number of and distance to existing venues. This, of course, does not imply that those zones are actually optimal locations for a new restaurant! Purpose of this analysis was to only provide info on areas close to NYC Center but not crowded with existing restaurants (particularly Japanese) - it is entirely possible that there is a very good reason for small number of restaurants in any of those areas, reasons which would make them unsuitable for a new restaurant regardless of lack of competition in the area. Recommended zones should therefore be considered only as a starting point for more detailed analysis which could eventually result in location which has not only no nearby competition but also other factors taken into account and all other relevant conditions met.

## Conclusion <a name="conclusion"></a>

The goal of this project is to identify NYC areas close to center with low number of restaurants (particularly Japanese restaurants) in order to aid stakeholders in narrowing down the search for optimal location for a new Japanese restaurant. By calculating restaurant density distribution from Foursquare data we have first identified general boroughs that justify further analysis (Queens, Brooklyn and Bronx), and then generated extensive collection of locations which satisfy some basic requirements regarding existing nearby restaurants to be used as starting points for final exploration by stakeholders.

Final decission on optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location, levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.