# Capstone Project - The Battle of the Neighborhoods - by Alexandre Maioli

## Applied Data Science Capstone by IBM/Coursera

## 1- Introduction: Business Problem <a name="introduction"></a>

In our final project "the Battle of the Neighborhoods", I will perform an optimal location analysis for a japanese restaurant in the city of Philadelphia. 

Although the current COVID-19 situation is a negative impact to the majority of restaurants' owners accross USA (https://www.inquirer.com/food/philadelphia-restaurant-closings-coronavirus-farmicia-mad-river-vitarellis-20200518.html), food delivery has never been in a high trend, especially with the expansion of home officing. Therefore, a group of investors would like to take this opportunity to open a japanese venue focused on a fast pace delivery system. 

As Philadelphia is well Known for its diverse and excellent food scene, I will detect locations that are not already crowded with restaurants nor with specifically japanese restaurants. I will impose two conditions for its optimal placement: no japanese restaurants within 1 km nor more than other 2 restaurants in the same 250m radius. We will try to adjust the location as close as possible to the city, assuming these previous two conditions are met.

I will be using similar adjusted thought and analysis as the example provided by the Coursera Platform (ref.https://cocl.us/coursera_capstone_notebook).



## 2- DATA <a name="Data"></a>

Based on the above business problem, the follwoing conditions will guide my data analysis:
* number of existing restaurants in the neighborhood (any type of restaurant) - no more than 2 within 250 m radius.
* number of and distance to Japanese restaurants in the vicinity - no other restaurant within 1 km. 
* distance of neighborhood from city center

We decided to use regularly spaced grid of locations, centered around city center, to define our neighborhoods.

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **geopy.geocoders**
* number of restaurants and their type and location in Philadelphia will be obtained using **Foursquare API**
* coordinate of Philadelphia Center will be obtained using **geopy.geocoders**. The center is known to be its City Hall area.

### Neighborhood Candidates

Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods as the coursera example nicely shown us. However, based on the Philadelphia distinct geography, I will create a rectangular grid of cells covering our area of interest (4x10 killometers rectangule centered around City Hall).

First, I will import all required libraries to find the latitude & longitude of Philly center (City Hall) using geopy.geocoders and nominatim.

In [1]:
import requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')

Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-2.0.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-2.0.0          | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ################################

In [2]:
address = 'City Hall, Philadelphia'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
lat = location.latitude
lon = location.longitude
print(lat, lon)

39.9531287 -75.1642021


In [3]:
# Please note that Philadelphia is UTM zone 18, North hemisphere.
!pip install shapely
import shapely.geometry

!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=18, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=18, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Philadelphia center longitude={}, latitude={}'.format(-75.1642021,39.9531287))
x, y = lonlat_to_xy(-75.1642021,39.9531287)
print('Philadelphia center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Philadelphia center longitude={}, latitude={}'.format(lo, la))

Collecting shapely
[?25l  Downloading https://files.pythonhosted.org/packages/20/fa/c96d3461fda99ed8e82ff0b219ac2c8384694b4e640a611a1a8390ecd415/Shapely-1.7.0-cp36-cp36m-manylinux1_x86_64.whl (1.8MB)
[K     |████████████████████████████████| 1.8MB 7.9MB/s eta 0:00:01
[?25hInstalling collected packages: shapely
Successfully installed shapely-1.7.0
Coordinate transformation check
-------------------------------
Philadelphia center longitude=-75.1642021, latitude=39.9531287
Philadelphia center UTM X=485974.1760981333, Y=4422567.890212067
Philadelphia center longitude=-75.1642021, latitude=39.9531287


In [4]:
phila_center_x, phila_center_y = lonlat_to_xy(-75.1642021,39.9531287) # City center in Cartesian coordinates

x_min = phila_center_x - 2000
x_step = 500
y_min = phila_center_y - 5000 
y_step = 450  

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, 21):
    y = y_min + i * y_step  
    x_offset = 250 if i%2==0 else 0
    for j in range(0, 9):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(phila_center_x, phila_center_y, x, y)
        if (distance_from_center <= 5001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

180 candidate neighborhood centers generated.


Based on the above results, we can now focus our search on these 180 candidate neighborhoods! Let's take a look on the map to have a better overview of the options.

In [5]:
map_phila = folium.Map(location=[39.9531287,-75.1642021], zoom_start=13)
folium.Marker([39.9531287,-75.1642021], popup='Philly CITY HALL').add_to(map_phila)
for lat, lon in zip(latitudes, longitudes):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_phila) 
    folium.Circle([lat, lon], radius=250, color='blue', fill=False).add_to(map_phila)
    #folium.Marker([lat, lon]).add_to(map_phila)
map_phila

Will check if the Nominatim function returns the City Hall address for the lat/long given.

In [6]:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="location5")
location = geolocator.reverse("39.9531287,-75.1642021")
print(location.address)

City Hall, City Hall Walk Ways, Rittenhouse Square, Philadelphia, Philadelphia County, Pennsylvania, 19110, United States of America


Good! Now will obtain the address for each of the 180 circular neighborhoods created and create a dataframe where we can see the following information for each: Address, Latitude, Longitude, X, Y and distance to City Hall. 

In [7]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    location = geolocator.reverse([lat,lon])
    addresses.append(location.address)
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [8]:
import pandas as pd

df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,"3205, Napoli Way, Packer Park, South Philadelp...",39.912098,-75.187504,483974.176098,4418018.0,4970.160963
1,"3110, South Uber Street, Packer Park, South Ph...",39.912108,-75.181654,484474.176098,4418018.0,4790.876746
2,"1612, Packer Avenue, Packer Park, South Philad...",39.912117,-75.175804,484974.176098,4418018.0,4658.594209
3,"Schuylkill Expressway, Packer Park, South Phil...",39.912125,-75.169954,485474.176098,4418018.0,4577.390086
4,"Saint Maris Convent, South 10th Street, Whitma...",39.912134,-75.164104,485974.176098,4418018.0,4550.0
5,"524, Bigler Street, Whitman, South Philadelphi...",39.912142,-75.158254,486474.176098,4418018.0,4577.390086
6,"Oregon Market, South 3rd Street, Whitman, Sout...",39.91215,-75.152404,486974.176098,4418018.0,4658.594209
7,"East Oregon Avenue, Whitman, South Philadelphi...",39.912157,-75.146554,487474.176098,4418018.0,4790.876746
8,"East Oregon Avenue, Whitman, South Philadelphi...",39.912165,-75.140704,487974.176098,4418018.0,4970.160963
9,"Brite Star Manufacturing Company, Oregon Avenu...",39.916157,-75.18459,484224.176098,4418468.0,4457.85823


In [9]:
df_locations.to_pickle('./locations.pkl')  

With the location data of each of the interesting neighborhoods in hands, now I will use the FOURSQUARE to start taking data to analyze my imposed conditions. 
Therefore, will obtain all the restaurants and japanese ones within the circular neighborhoods.

In [10]:

CLIENT_ID = 'VDE5DMZ3QMDXO4X0ZZ5SQSWGMEI1OQQF3ZDM0K4IZ22PHENW' # your Foursquare ID
CLIENT_SECRET = 'GMVW4SDCWVJV4QWMDGWHBWMGYA1U3CRTQSHKGJJKYJOODWF4' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)



Your credentails:
CLIENT_ID: VDE5DMZ3QMDXO4X0ZZ5SQSWGMEI1OQQF3ZDM0K4IZ22PHENW
CLIENT_SECRET:GMVW4SDCWVJV4QWMDGWHBWMGYA1U3CRTQSHKGJJKYJOODWF4


In [11]:
# Category IDs corresponding to Japanese restaurants (including sushi,...) and all food category were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

japanese_restaurant_categories = ['4bf58dd8d48988d111941735','55a59bace4b013909087cb30','55a59bace4b013909087cb24',
                                 '55a59bace4b013909087cb15','4bf58dd8d48988d1d2941735']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Philadelphia', '')
    address = address.replace(', Philadelphia County', '')
    address = address.replace(', Pennsylvania', '')
    address = address.replace(', United States of America', '')
    return address

def get_venues_near_location(lat, lon, category, CLIENT_ID, CLIENT_SECRET, radius=400, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        CLIENT_ID, CLIENT_SECRET, VERSION, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

Category IDs corresponding to Japanese restaurants (including sushi,...) and all food category were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):


Let's now go over our neighborhood locations and get nearby restaurants; we'll also maintain a dictionary of all found restaurants and all found japanese restaurants


In [12]:
import pickle

def get_restaurants(lats, lons):
    restaurants = {}
    japanese_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=280 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, CLIENT_ID, CLIENT_SECRET, radius=280, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_japanese = is_restaurant(venue_categories, specific_filter=japanese_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_japanese, x, y)
                if venue_distance<=250:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_japanese:
                    japanese_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, japanese_restaurants, location_restaurants

# Try to load from local file system in case we did this before
restaurants = {}
japanese_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_280.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('japanese_restaurants_280.pkl', 'rb') as f:
        japanese_restaurants = pickle.load(f)
    with open('location_restaurants_280.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    restaurants, japanese_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
    
    # Let's persists this in local file system
    with open('restaurants_280.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('japanese_restaurants_280.pkl', 'wb') as f:
        pickle.dump(japanese_restaurants, f)
    with open('location_restaurants_280.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)
        


Obtaining venues around candidate locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [13]:
print('Total number of restaurants:', len(restaurants))
print('Total number of Japanese restaurants:', len(japanese_restaurants))
print('Percentage of Japanese restaurants: {:.2f}%'.format(len(japanese_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 740
Total number of Japanese restaurants: 51
Percentage of Japanese restaurants: 6.89%
Average number of restaurants in neighborhood: 3.6166666666666667


Awesome! 6.89% seems a low and satisfactory result so far. In addition, each neighborhood has only 3.6 restaurants as average within its 250m radius. 
Lets make a list for the 51 japanese restaurants. 

In [14]:
print('List of Japanese restaurants')
print('---------------------------')
for r in list(japanese_restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(japanese_restaurants))

List of Japanese restaurants
---------------------------
('4b7ecbb4f964a520160030e3', "Johnny Chang's", 39.918101845696164, -75.17093327675984, '2601 S Broad St (at W Shunk St), PA 19148, United States', 251, True, 485391.76665635465, 4418681.372046407)
('58d93f7b113efc277d518a85', 'Kyoto Japan', 39.921697, -75.146706, '37 Snyder Ave (S Water St.), PA 19148, United States', 159, True, 487462.9252939935, 4419076.710852634)
('4ec9a772f79041351f52c520', 'Hibachi 2Go', 39.92439630163154, -75.17042876646192, '1414 Snyder Ave (at S Broad St), PA 19145, United States', 212, True, 485436.2165068007, 4419379.903700163)
('5bba46c0d807ee002c4137e2', 'Ginza', 39.93249, -75.146052, '1100 S Front St, PA 19147, United States', 203, True, 487520.77410018403, 4420274.522234872)
('57cf6245498e370db9b79b8e', 'Royal Sushi & Izakaya', 39.938008678666115, -75.14640796478119, '780 S 2nd St, PA 19147, United States', 170, True, 487491.36384265125, 4420887.084906907)
('5904fad40802d42c65ebef60', "Bangin' Curry

Now lets see which restaurants belong to each location:

In [15]:
print('Restaurants around location')
print('---------------------------')
for i in range(0, 20):
    rs = location_restaurants[i][:8]
    names = ', '.join([r[1] for r in rs])
    print('Restaurants around location {}: {}'.format(i+1, names))

Restaurants around location
---------------------------
Restaurants around location 1: 
Restaurants around location 2: Penrose Diner, Popi's Restaurant, Peking Inn
Restaurants around location 3: Lombardis Prime Meats
Restaurants around location 4: SOMO SoPhi
Restaurants around location 5: J.P. Caterers
Restaurants around location 6: Teppanyaki Grill & Supreme Buffet
Restaurants around location 7: Pho Ha Saigon, Oregon Diner, Banh Mi Square
Restaurants around location 8: Tony Luke's Casa De Pasta
Restaurants around location 9: 
Restaurants around location 10: 
Restaurants around location 11: 
Restaurants around location 12: 
Restaurants around location 13: New Chopsticks House, Asian Fusion & Steak, Nifty Fifty's
Restaurants around location 14: QQ Chinese Restaurant, Flying Fish Seafood Market
Restaurants around location 15: 
Restaurants around location 16: Los Caballos Locos
Restaurants around location 17: 
Restaurants around location 18: Franks Restaurant
Restaurants around location 1

Using folium, there they are! All restaurants in blue, and japanese restaurants in red.

In [16]:
map_phila = folium.Map(location=[39.9531287,-75.1642021], zoom_start=13)
folium.Marker([39.9531287,-75.1642021], popup='Philly CITY HALL').add_to(map_phila)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_japanese = res[6]
    color = 'red' if is_japanese else 'blue'
    folium.CircleMarker([lat, lon], radius=2, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_phila)
map_phila

## Methodology

As previously stated, we will focus on neighborhoods with low restaurant density, particularly those with few Japanese restaurants. I will follow the same good methodology as per Coursera example, but here limiting our analysis to area ~3km around city center.

In first step we have collected the required data: location and type (category) of every restaurant within 5km from Philadelphia center (City Hall). We have also identified japanese restaurants (according to Foursquare categorization).

Second step in my analysis will be the calculation and exploration of 'restaurant density' across different areas of Philadelphia - using heatmaps to identify a few promising areas close to center with low number of restaurants in general (and no Japanese restaurants in vicinity) and focus our attention on those areas.

In third and final step we will focus on most promising areas and within those create clusters of locations that meet BOTH of the following conditions:                        
    1- Locations with no more than two restaurants in radius of 250 meters.                                                   
    2- Locations without Japanese restaurants in radius of 1000 meters. 
    
Finally, a map will be presented of all such cluster locations (using k-means clustering with k=5) to identify general zones / neighborhoods / addresses which should be a starting point for final 'street level' exploration and search for optimal venue location by stakeholders.

### Analysis
Let's perform some basic explanatory data analysis and derive some additional info from our raw data. First let's count the number of restaurants in every area candidate:

For our dataframe, I will include then how many restaurants are located within each row (address - neighborhood) and its distance to nearest Japanese restaurant

In [17]:
location_restaurants_count = [len(res) for res in location_restaurants]

df_locations['Restaurants in area'] = location_restaurants_count

distances_to_japanese_restaurant = []

for area_x, area_y in zip(xs, ys):
    min_distance = 5000
    for res in japanese_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_japanese_restaurant.append(min_distance)

df_locations['Distance to Japanese restaurant'] = distances_to_japanese_restaurant

print('Average number of restaurants in every area with radius=250m:', np.array(location_restaurants_count).mean())
print('Average distance to closest Japanese restaurant from each area center:', df_locations['Distance to Japanese restaurant'].mean())

df_locations.head(10)

Average number of restaurants in every area with radius=250m: 3.6166666666666667
Average distance to closest Japanese restaurant from each area center: 707.2686453833273


Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area,Distance to Japanese restaurant
0,"3205, Napoli Way, Packer Park, South Philadelp...",39.912098,-75.187504,483974.176098,4418018.0,4970.160963,0,1565.174474
1,"3110, South Uber Street, Packer Park, South Ph...",39.912108,-75.181654,484474.176098,4418018.0,4790.876746,3,1132.334128
2,"1612, Packer Avenue, Packer Park, South Philad...",39.912117,-75.175804,484974.176098,4418018.0,4658.594209,1,783.957919
3,"Schuylkill Expressway, Packer Park, South Phil...",39.912125,-75.169954,485474.176098,4418018.0,4577.390086,1,668.580183
4,"Saint Maris Convent, South 10th Street, Whitma...",39.912134,-75.164104,485974.176098,4418018.0,4550.0,1,882.84138
5,"524, Bigler Street, Whitman, South Philadelphi...",39.912142,-75.158254,486474.176098,4418018.0,4577.390086,1,1269.57408
6,"Oregon Market, South 3rd Street, Whitman, Sout...",39.91215,-75.152404,486974.176098,4418018.0,4658.594209,3,1166.180486
7,"East Oregon Avenue, Whitman, South Philadelphi...",39.912157,-75.146554,487474.176098,4418018.0,4790.876746,1,1058.880413
8,"East Oregon Avenue, Whitman, South Philadelphi...",39.912165,-75.140704,487974.176098,4418018.0,4970.160963,0,1175.788473
9,"Brite Star Manufacturing Company, Oregon Avenu...",39.916157,-75.18459,484224.176098,4418468.0,4457.85823,0,1186.946589


Therefore on average a Japanese restaurant can be found within ~700m from every area center candidate. That's close even though not as close as Berlin or New York. Nevertheless, we will later impose a condition for no japanese restaurant within 1 km. 

Let's crete a map showing heatmap / density of restaurants and try to extract some meaningfull info from that. Also I am circles indicating distance of 1km, 2km and 3km from City Hall.

In [18]:
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]

japanese_latlons = [[res[2], res[3]] for res in japanese_restaurants.values()]

In [19]:
from folium import plugins
from folium.plugins import HeatMap

map_phila = folium.Map(location=[39.9531287,-75.1642021], zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_phila) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_phila)
folium.Marker([39.9531287,-75.1642021]).add_to(map_phila)
folium.Circle([39.9531287,-75.1642021], radius=1000, fill=False, color='white').add_to(map_phila)
folium.Circle([39.9531287,-75.1642021], radius=2000, fill=False, color='white').add_to(map_phila)
folium.Circle([39.9531287,-75.1642021], radius=3000, fill=False, color='white').add_to(map_phila)

map_phila

Looks like a few pockets of low restaurant density closest to city center can be found NORTHEAST from City Hall (Philly City Center). 

Below, lets see another heatmap map showing heatmap/density of japanese restaurants only.

In [20]:
map_phila = folium.Map(location=[39.9531287,-75.1642021], zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_phila) #cartodbpositron cartodbdark_matter
HeatMap(japanese_latlons).add_to(map_phila)
folium.Marker([39.9531287,-75.1642021]).add_to(map_phila)
folium.Circle([39.9531287,-75.1642021], radius=1000, fill=False, color='white').add_to(map_phila)
folium.Circle([39.9531287,-75.1642021], radius=2000, fill=False, color='white').add_to(map_phila)
folium.Circle([39.9531287,-75.1642021], radius=3000, fill=False, color='white').add_to(map_phila)

map_phila

This map is definetly 'not hot' (Japanese restaurants represent a subset of ~7% of all restaurants in Philadelphia) but it also indicates higher density of existing Italian restaurants directly south and northwest from City Hall, with closest pockets of low Italian restaurant density positioned northeast from city center.

Based on this we will now focus our analysis on NORTHEAST from Philly center - we will move the center of our area of interest and reduce it's size to have a radius of 750m. This places our location candidates mostly in YORKTOWN/GIRARD/CAMBRIDGE PLAZA.

YORKTOWN/GIRARD/CAMBRIDGE PLAZA:
Analysis of the sorroundings amplifies the good location of this areas to stakeholders as they are mostly residential and nearby big university campus as TEMPLE and DREXEL Univeristies. Ultimately, it is a big surplus as schools tend to go online for at least a year prompting the demand of food delivery. 
In addition, its relatively close to city center in walking distance. This borough appear to justify further analysis.

Let's define new, more narrow region of interest, which will include low-restaurant-count parts of YORKTOWN/GIRARD/CAMBRIDGE PLAZA, closest to City Hall

In [21]:
roi_x_min = phila_center_x 
roi_y_max = phila_center_y + 1200
roi_width = 1500
roi_height = 1500
roi_center_x = roi_x_min + 750
roi_center_y = roi_y_max + 750
roi_center_lon, roi_center_lat = xy_to_lonlat(roi_center_x, roi_center_y)
roi_center = [roi_center_lat, roi_center_lon]

map_phila = folium.Map(location=roi_center, zoom_start=14)
HeatMap(restaurant_latlons).add_to(map_phila)
folium.Marker([39.9531287,-75.1642021]).add_to(map_phila)
folium.Circle(roi_center, radius=750, color='white', fill=True, fill_opacity=0.4).add_to(map_phila)
map_phila

AWESOME! - this nicely covers all the pockets of low restaurant density in an area of YORKTOWN/GIRARD/CAMBRIDGE PLAZA, closest to City Hall.

Let's also create new, more dense grid of location candidates restricted to our new region of interest (let's make our location candidates 100m appart).

In [22]:
x_step = 100
y_step = 100 
roi_y_min = roi_center_y - 750

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(31)):
    y = roi_y_min + i * y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, 21):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(roi_center_x, roi_center_y, x, y)
        if (d <= 751):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')

176 candidate neighborhood centers generated.


176 candidates! Now let's impose our 2 most important condition for each location candidate: **number of restaurants in vicinity** (we'll use radius of **250 meters**) and **distance to closest Japanese restaurant**.

In [23]:
def count_restaurants_nearby(x, y, restaurants, radius=250):    
    count = 0
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=radius:
            count += 1
    return count

def find_nearest_restaurant(x, y, restaurants):
    d_min = 100000
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=d_min:
            d_min = d
    return d_min

roi_restaurant_counts = []
roi_japanese_distances = []

print('Generating data on location candidates... ', end='')
for x, y in zip(roi_xs, roi_ys):
    count = count_restaurants_nearby(x, y, restaurants, radius=250)
    roi_restaurant_counts.append(count)
    distance = find_nearest_restaurant(x, y, japanese_restaurants)
    roi_japanese_distances.append(distance)
print('done.')

Generating data on location candidates... done.


In [24]:
# Let's put this into dataframe
df_roi_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Restaurants nearby':roi_restaurant_counts,
                                 'Distance to Japanese restaurant':roi_japanese_distances})

df_roi_locations.head(10)

Unnamed: 0,Latitude,Longitude,X,Y,Restaurants nearby,Distance to Japanese restaurant
0,39.963953,-75.155446,486724.176098,4423768.0,5,719.913316
1,39.964848,-75.159546,486374.176098,4423868.0,6,428.384105
2,39.96485,-75.158376,486474.176098,4423868.0,2,514.642191
3,39.964851,-75.157205,486574.176098,4423868.0,1,605.1448
4,39.964853,-75.156034,486674.176098,4423868.0,2,698.243419
5,39.964854,-75.154863,486774.176098,4423868.0,2,793.024284
6,39.964856,-75.153692,486874.176098,4423868.0,0,888.949469
7,39.964857,-75.152521,486974.176098,4423868.0,0,985.684941
8,39.964859,-75.15135,487074.176098,4423868.0,1,930.027191
9,39.965747,-75.161305,486224.176098,4423968.0,6,395.522424


OK. Let us now **filter** those locations: we're interested only in **locations with no more than two restaurants in radius of 250 meters**, and **no Japanese restaurants in radius of 1000 meters**.

In [35]:
good_res_count = np.array((df_roi_locations['Restaurants nearby']<=2))
print('Locations with no more than two restaurants nearby:', good_res_count.sum())

good_jap_distance = np.array(df_roi_locations['Distance to Japanese restaurant']>=1000)
print('Locations with no Japanese restaurants within 1000m:', good_jap_distance.sum())

good_locations = np.logical_and(good_res_count, good_jap_distance)
print('Locations with both conditions met:', good_locations.sum())

df_good_locations = df_roi_locations[good_locations]


Locations with no more than two restaurants nearby: 127
Locations with no Japanese restaurants within 1000m: 49
Locations with both conditions met: 45


In [26]:
good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_phila = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_phila)
HeatMap(restaurant_latlons).add_to(map_phila)
folium.Circle(roi_center, radius=750, color='white', fill=True, fill_opacity=0.6).add_to(map_phila)
folium.Marker([39.9531287,-75.1642021]).add_to(map_phila)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_phila) 
map_phila

Looking good. We now have a bunch of locations fairly close to City Hall (near Girard, mostly south of Yorktown).
Let's now show those good locations in a form of heatmap:

In [27]:
map_phila = folium.Map(location=roi_center, zoom_start=14)
HeatMap(good_locations, radius=25).add_to(map_phila)
folium.Marker([39.9531287,-75.1642021]).add_to(map_phila)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_phila)
map_phila

There is clear indication of zones with low number of restaurants in vicinity, and *no* Japanese restaurants at all nearby.

Let us now **cluster** those locations to create 5 **centers of zones containing good locations**. Those zones, their centers and addresses will be the final result of our analysis.

In [28]:
from sklearn.cluster import KMeans

number_of_clusters = 5

good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

map_phila = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_phila)
HeatMap(restaurant_latlons).add_to(map_phila)
folium.Circle(roi_center, radius=750, color='white', fill=True, fill_opacity=0.4).add_to(map_phila)
folium.Marker([39.9531287,-75.1642021]).add_to(map_phila)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=150, color='green', fill=True, fill_opacity=0.25).add_to(map_phila) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_phila)
map_phila

Addresses of those cluster centers are a good starting point for exploring the neighborhoods to find the best possible location based on neighborhood specifics.

Let's see those zones on a city map without heatmap, using shaded areas to indicate our clusters:

In [29]:
map_phila = folium.Map(location=roi_center, zoom_start=14)
folium.Marker([39.9531287,-75.1642021]).add_to(map_phila)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#00000000', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_phila)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_phila)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=150, color='green', fill=False).add_to(map_phila) 
map_phila

Finaly, let's **reverse geocode those candidate area centers to get the addresses** which can be presented to stakeholders.

In [85]:
print('==============================================================')
print('Addresses of centers of areas recommended for further analysis')
print('==============================================================\n')
candidate_area_addresses = []
for lat, lon in cluster_centers:
    location = geolocator.reverse([lon,lat])
    candidate_area_addresses.append(location)
    x, y = lonlat_to_xy(lon, lat)
    d = calc_xy_distance(phila_center_x, phila_center_y, x, y)
    print('{}'.format(location))


Addresses of centers of areas recommended for further analysis

Temple University Sports Complex, North 13th Street, Cambridge Plaza, Philadelphia, Philadelphia County, Pennsylvania, 19133, United States of America
St. Malachy Catholic School, West Flora Street, Yorktown, Philadelphia, Philadelphia County, Pennsylvania, 19122, United States of America
1559, West Cabot Street, North Central, Philadelphia, Philadelphia County, Pennsylvania, 19121, United States of America
901, North 10th Street, Harrison Plaza, Philadelphia, Philadelphia County, Pennsylvania, 19123, United States of America
Girard Plaza, West Harper Street, Yorktown, Philadelphia, Philadelphia County, Pennsylvania, 19123, United States of America


This concludes my analysis. 
- 5 addresses representing centers of zones containing locations with low number of restaurants and no japanese restaurants nearby, all zones being fairly close to city center (all less than 3km from City Hall) 

These addresses should be considered only as a starting point for exploring area neighborhoods in search for potential restaurant locations. Most of the zones are located in Girardi/Yorktown/Poplar, which we have identified as interesting due to being popular with university students, home owners and to being fairly close to city center and well connected by public transport.

In [31]:
map_phila = folium.Map(location=roi_center, zoom_start=14)
folium.Circle([39.9531287,-75.1642021], radius=100, color='red', fill=True, fill_color='red', fill_opacity=1).add_to(map_phila)
for lonlat, location in zip(cluster_centers, candidate_area_addresses):
    folium.Marker([lonlat[1], lonlat[0]], popup=location.address).add_to(map_phila) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=150, color='#0000ff00', fill=True, fill_color='#0066ff', fill_opacity=0.05).add_to(map_phila)
map_phila

## Results and Discussion

My analysis shows that there are pockets of low restaurant density fairly close to city center. Highest concentration of restaurants was detected south from City Hall, so we focused our attention to areas northeast, corresponding to GIRARDI, YORKTOWN and CAMBRIDGE PLAZA. These addresses should be considered only as a starting point for exploring area neighborhoods in search for potential restaurant locations. Most of the zones are located in Girardi/Yorktown/Poplar, which we have identified as interesting due to being popular with university students, home owners and to being fairly close to city center and well connected by public transport.

Those location candidates were then clustered to create 5 zones of interest which contain greatest number of location candidates. The conditions initially stated in the problem were imposed: **locations with no more than two restaurants in radius of 250 meters**, and **no Japanese restaurants in radius of 1000 meters**. 
Addresses of centers of those zones were also generated using reverse geocoding to be used as markers/starting points for more detailed local analysis based on other factors.

Also good to point out that this was a simple analysis and should be considered a starting point for further development of a location search for a japanese restaurant.



## Conclusion
This project objective is to identify Philadelphia areas close to center with low number of restaurants (particularly Japanese restaurants) in order to aid stakeholders in narrowing down the search for optimal location for a new Japanese restaurant, taking as a guide and reference the Coursera example from Berlin. 

Optimal restaurant location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location, levels of noise / proximity to major roads, real estate availability, prices, social and economic dynamics of every neighborhood etc.

