# Capstone Project - Finding a Suitable Location for a Restaurant (Week 2)
### Applied Data Science Capstone by Tim Strebel

In [1]:
import numpy as np
import pandas as pd 
# increase column and row display
pd.set_option('display.max_columns', 100)
pd.set_option('display.max_rows', 1000)

import json

# !conda install -c conda-forge geopy --yes  # uncomment this line to add to environment
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests
from pandas.io.json import json_normalize

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line to add to environment
import folium # map rendering library

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Methodology](#methodology)
* [Data](#data)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project, I'm going to apply the similar Data Science methodologies used by the IBM team to find an optimal location for an Asian fusion restaurant in my home town Salt Lake City, Utah. This report will be targeted toward stakeholders interested in opening an Asian fusion or any other type of Asian restaurant in Salt Lake City. 

This charming city is home to nearly 201,000 citizens and contains many wonderous attractions that draws tourists from all over the world. Temple Square which is located near city center, is the most popular tourist attraction. Containing the world's first Mormon temple, it contains stunning architecture, accomodating walkways for foot traffic and is an ideal area for restaurants to feed people famished from a days worth of site-seeing. I chose Asian fusion as the theme for the restaurant because it is my favorite food and growing up in Salt Lake City, I wished there were more of these restaurants located downtown.

I am going to use the data science tool and methods learned from this IBM Data Science course to look for suitable candidate locations for an Asian fusion restaurant. The idea location is one closest to city center with the fewest number of restaurants to compete with, especially other Asian restauraunts which would be direct competition.

## Methodology <a name="methodology"></a>

### Step One
Collect location data on Salt Lake City. Find the city center and a set of candidate locations spaced equally appart withing a 6 Kilometer radius of city Center. Identify all restaraunts surrounding each candidate locations and Asian restaurants.

### Step Two
Explore restaurant density of each candidate area of Salt Lake City. Use heatmaps to identify promising areas close to cetner with a low number of restaraunts and with no Asian restaurants in the vicinity.

### Step Three
Once we find promising areas, I will use k-means clutering to find similar areas with a low density of restaurants; especially Asian restaurants.

## Data <a name="data"></a>

Based on definition of my problem, factors that will influence my decision are:
* number of existing restaurants in the candidate area of any type.
* number of and distance to any other Asian restaurants
* distance of candidate area from city center

I decided to use equally spaced grid of locations, centered around city center, to define my candidate areas.

Following data sources will be needed to extract/generate the required information:
* centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **Nominatim Geocoding**
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of Salt Lake City Center will be obtained using zip code location information from **AmericanTowns.com**

### Area Candidates

I'm going to extrapolate latitude & longitude coordinates for centroids of our candidate areas. We will create a grid of cells covering our area of interest which is aprox. 6x6 killometers centered around Salt Lake City center.

I'm going to first find the latitude & longitude closest to city center, using a simple Pandas function to scrape all of the zip codes and lat/long locations from the **AmericanTowns.com**.

In [2]:
slc_df = (pd.read_html(r'https://www.americantowns.com/salt-lake-city-ut/zip-code/')[0]
           .drop_duplicates()
           .reset_index(drop=True))

slc_df.head()

Unnamed: 0,Zip,Area,Lat,Lon,Zone,UTC,DST,State FIPS Code,County FIPS Code,MSA Code,City,County,State
0,84101,801,40.756416,-111.89907,Mountain,-7,Y,49,49035,7160,Salt Lake City,Salt Lake,UT
1,84102,801,40.758805,-111.865417,Mountain,-7,Y,49,49035,7160,Salt Lake City,Salt Lake,UT
2,84103,801,40.783965,-111.876047,Mountain,-7,Y,49,49035,7160,Salt Lake City,Salt Lake,UT
3,84104,801,40.74985,-111.934638,Mountain,-7,Y,49,49035,7160,Salt Lake City,Salt Lake,UT
4,84105,801,40.734214,-111.856179,Mountain,-7,Y,49,49035,7160,Salt Lake City,Salt Lake,UT


Now that we have our zip codes and lat/long combinations, let plot these using **folium** to see which is closest to city center.

In [3]:
latitude = slc_df.Lat.mean()
longitude = slc_df.Lon.mean()
lanc_map = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, city, zip_code in zip(slc_df['Lat'], slc_df['Lon'], slc_df['City'], slc_df['Zip']):
    label = '{}, {}'.format(city, zip_code)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(lanc_map)  
    
lanc_map

It looks like **84133** is located in dead center of Temple Square. Our next task is to create equally spaced area candidates around city center in a 6x6 Kilometer grid. in order to do so, we will need functions to convert lat-long coordinates to/from UTM.

In [10]:
import shapely.geometry
import pyproj
import math

city_center_postal_code = 84133
slc_city_center = slc_df[slc_df.Zip == city_center_postal_code]
slc_center_lat = slc_city_center.Lat.iloc[0]
slc_center_lng = slc_city_center.Lon.iloc[0]

def latlng_to_xy(lat, lng):
    transformer = pyproj.Transformer.from_crs('EPSG:4326', 'EPSG:26918')
    xy = transformer.transform(lat, lng)
    return xy[0], xy[1]

def xy_to_latlng(x, y):
    transformer = pyproj.Transformer.from_crs('EPSG:26918', 'EPSG:4326')
    latlng = transformer.transform(x, y)
    return latlng[0], latlng[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Salt Lake City center latitude={}, longitude={}'.format(slc_center_lat, slc_center_lng))
x, y = latlng_to_xy(slc_center_lat, slc_center_lng)
print('Salt Lake City city center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_latlng(x, y)
print('Salt Lake City city center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Salt Lake City center latitude=40.770852000000005, longitude=-111.892118
Salt Lake City city center UTM X=-2632119.1545131044, Y=5225098.766082728
LSalt Lake City city center longitude=40.77085200000001, latitude=-111.892118


I will use the functions to create a **6x6 kilometer hexagonal grid** of areas spaced 300 meters apart for our area candidates. While obtaining my candidate areas I will also extrapolate the distance from city center.

In [22]:
slc_center_x, slc_center_y = latlng_to_xy(slc_center_lat, slc_center_lng) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = slc_center_x - 3000
x_step = 600
y_min = slc_center_y - 3000 - (int(21/k)*k*600 - 6000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(slc_center_x, slc_center_y, x, y)
        if (distance_from_center <= 3001):
            lat, lng = xy_to_latlng(x, y)
            latitudes.append(lat)
            longitudes.append(lng)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'area candidates generated.')

92 area candidates generated.


I will now use **Nominatim** to obtain the closes address to each candidate area i.e. reverse geocode

In [18]:
import os
import pickle

if os.path.isfile('locations.pkl'):
    print('.pkl file exists; load from file')
    with open('locations.pkl', 'rb') as fh:
        df_locations = pickle.load(fh)
else:
    print('.pkl file does not exist; creating DataFrame and getting address\nvia reverse geocode.')
    df_locations = (pd.DataFrame({ 'distance_from_center':distances_from_center, 
                    'latitude':latitudes, 
                    'longitude':longitudes, 
                    'x_coord':xs, 
                    'y_coord':ys })
        .sort_values('distance_from_center')
        .reset_index(drop=True))
    locator = Nominatim(user_agent='MyGeocoder')

    def get_address(row):
        coordinates = row.iloc[1], row.iloc[2]
        location = locator.reverse(coordinates)
        return location.raw['display_name']

    df_locations['address'] = df_locations.apply(get_address, axis=1)
    
    df_locations.to_pickle('locations.pkl')

df_locations.head()

.pkl file does not exist; creating DataFrame and getting address
via reverse geocode.


Unnamed: 0,distance_from_center,latitude,longitude,x_coord,y_coord,address
0,300.0,40.769792,-111.89496,-2632419.0,5225099.0,"Plaza Hotel at Temple Square, South Temple, Sa..."
1,300.0,40.771912,-111.889276,-2631819.0,5225099.0,"North Office Building, North Temple, Salt Lake..."
2,519.615242,40.76711,-111.889703,-2632119.0,5224579.0,"City Creek South, 100 South, Salt Lake City, S..."
3,519.615242,40.774594,-111.894533,-2632119.0,5225618.0,"256, West Temple, Marmalade District, Salt Lak..."
4,793.725393,40.76499,-111.895386,-2632719.0,5224579.0,"200 S / 148 W, 200 South, Salt Lake City, Salt..."


Lets visualize our candidate areas using folium **(hover mouse over area candidate to see the address)**.

In [19]:
# create map centered on city
slc_map = folium.Map(location=[slc_center_lat, slc_center_lng], zoom_start=12)

# add markers to map
label = 'City center: {}, {}'.format(slc_center_lat, slc_center_lng)
label = folium.Popup(label, parse_html=True)
folium.CircleMarker(
    (slc_center_lat, slc_center_lng),
    radius=5,
    popup=label,
    color='purple',
    fill=True,
    fill_color='#3186cc',
    fill_opacity=0.7,
    parse_html=False).add_to(slc_map)  
    
for lat, lng, address in zip(df_locations.latitude, df_locations.longitude, df_locations.address):
    label = folium.Popup(address, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(slc_map)  
    
slc_map

### Foursquare
Now that I have my area candidates, I will use Foursquare API to get restaurant information for each area.

I'm interested in venues in 'food' category, but only those that are proper restaurants - coffe shops, pizza places, bakeries etc. are not direct competitors so I don't care about those. I will include in my list, only venues that have 'restaurant' in category name, and make sure to detect and include all the subcategories of specific 'Asian restaurant' category, as I need info on Asian restaurants in the neighborhood.

In [30]:
CLIENT_ID = 'SWUHHLHAEUP4LYMZHYUGCWRYLMFM5V5NHJZULZ15INHZQ1DB' # your Foursquare ID
CLIENT_SECRET = 'TRJ101HWJCTHUHWF4KCEFLU2RKOOWFRY03N5CJKCX5CQZKOG' # your Foursquare Secret
VERSION = '20180604'

print('Your credentails: ')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET: ' + CLIENT_SECRET)

Your credentails: 
CLIENT_ID: SWUHHLHAEUP4LYMZHYUGCWRYLMFM5V5NHJZULZ15INHZQ1DB
CLIENT_SECRET: TRJ101HWJCTHUHWF4KCEFLU2RKOOWFRY03N5CJKCX5CQZKOG


In [31]:
# Category IDs corresponding to Asian restaurants were taken from Foursquare web site (https://developer.foursquare.com/docs/resources/categories):

food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

asian_restaurant_categories = ['4bf58dd8d48988d142941735', '56aa371be4b08b9a8d573568', '52e81612bcbc57f1066b7a03', 
                                '4bf58dd8d48988d145941735', '52af3a5e3cf9994f4e043bea', '52af3a723cf9994f4e043bec',
                                '52af3a7c3cf9994f4e043bed', '58daa1558bbb0b01f18ec1d3', '52af3a673cf9994f4e043beb',
                                '52af3a903cf9994f4e043bee', '4bf58dd8d48988d1f5931735', '52af3a9f3cf9994f4e043bef',
                                '52af3aaa3cf9994f4e043bf0', '52af3ab53cf9994f4e043bf1', '52af3abe3cf9994f4e043bf2',
                                '52af3ac83cf9994f4e043bf3', '52af3ad23cf9994f4e043bf4', '52af3add3cf9994f4e043bf5',
                                '52af3af23cf9994f4e043bf7', '52af3ae63cf9994f4e043bf6', '52af3afc3cf9994f4e043bf8',
                                '52af3b053cf9994f4e043bf9', '52af3b213cf9994f4e043bfa', '52af3b293cf9994f4e043bfb',
                                '52af3b343cf9994f4e043bfc', '52af3b3b3cf9994f4e043bfd', '52af3b463cf9994f4e043bfe',
                                '52af3b633cf9994f4e043c01', '52af3b513cf9994f4e043bff', '52af3b593cf9994f4e043c00',
                                '52af3b6e3cf9994f4e043c02', '52af3b773cf9994f4e043c03', '52af3b813cf9994f4e043c04',
                                '52af3b893cf9994f4e043c05', '52af3b913cf9994f4e043c06', '52af3b9a3cf9994f4e043c07',
                                '52af3ba23cf9994f4e043c08', '4eb1bd1c3b7b55596b4a748f', '52e81612bcbc57f1066b79fb',
                                '52af0bd33cf9994f4e043bdd', '4deefc054765f83613cdba6f', '52960eda3cf9994f4e043ac9',
                                '52960eda3cf9994f4e043acb', '52960eda3cf9994f4e043aca', '52960eda3cf9994f4e043acc',
                                '52960eda3cf9994f4e043ac7', '52960eda3cf9994f4e043ac8', '52960eda3cf9994f4e043ac5',
                                '52960eda3cf9994f4e043ac6', '4bf58dd8d48988d111941735', '55a59bace4b013909087cb0c',
                                '55a59bace4b013909087cb30', '55a59bace4b013909087cb21', '55a59bace4b013909087cb06',
                                '55a59bace4b013909087cb1b', '55a59bace4b013909087cb1e', '55a59bace4b013909087cb18',
                                '55a59bace4b013909087cb24', '55a59bace4b013909087cb15', '55a59bace4b013909087cb27',
                                '55a59bace4b013909087cb12', '4bf58dd8d48988d1d2941735', '55a59bace4b013909087cb2d',
                                '55a59a31e4b013909087cb00', '55a59af1e4b013909087cb03', '55a59bace4b013909087cb2a',
                                '55a59bace4b013909087cb0f', '55a59bace4b013909087cb33', '55a59bace4b013909087cb09',
                                '55a59bace4b013909087cb36', '4bf58dd8d48988d113941735', '56aa371be4b08b9a8d5734e4',
                                '56aa371be4b08b9a8d5734f0', '56aa371be4b08b9a8d5734e7', '56aa371be4b08b9a8d5734ed',
                                '56aa371be4b08b9a8d5734ea', '4bf58dd8d48988d156941735', '5ae9595eb77c77002c2f9f26',
                                '4eb1d5724b900d56c88a45fe', '4bf58dd8d48988d1d1941735', '56aa371be4b08b9a8d57350e',
                                '4bf58dd8d48988d149941735', '56aa371be4b08b9a8d573502', '52af39fb3cf9994f4e043be9',
                                '4bf58dd8d48988d14a941735']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   item['venue']['location'],
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [32]:
import pickle

def get_restaurants(lats, lons):
    restaurants = {}
    asian_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, CLIENT_ID, CLIENT_SECRET, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlng = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_asian = is_restaurant(venue_categories, specific_filter=asian_restaurant_categories)
            if is_res:
                x, y = latlng_to_xy(venue_latlng[0], venue_latlng[1])
                restaurant = (venue_id, venue_name, venue_latlng[0], venue_latlng[1], venue_address, venue_distance, is_asian, x, y)
                if venue_distance <= 300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_asian:
                    asian_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, asian_restaurants, location_restaurants

# Try to load from local file system in case we did this before
restaurants = {}
asian_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_350.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('asian_restaurants_350.pkl', 'rb') as f:
        asian_restaurants = pickle.load(f)
    with open('location_restaurants_350.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    restaurants, asian_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
    # Let's persists this in local file system
    with open('restaurants_350.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('asian_restaurants_350.pkl', 'wb') as f:
        pickle.dump(asian_restaurants, f)
    with open('location_restaurants_350.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)

Obtaining venues around candidate locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . done.


In [33]:
import numpy as np

print('Total number of restaurants:', len(restaurants))
print('Total number of Asian restaurants:', len(asian_restaurants))
print('Percentage of Asian restaurants: {:.2f}%'.format(len(asian_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in area:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 199
Total number of Asian restaurants: 51
Percentage of Asian restaurants: 25.63%
Average number of restaurants in neighborhood: 2.3152173913043477


In [34]:
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(restaurants))

List of all restaurants
-----------------------
('4b43ed18f964a5202eee25e3', 'La-Cai Noodle House', 40.74820650220776, -111.88797871944917, {'address': '961 S State St', 'lat': 40.74820650220776, 'lng': -111.88797871944917, 'labeledLatLngs': [{'label': 'display', 'lat': 40.74820650220776, 'lng': -111.88797871944917}, {'label': 'entrance', 'lat': 40.748185, 'lng': -111.887966}], 'distance': 277, 'postalCode': '84111', 'cc': 'US', 'city': 'Salt Lake City', 'state': 'UT', 'country': 'United States', 'formattedAddress': ['961 S State St', 'Salt Lake City, UT 84111', 'United States']}, 277, True, -2633010.1897225375, 5222391.224132606)
('54b999ba498e7edbd52267e6', 'Thai Chilli', 40.75032887786386, -111.88887787247548, {'address': '872 S State St', 'lat': 40.75032887786386, 'lng': -111.88887787247548, 'labeledLatLngs': [{'label': 'display', 'lat': 40.75032887786386, 'lng': -111.88887787247548}], 'distance': 318, 'postalCode': '84111', 'cc': 'US', 'neighborhood': 'Downtown Salt Lake City', 'c

In [35]:
print('List of Asian restaurants')
print('---------------------------')
for r in list(asian_restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(asian_restaurants))

List of Asian restaurants
---------------------------
('4b43ed18f964a5202eee25e3', 'La-Cai Noodle House', 40.74820650220776, -111.88797871944917, {'address': '961 S State St', 'lat': 40.74820650220776, 'lng': -111.88797871944917, 'labeledLatLngs': [{'label': 'display', 'lat': 40.74820650220776, 'lng': -111.88797871944917}, {'label': 'entrance', 'lat': 40.748185, 'lng': -111.887966}], 'distance': 277, 'postalCode': '84111', 'cc': 'US', 'city': 'Salt Lake City', 'state': 'UT', 'country': 'United States', 'formattedAddress': ['961 S State St', 'Salt Lake City, UT 84111', 'United States']}, 277, True, -2633010.1897225375, 5222391.224132606)
('54b999ba498e7edbd52267e6', 'Thai Chilli', 40.75032887786386, -111.88887787247548, {'address': '872 S State St', 'lat': 40.75032887786386, 'lng': -111.88887787247548, 'labeledLatLngs': [{'label': 'display', 'lat': 40.75032887786386, 'lng': -111.88887787247548}], 'distance': 318, 'postalCode': '84111', 'cc': 'US', 'neighborhood': 'Downtown Salt Lake Cit

In [40]:
len(location_restaurants)

92

In [41]:
print('Restaurants around location')
print('---------------------------')
for i in range(80, 90):
    rs = location_restaurants[i][:8]
    names = ', '.join([r[1] for r in rs])
    print('Restaurants around location {}: {}'.format(i+1, names))

Restaurants around location
---------------------------
Restaurants around location 81: 
Restaurants around location 82: 
Restaurants around location 83: 
Restaurants around location 84: 
Restaurants around location 85: 
Restaurants around location 86: 
Restaurants around location 87: 
Restaurants around location 88: 
Restaurants around location 89: Market Street Grill
Restaurants around location 90: 


In [42]:
restaurants.values()

dict_values([('4b43ed18f964a5202eee25e3', 'La-Cai Noodle House', 40.74820650220776, -111.88797871944917, {'address': '961 S State St', 'lat': 40.74820650220776, 'lng': -111.88797871944917, 'labeledLatLngs': [{'label': 'display', 'lat': 40.74820650220776, 'lng': -111.88797871944917}, {'label': 'entrance', 'lat': 40.748185, 'lng': -111.887966}], 'distance': 277, 'postalCode': '84111', 'cc': 'US', 'city': 'Salt Lake City', 'state': 'UT', 'country': 'United States', 'formattedAddress': ['961 S State St', 'Salt Lake City, UT 84111', 'United States']}, 277, True, -2633010.1897225375, 5222391.224132606), ('54b999ba498e7edbd52267e6', 'Thai Chilli', 40.75032887786386, -111.88887787247548, {'address': '872 S State St', 'lat': 40.75032887786386, 'lng': -111.88887787247548, 'labeledLatLngs': [{'label': 'display', 'lat': 40.75032887786386, 'lng': -111.88887787247548}], 'distance': 318, 'postalCode': '84111', 'cc': 'US', 'neighborhood': 'Downtown Salt Lake City', 'city': 'Salt Lake City', 'state': '

Below is a map of all the restaurants located within our 6x6 kilometer grid. Asian restaurants are located in red.

In [44]:
slc_map = folium.Map(location=(slc_center_lat, slc_center_lng), zoom_start=12)
folium.Marker((slc_center_lat, slc_center_lng), popup='slcaster').add_to(slc_map)
for res in restaurants.values():
    name = res[1]
    address = res[4]['address'] if 'address' in res[4] else 'Address not listed'
    popup = folium.Popup('{}\n{}'.format(name, address), parse_html=True)
    lat = res[2]; lng = res[3]
    is_asian = res[6]
    color = 'red' if is_asian else 'blue'
    folium.CircleMarker([lat, lng], popup=popup, radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(slc_map)
slc_map

Now I have all of my restaurants within my 6x6 kilometer grid and asian restaurants marked in red, I can begin analyzing all of my candidate areas based on my criteria to find the best areas to open an asian fusion restaurant.

## Analysis <a name="analysis"></a>

Now that I have my data I'm going to perform some basic analysis to better get to know and understand my candidate areas.

In [50]:
location_restaurants_count = [len(res) for res in location_restaurants]

df_locations['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=300m:', np.array(location_restaurants_count).mean())

df_locations.head(10)

Average number of restaurants in every area with radius=300m: 2.3152173913043477


Unnamed: 0,distance_from_center,latitude,longitude,x_coord,y_coord,address,Restaurants in area
0,300.0,40.769792,-111.89496,-2632419.0,5225099.0,"Plaza Hotel at Temple Square, South Temple, Sa...",2
1,300.0,40.771912,-111.889276,-2631819.0,5225099.0,"North Office Building, North Temple, Salt Lake...",5
2,519.615242,40.76711,-111.889703,-2632119.0,5224579.0,"City Creek South, 100 South, Salt Lake City, S...",5
3,519.615242,40.774594,-111.894533,-2632119.0,5225618.0,"256, West Temple, Marmalade District, Salt Lak...",1
4,793.725393,40.76499,-111.895386,-2632719.0,5224579.0,"200 S / 148 W, 200 South, Salt Lake City, Salt...",3
5,793.725393,40.76923,-111.88402,-2631519.0,5224579.0,"230, South Temple, Salt Lake City, Salt Lake C...",1
6,793.725393,40.776714,-111.88885,-2631519.0,5225618.0,"Marmalade District, Salt Lake City, Salt Lake ...",6
7,793.725393,40.772473,-111.900216,-2632719.0,5225618.0,"Skyhouse, North Temple, Salt Lake City, Salt L...",4
8,900.0,40.767672,-111.900642,-2633019.0,5225099.0,"Salt Lake City, Salt Lake County, Utah, United...",4
9,900.0,40.774032,-111.883592,-2631219.0,5225099.0,"203, 4th Avenue, Salt Lake City, Salt Lake Cou...",0


the next code block is to calculate the distance to the nearest asian restaurant from every area candidate center and not just those within 300 meters.

In [51]:
distances_to_asian_restaurant = []

for area_x, area_y in zip(xs, ys):
    min_distance = 10000
    for res in asian_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_asian_restaurant.append(min_distance)

df_locations['Distance to Asian restaurant'] = distances_to_asian_restaurant

In [53]:
df_locations.head()

Unnamed: 0,distance_from_center,latitude,longitude,x_coord,y_coord,address,Restaurants in area,Distance to Asian restaurant
0,300.0,40.769792,-111.89496,-2632419.0,5225099.0,"Plaza Hotel at Temple Square, South Temple, Sa...",2,148.879674
1,300.0,40.771912,-111.889276,-2631819.0,5225099.0,"North Office Building, North Temple, Salt Lake...",5,210.412301
2,519.615242,40.76711,-111.889703,-2632119.0,5224579.0,"City Creek South, 100 South, Salt Lake City, S...",5,333.550087
3,519.615242,40.774594,-111.894533,-2632119.0,5225618.0,"256, West Temple, Marmalade District, Salt Lak...",1,458.066032
4,793.725393,40.76499,-111.895386,-2632719.0,5224579.0,"200 S / 148 W, 200 South, Salt Lake City, Salt...",3,205.159264


In [52]:
print('Average distance to closest Asian restaurant from each area center: {} meters'.format(df_locations['Distance to Asian restaurant'].mean()))

Average distance to closest Asian restaurant from each area center: 723.075412180795 meters


723 meters is the average distance form each area to an asian restaurant we can use this information to further filter our candidate areas.

Next I will create a heat map so that we can visualize the density of restaurants in the surrounding area.

In [66]:
def map_restaurants(restaurant_latlongs):
    from folium import plugins
    from folium.plugins import HeatMap

    map_slc = folium.Map(location=(slc_center_lat, slc_center_lng), zoom_start=13)
    folium.TileLayer('cartodbpositron').add_to(map_slc)
    HeatMap(restaurant_latlongs).add_to(map_slc)
    folium.Marker((slc_center_lat, slc_center_lng)).add_to(map_slc)
    folium.Circle((slc_center_lat, slc_center_lng), radius=1000, fill=False, color='white').add_to(map_slc)
    folium.Circle((slc_center_lat, slc_center_lng), radius=2000, fill=False, color='white').add_to(map_slc)
    folium.Circle((slc_center_lat, slc_center_lng), radius=3000, fill=False, color='white').add_to(map_slc)
    return map_slc

restaurant_latlongs = [[res[2], res[3]] for res in restaurants.values()]

map_restaurants(restaurant_latlongs)

It looks like the highest concentration of restaurants is just south of city center. We'll want to avoid this area and perhaps look to build either to the north or southeast or southwest of city center.

Lets now look at a heatmap of Asian restaurants to further narrow our scope of candidate areas.

In [67]:
asian_restaurant_latlongs = [[res[2], res[3]] for res in asian_restaurants.values()]
map_restaurants(asian_restaurant_latlongs)

From this map you can see concentrations of Asian restaurants to the immediate south and southwest of city center. It looks like to the north and soutwest could be some pretty promissing locations.

### The Avenues
The reason there aren't many restaurants nort of city center is due to a part of Salt Lake City called the Avenues. This portion is heavily saturated with domicillary real estate. SaltLakeCity.com/neighborhoods/the-avenues cites this area of town as highly attractive, unique, and contains access to the greatest attractions in Salt Lake, it would be incredibly difficult to buy or rent needed real estate to open a restaurant, making this area highly impractical.

### The South-east Side
The east side of town as opposed to the avenues is less densely populated and the bulwark of businesses are either not restaurants or are coffee shops which would be a limit the amount of competition for a potential restaurant. This would be an ideal and more practival place to build a restaurant.

### The South-west Side
https://www.neighborhoodscout.com/ut/salt-lake-city/crime shows an increase of crime on the west side of Salt Lake City. Obtaining real estate in this area would probably be easiest but the crime rates make the business prospect less desireable.

based on this information, I am going to narrow my scope of candidate locations to the south-east side of Salt lake.

In [81]:
east_side_restaurants = [(res[2], res[3]) for res in restaurants.values() if res[2] <= slc_center_lat and res[3] >= slc_center_lng]

map_restaurants(east_side_restaurants)

Finally, lets look at the density of Asian restaurants on the east side.

In [82]:
east_side_asian_restaurants = [(res[2], res[3]) for res in asian_restaurants.values() if res[2] <= slc_center_lat and res[3] >= slc_center_lng]

map_restaurants(east_side_asian_restaurants)

Not bad, lets put these data into a Data Frame

In [106]:
df_roi_locations = df_locations[(df_locations.latitude <= slc_center_lat) & (df_locations.longitude >= slc_center_lng)]

df_roi_locations.head(10)

Unnamed: 0,distance_from_center,latitude,longitude,x_coord,y_coord,address,Restaurants in area,Distance to Asian restaurant
2,519.615242,40.76711,-111.889703,-2632119.0,5224579.0,"City Creek South, 100 South, Salt Lake City, S...",5,333.550087
5,793.725393,40.76923,-111.88402,-2631519.0,5224579.0,"230, South Temple, Salt Lake City, Salt Lake C...",1,610.563503
12,1081.665383,40.764428,-111.884447,-2631819.0,5224060.0,"Bell Plaza, 250, 200 South, Salt Lake City, Sa...",0,850.836246
13,1081.665383,40.762308,-111.89013,-2632419.0,5224060.0,"Judge Building, 300 South, Salt Lake City, Sal...",2,940.883364
18,1374.772708,40.766547,-111.878764,-2631219.0,5224060.0,"440, 100 South, Salt Lake City, Salt Lake Coun...",9,248.730571
25,1558.845727,40.759626,-111.884875,-2632119.0,5223540.0,"Salt Lake City Public Library, 210, 400 South,...",4,100.99323
26,1670.329309,40.761745,-111.879192,-2631519.0,5223540.0,"345, 400 East, Central City / Liberty-Wells, S...",31,51.974963
28,1670.329309,40.757507,-111.890558,-2632719.0,5223540.0,"Garden Cafe, Main Street, Central Ninth, Salt ...",6,140.40221
31,1824.828759,40.768666,-111.87308,-2630619.0,5224060.0,"630, South Temple, The Avenues, Salt Lake City...",0,847.416968
40,1967.231557,40.763864,-111.873508,-2630919.0,5223540.0,"243, 600 East, Central City, Salt Lake City, S...",0,819.841151


Lets see if we can filter a little more. We'll look at locations with no more than two restauraunts in the radius of 250 meature and no Asian restauraunts within a radius of 400 meters

In [108]:
good_res_count = np.array((df_roi_locations['Restaurants in area']<=2))
print('Locations with no more than two restaurants nearby:', good_res_count.sum())

good_ita_distance = np.array(df_roi_locations['Distance to Asian restaurant']>=400)
print('Locations with no Italian restaurants within 400m:', good_ita_distance.sum())

good_locations = np.logical_and(good_res_count, good_ita_distance)
print('Locations with both conditions met:', good_locations.sum())

df_good_locations = df_roi_locations[good_locations]

Locations with no more than two restaurants nearby: 17
Locations with no Italian restaurants within 400m: 17
Locations with both conditions met: 15


Lets map this out!

In [115]:
good_latitudes = df_good_locations['latitude'].values
good_longitudes = df_good_locations['longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_slc = folium.Map(location=(slc_center_lat, slc_center_lng), zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_slc)
HeatMap(east_side_restaurants).add_to(map_slc)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.6).add_to(map_slc)
folium.Marker((slc_center_lat, slc_center_lng)).add_to(map_slc)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_slc) 
map_slc


Looks good, our analysis is coming together, plenty of viable candidate areas for a new restaurant. lets plot the good locations in the form of a heat map.

In [117]:
map_slc = folium.Map(location=(slc_center_lat, slc_center_lng), zoom_start=14)
HeatMap(good_locations, radius=25).add_to(map_slc)
folium.Marker((slc_center_lat, slc_center_lng)).add_to(map_slc)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_slc)
    
map_slc

looks great, now we have a clear indication of where are viable zones are.

Lets cluster these centers to find centers of zones with good locations.

In [165]:
from sklearn.cluster import KMeans

number_of_clusters = 5

good_xys = df_good_locations[['x_coord', 'y_coord']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_latlng(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

map_slc = folium.Map(location=(slc_center_lat, slc_center_lng), zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_slc)
HeatMap(east_side_restaurants).add_to(map_slc)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_slc)
folium.Marker((slc_center_lat, slc_center_lng)).add_to(map_slc)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.25).add_to(map_slc) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_slc)
    
map_slc

The clustered represent the center of zones richest in number of candidate areas. Lets make this less obscure by viewing these locations on a city map vs a heat map.

In [166]:
map_slc = folium.Map(location=(slc_center_lat, slc_center_lng), zoom_start=14)
folium.Marker((slc_center_lat, slc_center_lng)).add_to(map_slc)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#00000000', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_slc)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_slc)
for lat, lon in cluster_centers:
    folium.CircleMarker([lat, lon], radius=75, color='#33cc33', fill=False).add_to(map_slc) 
map_slc

Lets reverse geocode our cluster centers to obtain addresses for these areas.

In [None]:
    locator = Nominatim(user_agent='MyGeocoder')

    def get_address(row):
        coordinates = row.iloc[1], row.iloc[2]
        location = locator.reverse(coordinates)
        return location.raw['display_name']

    df_locations['address'] = df_locations.apply(get_address, axis=1)

In [169]:
candidate_area_addresses = []
print('==============================================================')
print('Addresses of centers of areas recommended for further analysis')
print('==============================================================\n')
for lat, lon in cluster_centers:
    locator = Nominatim(user_agent='MyGeocoder')
    location = locator.reverse((lat, lon))
    addr = location.raw['display_name']
    candidate_area_addresses.append(addr)    
    x, y = latlng_to_xy(lat, lon)
    d = calc_xy_distance(x, y, slc_center_x, slc_center_y)
    print('{}{} => {:.1f}km from Salt Lake City center'.format(addr, ' '*(50-len(addr)), d/1000))

Addresses of centers of areas recommended for further analysis

155, 200 South, Salt Lake City, Salt Lake County, Utah, 84111, United States => 2631.9km from Salt Lake City center
354, 800 East, 9th & 9th, Salt Lake City, Salt Lake County, Utah, 84102, United States => 2630.6km from Salt Lake City center
The Spot, Main Street, Central Ninth, Salt Lake City, Salt Lake County, Utah, 84111, United States => 2633.2km from Salt Lake City center
668, 400 East, Central City / Liberty-Wells, Salt Lake City, Salt Lake County, Utah, 84111, United States => 2632.0km from Salt Lake City center
37, 800 East, 9th & 9th, Salt Lake City, Salt Lake County, Utah, 84102, United States => 2630.2km from Salt Lake City center


## Results and Discussion <a name="results"></a>

Most of the restaurants in Salt Lake City are located approximately ~1000 meters south of city center. The reason many restaurants aren't located to the north is due to the location of many landmark sites and a high density of residential real estate. Due west of city center, there is a higher rate of crime hence a lower rate of tourism and foot traffic. Located south east of city center there are plenty of locations with a lower-density of restaurants that are far away from other potential competing Asian restaurants making it a most suitable area for an Asian fusion restaurant.

Narrowing in on the south-east side of the ciy, we clustered the candidate locations to create zones of interest which contain the greatest number of location candidates. We extrapolated addresses of the center of those zones using reverse-geocoding so that we had clear reference points to these desireable locations.

The final result was five zone containing a large number of great candidate locations for an Asian fusion restaurant. This analysis is not final because there are likely other factors that will play into locating an optimal location; however, this is great information to narrow interested stake holders scope in considering locations in which the restaurant will have low competition hence a lower barrier to entry into the local economy. It is entirely possible that there is a reason that these areas have a low number of restaurants and those reasons may make it unsuitable for a new restaurant making our analysis but a starting point for a larger, more detailed analysis.

## Conclusion <a name="conclusion"></a>

The purpose for this project was to identify desireable candidate areas in Salt Lake City Utah that could potentially be suitable locations to start a new Asian fusion restaurant. The scope of the analysis included areas that contain a low density of restaurants distant from other Asian restaurants. Limitations of my analysis are as follows: limited real estate and property avaliability information, limited population density information and while the're was some mention of crime statistics, further analysis should be conducted for in case stakeholders are interested in containing detailed statistics.

Due to the limitations of this analysis, it should not be used to make a final decision. It is recommended that stakeholders only use this as a starting point to narrow the search for a viable location. It also goes without saying that claims that  excluded candidate locations from the final analysis should be validated through further study to be sure that those areas should be ruled out.