In [1]:
# The code was removed by Watson Studio for sharing.

# Data Science Capstone - Battle of the Neighborhoods

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)

## Introduction: Business Problem <a name="introduction"></a>

Our client has asked us to analyze possible locations for them to open a <b>Thai Food Restaurant</b> in downtown <b>Dallas, TX</b>.<br>
We will identify locations by examining which have the lowest restaurant density, and focus our search around the center of the city.

## Data

Dallas is separated into several "Macro Neighborhoods" which span a very large area.  These are not very helpful for our purposes, so we will break Dallas into equal geometric sections instead.<br>
We will examine a map of all Thai food restuarants in these areas to eliminate any neighborhoods already saturated with this offering.<br>
We will then examine the remaining areas and rank them by the number of Asian Restuarants and Total Number of Restaurants.

In [3]:
!pip install folium

import requests
from bs4 import BeautifulSoup as Soup
import pandas as pd
import numpy as np
import folium
import json
import pickle

  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 8.5 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1


In [5]:
# David Blackman has a repository of GeoJson files on GitHub found here:
    # https://github.com/blackmad/neighborhoods

# Code attempts to pull this data from Git Hub.
# However, I have saved the Dallas GeoJson data as a dictionary to this notebook.
# If the url does not work or the file can't be found, code uses the back up data instead.

try:
    url = 'https://raw.githubusercontent.com/blackmad/neighborhoods/master/dallas.geojson'
    j = requests.get(url).json()
    print('GeoJSON Data Pulled Successfully')
except JSONDECODEERROR:
    print('ERROR! - Could not decode JSON file from the given URL.  Using the JSON data saved to this notebook instead.')
    x = json.dumps(GEO_JSON_DICT, separators=(',', ':'))
    j_alt = json.loads(x)
except MISSINGSCHEMA:
    x = json.dumps(GEO_JSON_DICT, separators=(',', ':'))
    j_alt = json.loads(x)
    print('ERROR! - Could not find the given URL.  Using the JSON data saved to this notebook instead.')




GeoJSON Data Pulled Successfully


Let's view the Macro Neighborhoods around Dallas.
We'll focus our search to 10 km around the city center.

In [6]:
dallas_center = [32.780154, -96.799074]
map_dallas = folium.Map(location=dallas_center, zoom_start=11)
t = folium.features.GeoJsonTooltip(fields=['name'],labels=False)
folium.GeoJson(j, name="geojson", tooltip=t).add_to(map_dallas)
folium.Marker(dallas_center, popup="Dallas City Center", tooltip="Dallas City Center").add_to(map_dallas)
folium.Circle(dallas_center, radius=10000, color='purple', fill=False).add_to(map_dallas)
map_dallas

We can see how wide an area the "Macro Neighborhoods" cover, and the circle where we will focus our search.
Now it's time to make our grid.

### Now it's time to organize  our radius into an equally spaced grid.
We'll use pyproj to convert from Latitude & Longitude Coordinates to Catesian XY coordinates, and vice versa.<br>
We need to use the proper <a href='https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system'> UTM Zone for Dallas</a> (14 North) when using XY Coordinates.<br>
The standard for latitude and longitude is <a href='https://gisgeography.com/wgs84-world-geodetic-system/'> WGS84 (World Geoetic System)</a>.<br><br>
We need to create a pyproj Transformer using EPSG Codes:
-    WGS84 Code: "EPSG:4326"
-    UTM 14 North Code: "EPSG:32714"


In [6]:
# !pip install shapely
import shapely.geometry

# !pip install pyproj
import pyproj

import math

def latlon_to_xy(lat, lon):
    transformer = pyproj.Transformer.from_crs("EPSG:4326", crs_utm)
    xy = transformer.transform(lat, lon)
    return xy[0], xy[1]

def xy_to_latlon(x, y):
    transformer = pyproj.Transformer.from_crs(crs_utm, "EPSG:4326")
    latlon = transformer.transform(x,y)
    return latlon[0], latlon[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

crs_utm = pyproj.CRS.from_string('+proj=utm +zone=14 +north').to_authority()
print("Let's Check Our Conversions Between WGS84 and UTM:\n")
print("Dallas City Center\n")
print("Latitude={}, Longitude={}".format(dallas_center[0], dallas_center[1]))
x, y = latlon_to_xy(dallas_center[0], dallas_center[1])
print("Converted to UTM: X={}, Y={}".format(x,y))
Lat, Lon = xy_to_latlon(x,y)
print("Converted Back to WGS84: Latitude={}, Longitude={}".format(Lat,Lon))


Let's Check Our Conversions Between WGS84 and UTM:

Dallas City Center

Latitude=32.780154, Longitude=-96.799074
Converted to UTM: X=706130.8586394618, Y=3629059.3883527005
Converted Back to WGS84: Latitude=32.780154, Longitude=-96.79907400000002


To create a grid, we need to offset alternating rows horizontally, and decrease the vertical spacing so each circle center is the same distance from each of it's neighbors.

In [8]:
# Convert City Center to Cartesian Coordinates
dallas_x, dallas_y = latlon_to_xy(dallas_center[0], dallas_center[1])


k =  math.sqrt(3) / 2  # Vertical offset
x_min = dallas_x - 5000
x_step = 500
y_min = dallas_y - 5000 + (int(21/k)*k*500 - 10000)/2
y_step = 500 * k 


latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 250 if i%2==0 else 0
    for h in range(0, 21):
        x = x_min + h * x_step + x_offset
        distance_from_center = calc_xy_distance(dallas_x, dallas_y, x, y)
        if (distance_from_center <= 5001):
            lat, lon = xy_to_latlon(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'possible locations generated.')

365 possible locations generated.


Now we can see what the grid looks like on our map.

In [11]:
map_dallas = folium.Map(location=dallas_center, zoom_start=11)
folium.GeoJson(j, name="geojson", tooltip=t).add_to(map_dallas)
folium.Marker(dallas_center, popup="Dallas City Center", tooltip="Dallas City Center").add_to(map_dallas)
folium.Circle(dallas_center, radius=5000, color='purple', fill=False).add_to(map_dallas)
for lat, lon in zip(latitudes, longitudes):
    folium.Circle([lat,lon], radius=250, color='purple', fill=False).add_to(map_dallas)
map_dallas

In [12]:
df_locations = pd.DataFrame({'Latitude': latitudes,
                            'Longitude': longitudes,
                            'X': xs,
                            'Y': ys,
                            'Distance from Center': distances_from_center})

# df_locations.head()
# project.save_data("locations.csv", df_locations.to_csv(index=False))

Now we have our collection of possible locations saved to a Pandas Dataframe.<br>
It's time to look at Foursquare Data to find restaurants in each location.

### Foursquare Data

The current <a href='https://developer.foursquare.com/docs/build-with-foursquare/categories/'> Foursquare API</a> has a generic code for all food venues.  This includes coffee shops, juice bars, and other locations we wouldn't consider to be direct competition for this restaurant.<br>
We've defined the specific code for Thai Food Restaurants, and the code for all Asian Cuisine Restaurants below.<br>
Then we've defined which types of Food Venues should be excluded from our list.

In [9]:
# @hiddencell

CLIENT_ID = 'KRI2Y14XVS2O3LKGMOAAH2YZVUQNMDQRZQCMO41HAOK4GL5P' # your Foursquare ID
CLIENT_SECRET = 'HG0Z2LD0KOOCA3CPPCPNVBBTF2LMRNNPVUNHGGF3FBLBRQ31' # your Foursquare Secret
ACCESS_TOKEN = 'SDYS0BUZ5L5XSG0AYNPMDUIYSWMQVMCM30YE2PBTXFDXRKZR' # your FourSquare Access Token  

In [10]:
food_code = '4d4b7105d754a06374d81259' #base code for all food venues
asian_code = '4bf58dd8d48988d142941735' #base code for all Asian Cuisine Venues
thai_code = '4bf58dd8d48988d149941735' #code for Thai Restaurant Venues

#Gather codes for food venues we won't consider as competition for a Thai Restaurant
ignore_dict = {'bagel_code' : '4bf58dd8d48988d179941735',
             'bakery_code' : '4bf58dd8d48988d16a941735',
             'breakfast_code' : '4bf58dd8d48988d143941735',
             'bubble_code' : '52e81612bcbc57f1066b7a0c',
             'cafeteria_code' : '4bf58dd8d48988d128941735',
             'coffee_code' : '4bf58dd8d48988d1e0931735',
             'dessert_code' : '4bf58dd8d48988d1d0941735',
             'donut_code' : '4bf58dd8d48988d148941735',
             'food_stand' : '56aa371be4b08b9a8d57350b',
             'juice_code' : '4bf58dd8d48988d112941735',
             'pet_cafe' : '56aa371be4b08b9a8d573508',
             'snack_code' : '4bf58dd8d48988d1c7941735',
             'tea_code' : '4bf58dd8d48988d1dc931735',
             'truck_stop' : '57558b36e4b065ecebd306dd'}
    


Below we define our functions.<br><br>
In addition to pulling the FourSquare Venues, we need to be able to:<br>
- Pull category data from each venue
- Identify which venues we'll consider to be restaurants, as well as which are Asian Cuisine, and which are specifically Thai Food.


In [13]:
def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def categorize(categories):
    thai = False
    asian = False
    competitor = True
    ignore = ignore_dict.values()
    for cat in categories:
        cat_id =  cat[1]
        if cat_id in ignore:
            competitor = False
        if cat_id == thai_code:
            thai = True
            asian = True
        elif cat_id == asian_code:
            asian = True
    return competitor, asian, thai 
                

def get_nearby_venues(lat, lon, query, client_id, client_secret, token, radius=500, limit=100):
    version = '20210101'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&oauth_token={}&query={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, token, query, radius, limit)
    try:
        response = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                  item['venue']['name'],
                  get_categories(item['venue']['categories']),
                  (item['venue']['location']['lat'], item['venue']['location']['lng']),
                  item['venue']['location']['formattedAddress'],
                  item['venue']['location']['distance']) for item in response]
    except:
        venues = []
    return venues
                  


In [14]:
def get_competition(lats, lons):
    competitors = {}
    asian_restaurants = {}
    thai_restaurants = {}
    location_competitors = []
    
    for lat, lon in zip(lats, lons):
        venues = get_nearby_venues(lat, lon, 'restaurant', CLIENT_ID, CLIENT_SECRET, ACCESS_TOKEN, 300, 100)
        area_competitors = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            competitor, asian, thai = categorize(venue_categories)
            if competitor:
                x, y = latlon_to_xy(venue_latlon[0], venue_latlon[1])
                comp = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, asian, thai, x, y)
                if venue_distance<=250:
                    area_competitors.append(comp)
                competitors[venue_id] = comp
                if asian:
                    asian_restaurants[venue_id] = comp
                if thai:
                    thai_restaurants[venue_id] = comp
        location_competitors.append(area_competitors)
    return competitors, asian_restaurants, thai_restaurants, location_competitors

competitors = {}
asian_restaurants = {}
thai_restaurants = {}
location_competitors = []

loaded = False
try:
    with open('competitors.pkl', 'rb') as f:
        competitors = pickle.load(f)
    with open('asian_restaurants.pkl', 'rb') as f:
        asian_restaurants = pickle.load(f)
    with open('thai_restaurants.pkl', 'rb') as f:
        thai_restaurants = pickle.load(f)
    with open('location_competitors.pk1', 'rb') as f:
        location_competitors = pickle.load(f)
    print('Competitor data loaded.')
    loaded = True
except:
    pass

#We have saved the data to this Notebook.  If accessing the data does not work, we call the FourSquare API again and save the data that is returned.
if not loaded:
    competitors, asian_restaurants, thai_restaurants, location_competitors = get_competition(latitudes, longitudes)

    with open('competitors.pkl', 'wb') as f:
        pickle.dump(competitors, f)
    with open('asian_restaurants.pkl', 'wb') as f:
        pickle.dump(asian_restaurants, f)
    with open('thai_restaurants.pkl', 'wb') as f:
        pickle.dump(thai_restaurants, f)
    with open('location_competitors.pk1', 'wb') as f:
        pickle.dump(location_competitors, f)

print('Total Number of Competitors: ', len(competitors))
print('Total Number of Asian Restaurants: ', len(asian_restaurants))
print('Total Number of Thai Restaurants: ', len(thai_restaurants))
print('Average Number of Competitors per Neighborhood: ', np.array([len(c) for c in location_competitors]).mean())



Competitor data loaded.
Total Number of Competitors:  1480
Total Number of Asian Restaurants:  37
Total Number of Thai Restaurants:  16
Average Number of Competitors per Neighborhood:  3.654794520547945


Now we can plot the competition on our map.<br>
We want to take extra notice of Thai Food restaurants so they are in Red.<br>
Other Asian Cuisine restaurants are in Purple.<br>
All other restaurants are in Blue.

In [15]:
map_dallas = folium.Map(location=dallas_center, zoom_start=11)

for comp in competitors.values():
    lat = comp[2]
    lon = comp[3]
    is_thai = comp[7]
    is_asian = comp[6]
    if is_thai:
        color = 'red'
    elif is_asian:
        color = 'purple'
    else:
        color = 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_opacity=1).add_to(map_dallas)
    

map_dallas
    

Now we have collected Data on restaurant locations and the type of food they serve in a 5 km radius around the city center of Dallas.<br><br>

Our next step is to analyze the data to determine which areas have the lowest restaurant density, as well as which areas are farthest from restaurants that serve a similar cuisine.