# Capstone Project - The Battle of Subway Stops
### Applied Data Science Capstone by IBM/Coursera
### Author: Peilin Xin

## Table of contents
* [Introduction](#introduction)
* [Data Source](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>

In this project, we will try to target a few locations that are good to open a restaurant. The report will be useful to stakeholders who are interested in opening an American restaurant in Boston, MA. 

Since Boston’s well-established subway system is an important way for people to communicate, we will try to detect locations that are **close to the subway and bus stops**. Also, it’s more important that the location is **not crowded with restaurants, especially American restaurants**.

We will scrape the subway and bus stops data from MBTA (Massachusetts Bay Transportation Authority) and apply explanations to filter the locations that can be chosen.

## Data <a name="data"></a>

We will scrape the subway and bus stops data from **MBTA** (Massachusetts Bay Transportation Authority) and apply explanations to filter the locations that can be chosen.

Based on the problem we defined, the factors that will influence our decisions would be:
* number of existing restaurants near the subway and bus stops.
* Location distance to the subway and bus stops.
* Location distance to the nearest American Restaurants.

Since the MBTA data has given the coordinates of each stop, we will need to extract the restaurant information based on these coordinates by using **Foursquare API** and **Google Geocoding API**.

In [107]:
import requests, zipfile, io # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geopy --yes # client for several popular geocoding web services
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 

from geopy import distance # calculate distance between two coordinaties

import pickle # Help a Python object hierarchy convert into a byte stream

! pip install --upgrade google-api-python-client google-auth-httplib2 google-auth-oauthlib # Google API installation

!pip install pyproj # cartographic projections and coordinate transformations library

import pyproj

!conda install -c conda-forge folium=0.5.0 --yes # plotting library

import folium
print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.


C:\Users\kk>if exist C:\Users\kk\anaconda3\Library\share\proj\copyright_and_licenses.csv (
rem proj-data is installed because its license was copied over  
 
)  else () 
Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.


C:\Users\kk>if exist C:\Users\kk\anaconda3\Library\share\proj\copyright_and_licenses.csv (
rem proj-data is installed because its license was copied over  
 
)  else () 
Folium installed
Libraries imported.


In [536]:
# Download the zipfile from MBTA site
r = requests.get('https://cdn.mbta.com/MBTA_GTFS.zip')
# Extract the zip
z = zipfile.ZipFile(io.BytesIO(r.content))

# read the stops csv file
data = pd.read_csv(z.open('stops.txt'))
data.head()

Unnamed: 0,stop_id,stop_code,stop_name,stop_desc,platform_code,platform_name,stop_lat,stop_lon,zone_id,stop_address,stop_url,level_id,location_type,parent_station,wheelchair_boarding,municipality,on_street,at_street,vehicle_type
0,1,1.0,Washington St opp Ruggles St,,,,42.330957,-71.082754,ExpressBus-Downtown,,https://www.mbta.com/stops/1,,0,,1,Boston,Washington Street,Ruggles Street,3.0
1,10,10.0,Theo Glynn Way @ Newmarket Sq,,,,42.330555,-71.068787,LocalBus,,https://www.mbta.com/stops/10,,0,,1,Boston,Theodore Glynn Way,Newmarket Square,3.0
2,10000,10000.0,Tremont St opp Temple Pl,,,,42.355692,-71.062911,LocalBus,,https://www.mbta.com/stops/10000,,0,,1,Boston,Tremont Street,Temple Place,3.0
3,10003,10003.0,Albany St opp Randall St,,,,42.331591,-71.076237,LocalBus,,https://www.mbta.com/stops/10003,,0,,0,Boston,Albany Street,Randall Street,3.0
4,10005,10005.0,Albany St opp E Concord St,,,,42.335017,-71.07128,LocalBus,,https://www.mbta.com/stops/10005,,0,,1,Boston,Albany Street,East Concord Street,3.0


In [537]:
# Shape of data
data.shape

(9867, 19)

Missing value is a common problem that we need to deal in data preparation. 

However, since the data is downloaded from the official website and I can't find any subsitutes, I choose to delete the null values. I'm not going to fill in the null values by calculations and algorithms becuase appoximate information in this reserach can be harmful.

In [538]:
missing_values_count = data.isnull().sum()
missing_values_count

stop_id                   0
stop_code              2142
stop_name                 0
stop_desc              7605
platform_code          9786
platform_name          9279
stop_lat               1398
stop_lon               1398
zone_id                1794
stop_address           9576
stop_url               1673
level_id               7782
location_type             0
parent_station         7658
wheelchair_boarding       0
municipality              0
on_street              2509
at_street              3283
vehicle_type           1938
dtype: int64

In [539]:
# Number of unique values in location_type
data['location_type'].nunique()

4

In [540]:
# Number of unique values in zone_id
data['zone_id'].nunique()

24

In [541]:
# Number of unique values in municipality
data['municipality'].nunique()

103

In [542]:
# Select the columns are can be useful for further analysis
data0 = data[['stop_name', 'stop_lat', 'stop_lon', 'zone_id', 'location_type', 'wheelchair_boarding', 'municipality', 'on_street', 'at_street']]
data0.head()

Unnamed: 0,stop_name,stop_lat,stop_lon,zone_id,location_type,wheelchair_boarding,municipality,on_street,at_street
0,Washington St opp Ruggles St,42.330957,-71.082754,ExpressBus-Downtown,0,1,Boston,Washington Street,Ruggles Street
1,Theo Glynn Way @ Newmarket Sq,42.330555,-71.068787,LocalBus,0,1,Boston,Theodore Glynn Way,Newmarket Square
2,Tremont St opp Temple Pl,42.355692,-71.062911,LocalBus,0,1,Boston,Tremont Street,Temple Place
3,Albany St opp Randall St,42.331591,-71.076237,LocalBus,0,0,Boston,Albany Street,Randall Street
4,Albany St opp E Concord St,42.335017,-71.07128,LocalBus,0,1,Boston,Albany Street,East Concord Street


After selecting the appropriate columns for reserach, I need to filter out the rows that are not in the reserach range.

First, I need to calculate an coordinates as the center of Boston and find out all subway stops within 8 km from this points. If the stops are more than 8 km far, then they are not in the reserach range because it's going to be too far for getting enough customers

In [544]:
# backup the data
data1 = data0

In [545]:
# The coordinates of Boston city
Blat, Blon = 42.35, -71.08

In [546]:
# Drop the na values since it's very difficult to find the correct values to fill in
data0 = data0.dropna()
data0.reset_index(drop=True, inplace=True)

In [547]:
data0.shape

(6584, 9)

In [548]:
# search for subway stop coordinates that within 8 km far from the Boston center
Area = []
for i in range(len(data0)):
    if distance.distance((data0['stop_lat'][i], data0['stop_lon'][i]), (Blat, Blon)).km <= 8:
        Area.append(1)
    else:
        Area.append(0)

In [549]:
# add the column to the reserach dataset
data0['In Reserach Area'] = Area

In [550]:
# delete the rows that are not in reserach range
data0 = data0[data0['In Reserach Area'] == 1]

In [551]:
data0.shape

(2171, 10)

In [552]:
# reset the index since we dropped some rows
dataBos = data0
dataBos.reset_index(drop=True, inplace=True)

In [553]:
# check the rows number match with the index
dataBos.tail()

Unnamed: 0,stop_name,stop_lat,stop_lon,zone_id,location_type,wheelchair_boarding,municipality,on_street,at_street,In Reserach Area
2166,Maverick,42.369139,-71.039615,LocalBus,0,1,Boston,Maverick Square,Henry Street,1
2167,Maverick,42.36907,-71.039473,LocalBus,0,1,Boston,Maverick Square,Sumner Street,1
2168,Orient Heights,42.386982,-71.00499,LocalBus,0,0,Boston,Busway,Bennington Street,1
2169,Orient Heights,42.38723,-71.004229,LocalBus,0,0,Boston,Busway,Bennington Street,1
2170,Wood Island,42.380368,-71.023287,LocalBus,0,0,Boston,Busway,Bennington Street,1


### Foursquare
Since we have our location data, we are going to use **Foursquare API** to get information on restaurants in each stops.

We are only interested in the actual restaurants such as diner, taverna, steakhouse, not the coffee, bakeries etc. So we will use the restaurant descriptions on the API response to filter out restaurants that are not revelent to our reserach topic.

In [505]:
# Get the restaurant id from Foursquare
food_category = '4d4b7105d754a06374d81259'
italian_restaurant_categories = '4bf58dd8d48988d14e941735' # 'Root' category for all food-related venues

In [506]:
CLIENT_ID = 'Foursquare ID' # your Foursquare ID
CLIENT_SECRET = 'Foursquare Secret' # your Foursquare Secret
VERSION = '20200624'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: OLIP15XCOROL1CAYUREM3R02E3JV2XVUSBXIR15Q1PCLHBDF
CLIENT_SECRET:LJ5YBE5QOASQRTREWXW5VYOELU5FHNLI4J12RJ2F0GPIPGCP


In [508]:
# Override warnings such as DeprecationWarning
import warnings

def fxn():
    warnings.warn("deprecated", DeprecationWarning)

with warnings.catch_warnings():
    warnings.simplefilter("ignore")
    fxn()

In [509]:
# Tranform the coordinates to x and y
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

# x and y to coordinates
def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

# x y distance
def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

# get venues by API call
def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, VERSION, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'], item['venue']['name'],
                   get_categories(item['venue']['categories']), 
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                    item['venue']['location']['distance']) for item in results] 
    except:
        venues = []
    return venues

# Check if the restaurant is in our research range
def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

In [796]:
# extract useful information from the API call
def get_restaurants(lats, lons):
    restaurants = {}
    A_restaurants = {}
    location_restaurants = []
    venues = get_venues_near_location(lats, lons, food_category, CLIENT_ID, CLIENT_SECRET, radius=350, limit=100)
    area_restaurants = []
    for venue in venues:
        venue_id = venue[0]
        venue_name = venue[1]
        venue_categories = venue[2]
        venue_latlon = venue[3]
        venue_distance = venue[4]
        is_res, is_A = is_restaurant(venue_categories, specific_filter=A_restaurant_categories)
        if is_res:
            x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
            restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_distance, is_A, x, y)
            if venue_distance<=300:
                area_restaurants.append(restaurant)
            restaurants[venue_id] = restaurant
            if is_A:
                A_restaurants[venue_id] = restaurant
    location_restaurants.append(area_restaurants)
    return restaurants, A_restaurants, location_restaurants

In [511]:
# def get_restaurants(lats, lons):
#     location_restaurants = []
#     # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
#     venues = get_venues_near_location(lats, lons, food_category, CLIENT_ID, CLIENT_SECRET, radius=350, limit=100)
#     area_restaurants = []
#     for venue in venues:
#         venue_id = venue[0]
#         venue_name = venue[1]
#         venue_categories = venue[2]
#         venue_latlon = venue[3]
#         venue_distance = venue[4]
#         is_res= is_restaurant(venue_categories)
#         if is_res:
#             x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
#             restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_italian, x, y)
#             restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_distance)
#             if venue_distance<=500:
#                 area_restaurants.append(restaurant)
#     location_restaurants.append(area_restaurants)

#     return location_restaurants

In [794]:
import sys

if not sys.warnoptions:
    import warnings
    warnings.simplefilter("ignore")

Since we have more than 2000 rows to reserach and the calls for regular users on Foursquare are limited to 950, we need seperate the calls to 2 days and eventually combine them together.

I save the response to local disk once after the call finish in case I lose the data.

In [513]:
LocationRes = []
for i in range(850):
    LAT, LON = dataBos.loc[i, "stop_lat"], dataBos.loc[i, "stop_lon"]
    LocationRes.append(get_restaurants(LAT, LON))

In [514]:
with open('LocationRes1', 'wb') as f:
    pickle.dump(LocationRes, f)

In [515]:
with open('LocationRes1', 'rb') as f:
    mylist = pickle.load(f)

In [516]:
LocationRes2 = []
for i in range(850, len(dataBos)):
    LAT, LON = dataBos.loc[i, "stop_lat"], dataBos.loc[i, "stop_lon"]
    LocationRes2.append(get_restaurants(LAT, LON))

In [517]:
with open('LocationRes2', 'wb') as f1:
    pickle.dump(LocationRes2, f1)

In [518]:
with open('LocationRes2', 'rb') as f1:
    mylist1 = pickle.load(f1)

In [519]:
# combine the response data
LocRes = mylist + mylist1

In [566]:
restaurants = [r[0] for r in LocRes]

In [567]:
American_restaurants = [r[1] for r in LocRes]

## Methodology <a name="methodology"></a>

In this project, we will try to detect locations that are close to the subway and bus stops while ensuring that they are not crowded with restaurants, especially American restaurants.

In the first step, I will calculate the restaurants within **350 meter** from the subway stops since this distance is acceptable for potential customers who take subway to get to the restaurants. Then, I will calculate the **nearest American restaurnat** in this range so we know where our competitor are.

At the second stage, I will explore the restaurant density through out the reserach area and use heatmap to identify the areas that are in **low restaurant density**.

Finally, we will focus on locations that meet the conditions listed above and generate the address of these locations based on **Google Geocoding API**.


## Analysis <a name="analysis"></a>

Here are some basic EDA of the data and we will also go through the methodology discussed.

In [568]:
count = 0
for i in range(len(restaurants)):
    if len(restaurants[i]) == 0:
        count +=1
print(count)

1371


In [569]:
print('Average number of restaurants in each neighbor:', np.array([len(r) for r in restaurants]).mean())

Average number of restaurants in each neighbor: 2.532934131736527


In [814]:
print('Maximum number of restaurants in a neighbor:', np.array([len(r) for r in restaurants]).max())

Maximum number of restaurants in a neighbor: 61


In [797]:
location_restaurants_count = [len(res) for res in restaurants]

dataBos['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=350m:', np.array(location_restaurants_count).mean())

dataBos['stop_lat'].apply(lambda x: float(x))
dataBos['stop_lon'].apply(lambda x: float(x))
dataBos.head()

Average number of restaurants in every area with radius=350m: 2.532934131736527


Unnamed: 0,stop_name,stop_lat,stop_lon,zone_id,location_type,wheelchair_boarding,municipality,on_street,at_street,In Reserach Area,Restaurants in area,Distance to American restaurant,good locations to open
0,Washington St opp Ruggles St,42.330957,-71.082754,ExpressBus-Downtown,0,1,Boston,Washington Street,Ruggles Street,1,6,145.461131,False
1,Theo Glynn Way @ Newmarket Sq,42.330555,-71.068787,LocalBus,0,1,Boston,Theodore Glynn Way,Newmarket Square,1,2,143.12222,False
2,Tremont St opp Temple Pl,42.355692,-71.062911,LocalBus,0,1,Boston,Tremont Street,Temple Place,1,38,44.303822,False
3,Albany St opp Randall St,42.331591,-71.076237,LocalBus,0,0,Boston,Albany Street,Randall Street,1,4,270.56553,False
4,Albany St opp E Concord St,42.335017,-71.07128,LocalBus,0,1,Boston,Albany Street,East Concord Street,1,7,158.599563,False


In [571]:
dataBos['stop_lat'] = pd.to_numeric(dataBos['stop_lat'], errors='coerce')
dataBos['stop_lon'] = pd.to_numeric(dataBos['stop_lon'], errors='coerce')
dataBos['Restaurants in area'] = pd.to_numeric(dataBos['Restaurants in area'], errors='coerce')

In [572]:
# calculate the closest American restaurant from the stops
import numpy as np
import copy
from copy import deepcopy
from geopy.distance import geodesic
distances_to_American_restaurant = []

for i in range(len(restaurants)):
    min_distance = 10000
    x, y = dataBos['stop_lat'][i], dataBos['stop_lon'][i]
    if len(restaurants[i]) != 0:
        for val in restaurants[i].values():
            val_x = val[2]
            val_y = val[3]
            dist = geodesic((x, y), (val_x, val_y)).meters
            if dist<min_distance:
                min_distance = dist
    if min_distance == 10000:
        min_distance = -1

    distances_to_American_restaurant.append(min_distance)

dataBos['Distance to American restaurant'] = distances_to_American_restaurant

In [798]:
# check how many stops have restaurant within 350 meters
len(dataBos[dataBos['Restaurants in area'] != 0])

800

In [799]:
# check how many stops has American restaurants
len(dataBos[dataBos['Distance to American restaurant'] != -1])

800

In [594]:
dataBos.head()

Unnamed: 0,stop_name,stop_lat,stop_lon,zone_id,location_type,wheelchair_boarding,municipality,on_street,at_street,In Reserach Area,Restaurants in area,Distance to American restaurant
0,Washington St opp Ruggles St,42.330957,-71.082754,ExpressBus-Downtown,0,1,Boston,Washington Street,Ruggles Street,1,6,145.461131
1,Theo Glynn Way @ Newmarket Sq,42.330555,-71.068787,LocalBus,0,1,Boston,Theodore Glynn Way,Newmarket Square,1,2,143.12222
2,Tremont St opp Temple Pl,42.355692,-71.062911,LocalBus,0,1,Boston,Tremont Street,Temple Place,1,38,44.303822
3,Albany St opp Randall St,42.331591,-71.076237,LocalBus,0,0,Boston,Albany Street,Randall Street,1,4,270.56553
4,Albany St opp E Concord St,42.335017,-71.07128,LocalBus,0,1,Boston,Albany Street,East Concord Street,1,7,158.599563


In [802]:
print('Average distance to closest American restaurant from each area center:',dataBos[dataBos['Distance to American restaurant'] != -1]['Distance to American restaurant'].mean())

Average distance to closest American restaurant from each area center: 138.21758362497502


## The Density of American Restaurants in Research Area

The map shows all coordinates of the American restaurants in the reserach area with the heatmap showing their density

In [575]:
from folium import plugins
from folium.plugins import HeatMap

In [701]:
American_Res = []
for i in range(len(American_restaurants)):
    if len(American_restaurants[i]) != 0:
        for val in American_restaurants[i].values():
            val_x = val[2]
            val_y = val[3]
            American_Res.append([val_x, val_y])

In [815]:
print('Number of American restaurants in the reserach range:', len(American_Res))

Number of American restaurants in the reserach range: 553


In [716]:
map_BosR1 = folium.Map(location=[Blat, Blon], zoom_start=14)
folium.Marker([Blat, Blon]).add_to(map_BosR1)
for lat1, lon1 in American_Res:
    folium.CircleMarker([lat1, lon1], radius=2.5, color='red', fill=True, fill_color='red', fill_opacity=1).add_to(map_BosR1)
map_BosR1.add_children(plugins.HeatMap(American_Res, radius=18))
map_BosR1

## The Transportation systems in the Reserach Area

The map shows all the subway and bus stops in the reserach area with the heatmap indicating the areas with high American restaurant density

In [723]:
Trans_latitudes = dataBos['stop_lat']
Trans_longitudes = dataBos['stop_lon']
Trans_locations = [[lat, lon] for lat, lon in zip(Trans_latitudes, Trans_longitudes)]
map_BosT = folium.Map(location=[Blat, Blon], zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_BosT)
folium.Marker([Blat, Blon]).add_to(map_BosT)
for lat, lon in zip(Trans_latitudes, Trans_longitudes):
    folium.CircleMarker([lat, lon], radius=0.1, color='black', fill=True, fill_color='black', fill_opacity=1).add_to(map_BosT)
map_BosT.add_children(plugins.HeatMap(American_Res, radius=18))
map_BosT

In [660]:
dataBos.head()

Unnamed: 0,stop_name,stop_lat,stop_lon,zone_id,location_type,wheelchair_boarding,municipality,on_street,at_street,In Reserach Area,Restaurants in area,Distance to American restaurant
0,Washington St opp Ruggles St,42.330957,-71.082754,ExpressBus-Downtown,0,1,Boston,Washington Street,Ruggles Street,1,6,145.461131
1,Theo Glynn Way @ Newmarket Sq,42.330555,-71.068787,LocalBus,0,1,Boston,Theodore Glynn Way,Newmarket Square,1,2,143.12222
2,Tremont St opp Temple Pl,42.355692,-71.062911,LocalBus,0,1,Boston,Tremont Street,Temple Place,1,38,44.303822
3,Albany St opp Randall St,42.331591,-71.076237,LocalBus,0,0,Boston,Albany Street,Randall Street,1,4,270.56553
4,Albany St opp E Concord St,42.335017,-71.07128,LocalBus,0,1,Boston,Albany Street,East Concord Street,1,7,158.599563


## Calculation of the Fianl Locations

Calculate the locations with the criteria introduced and use a map to visualize these locations. (The heatmap shows the density of the American Restaurant in this area)

In [687]:
good_res_count = np.array((dataBos['Restaurants in area']<= 1))
print('Locations with no more than one restaurant nearby:', good_res_count.sum())

good_ita_distance = np.array(dataBos['Distance to American restaurant']>=340)
print('Locations with no American restaurants within 340m:', good_ita_distance.sum())

good_locations = np.logical_and(good_res_count, good_ita_distance)
print('Locations with both conditions met:', good_locations.sum())



Locations with no more than one restaurant nearby: 1531
Locations with no American restaurants within 340m: 18
Locations with both conditions met: 17


In [688]:
dataBos['good locations to open'] = good_locations

In [689]:
dataBos.head()

Unnamed: 0,stop_name,stop_lat,stop_lon,zone_id,location_type,wheelchair_boarding,municipality,on_street,at_street,In Reserach Area,Restaurants in area,Distance to American restaurant,good locations to open
0,Washington St opp Ruggles St,42.330957,-71.082754,ExpressBus-Downtown,0,1,Boston,Washington Street,Ruggles Street,1,6,145.461131,False
1,Theo Glynn Way @ Newmarket Sq,42.330555,-71.068787,LocalBus,0,1,Boston,Theodore Glynn Way,Newmarket Square,1,2,143.12222,False
2,Tremont St opp Temple Pl,42.355692,-71.062911,LocalBus,0,1,Boston,Tremont Street,Temple Place,1,38,44.303822,False
3,Albany St opp Randall St,42.331591,-71.076237,LocalBus,0,0,Boston,Albany Street,Randall Street,1,4,270.56553,False
4,Albany St opp E Concord St,42.335017,-71.07128,LocalBus,0,1,Boston,Albany Street,East Concord Street,1,7,158.599563,False


In [816]:
OP_latitudes = dataBos[dataBos['good locations to open'] == True]['stop_lat'].values
OP_longitudes = dataBos[dataBos['good locations to open'] == True]['stop_lon'].values
OP_loc = [[latitude, lontitude] for latitude, lontitude in zip(OP_latitudes, OP_longitudes)]

map_BosO = folium.Map(location=[Blat, Blon], zoom_start=14)

for lat1, lon1 in zip(OP_latitudes, OP_longitudes):
    folium.CircleMarker([lat1, lon1], radius=5, color='red', fill=True, fill_color='black', fill_opacity=1).add_to(map_BosO)
map_BosO.add_children(plugins.HeatMap(American_Res, radius=18))
map_BosO

In [777]:
google_api_key = 'AIzaSyBY3ZeSWK3-bYUl5tyxkNPWVR2svgAwpi4'
def get_address(api_key, latitude, longitude, verbose=False):
    try:
        url = 'https://maps.googleapis.com/maps/api/geocode/json?key={}&latlng={},{}'.format(api_key, latitude, longitude)
        response = requests.get(url).json()
        if verbose:
            print('Google Maps API JSON result =>', response)
        results = response['results']
        address = results[0]['formatted_address']
        return address
    except:
        return None

In [792]:
candidate_area_addresses = []
print('------------------------------------------------------------------')
print('Approximate American Restaurant Addresses recommended')
print('----------------------------------------------------------------\n')
for lat, lon in OP_loc:
    addr = get_address(google_api_key, lat, lon)
    candidate_area_addresses.append(addr)
    print(addr)

------------------------------------------------------------------
Approximate American Restaurant Addresses recommended
----------------------------------------------------------------

American Legion Hwy @ Blue Hill Ave, Boston, MA 02124, USA
Market St @ Lothrop St, Boston, MA 02135, USA
Chestnut Hill Ave @ Jackson Ave, Boston, MA 02135, USA
N Beacon St opp Vineland St, Boston, MA 02109, USA
Columbus Ave @ Walnut Ave, Boston, MA 02119, USA
Centre St @ Cedar St, Boston, MA 02119, USA
Columbus Ave @ Cedar St, Boston, MA 02119, USA
Washington St @ Monastery Rd, Boston, MA 02135, USA
Cambridge St @ Dana St, Cambridge, MA 02138, USA
381 Dudley St opp Hampden St, Boston, MA 02119, USA
High St @ Cypress St, Brookline, MA 02445, USA
Boylston St @ Timon Ave, Brookline, MA 02467, USA
204 Seaver St, Boston, MA 02121, USA
Columbus Ave @ Walnut Ave, Boston, MA 02119, USA
Mt Auburn St @ Adams Ave, Watertown, MA 02472, USA
Broadway @ Lee St, Cambridge, MA 02139, USA
Broadway @ Fayette St, Cambridg

## Conclusion <a name="conclusion"></a>

The purpose of this project was to identify viable locations for opening an American restaurant. By calculating and visualizing the density distribution of restaurants in the Boston area, we get the locations that are crowded with restaurants and also the places that have the potential to open a restaurant as well. Then we filter out the locations with more than 1 American restaurant in range and make sure there is no American restaurant within 340 meters. After we got the final list of the locations, we reverse the coordinates to actual addresses.