# Capstone Project - The Best Neighborhood for an Ethipian Restaurant in DC

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)

## Introduction: Business Problem <a name="introduction"></a>


In this project, I will try to find an optimal location to open a new Ethiopian restaurant in Washington, D.C.

D.C. already has many highly-rated Ethiopian restaurants. We want to determine where the restaurant could go that doesn’t already have other Ethiopian restaurants in the area. In Washington, D.C. parking can be a nightmare. Therefore, we would like to place this restaurant as close to a metro stop as possible so that it is easily accessible to residents living all around the city.

We will determine the best neighborhoods to open this restaurant, based on the above criteria. The top neighborhoods will be presented, along with their advantages and disadvantages, to the stakeholder.

## Data <a name="data"></a>

The data we need to make our decision are:
- number of existing Ethiopian restaurants in the neighborhood
- location of metro stops
- regularly spaced grid of locations, centered around city center, which will be used to define neighborhoods

The following data sources will be needed to extract/generate the required information:
- the number of Ethiopian restaurants and their locations in every neighborhood will be obtained using Foursquare API
- the location of metro stops will be obtained using the Washington Metropolitan Area Transit Authority API
- the coordinates of Washington, DC’s center will be visually approximated; neighborhood centers will be generated algorithmically 


Import relevant libraries

In [1]:
import shapely.geometry
import pyproj
import math
import copy
import pandas as pd
import folium
import requests


The coordinates of Washington, D.C. city center will be defined.

In [2]:
dc_center = [38.9072, -77.0134]

Now, create a grid of neighborhoods. These will be equally spaced circles, centered around Washington, D.C. city center, within 5 km of the center (approximately the radius of D.C.). Neighborhoods will be circular areas with a radius of 500 meters, so our neighborhood centers will be 250 meters apart.

To accurately calculate distances, I will create a grid of locations in the Cartesian 2D coordinate system so that we can calculate distances in meters rather than degrees. We can project the 2D coordinates back to degrees to be shown on the map.

In [3]:
# define conversion functions
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=18, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=18, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

In [4]:
# convert city center to Cartesian coordinates
dc_center_x, dc_center_y = lonlat_to_xy(dc_center[1], dc_center[0])
# define city radius and distance between neighborhood centers
radius = 5000 # meters
step = 500 # meters

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = dc_center_x - radius
x_step = step
y_min = dc_center_y - radius - (int(21/k)*k*step - 2*radius)/2
y_step = step * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(30/k)):
    y = y_min + i * y_step
    x_offset = step/2 if i%2==0 else 0
    for j in range(0, 30):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(dc_center_x, dc_center_y, x, y)
        if (distance_from_center <= (radius+1)):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')
# Fix overlapping neighborhoods

grading-to-pyproj-2-from-pyproj-1
364 candidate neighborhood centers generated.


Visualize the city center and candidate neighborhoods

In [5]:
map_dc = folium.Map(location=dc_center, zoom_start=13)
folium.Marker(dc_center, popup='Washington, D.C.').add_to(map_dc)
for lat, lon in zip(latitudes, longitudes):
    folium.Circle([lat, lon], radius=step/2, color='blue', fill=False).add_to(map_dc)
map_dc

Add neighborhood to dataframe

In [6]:
df_locations = pd.DataFrame({'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys})

df_locations.head(10)

Unnamed: 0,Latitude,Longitude,X,Y
0,38.864004,-77.029468,323920.987388,4303642.0
1,38.864104,-77.023708,324420.987388,4303642.0
2,38.864204,-77.017949,324920.987388,4303642.0
3,38.864303,-77.012189,325420.987388,4303642.0
4,38.864402,-77.006429,325920.987388,4303642.0
5,38.864501,-77.00067,326420.987388,4303642.0
6,38.8646,-76.99491,326920.987388,4303642.0
7,38.867753,-77.038219,323170.987388,4304075.0
8,38.867854,-77.032459,323670.987388,4304075.0
9,38.867954,-77.026699,324170.987388,4304075.0


Now I will use the Foursquare API to get info on all the Ethiopian restaurants in D.C.

In [7]:
# API Key - removed to post on Github

version = '20180604'
limit = 100

In [8]:
ethiopian_restaurant_category = '4bf58dd8d48988d10a941735'
url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, dc_center[0], dc_center[1], ethiopian_restaurant_category, radius, limit)
results = requests.get(url).json()['response']['groups'][0]['items']

venues = [(item['venue']['id'],
            item['venue']['name'],
            item['venue']['location']['lat'],
            item['venue']['location']['lng']) for item in results]

In [9]:
# read into a dataframe
restaurant_df = pd.DataFrame(venues, columns = ['ID', 'Name', 'Lat', 'Lon']) 
restaurant_df.head()

Unnamed: 0,ID,Name,Lat,Lon
0,4b4fb0dff964a520171127e3,Habesha,38.916372,-77.023947
1,468e1270f964a52077481fe3,Dukem,38.916846,-77.027968
2,4e4e45d7bd4101d0d7a68a73,Keren Restaurant,38.917015,-77.04134
3,4ba40feff964a5206e7d38e3,Ethiopic,38.900034,-77.000395
4,4fc00d74e4b04422a09e6b25,CherCher,38.90836,-77.0242


In [10]:
map_dc = folium.Map(location=dc_center, zoom_start=13)
folium.Marker(dc_center, popup='Washington, D.C.').add_to(map_dc)
for lat, lon in zip(latitudes, longitudes):
    folium.Circle([lat, lon], radius=step/2, color='blue', fill=False).add_to(map_dc)
for lat, lon in zip(list(restaurant_df['Lat']), list(restaurant_df['Lon'])):
    folium.CircleMarker([lat, lon], radius=3, color='red', fill=True, fill_opacity=1).add_to(map_dc)
map_dc

Download metro station data

In [11]:
# API Key - removed to post on Github


In [12]:
# read in all metros in D.C.
headers = {'api_key': api_key}
results = requests.get('https://api.wmata.com/Rail.svc/json/jStations', headers=headers).json()

In [13]:
metro_df = pd.DataFrame(results['Stations'])
metro_df.head()

Unnamed: 0,Code,Name,StationTogether1,StationTogether2,LineCode1,LineCode2,LineCode3,LineCode4,Lat,Lon,Address
0,A01,Metro Center,C01,,RD,,,,38.898303,-77.028099,"{'Street': '607 13th St. NW', 'City': 'Washing..."
1,A02,Farragut North,,,RD,,,,38.903192,-77.039766,"{'Street': '1001 Connecticut Avenue NW', 'City..."
2,A03,Dupont Circle,,,RD,,,,38.909499,-77.04362,"{'Street': '1525 20th St. NW', 'City': 'Washin..."
3,A04,Woodley Park-Zoo/Adams Morgan,,,RD,,,,38.924999,-77.052648,"{'Street': '2700 Connecticut Ave., NW', 'City'..."
4,A05,Cleveland Park,,,RD,,,,38.934703,-77.058226,"{'Street': '3599 Connecticut Avenue NW', 'City..."


In [14]:
map_dc = folium.Map(location=dc_center, zoom_start=13)
folium.Marker(dc_center, popup='Washington, D.C.').add_to(map_dc)
for lat, lon in zip(latitudes, longitudes):
    folium.Circle([lat, lon], radius=step/2, color='blue', fill=False).add_to(map_dc)
for lat, lon in zip(list(metro_df['Lat']), list(metro_df['Lon'])):
    folium.CircleMarker([lat, lon], radius=3, color='red', fill=True, fill_opacity=1).add_to(map_dc)
map_dc

## Methodology <a name="methodology"></a>

In this project, I will find areas of Washington, D.C. that have a low density of Ethiopian restaurants and are near a metro stop.

In first step, I collected the relevant data, which included the location of every Ethiopian restaurant and metro stop in D.C.

In the analysis I will calculate the number of metro stops and Ethiopian restaurants in every neighborhood of D.C.

In the last step, I will identify the neighborhoods that have at least one metro stop and the fewest number of Ethiopian restaurants.

## Analysis <a name="analysis"></a>

Count how many Ethiopian restaurants are in each neighborhood.

In [15]:
# create a copy of our neighborhood df so that we can track the distance to each restaurant and overwrite it each time
df_locations_temp = copy.copy(df_locations)
# initialize a new column that tallies the number of ethiopian restaurants in each neighborhood
df_locations['eth_rest_count'] = 0

In [16]:
# go through each restaurant and determine which neighborhood it is in
for i, row in restaurant_df.iterrows():
    restaurant_lat = float(row['Lat'])
    restaurant_lon = float(row['Lon'])
    # convert to cartesian coordinates
    restaurant_x, restaurant_y = lonlat_to_xy(restaurant_lon, restaurant_lat)
    # calculate distance to each neighborhood
    for n_i, n_row in df_locations_temp.iterrows():
        dis = calc_xy_distance(restaurant_x, restaurant_y, n_row.X, n_row.Y)
        df_locations_temp.loc[n_i, 'distance'] = dis
    # if the distance falls within the radius of the neighborhood closest to it, record it
    if df_locations_temp.distance.min() <= (step/2):
        # find the minimum distance neighborhood (should be the neighborhood it is inside of)
        min_idx = df_locations_temp.index[df_locations_temp.distance==df_locations_temp.distance.min()][0]
        # indicate that we have found a/another ethiopian restaurant in the original neighborhood df
        df_locations.loc[min_idx, 'eth_rest_count'] += 1



Determine if metro stop is in each neighborhood or not.

In [17]:
# initialize a new column that tallies the number of metro stops in each neighborhood
df_locations['metro_count'] = 0

In [18]:
# go through each metro stop and determine which neighborhood it is in
for i, row in metro_df.iterrows():
    metro_lat = float(row['Lat'])
    metro_lon = float(row['Lon'])
    # convert to cartesian coordinates
    metro_x, metro_y = lonlat_to_xy(metro_lon, metro_lat)
    # calculate distance to each neighborhood
    for n_i, n_row in df_locations_temp.iterrows():
        dis = calc_xy_distance(metro_x, metro_y, n_row.X, n_row.Y)
        df_locations_temp.loc[n_i, 'distance'] = dis

    # if the distance falls within the radius of the neighborhood closest to it, record it
    if df_locations_temp.distance.min() <= (step/2):
        # find the minimum distance neighborhood (should be the neighborhood it is inside of)
        min_idx = df_locations_temp.index[df_locations_temp.distance==df_locations_temp.distance.min()][0]
        # indicate that we have found a/another ethiopian restaurant in the original neighborhood df
        df_locations.loc[min_idx, 'metro_count'] += 1



Find the neighborhoods with at least one metro stop.

In [19]:
df_1_metro = df_locations[df_locations.metro_count>=1]
df_1_metro.shape

(26, 6)

 Determine which of those neighborhoods have the fewest Ethiopian retaurants.

In [20]:
df_1_metro.sort_values('eth_rest_count', inplace=True)
df_1_metro.head(10)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,Latitude,Longitude,X,Y,eth_rest_count,metro_count
36,38.875953,-77.015399,325170.987388,4304941.0,0,1
292,38.930248,-77.034239,323670.987388,4311004.0,0,1
270,38.925995,-77.054303,321920.987388,4310571.0,0,1
243,38.919196,-76.996438,326920.987388,4309705.0,0,1
196,38.910599,-77.042327,322920.987388,4308839.0,0,1
183,38.907349,-77.004755,326170.987388,4308406.0,0,1
141,38.899251,-77.021823,324670.987388,4307539.0,0,2
124,38.8956,-77.007307,325920.987388,4307106.0,0,1
122,38.895401,-77.018832,324920.987388,4307106.0,0,1
136,38.898749,-77.050635,322170.987388,4307539.0,0,1


In [21]:
df_best = df_1_metro[df_1_metro.eth_rest_count==0]
df_best.shape

(20, 6)

In [22]:
df_best.sort_values('metro_count', ascending=False, inplace=True)
display(df_best)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,Latitude,Longitude,X,Y,eth_rest_count,metro_count
362,38.950295,-77.00308,326420.987388,4313169.0,0,2
141,38.899251,-77.021823,324670.987388,4307539.0,0,2
66,38.883653,-77.021381,324670.987388,4305807.0,0,2
71,38.884148,-76.992575,327170.987388,4305807.0,0,1
120,38.895202,-77.030356,323920.987388,4307106.0,0,1
38,38.876151,-77.003878,326170.987388,4304941.0,0,1
56,38.880395,-76.983824,327920.987388,4305374.0,0,1
59,38.882947,-77.061709,321170.987388,4305807.0,0,1
67,38.883752,-77.01562,325170.987388,4305807.0,0,1
69,38.883951,-77.004097,326170.987388,4305807.0,0,1


We see that there are many neighborhoods with at least one metro stop and no Ethiopian restaurants. Let's pull out those with two metro stops because they have increased accessibility. 

In [23]:
df_best_2_metro = df_best[df_best['metro_count']==2]
display(df_best_2_metro)

Unnamed: 0,Latitude,Longitude,X,Y,eth_rest_count,metro_count
362,38.950295,-77.00308,326420.987388,4313169.0,0,2
141,38.899251,-77.021823,324670.987388,4307539.0,0,2
66,38.883653,-77.021381,324670.987388,4305807.0,0,2


Let's map the areas with no Ethiopian restaurants that have at least one metro stop in purple. We will map the areas with two metro stops in red so they stand out more because these areas have increased accessibility.

In [24]:
map_dc = folium.Map(location=dc_center, zoom_start=13)
folium.Marker(dc_center, popup='Washington, D.C.').add_to(map_dc)
for lat, lon in zip(df_best.Latitude, df_best.Longitude):
    folium.Circle([lat, lon], radius=step/2, color='purple', fill=True, opacity=0.5).add_to(map_dc)
for lat, lon in zip(df_best_2_metro.Latitude, df_best_2_metro.Longitude):
    folium.Circle([lat, lon], radius=step/2, color='red', fill=True, opacity=0.5).add_to(map_dc)
map_dc

## Results and Discussion <a name="results"></a>

While Washington, D.C. has both a large number of restaurants and more specifically, Ethiopian restaurants, this analysis shows that there are several areas of the city that have a low density of Ethiopian restaurants and are still quite accessible to a wide customer base via the metro.

First, we determined which neighborhoods had at least one metro stop, so we would know that any area in this neighborhood would be very accessible. We found 26 neighborhoods in D.C. that had a metro inside.

Next, we checked the Ethiopian restaurant density in these neighborhoods. We found that there were 20 neighborhoods that had a metro stop and did not contain any Ethiopian restaurants currently. In fact, there were 3 of these neighborhoods that had two metro stops, meaning they are even more accessible.

These three areas could potentially be great candidates to explore to start a new Ethiopian restaurant. However, more analysis will have to be done to determine which of these areas is best. We want to make sure these are desireable neighborhoods to visit and eat and explore other things to do in the area; for example, you might be more likely to go out to eat dinner in an area with fun bars to visit later! We would recommend the resulting areas from the analysis be explored further before determining the final location.

## Conclusion <a name="conclusion"></a>

The purpose of this project was to help our stakeholders narrow down the best locations to start a new Ethiopian restaurant in Washington, D.C. By determining which areas have metro stations, we have first found locations that can be easily visited by a wide variety of customers. Next, we were able to find a variety of these neighborhoods that did not have any competing Ethiopian restaurant. These areas are a great starting point for further exploration by the stakeholder. The final decision on restaurant placement should be determined by the stakeholder after taking into account other important factors.