<h1>Capstone Project: The Battle of the Neighborhoods</h1>

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

In this project we will try to find an optimal location for a restaurant. Specifically, this report will be targeted to stakeholders interested in opening a **restaurant** in **Moscow**, Russia. The interest in this topic comes from the desire to have as little competition as possible while having a target audience nearby.
We are particularly interested in **areas close to business or office centers**, as we will try to build our business model around serving business lunches. Also we will try to find areas with low restaurant density, as we would like to have little to none competition.

## Data <a name="data"></a>

The factors to be considered are the following:

- number of existing restaurants in the neighborhood (any type of restaurant)
- number of and distance to business centers in the neighborhood, if any

We use a regular grid centered around the city center to define neighborhoods.

Following data sources will be needed to extract/generate the required information:

Centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using Google Maps API reverse geocoding
number of restaurants and business centers in every neighborhood will be obtained using Foursquare API
coordinate of Moscow center will be obtained using geopy api.

The examples of the data obtained can be found in the sections below. Map visualization and table representation are used to illustrate the data.

### Neighborhood Candidates

Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods. We will create a grid of cells covering our area of interest which is aprox. 40x40 killometers centered around Moscow city center. Note that we do not consider New Moscow south western region.

Let's first find the latitude & longitude of Moscow city center.

We import the required libs first.

In [190]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

!conda install -c conda-forge geocoder geopy shapely pyproj folium=0.5.0 --yes
!pip install transliterate
import geocoder
from transliterate import translit, get_available_language_codes

from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

import folium # plotting library
import math
import matplotlib.cm as cm
import matplotlib.colors as colors
import shapely.geometry

import pyproj

import math

print('Folium installed')
print('Libraries imported.')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Folium installed
Libraries imported.


Now we find out the coordinate required.

In [191]:
address = 'The Red Square, Moscow, Russian Federation'

geolocator = Nominatim(user_agent="se_explorer")
location = geolocator.geocode(address)
moscow_latitude = location.latitude
moscow_longitude = location.longitude
print('The geograpical coordinate of Moscow are {}, {}.'.format(moscow_latitude, moscow_longitude))

The geograpical coordinate of Moscow are 55.7536283, 37.62137960067377.


Now we generate a **hex grid** covering Moscow region. The distance calculation will be performed in meters, so we add functions to convert lat/lon to cartesian coordinates an vice versa:

In [192]:
proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
proj_cartesian = pyproj.Proj(proj="utm", zone=37, datum='WGS84')

def lonlat_to_cartesian(lon, lat):
    xy = pyproj.transform(proj_latlon, proj_cartesian, lon, lat)
    return xy[0], xy[1]

def cartesian_to_lonlat(x, y):
    lonlat = pyproj.transform(proj_cartesian, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def dist(x1,y1,x2,y2):
    dx = x1-x2
    dy = y1-y2
    return math.sqrt(dx*dx+dy*dy)

#let's check ourselves on known values

x,y = lonlat_to_cartesian(moscow_longitude, moscow_latitude)
print('Moscow cartesian coordinate are {}, {}'.format(x,y))
lon, lat = cartesian_to_lonlat(x, y)
print('Moscow lonlat coordinate are {}, {}'.format(lat, lon))

Moscow cartesian coordinate are 413475.34048196295, 6179520.184200608
Moscow lonlat coordinate are 55.7536283, 37.62137960067377


Now we are ready for grid generation. Let's assume that an average pedestrian **speed is 5 km/h**. We make the distance between the centroids approx. equal to **20 minute walk**, so that the radius is approx. **10 minute walk**.

In [193]:
minutes = 20
centroids = []
centroid_distance = 5000*minutes/60
print("centroid distance (10 minute walk) is {}".format(centroid_distance))
size = 40000
x_offset = centroid_distance/2
x_step = centroid_distance
coef = math.sqrt(3)/2
y_step = centroid_distance*coef
centroid_limit = size/centroid_distance
print("centroid limit is {}".format(centroid_limit))
x_min = x-size/2
y_min = y-size/2-(int(centroid_limit/coef)*coef*centroid_distance/2 - size/2)/2
for i in range(0, int(centroid_limit/coef)):
    for j in range(0, int(centroid_limit)):
        pt_y = y_min+i*y_step
        pt_x = x_min+j*x_step
        if i%2==0:
            pt_x = pt_x+x_offset
        distance = dist(x,y,pt_x, pt_y)
        if distance < size/2:
            centroids.append([pt_x, pt_y])
print("Centroids generated: {}".format(len(centroids)))


centroid distance (10 minute walk) is 1666.6666666666667
centroid limit is 24.0
Centroids generated: 515


Let's add the centroids on the map to visualize our grid

In [194]:
moscow_map = folium.Map(location=[moscow_latitude, moscow_longitude], zoom_start=10)
folium.Marker([moscow_latitude, moscow_longitude], popup='Red Square').add_to(moscow_map)
lat_array = [];
lon_array = [];
xs = [];
ys = [];
for xi, yi in centroids :
    lon, lat = cartesian_to_lonlat(xi, yi)
    lat_array.append(lat);
    lon_array.append(lon);
    xs.append(xi)
    ys.append(yi)
    folium.Circle([lat, lon], centroid_distance/2 , color='blue', fill=False).add_to(moscow_map)
moscow_map

Now let's load aur data to a pandas dataframe for futher processing

In [195]:

locations = pd.DataFrame({'index': range(0, len(centroids)), 'Latitude': lat_array,'Longitude': lon_array,'X': xs,'Y': ys})
print("Location size: {}".format(locations.shape[0]))
locations.head(10)

Location size: 515


Unnamed: 0,index,Latitude,Longitude,X,Y
0,0,55.575832,37.587961,410975.340482,6159777.0
1,1,55.576133,37.614387,412642.007149,6159777.0
2,2,55.576429,37.640814,414308.673815,6159777.0
3,3,55.576719,37.667242,415975.340482,6159777.0
4,4,55.588018,37.521409,406808.673815,6161221.0
5,5,55.588334,37.547843,408475.340482,6161221.0
6,6,55.588644,37.574278,410142.007149,6161221.0
7,7,55.588949,37.600713,411808.673815,6161221.0
8,8,55.589248,37.627148,413475.340482,6161221.0
9,9,55.589541,37.653584,415142.007149,6161221.0


### Using Foursquare API

Now we are ready to query Foursquare API for venue data.

In [196]:
# @hidden cell
CLIENT_ID = 'OOUVZP5UUDLAVGILSUR1CUYV4MUXOS5VHH4KM4DRBBMXYJB4' # your Foursquare ID
CLIENT_SECRET = 'ZOH4REJZN3UBF4SSK1VBYPZUIZDMO3OFIV3RUAGZWFR1RJST' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT=200

In [197]:
#the category ids are taken from https://developer.foursquare.com/docs/build-with-foursquare/categories/

business_center_category = '56aa371be4b08b9a8d573517'
food_category = '4d4b7105d754a06374d81259'

#we define a function to query data by specific category
def get_nearby_venues(indices, category, latitudes, longitudes, radius):
    
    venues_list=[]
    for index, lat, lng in zip(indices, latitudes, longitudes):
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&categoryId={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            category,
            lat, 
            lng, 
            radius, 
            LIMIT)
        try:
            
            # make the GET request
            results = requests.get(url).json()["response"]['groups'][0]['items']
        
            # return only relevant information for each nearby venue
            venues_list.append([(
                index, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])
        except:
            print("failed to receive venue info")

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['index', 
                  'Latitude', 
                  'Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [198]:
#first, let's get business centers within each circle
try:
    business_venues = pd.read_csv(r'./business_venues.csv')
except:
    business_venues = get_nearby_venues(locations['index'],business_center_category, locations['Latitude'], locations['Longitude'], centroid_distance/2)
    business_venues.to_csv(r'./business_venues.csv')
business_venues.head()

Unnamed: 0.1,Unnamed: 0,index,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,0,1,55.576133,37.614387,БЦ Аннино Плаза,55.579225,37.605989,Business Center
1,1,8,55.589248,37.627148,"БЦ ""БИРПАРК""",55.583648,37.621206,Business Center
2,2,15,55.601142,37.534143,Мореон Офисная Часть,55.597564,37.527472,Business Center
3,3,17,55.601763,37.587029,"ТОЦ ""Селектика"" / Selectica MFC",55.598147,37.596825,Business Center
4,4,17,55.601763,37.587029,Бизнес-парк «Solutions» (корпус 2),55.60097,37.599675,Business Center


In [199]:
#now, let's get restaurants within each circle
try:
    restaurant_venues = pd.read_csv(r'./restaurant_venues.csv')
except:
    restaurant_venues = get_nearby_venues(locations['index'],food_category, locations['Latitude'], locations['Longitude'], centroid_distance/2)
    restaurant_venues.to_csv(r'./restaurant_venues.csv')
restaurant_venues.head()

Unnamed: 0.1,Unnamed: 0,index,Latitude,Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,0,0,55.575832,37.587961,Ильгар Д,55.575318,37.600407,Café
1,1,0,55.575832,37.587961,Шалахо,55.568841,37.587528,Middle Eastern Restaurant
2,2,0,55.575832,37.587961,БентоWOK,55.570334,37.580642,Chinese Restaurant
3,3,0,55.575832,37.587961,The 33 loft Grill&Bar,55.575008,37.579573,BBQ Joint
4,4,0,55.575832,37.587961,Хинкали от души,55.57467,37.579529,Café


In [200]:
print("Business centers found: {}".format(business_venues.shape[0]))
print("Restaurants found: {}".format(restaurant_venues.shape[0]))

Business centers found: 730
Restaurants found: 7579


Not let's place our data on the map to visualize it. The blue dots are business centers, the red ones are restaurants.

In [201]:
moscow_map = folium.Map(location=[moscow_latitude, moscow_longitude], zoom_start=11)
folium.Marker([moscow_latitude, moscow_longitude], popup='Red Square').add_to(moscow_map)
for index, row in business_venues.iterrows() :
    folium.Circle([row['Venue Latitude'], row['Venue Longitude']], 1 , color='blue', fill=False).add_to(moscow_map)
for index, row in restaurant_venues.iterrows() :
    folium.Circle([row['Venue Latitude'], row['Venue Longitude']], 1 , color='red', fill=False).add_to(moscow_map)
moscow_map

Now we have all the data required, moving to the next phase.

## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of Mosckw that have low restaurant density and having at least one business center within 10 minute walk radius.

In first step we have collected the required **data: location of  restaurants and business centers in Moscow**.

Second step in our analysis will be calculation and exploration of '**restaurant density**' and **business centers** distribution across different areas of Moscow - we will use **heatmaps** to identify a few promising areas.

Finally we will calculate the score of each area and detect the **top-10** areas. Some possible ways to alter the selection criteria will be siggested.

## Analysis <a name="analysis"></a>

Let's explore aour data. First, we build a **heatmap** of the region with business centers marked on it.

In [202]:
from folium import plugins
from folium.plugins import HeatMap

restaurant_latlons = restaurant_venues[['Venue Latitude','Venue Longitude']].values
moscow_map = folium.Map(location=[moscow_latitude, moscow_longitude], zoom_start=11)
HeatMap(restaurant_latlons.tolist()).add_to(moscow_map)
folium.Marker([moscow_latitude, moscow_longitude], popup='Red Square').add_to(moscow_map)
for index, row in business_venues.iterrows() :
    folium.Circle([row['Venue Latitude'], row['Venue Longitude']], 1 , color='blue', fill=False).add_to(moscow_map)
moscow_map

The zone to **south-west** from the center looks promising: there is a pocket with low food venues count and several business centers in it.

Let's filter our data a bit. We count business venues and restaurant venues within each circle and normalize it by dividing it by its max value. Note that the factors work int the opposite directions: the **more business** centers the better, the **less restaurants** the better.

In [203]:
business_count = business_venues.groupby('index')['Venue'].count().to_frame()
restaurant_count = restaurant_venues.groupby('index')['Venue'].count().to_frame()
restaurant_count.columns = ['restaurant_venue_count']
business_count.columns = ['business_venue_count']
dataframe = locations.set_index('index').join(business_count).join(restaurant_count).fillna(0)
max_business_count = dataframe['business_venue_count'].max()
max_restaurant_count = dataframe['restaurant_venue_count'].max()
print("Max business venue count: {}".format(max_business_count))
print("Max restaurant venue count: {}".format(max_restaurant_count))
dataframe['business_frac'] = dataframe['business_venue_count']/max_business_count;
dataframe['restaurant_frac'] = dataframe['restaurant_venue_count']/max_business_count;
dataframe.head()

Max business venue count: 22.0
Max restaurant venue count: 100.0


Unnamed: 0_level_0,Latitude,Longitude,X,Y,business_venue_count,restaurant_venue_count,business_frac,restaurant_frac
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
0,55.575832,37.587961,410975.340482,6159777.0,0.0,14.0,0.0,0.636364
1,55.576133,37.614387,412642.007149,6159777.0,1.0,6.0,0.045455,0.272727
2,55.576429,37.640814,414308.673815,6159777.0,0.0,2.0,0.0,0.090909
3,55.576719,37.667242,415975.340482,6159777.0,0.0,6.0,0.0,0.272727
4,55.588018,37.521409,406808.673815,6161221.0,0.0,3.0,0.0,0.136364


Now we retain only points having business centers around

In [204]:
dataframe_filtered = dataframe[dataframe['business_venue_count']>0]
print("remaining poitn count: {}".format(dataframe_filtered.shape[0]))

remaining poitn count: 224


Let's show them on the map with their business centers

In [205]:
moscow_map = folium.Map(location=[moscow_latitude, moscow_longitude], zoom_start=10)
folium.Marker([moscow_latitude, moscow_longitude], popup='Red Square').add_to(moscow_map)
for index, row in dataframe_filtered.iterrows() :
    lat = row['Latitude']
    lon = row['Longitude']
    folium.Circle([lat, lon], centroid_distance/2 , color='green', fill=False).add_to(moscow_map)
for index, row in business_venues.iterrows() :
    folium.Circle([row['Venue Latitude'], row['Venue Longitude']], 1 , color='blue', fill=False).add_to(moscow_map)
moscow_map

In [206]:
#let's take top 10 locations and place them on the map
dataframe_sorted = dataframe_filtered.sort_values(by=['restaurant_frac', 'business_venue_count'], ascending =(True, False))
top10 = dataframe_sorted.head(10)
top10

Unnamed: 0_level_0,Latitude,Longitude,X,Y,business_venue_count,restaurant_venue_count,business_frac,restaurant_frac
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
144,55.690753,37.437988,401808.673815,6172768.0,3.0,0.0,0.136364,0.0
390,55.824315,37.778511,423475.340482,6187202.0,2.0,0.0,0.090909,0.0
489,55.889012,37.763155,422642.007149,6194418.0,1.0,1.0,0.045455,0.045455
8,55.589248,37.627148,413475.340482,6161221.0,1.0,3.0,0.045455,0.136364
79,55.65169,37.426299,400975.340482,6168438.0,1.0,3.0,0.045455,0.136364
207,55.721349,37.861332,428475.340482,6175655.0,1.0,3.0,0.045455,0.136364
253,55.747034,37.834032,426808.673815,6178541.0,1.0,3.0,0.045455,0.136364
141,55.689716,37.358484,396808.673815,6172768.0,5.0,4.0,0.227273,0.181818
18,55.602065,37.613473,412642.007149,6162664.0,3.0,4.0,0.136364,0.181818
311,55.782331,37.500769,405975.340482,6182871.0,3.0,4.0,0.136364,0.181818


In [207]:
moscow_map = folium.Map(location=[moscow_latitude, moscow_longitude], zoom_start=10)
folium.Marker([moscow_latitude, moscow_longitude], popup='Red Square').add_to(moscow_map)
for index, row in top10.iterrows() :
    lat = row['Latitude']
    lon = row['Longitude']
    folium.Circle([lat, lon], centroid_distance/2 , color='green', fill=False).add_to(moscow_map)
for index, row in business_venues.iterrows() :
    folium.Circle([row['Venue Latitude'], row['Venue Longitude']], 1 , color='blue', fill=False).add_to(moscow_map)
moscow_map

We have received top-10 results, let's rerverse geocode them to make our suggestions more human readable:

In [208]:
addrs = [];
for index, row in top10.iterrows() :
    lat = row['Latitude']
    lon = row['Longitude']
    location = geolocator.reverse([lat, lon])
    addrs.append(location.address)
addrs

['24 с1, улица Генерала Дорохова, Очаково-Матвеевское, район Очаково-Матвеевское, Западный административный округ, Москва, Центральный федеральный округ, 119530, Россия',
 '6 с1, Вербная улица, Метрогородок, район Метрогородок, Восточный административный округ, Москва, Центральный федеральный округ, 107143, Россия',
 'проезд №2, СНТ «Мосводоканал-2», Мытищи, городской округ Мытищи, Московская область, Центральный федеральный округ, 141011, Россия',
 'Ступинский проезд, Бирюлёво Западное, район Бирюлёво Западное, Южный административный округ, Москва, Центральный федеральный округ, 117403, Россия',
 'Говорово, Новомосковский административный округ, Москва, Центральный федеральный округ, 119620, Россия',
 '16, Ветлужская улица, Косино, район Косино-Ухтомский, Восточный административный округ, Москва, Центральный федеральный округ, 111622, Россия',
 '8Б, Фрязевская улица, Южное Измайлово, район Ивановское, Восточный административный округ, Москва, Центральный федеральный округ, 111396, Рос

In [209]:
i=0
for index, row in top10.iterrows():
    print('#{}: {}'.format(i+1, translit(addrs[i], 'ru', reversed=True)))
    i=i+1
    

#1: 24 s1, ulitsa Generala Dorohova, Ochakovo-Matveevskoe, rajon Ochakovo-Matveevskoe, Zapadnyj administrativnyj okrug, Moskva, Tsentral'nyj federal'nyj okrug, 119530, Rossija
#2: 6 s1, Verbnaja ulitsa, Metrogorodok, rajon Metrogorodok, Vostochnyj administrativnyj okrug, Moskva, Tsentral'nyj federal'nyj okrug, 107143, Rossija
#3: proezd №2, SNT «Mosvodokanal-2», Mytischi, gorodskoj okrug Mytischi, Moskovskaja oblast', Tsentral'nyj federal'nyj okrug, 141011, Rossija
#4: Stupinskij proezd, Birjulevo Zapadnoe, rajon Birjulevo Zapadnoe, Juzhnyj administrativnyj okrug, Moskva, Tsentral'nyj federal'nyj okrug, 117403, Rossija
#5: Govorovo, Novomoskovskij administrativnyj okrug, Moskva, Tsentral'nyj federal'nyj okrug, 119620, Rossija
#6: 16, Vetluzhskaja ulitsa, Kosino, rajon Kosino-Uhtomskij, Vostochnyj administrativnyj okrug, Moskva, Tsentral'nyj federal'nyj okrug, 111622, Rossija
#7: 8B, Frjazevskaja ulitsa, Juzhnoe Izmajlovo, rajon Ivanovskoe, Vostochnyj administrativnyj okrug, Moskva, Tse

This is the final list of our suggestions transliterated into English.

## Results and Discussion <a name="results"></a>

Our analisis shows that there are some areas with low restaurant dencity having business centers nearby. We extracted top-10 zones and generated suggestions.

Note that the selection criteria can be altered. We have a **pre-calculated values** for restaurant score and business center score. We can have a result score as a weighed sum of these values, if the stakeholders' preferences could be expressed in such way. Then we can sort our data in a different way and have another suggestion set.

There are some factores not considered here which can make significant impact on the final decision, such as rent value, transport avalability etc.

## Conclusion <a name="conclusion"></a>

The purpose of this project was to find a data-based suggestions of a new restaurant location in Moscow. We used Foursquare API to receive data about venues in the area and generated a set of locations to be considered. There are some ways to alter the decision criteria give in the discussion section.