# Coursera Capstone - Battle of the Neighborhoods

## Contents

* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



### Introduction <a name="introduction"></a>

This analysis is trying to answer the questions:

1) Where to open an Indian restaurant in Cambridge
2) Whether to open an Indian restaurant in an area of historically high house prices or whether to open in Central Cambridge. While Central Cambridge has high house prices, there are other areas that have similarly high house prices. 

This analysis can hopefully be replicated for a) Other restaurant types b) Other cities. 


## Data <a name="data"></a>

The factors that will impact our analysis are :

- The number of existing restaurants in a mircrolocation 

- The number and distance of India restaurants in the neighborhood

- The distance of the neighborhood from the city center. 

I used a regularly space location grid centered around a key location in the city center. 

The following data source will be needed to generate the information and analysis required: 

## Data Sources :

### Venue Data

Foursquare.com 
#Location and Borough Data
Geonames : Cambridge Borough Data Set and GPS data Downloaded from Geonames (in the UK_full zip file)
http://download.geonames.org/export/zip/

### Housing Price Data

Downloaded for 2019 and 2020 from HMRC
https://www.gov.uk/government/statistical-data-sets/price-paid-data-downloads

* Centers of candidate areas will be generated algorithmically and approximate addresses of centers of those areas will be obtained using **geocoder*
* number of restaurants and their type and location in every neighborhood will be obtained using **Foursquare API**
* coordinate of Cambridge center will be obtained using **Google Maps API geocoding** of well known Cambrdige location (King's College)



### Methodology

Let's create latitude & longitude coordinates for centroids of our candidate neighborhoods. We will create a grid of cells covering our area of interest which is aprox. 12x12 killometers centered around Cambridge city center.

Let's first find the latitude & longitude of Cambridge city center, using specific, well known address and Google Maps geocoding API.

In [1]:
#import relevant libraries 

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


In [2]:
#Coordinates of Cambridge City center
address = 'Kings College - Cambridge, Kings Parade, Cambridge, CB2 1ST, England'

geolocator = Nominatim(user_agent="UK_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
cambridge_center = [latitude, longitude]
print('The geograpical coordinate of Kings College Cambridge is {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Kings College Cambridge is 52.2046053, 0.1178684.


Let's create a grid of area candidates, equally spaced and within 10km from the city center, defined as King's College Cambridge. 

The neighborhoods will be defined as circular areas with radius 25 meters and neighborhood centers 50 meters apart. 

To accurately calculate the distances we need in cartersian 2D to calculate distances in meters, not in longitude and latitude. We will then project the coordinates back onto the Folium map. 

In [3]:
#!pip install shapely
import shapely.geometry

#!pip install pyproj
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

print('Coordinate transformation check')
print('-------------------------------')
print('Cambridge center longitude={}, latitude={}'.format(cambridge_center[1], cambridge_center[0]))
x, y = lonlat_to_xy(cambridge_center[1], cambridge_center[0])
print('Cambridge center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('Cambridge center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
Cambridge center longitude=0.1178684, latitude=52.2046053
Cambridge center UTM X=-514044.69617906236, Y=5888899.0450353
Cambridge center longitude=0.11786840000000155, latitude=52.20460529999999


Let's create a hexagonal grid of cells: we offset every other row, and adjust vertical row spacing so that every cell center is equally distant from all it's neighbors.

In [129]:
cambridge_center_x, cambridge_center_y = lonlat_to_xy(cambridge_center[1], cambridge_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = cambridge_center_x - 200
x_step = 60
y_min = cambridge_center_y - 100 - (int(21/k)*k*60 - 300)/2
y_step = 60 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 30 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(cambridge_center_x, cambridge_center_y, x, y)
        if (distance_from_center <= 60001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

504 candidate neighborhood centers generated.


In [130]:
import folium

In [131]:
map_cambridge = folium.Map(location=cambridge_center, zoom_start=13)
folium.Marker(cambridge_center, popup='Kings College').add_to(map_cambridge)
for lat, lon in zip(latitudes, longitudes):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_berlin) 
    folium.Circle([lat, lon], radius=30, color='blue', fill=False).add_to(map_cambridge)
    #folium.Marker([lat, lon]).add_to(map_berlin)
map_cambridge

Obtain the GPS coordinates of these centers

In [132]:
# define the address function to get the coordinates of these centers. 
import requests
import geocoder # import geocoder


# Create dataframe for locations

import pandas as pd

df_locations = pd.DataFrame({'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head(5)


Unnamed: 0,Latitude,Longitude,X,Y,Distance from center
0,52.199312,0.117168,-514214.696179,5888326.0,598.20245
1,52.199421,0.118016,-514154.696179,5888326.0,583.991585
2,52.199531,0.118864,-514094.696179,5888326.0,575.713619
3,52.19964,0.119713,-514034.696179,5888326.0,573.625462
4,52.19975,0.120561,-513974.696179,5888326.0,577.794229


Now we will use Foursquare to get data on each micro neighborhood

In [133]:
CLIENT_ID = 'WJVTYI0RVSZPJPGXSUZUD2RK2NG4UGKRZQSZRRHZRJ3LE1JR' # your Foursquare ID
CLIENT_SECRET = '13QKB3R4FROLRF3KZN3LDB0BLGKTG0HT0E4OSZEOPY2HMBRH' # your Foursquare Secret
VERSION = '20180605'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: WJVTYI0RVSZPJPGXSUZUD2RK2NG4UGKRZQSZRRHZRJ3LE1JR
CLIENT_SECRET:13QKB3R4FROLRF3KZN3LDB0BLGKTG0HT0E4OSZEOPY2HMBRH


In [134]:
# Foursquare categories

food_category = '4d4b7105d754a06374d81259'

indian_restaurant_categories = ['4bf58dd8d48988d10f941735']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Deutschland', '')
    address = address.replace(', Germany', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [135]:
# Let's now go over our neighborhood locations and get nearby restaurants; we'll also maintain a dictionary of all found restaurants and all found indian restaurants

def get_restaurants(lats, lons):
    restaurants = {}
    indian_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, CLIENT_ID, CLIENT_SECRET, radius=350, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_indian = is_restaurant(venue_categories, specific_filter=indian_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_indian, x, y)
                if venue_distance<=60:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_indian:
                    indian_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, indian_restaurants, location_restaurants

# Try to load from local file system in case we did this before
restaurants = {}
indian_restaurants = {}
location_restaurants = []
loaded = False
try:
    with open('restaurants_350.pkl', 'rb') as f:
        restaurants = df_locations
    with open('italian_restaurants_350.pkl', 'rb') as f:
        italian_restaurants = df_locations
    with open('location_restaurants_350.pkl', 'rb') as f:
        location_restaurants = df_locations
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:
    restaurants, indian_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)
    


Obtaining venues around candidate locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

In [136]:
import numpy as np

print('Total number of restaurants:', len(restaurants))
print('Total number of Indian restaurants:', len(indian_restaurants))
print('Percentage of Italian restaurants: {:.2f}%'.format(len(indian_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 76
Total number of Indian restaurants: 6
Percentage of Italian restaurants: 7.89%
Average number of restaurants in neighborhood: 0.24404761904761904


In [137]:
print('List of all restaurants')
print('-----------------------')
for r in list(restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(restaurants))

List of all restaurants
-----------------------
('58374c8713bb7732ce82ebd1', 'Millworks', 52.19915286981001, 0.11420680169835393, 'United Kingdom', 348, False, -514418.92065132945, 5888350.11184846)
('4bfeb22ddaf9c9b6811af9ef', 'Sala Thong', 52.19925428092794, 0.11366324725012002, 'Newnham Road, Cambridge, Cambridgeshire, United Kingdom', 340, False, -514453.3890170463, 5888369.023609588)
('4ba14fd7f964a52081ab37e3', 'Loch Fyne', 52.200675061791166, 0.1196986708643755, '37 Trumpington St, Cambridge, Cambridgeshire, CB2 1QY, United Kingdom', 349, False, -514011.70910891495, 5888439.785625388)
('4bab32c8f964a52001993ae3', 'The Rice Boat', 52.198979, 0.1135486364364624, '37 Newnham Road (On corner), Cambridge, CB3 9EY, United Kingdom', 327, True, -514467.525822795, 5888340.310179618)
('4ce59dbae888f04dc350316b', 'Japas Bento Box Sushi Restaurant', 52.1977906326135, 0.12244706754415642, '9 Saxon St0 (Brookside), Cambridge, Cambridgeshire, CB2 1HN, United Kingdom', 344, False, -513892.30340

In [138]:
print('List of Indian restaurants')
print('---------------------------')
for r in list(indian_restaurants.values())[:10]:
    print(r)
print('...')
print('Total:', len(indian_restaurants))

List of Indian restaurants
---------------------------
('4bab32c8f964a52001993ae3', 'The Rice Boat', 52.198979, 0.1135486364364624, '37 Newnham Road (On corner), Cambridge, CB3 9EY, United Kingdom', 327, True, -514467.525822795, 5888340.310179618)
('5a7d90d6112c6c7084e5e70b', 'The Tiffin Truck', 52.201351, 0.125443, 'Cambridge, Cambridgeshire, CB2 1DP, United Kingdom', 349, True, -513607.0010513052, 5888432.619014632)
('56ad0f0f498e2a4a622bcd84', 'Vedanta', 52.200027, 0.126456, '92 Regent Street, Cambridge, Cambridgeshire, CB2 1DP, United Kingdom', 349, True, -513569.0197576112, 5888272.273121425)
('4c869400d92ea093879f6f72', 'Kohinoor Tandoori Restaurant', 52.200388, 0.136343, 'Mill Rd, Cambridge, Cambridgeshire, United Kingdom', 341, True, -512890.9988799854, 5888171.573269724)
('54f2e608498eeb7367105ae7', 'Navadhanya', 52.20847076717482, 0.1362798733081952, '73 Newmarket Road, Cambridge, Cambridgeshire, United Kingdom', 337, True, -512708.40539239114, 5889063.45044746)
('4d7122c5a8d

In [139]:
map_cambridge = folium.Map(location=cambridge_center, zoom_start=13)
folium.Marker(cambridge_center, popup='Cambridge').add_to(map_cambridge)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_cambridge = res[6]
    color = 'red' if is_cambridge else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_cambridge)
map_cambridge

We now know which restaurants are within a few kilometers of the city center. We also know which ones are indian restaurants and under which microneighborhood they reside under.

We will now begin the data analysis. 

# Methodology


We have collected the location and type data for every restaurant around Cambridge city center. 

We have also identified the indian restaurants around cambridge city center. 

We will first look at restaurant density across different parts of cambridge city center. We will further use heatmaps to determine key areas near the city center with 1) Low density of restaurants 2) Low density of Indian restaurants. 

We will then focus on promising areas and create clusters that meet basic requirements including no more than 2 restaurants within 50 meters and no indian restaurants within 500 meters. 

We will then present a map of all such locations and create clusters using k-means clustering of these locations to identify general zones/neighborhoods which should be a starting point for street level exploration by local stakeholders. 



# Analysis

In [140]:
# Basic exploratory data analysis and counting every restaurant. 

location_restaurants_count = [len(res) for res in location_restaurants]

df_locations['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=60m:', np.array(location_restaurants_count).mean())

df_locations.head(5)

Average number of restaurants in every area with radius=60m: 0.24404761904761904


Unnamed: 0,Latitude,Longitude,X,Y,Distance from center,Restaurants in area
0,52.199312,0.117168,-514214.696179,5888326.0,598.20245,0
1,52.199421,0.118016,-514154.696179,5888326.0,583.991585,0
2,52.199531,0.118864,-514094.696179,5888326.0,575.713619,0
3,52.19964,0.119713,-514034.696179,5888326.0,573.625462,0
4,52.19975,0.120561,-513974.696179,5888326.0,577.794229,0


In [141]:
# Now calculate the distance to the nearest indian restaurant from every micro neighborhood.

distances_to_indian_restaurant = []

for area_x, area_y in zip(xs, ys):
    min_distance = 10000
    for res in indian_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_indian_restaurant.append(min_distance)

df_locations['Distance to Indian restaurant'] = distances_to_indian_restaurant

In [142]:
df_locations.head(5)

Unnamed: 0,Latitude,Longitude,X,Y,Distance from center,Restaurants in area,Distance to Indian restaurant
0,52.199312,0.117168,-514214.696179,5888326.0,598.20245,0,253.262651
1,52.199421,0.118016,-514154.696179,5888326.0,583.991585,0,313.179705
2,52.199531,0.118864,-514094.696179,5888326.0,575.713619,0,373.123418
3,52.19964,0.119713,-514034.696179,5888326.0,573.625462,0,433.08272
4,52.19975,0.120561,-513974.696179,5888326.0,577.794229,0,382.978779


In [143]:
print('Average distance to closest Indian restaurant from each area center:', df_locations['Distance to Indian restaurant'].mean())

Average distance to closest Indian restaurant from each area center: 366.28719551621185


Most indian restaurants are relatively close. 


We can now create a heatmap / density of restaurants and a few circles highlighting distance from Cambridge city center. 

In [144]:
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]

indian_latlons = [[res[2], res[3]] for res in indian_restaurants.values()]

In [145]:
from folium import plugins
from folium.plugins import HeatMap

map_cambridge = folium.Map(location=cambridge_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_cambridge) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_cambridge)
folium.Marker(cambridge_center).add_to(map_cambridge)
folium.Circle(cambridge_center, radius=1000, fill=False, color='blue').add_to(map_cambridge)
folium.Circle(cambridge_center, radius=2000, fill=False, color='blue').add_to(map_cambridge)
folium.Circle(cambridge_center, radius=3000, fill=False, color='blue').add_to(map_cambridge)

map_cambridge

We can create another heatmap showing indian restaurants only


In [146]:
map_cambridge_indian = folium.Map(location=cambridge_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_cambridge_indian) #cartodbpositron cartodbdark_matter
HeatMap(indian_latlons).add_to(map_cambridge_indian)
folium.Marker(cambridge_center).add_to(map_cambridge_indian)
folium.Circle(cambridge_center, radius=1000, fill=False, color='white').add_to(map_cambridge_indian)
folium.Circle(cambridge_center, radius=2000, fill=False, color='white').add_to(map_cambridge_indian)
folium.Circle(cambridge_center, radius=3000, fill=False, color='white').add_to(map_cambridge_indian)

map_cambridge_indian

This shows that there are very few indian restaurants in the center of cambridge

In [147]:
#New Grid location candidates

# Defining new location candidates
roi_x_min = cambridge_center_x - 1000
roi_y_max = cambridge_center_y + 500
roi_width = 2000
roi_height = 2000
roi_center_x = roi_x_min + 0
roi_center_y = roi_y_max - 0
roi_center_lon, roi_center_lat = xy_to_lonlat(roi_center_x, roi_center_y)
roi_center = [roi_center_lat, roi_center_lon]

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_step = 100
y_step = 100 * k 
roi_y_min = roi_center_y - 3000

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, int(51/k)):
    y = roi_y_min + i * y_step
    x_offset = 50 if i%2==0 else 0
    for j in range(0, 51):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(roi_center_x, roi_center_y, x, y)
        if (d <= 2501):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')




1085 candidate neighborhood centers generated.


In [148]:
# Now we can calculate the number of restaurants in the vicinty. 

def count_restaurants_nearby(x, y, restaurants, radius=250):    
    count = 0
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=radius:
            count += 1
    return count

def find_nearest_restaurant(x, y, restaurants):
    d_min = 100000
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=d_min:
            d_min = d
    return d_min

roi_restaurant_counts = []
roi_indian_distances = []

print('Generating data on location candidates... ', end='')
for x, y in zip(roi_xs, roi_ys):
    count = count_restaurants_nearby(x, y, restaurants, radius=250)
    roi_restaurant_counts.append(count)
    distance = find_nearest_restaurant(x, y, indian_restaurants)
    roi_indian_distances.append(distance)
print('done.')

Generating data on location candidates... done.


In [149]:
#Including into a dataframe
df_roi_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Restaurants nearby':roi_restaurant_counts,
                                 'Distance to Indian restaurant':roi_indian_distances})

df_roi_locations.head(10)

Unnamed: 0,Latitude,Longitude,X,Y,Restaurants nearby,Distance to Indian restaurant
0,52.185665,0.110318,-514994.696179,5886919.0,0,1516.244383
1,52.185847,0.111731,-514894.696179,5886919.0,0,1484.440284
2,52.18603,0.113145,-514794.696179,5886919.0,0,1458.810778
3,52.186326,0.109354,-515044.696179,5887005.0,0,1454.467971
4,52.186508,0.110767,-514944.696179,5887005.0,0,1417.759855
5,52.186691,0.112181,-514844.696179,5887005.0,0,1387.302756
6,52.186873,0.113594,-514744.696179,5887005.0,0,1363.515627
7,52.187056,0.115008,-514644.696179,5887005.0,0,1346.751942
8,52.187238,0.116422,-514544.696179,5887005.0,0,1337.275859
9,52.187421,0.117835,-514444.696179,5887005.0,0,1335.242544


In [150]:
#Now filter these with no restaurants within 100 meters and no indian restaurants within 400 meters. 

good_res_count = np.array((df_roi_locations['Restaurants nearby']<=2))
print('Locations with no more than two restaurants nearby:', good_res_count.sum())

good_ind_distance = np.array(df_roi_locations['Distance to Indian restaurant']>=500)
print('Locations with no Indian restaurants within 500m:', good_ind_distance.sum())

good_locations = np.logical_and(good_res_count, good_ind_distance)
print('Locations with both conditions met:', good_locations.sum())

df_good_locations = df_roi_locations[good_locations]


Locations with no more than two restaurants nearby: 905
Locations with no Indian restaurants within 500m: 713
Locations with both conditions met: 699


In [151]:
#Looking on a map

good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_cambridge = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_cambridge)
HeatMap(restaurant_latlons).add_to(map_cambridge)
folium.Circle(roi_center, radius=1000, color='white', fill=True, fill_opacity=0.6).add_to(map_berlin)
folium.Marker(cambridge_center).add_to(map_cambridge)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_cambridge) 

map_cambridge

In [152]:
#Let's cluster the restaurants. 
from sklearn.cluster import KMeans

number_of_clusters = 10

good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

map_cambridge = folium.Map(location=roi_center, zoom_start=14)
folium.TileLayer('cartodbpositron').add_to(map_cambridge)
HeatMap(restaurant_latlons).add_to(map_cambridge)
folium.Circle(roi_center, radius=2500, color='white', fill=True, fill_opacity=0.4).add_to(map_cambridge)
folium.Marker(roi_center).add_to(map_cambridge)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=60, color='green', fill=True, fill_opacity=0.25).add_to(map_cambridge) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_cambridge)

map_cambridge

In [153]:
# 

### Results

*Results section where you discuss the results*

Our results indicate that town center is indeed a crowded area to open an indian restaurant, both due to the total number of restaurants nearby and the number of Indian restaurants. 

Using K-Means, it can be seen that there are clusters near town center that may be feasible to open a restaurant. These include :
- Around Grange Road
- Further down Hills Road
- Around Milton Road



### Discussion

*Discussion section where you discuss any observations you noted and any recommendations you can make based on the results.*

While this showed somewhat predictable information including that the city center was the most crowded part of Cambridge it also highlighted some not so clear information, specifically, around favorable spots to open up a restaurant. 

The rents in central cambridge are notoriously high, with many restaurants going out of business due to the high rent. Being able to take a data driven approach is vital in ensuring adequate consideration is given to locations that may not intuitively feasible but are potentially attractive opportunities based on data analysis. 

While the analysis is not a complete substitute for intuitive street level knowledge, it should be able to complement an individuals approach to selecting a good place to open a restaurant. 

### Conclusion

*Conclusion section where you conclude the report.*


The conclusion notes that there are a few good places to open up a restaurant that is not in very central cambridge. Namely, 

- Around Grange Road
- Further down Hills Road
- Around Milton Road

These three areas provide a data driven opportunity, particularly for owners that want to open up an Indian restaurant. 

The results do validate the hypothesis that central cambridge is overcrowded in terms of restaurants, however, it also shows that there are few Indian restaurants in central Cambridge, with many of them located further away. 

Data Attributions :

Contains HM Land Registry data © Crown copyright and database right 2020. This data is licensed under the Open Government Licence v3.0.

Parts of this code was taken from another coursera project that was posted as an example to the assignment. 

