# Capstone Project - The Battle of the Neighborhoods

### IBM Data Science Certification ~ Coursera

## Introduction: The Business Problem

Our aim for this project is to find suitable locations to open an Italian restaurant in New Delhi, India. In order to do this we will try to find areas that aren't too crowded with restaurants in general, as well as having no Italian restaurants in its vicintiy. On top of this we don't want the locations to be too far away from the city center.

## Data

The data the we need according to our stated business problem is:
* Number of restuarants in a neighborhood.
* Number and distance of Italian restuants in a neighborhood.
* Distance of neighborhood from city center.

In order to define the aforementioned neighborhoods, a circular grid system centered around the city center was used. 

To obtain the data, the following data sources are required:
* The neighborhood locations will be generated algorithmically, and corresponding addresses will be acquired using the Geopy library. 
* The data on restaurants will be obtained using the Foursquare API.

### Importing necessary dependencies

In [45]:
#!pip install geocoder
#!pip install geopy
from geopy.geocoders import Nominatim
import requests
import geocoder
from sklearn.cluster import KMeans
#!pip install folium
import folium
import pandas as pd
import numpy as np
#!pip install shapely
import shapely.geometry
#!pip install pyproj
import pyproj
import math
import pickle

We need to create coordinates for the cenroids of the neighborhoods, in order to do this we first need to obtain the coordinates of New Delhi's center point. This is done below using Geopy:

In [3]:
address = 'New Delhi, India'

def get_cords(address):
    geolocator = Nominatim(user_agent="nd_explorer")
    location = geolocator.geocode(address)
    lat = location.latitude
    long = location.longitude
    return [lat, long]

delhi_center = get_cords(address)

print('The geograpical coordinates of {} are {}'.format(address, delhi_center))

The geograpical coordinates of New Delhi, India are [28.6141793, 77.2022662]


Now in order to create the circular grid of neighborhoods we need to define fucntions that will convert WGS84 spherical coordinates to UTM Cartesian coordinates. 

After this we will need to algorihmically calculate the centroids of these neighborhoods so that they cover an approximate ~20km radius around the center of New Delhi. These neighborhoods will be 600m apart with a radius of 300m.

In [4]:
def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=33, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

In [5]:
print('Coordinate transformation check')
print('-------------------------------')
print('New Delhi center longitude={}, latitude={}'.format(delhi_center[1], delhi_center[0]))
x, y = lonlat_to_xy(delhi_center[1], delhi_center[0])
print('New Delhi center UTM X={}, Y={}'.format(x, y))
lo, la = xy_to_lonlat(x, y)
print('New Delhi center longitude={}, latitude={}'.format(lo, la))

Coordinate transformation check
-------------------------------
New Delhi center longitude=77.2022662, latitude=28.6141793
New Delhi center UTM X=7113985.144928564, Y=5496313.098959174
New Delhi center longitude=77.20226619999885, latitude=28.614179300001304


In [6]:
delhi_center_x, delhi_center_y = lonlat_to_xy(delhi_center[1], delhi_center[0]) # City center in Cartesian coordinates

a = 1.6
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = delhi_center_x - 20000
x_step = 600 * a
y_min = delhi_center_y - 20000 - (int(84/k)*k*1200 - 24000)/4
y_step = 600 * a * k

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []
for i in range(0, int(84/k)):
    y = y_min + i * y_step
    x_offset = (300 * a) if i%2==0 else 0
    for j in range(0, 84):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(delhi_center_x, delhi_center_y, x, y)
        if (distance_from_center <= 20001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

print(len(latitudes), 'candidate neighborhood centers generated.')

1572 candidate neighborhood centers generated.


We can now visualize the generated neightborhoods using folium:

In [7]:
map_delhi = folium.Map(location=delhi_center, zoom_start=11)
folium.Marker(delhi_center, popup='New Delhi').add_to(map_delhi)
for lat, lon in zip(latitudes, longitudes):
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(map_delhi)
map_delhi

Now we can use the Geopy library to reverse geocode in order to get the addresses of the generated neighborhoods.

In [8]:
def get_address(latitude, longitude, verbose=False):
    try:
        geolocator = Nominatim(user_agent="nd_explorer")
        address = geolocator.reverse([latitude, longitude])[0]
        return address
    except:
        return None

get_address(delhi_center[0], delhi_center[1])

'Central Secretariat, Chanakya Puri Tehsil, New Delhi, Delhi, India'

In [9]:
print('Obtaining location addresses: ', end='')
addresses = []
for lat, lon in zip(latitudes, longitudes):
    address = get_address(lat, lon)
    if address is None:
        address = 'NO ADDRESS'
    address = address.replace(', India', '') # We don't need country part of address
    addresses.append(address)
    print(' .', end='')
print(' done.')

Obtaining location addresses:  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

In [10]:
df_locations = pd.DataFrame({'Address': addresses,
                             'Latitude': latitudes,
                             'Longitude': longitudes,
                             'X': xs,
                             'Y': ys,
                             'Distance from center': distances_from_center})

df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center
0,"Rangpuri, Vasant Vihar Tehsil, New Delhi, Delh...",28.537689,77.107837,7112225.0,5476493.0,19897.681774
1,"DeeMarks, Delhi-Gurugram Expressway, Rangpuri,...",28.534016,77.112409,7113185.0,5476493.0,19835.829702
2,"West End Greens, Rangpuri, Vasant Vihar Tehsil...",28.530343,77.116981,7114145.0,5476493.0,19820.336525
3,"West End Greens, Rangpuri, Vasant Vihar Tehsil...",28.52667,77.121551,7115105.0,5476493.0,19851.310787
4,"Bana Singh Enclave, Vasant Kunj, Rangpuri, Vas...",28.522998,77.126121,7116065.0,5476493.0,19928.535821
5,"Rangpuri, Vasant Vihar Tehsil, New Delhi, Delh...",28.557722,77.090856,7107905.0,5477325.0,19937.958198
6,"Rangpuri, Vasant Vihar Tehsil, New Delhi, Delh...",28.554047,77.095432,7108865.0,5477325.0,19666.473429
7,"Rangpuri, Vasant Vihar Tehsil, New Delhi, Delh...",28.550372,77.100007,7109825.0,5477325.0,19438.656773
8,"Vasant Kunj, Rangpuri, Vasant Vihar Tehsil, Ne...",28.546697,77.104581,7110785.0,5477325.0,19256.058193
9,"Vasant Kunj, Rangpuri, Vasant Vihar Tehsil, Ne...",28.543023,77.109154,7111745.0,5477325.0,19119.973251


In [11]:
df_locations.to_pickle('./locations.pkl')

### Foursquare

Now that we have our neighborhoods, we can use the Foursquare API to get the required data about the restuarants in these neighborhoods. This is done by specifying the food category, as well as the specific types of Italian restaurants within the food category. This is done by defining the respective Foursquare identifciation tags as shown below.

In [49]:
CLIENT_ID = 'PUAMDARM4UCW1QWHJJFU55NNBN2DVDXDL3VSFU1JOQKKUIXK'
CLIENT_SECRET = 'TRWWFQCAA3ZENY3ZIDDR1AHON1MS1MCDO2TZ3ITLTXKVY3SD'

In [50]:
food_category = '4d4b7105d754a06374d81259' # 'Root' category for all food-related venues

italian_restaurant_categories = ['4bf58dd8d48988d110941735','55a5a1ebe4b013909087cbb6','55a5a1ebe4b013909087cb7c',
                                 '55a5a1ebe4b013909087cba7','55a5a1ebe4b013909087cba1','55a5a1ebe4b013909087cba4',
                                 '55a5a1ebe4b013909087cb95','55a5a1ebe4b013909087cb89','55a5a1ebe4b013909087cb9b',
                                 '55a5a1ebe4b013909087cb98','55a5a1ebe4b013909087cbbf','55a5a1ebe4b013909087cb79',
                                 '55a5a1ebe4b013909087cbb0','55a5a1ebe4b013909087cbb3','55a5a1ebe4b013909087cb74',
                                 '55a5a1ebe4b013909087cbaa','55a5a1ebe4b013909087cb83','55a5a1ebe4b013909087cb8c',
                                 '55a5a1ebe4b013909087cb92','55a5a1ebe4b013909087cb8f','55a5a1ebe4b013909087cb86',
                                 '55a5a1ebe4b013909087cbb9','55a5a1ebe4b013909087cb7f','55a5a1ebe4b013909087cbbc',
                                 '55a5a1ebe4b013909087cb9e','55a5a1ebe4b013909087cbc2','55a5a1ebe4b013909087cbad']

def is_restaurant(categories, specific_filter=None):
    restaurant_words = ['restaurant', 'diner', 'taverna', 'steakhouse']
    restaurant = False
    specific = False
    for c in categories:
        category_name = c[0].lower()
        category_id = c[1]
        for r in restaurant_words:
            if r in category_name:
                restaurant = True
        if 'fast food' in category_name:
            restaurant = False
        if not(specific_filter is None) and (category_id in specific_filter):
            specific = True
            restaurant = True
    return restaurant, specific

def get_categories(categories):
    return [(cat['name'], cat['id']) for cat in categories]

def format_address(location):
    address = ', '.join(location['formattedAddress'])
    address = address.replace(', Delhi', '')
    address = address.replace(', India', '')
    return address

def get_venues_near_location(lat, lon, category, client_id, client_secret, radius=500, limit=100):
    version = '20180724'
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
        client_id, client_secret, version, lat, lon, category, radius, limit)
    try:
        results = requests.get(url).json()['response']['groups'][0]['items']
        venues = [(item['venue']['id'],
                   item['venue']['name'],
                   get_categories(item['venue']['categories']),
                   (item['venue']['location']['lat'], item['venue']['location']['lng']),
                   format_address(item['venue']['location']),
                   item['venue']['location']['distance']) for item in results]        
    except:
        venues = []
    return venues

In [51]:
def get_restaurants(lats, lons):
    restaurants = {}
    italian_restaurants = {}
    location_restaurants = []

    print('Obtaining venues around candidate locations:', end='')
    for lat, lon in zip(lats, lons):
        # Using radius=350 to meke sure we have overlaps/full coverage so we don't miss any restaurant (we're using dictionaries to remove any duplicates resulting from area overlaps)
        venues = get_venues_near_location(lat, lon, food_category, CLIENT_ID, CLIENT_SECRET, radius=600, limit=100)
        area_restaurants = []
        for venue in venues:
            venue_id = venue[0]
            venue_name = venue[1]
            venue_categories = venue[2]
            venue_latlon = venue[3]
            venue_address = venue[4]
            venue_distance = venue[5]
            is_res, is_italian = is_restaurant(venue_categories, specific_filter=italian_restaurant_categories)
            if is_res:
                x, y = lonlat_to_xy(venue_latlon[1], venue_latlon[0])
                restaurant = (venue_id, venue_name, venue_latlon[0], venue_latlon[1], venue_address, venue_distance, is_italian, x, y)
                if venue_distance<=300:
                    area_restaurants.append(restaurant)
                restaurants[venue_id] = restaurant
                if is_italian:
                    italian_restaurants[venue_id] = restaurant
        location_restaurants.append(area_restaurants)
        print(' .', end='')
    print(' done.')
    return restaurants, italian_restaurants, location_restaurants

In [52]:
restaurants = {}
italian_restaurants = {}
location_restaurants = []

loaded = False
try:
    with open('restaurants_350_2.pkl', 'rb') as f:
        restaurants = pickle.load(f)
    with open('italian_restaurants_350_2.pkl', 'rb') as f:
        italian_restaurants = pickle.load(f)
    with open('location_restaurants_350_2.pkl', 'rb') as f:
        location_restaurants = pickle.load(f)
    print('Restaurant data loaded.')
    loaded = True
except:
    pass

# If load failed use the Foursquare API to get the data
if not loaded:

    restaurants, italian_restaurants, location_restaurants = get_restaurants(latitudes, longitudes)

        # Let's persists this in local file system
    with open('restaurants_350_2.pkl', 'wb') as f:
        pickle.dump(restaurants, f)
    with open('italian_restaurants_350_2.pkl', 'wb') as f:
        pickle.dump(italian_restaurants, f)
    with open('location_restaurants_350_2.pkl', 'wb') as f:
        pickle.dump(location_restaurants, f)    

Obtaining venues around candidate locations: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

In [53]:
print('Total number of restaurants:', len(restaurants))
print('Total number of Italian restaurants:', len(italian_restaurants))
print('Percentage of Italian restaurants: {:.2f}%'.format(len(italian_restaurants) / len(restaurants) * 100))
print('Average number of restaurants in neighborhood:', np.array([len(r) for r in location_restaurants]).mean())

Total number of restaurants: 1102
Total number of Italian restaurants: 62
Percentage of Italian restaurants: 5.63%
Average number of restaurants in neighborhood: 0.45038167938931295


In [54]:
print('List of Italian restaurants')
print('---------------------------')
i = 1
for r in list(italian_restaurants.values()):
    print('----{}----'.format(i))
    print(r[1],'--',r[4])
    i = i+1
print('...')
print('Total:', len(italian_restaurants))

List of Italian restaurants
---------------------------
----1----
bella italia -- India
----2----
Flaming Chilli Pepper -- 249, Aruna Asaf Ali Marg, Opposite Fortis Hospital, (Vasant Kunj), New Delhi 110070
----3----
Tonino -- India
----4----
'It' Italian Restaurant @ The Grand -- India
----5----
Domino's Pizza -- India
----6----
Jamie’s Italian -- India
----7----
Big Chill -- DLF Promenade, Vasant Kunj 110070
----8----
Italia -- 309 & 310, 2nd Floor, DLF Promenade Mall (Vasant Kunj), New Delhi
----9----
Cherie -- Kalka Das Marg (Near Qutub Minar)
----10----
Olive Bar & Kitchen -- One Style Mile, Mehrauli, New Delhi
----11----
FIO -- Garden of Five Senses, New Delhi
----12----
Pasta Xpress -- New Delhi 110058
----13----
Sartoria -- Vasant Vihar, New Delhi
----14----
Da Pizza Planet -- India
----15----
Evoo -- New Delhi 110017
----16----
Fat Lulu's -- SDA Market, New Delhi 110016
----17----
Pizzeria Rossa -- 26, Ground floor, Hauz Khas Village (Hauz khas village st), New Delhi
----18---

We can now visualize the restuarant data we've obtained on a map using folium. All restaurants are marked blue while Italian restaurants are marked red.

In [55]:
map_delhi = folium.Map(location=delhi_center, zoom_start=12)
folium.Marker(delhi_center, popup='New Delhi').add_to(map_delhi)
for res in restaurants.values():
    lat = res[2]; lon = res[3]
    is_italian = res[6]
    color = 'red' if is_italian else 'blue'
    folium.CircleMarker([lat, lon], radius=3, color=color, fill=True, fill_color=color, fill_opacity=1).add_to(map_delhi)
map_delhi

## Methodology

This project looks at the 20km radius around the center of New Delhi in order to identify areas that have a low restaurant density as well as a low number of Italian restaurants. 

Firstly, we have collected the required data: Location of all restaurants as well as specifically identifying which of those are Italian restaurants within 20km of the center of New Delhi.

Secondly, we will visualize this data as heatmaps in order to explore the density of restaurants in New Delhi. This will lead to the identification of areas with low density that are close to the center.

Finally, the most promsing area will be selected and further analyzed in order to create clusters using K-means of suitable locations that abide by the following requirements: no more than two restaurants within a 1km radius and no Italian restuarant wihting a 1km radius. This will yield general areas for further exploration by stakeholders in order to identify where to open an Italian restaurant. 

## Analysis

Let's perform some basic explanatory data analysis and derive some additional info from our raw data. First let's count the number of restaurants in every area candidate:

In [56]:
location_restaurants_count = [len(res) for res in location_restaurants]

df_locations['Restaurants in area'] = location_restaurants_count

print('Average number of restaurants in every area with radius=300m:', np.array(location_restaurants_count).mean())

df_locations.head(10)

Average number of restaurants in every area with radius=300m: 0.45038167938931295


Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area,Distance to Italian restaurant
0,"Rangpuri, Vasant Vihar Tehsil, New Delhi, Delh...",28.537689,77.107837,7112225.0,5476493.0,19897.681774,0,3247.030133
1,"DeeMarks, Delhi-Gurugram Expressway, Rangpuri,...",28.534016,77.112409,7113185.0,5476493.0,19835.829702,2,3299.687054
2,"West End Greens, Rangpuri, Vasant Vihar Tehsil...",28.530343,77.116981,7114145.0,5476493.0,19820.336525,0,3616.056502
3,"West End Greens, Rangpuri, Vasant Vihar Tehsil...",28.52667,77.121551,7115105.0,5476493.0,19851.310787,0,4136.060275
4,"Bana Singh Enclave, Vasant Kunj, Rangpuri, Vas...",28.522998,77.126121,7116065.0,5476493.0,19928.535821,0,4793.88408
5,"Rangpuri, Vasant Vihar Tehsil, New Delhi, Delh...",28.557722,77.090856,7107905.0,5477325.0,19937.958198,0,5207.381574
6,"Rangpuri, Vasant Vihar Tehsil, New Delhi, Delh...",28.554047,77.095432,7108865.0,5477325.0,19666.473429,1,4378.030702
7,"Rangpuri, Vasant Vihar Tehsil, New Delhi, Delh...",28.550372,77.100007,7109825.0,5477325.0,19438.656773,0,3613.956668
8,"Vasant Kunj, Rangpuri, Vasant Vihar Tehsil, Ne...",28.546697,77.104581,7110785.0,5477325.0,19256.058193,0,2966.043286
9,"Vasant Kunj, Rangpuri, Vasant Vihar Tehsil, Ne...",28.543023,77.109154,7111745.0,5477325.0,19119.973251,0,2525.340125


Now we need to calculate the distance to the nearest Italian restaurant even if that restaurant is outside of the candidate neighborhood.

In [57]:
distances_to_italian_restaurant = []

for area_x, area_y in zip(xs, ys):
    min_distance = 20000
    for res in italian_restaurants.values():
        res_x = res[7]
        res_y = res[8]
        d = calc_xy_distance(area_x, area_y, res_x, res_y)
        if d<min_distance:
            min_distance = d
    distances_to_italian_restaurant.append(min_distance)

df_locations['Distance to Italian restaurant'] = distances_to_italian_restaurant

In [58]:
df_locations.head(10)

Unnamed: 0,Address,Latitude,Longitude,X,Y,Distance from center,Restaurants in area,Distance to Italian restaurant
0,"Rangpuri, Vasant Vihar Tehsil, New Delhi, Delh...",28.537689,77.107837,7112225.0,5476493.0,19897.681774,0,3247.030133
1,"DeeMarks, Delhi-Gurugram Expressway, Rangpuri,...",28.534016,77.112409,7113185.0,5476493.0,19835.829702,2,3299.687054
2,"West End Greens, Rangpuri, Vasant Vihar Tehsil...",28.530343,77.116981,7114145.0,5476493.0,19820.336525,0,3616.056502
3,"West End Greens, Rangpuri, Vasant Vihar Tehsil...",28.52667,77.121551,7115105.0,5476493.0,19851.310787,0,4136.060275
4,"Bana Singh Enclave, Vasant Kunj, Rangpuri, Vas...",28.522998,77.126121,7116065.0,5476493.0,19928.535821,0,4793.88408
5,"Rangpuri, Vasant Vihar Tehsil, New Delhi, Delh...",28.557722,77.090856,7107905.0,5477325.0,19937.958198,0,5207.381574
6,"Rangpuri, Vasant Vihar Tehsil, New Delhi, Delh...",28.554047,77.095432,7108865.0,5477325.0,19666.473429,1,4378.030702
7,"Rangpuri, Vasant Vihar Tehsil, New Delhi, Delh...",28.550372,77.100007,7109825.0,5477325.0,19438.656773,0,3613.956668
8,"Vasant Kunj, Rangpuri, Vasant Vihar Tehsil, Ne...",28.546697,77.104581,7110785.0,5477325.0,19256.058193,0,2966.043286
9,"Vasant Kunj, Rangpuri, Vasant Vihar Tehsil, Ne...",28.543023,77.109154,7111745.0,5477325.0,19119.973251,0,2525.340125


In [59]:
print('Average distance to closest Italian restaurant from each area center:', df_locations['Distance to Italian restaurant'].mean())

Average distance to closest Italian restaurant from each area center: 4766.899100015538


Now we can make a heatmap of all restuarants in New Delhi, we can also use circles to show the distance from city center in the intervals of: 2km, 5km and 10km. 

In [60]:
restaurant_latlons = [[res[2], res[3]] for res in restaurants.values()]

italian_latlons = [[res[2], res[3]] for res in italian_restaurants.values()]

In [61]:
from folium import plugins
from folium.plugins import HeatMap

map_delhi = folium.Map(location=delhi_center, zoom_start=11.5)
folium.TileLayer('cartodbpositron').add_to(map_delhi) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_delhi)
folium.Marker(delhi_center).add_to(map_delhi)
folium.Circle(delhi_center, radius=2000, fill=False, color='white').add_to(map_delhi)
folium.Circle(delhi_center, radius=5000, fill=False, color='white').add_to(map_delhi)
folium.Circle(delhi_center, radius=10000, fill=False, color='white').add_to(map_delhi)
map_delhi

Here we notice some gaps in the South & South-East of New Delhi that's still close to the city center near Sarojini Nagar. Following this we can also make another heat map for Italian restuarants.

In [62]:
map_delhi = folium.Map(location=delhi_center, zoom_start=11.5)
folium.TileLayer('cartodbpositron').add_to(map_delhi) #cartodbpositron cartodbdark_matter
HeatMap(italian_latlons).add_to(map_delhi)
folium.Marker(delhi_center).add_to(map_delhi)
folium.Circle(delhi_center, radius=2000, fill=False, color='white').add_to(map_delhi)
folium.Circle(delhi_center, radius=5000, fill=False, color='white').add_to(map_delhi)
folium.Circle(delhi_center, radius=10000, fill=False, color='white').add_to(map_delhi)
map_delhi

We can now define a more narrow region of interest, this is an area to near Sarjoini Nagar and can be seen below.

In [63]:
roi_x_min = delhi_center_x + 6000
roi_y_max = delhi_center_y - 2000
roi_width = 8000
roi_height = 8000
roi_center_x = roi_x_min + 4000
roi_center_y = roi_y_max - 4000
roi_center_lon, roi_center_lat = xy_to_lonlat(roi_center_x, roi_center_y)
roi_center = [roi_center_lat, roi_center_lon]

map_delhi = folium.Map(location=roi_center, zoom_start=13)
HeatMap(restaurant_latlons).add_to(map_delhi)
folium.Marker(delhi_center).add_to(map_delhi)
folium.Circle(roi_center, radius=4000, color='white', fill=True, fill_opacity=0.4).add_to(map_delhi)
map_delhi

We now create a new denser grid in this area with neighborhoods 100m apart.

In [64]:
k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
b = 1.6
x_step = 100 * b
y_step = 100 * b * k 
roi_y_min = roi_center_y - 6300
roi_x_min = roi_center_x - 6300

roi_latitudes = []
roi_longitudes = []
roi_xs = []
roi_ys = []
for i in range(0, 408):
    y = roi_y_min + i * y_step
    x_offset = (50 * b) if i%2==0 else 0
    for j in range(0, 408):
        x = roi_x_min + j * x_step + x_offset
        d = calc_xy_distance(roi_center_x, roi_center_y, x, y)
        if (d <= 6301):
            lon, lat = xy_to_lonlat(x, y)
            roi_latitudes.append(lat)
            roi_longitudes.append(lon)
            roi_xs.append(x)
            roi_ys.append(y)

print(len(roi_latitudes), 'candidate neighborhood centers generated.')

5617 candidate neighborhood centers generated.


In [65]:
map_delhi = folium.Map(location=roi_center, zoom_start=11.5)
folium.Marker(roi_center).add_to(map_delhi)
for lat, lon in zip(roi_latitudes, roi_longitudes):
    folium.Circle([lat, lon], radius=50, color='blue', fill=False).add_to(map_delhi)
folium.Circle(roi_center, radius=4000, color='white', fill=True, fill_opacity=0.4).add_to(map_delhi)
map_delhi

We now need to calculate the number of restaurants within a 1km radius as well as the nearest Italian restaurant for each neghborhood and put this data into a data frame. 

In [66]:
def count_restaurants_nearby(x, y, restaurants, radius=250):    
    count = 0
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=radius:
            count += 1
    return count

def find_nearest_restaurant(x, y, restaurants):
    d_min = 100000
    for res in restaurants.values():
        res_x = res[7]; res_y = res[8]
        d = calc_xy_distance(x, y, res_x, res_y)
        if d<=d_min:
            d_min = d
    return d_min

roi_restaurant_counts = []
roi_italian_distances = []

print('Generating data on location candidates... ', end='')
for x, y in zip(roi_xs, roi_ys):
    count = count_restaurants_nearby(x, y, restaurants, radius=1000)
    roi_restaurant_counts.append(count)
    distance = find_nearest_restaurant(x, y, italian_restaurants)
    roi_italian_distances.append(distance)
print('done.')

Generating data on location candidates... done.


In [67]:
df_roi_locations = pd.DataFrame({'Latitude':roi_latitudes,
                                 'Longitude':roi_longitudes,
                                 'X':roi_xs,
                                 'Y':roi_ys,
                                 'Restaurants nearby':roi_restaurant_counts,
                                 'Distance to Italian restaurant':roi_italian_distances})

df_roi_locations.head(10)

Unnamed: 0,Latitude,Longitude,X,Y,Restaurants nearby,Distance to Italian restaurant
0,28.524187,77.196449,7124005.0,5484013.0,1,1787.420253
1,28.529362,77.191351,7122805.0,5484152.0,3,1282.44886
2,28.528749,77.192111,7122965.0,5484152.0,2,1348.186372
3,28.528137,77.192871,7123125.0,5484152.0,2,1428.893946
4,28.527524,77.193631,7123285.0,5484152.0,2,1522.192276
5,28.526912,77.194391,7123445.0,5484152.0,1,1625.915355
6,28.526299,77.195151,7123605.0,5484152.0,1,1738.197962
7,28.525687,77.19591,7123765.0,5484152.0,1,1857.488512
8,28.525074,77.19667,7123925.0,5484152.0,1,1941.425711
9,28.524462,77.19743,7124085.0,5484152.0,2,1831.618724


Now we can filter this data so that we find the locations with less than 2 restaurants in a 1km radius and no Italian restaurant in a 1km radius. 

In [68]:
good_res_count = np.array((df_roi_locations['Restaurants nearby']<=2))
print('Locations with no more than two restaurants nearby:', good_res_count.sum())

good_ita_distance = np.array(df_roi_locations['Distance to Italian restaurant']>=1000)
print('Locations with no Italian restaurants within 1000m:', good_ita_distance.sum())

good_locations = np.logical_and(good_res_count, good_ita_distance)
print('Locations with both conditions met:', good_locations.sum())

df_good_locations = df_roi_locations[good_locations]

Locations with no more than two restaurants nearby: 1122
Locations with no Italian restaurants within 1000m: 3781
Locations with both conditions met: 1113


Using this we can visualize the good locations on a map, the dots represent the good locations while the heat map is that of restaurants in the area.

In [69]:
good_latitudes = df_good_locations['Latitude'].values
good_longitudes = df_good_locations['Longitude'].values

good_locations = [[lat, lon] for lat, lon in zip(good_latitudes, good_longitudes)]

map_delhi = folium.Map(location=roi_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_delhi) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_delhi)
folium.Marker(delhi_center).add_to(map_delhi)
folium.Circle(roi_center, radius=4000, color='white', fill=True, fill_opacity=0.4).add_to(map_delhi)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=1, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_delhi) 
map_delhi

A representation of the good locations as a heat map.

In [70]:
map_delhi = folium.Map(location=roi_center, zoom_start=13)
HeatMap(good_locations, radius=25).add_to(map_delhi)
folium.Marker(delhi_center).add_to(map_delhi)
folium.Circle(roi_center, radius=4000, color='white', fill=True, fill_opacity=0.4).add_to(map_delhi)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=1, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_delhi) 
map_delhi

Now we can use Kmeans to cluster these locations so that we end up with the main zones that contain these filtered locations good for opening up an Italian restaurant. Using the centers of these zones to derive the address will be the final result of our analysis.

In [71]:
number_of_clusters = 20

good_xys = df_good_locations[['X', 'Y']].values
kmeans = KMeans(n_clusters=number_of_clusters, random_state=0).fit(good_xys)

cluster_centers = [xy_to_lonlat(cc[0], cc[1]) for cc in kmeans.cluster_centers_]

In [72]:
map_delhi = folium.Map(location=roi_center, zoom_start=13)
folium.TileLayer('cartodbpositron').add_to(map_delhi) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_delhi)
folium.Marker(delhi_center).add_to(map_delhi)
folium.Circle(roi_center, radius=4000, color='white', fill=True, fill_opacity=0.4).add_to(map_delhi)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.25).add_to(map_delhi)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=1, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_delhi) 
map_delhi

In [73]:
candidate_area_addresses = []
i = 1
print('==============================================================')
print('Addresses of centers of areas recommended for further analysis')
print('==============================================================\n')
for lon, lat in cluster_centers:
    print("---------------{}---------------".format(i))
    i = i+1
    addr = get_address(lat, lon)
    candidate_area_addresses.append(addr)    
    x, y = lonlat_to_xy(lon, lat)
    d = calc_xy_distance(x, y, delhi_center_x, delhi_center_y)
    print(addr)
    print('=> {:.1f}km from New Delhi Center'.format(d/1000))

Addresses of centers of areas recommended for further analysis

---------------1---------------
Rampuri, Govindpuri, Kalkaji, Kalkaji Tehsil, South East Delhi, Delhi, 110019, India
=> 15.9km from New Delhi Center
---------------2---------------
College of Vocational Studies, Pandit Trilok Chandra Sharma Marg, Madangir, Hauz Khas Tehsil, South Delhi, Delhi, 110076, India
=> 14.6km from New Delhi Center
---------------3---------------
AIIMS Campus, Mahatma Gandhi Marg, Yusuf Sarai Market, Defence Colony Teshil, South East Delhi, Delhi, 1100049, India
=> 8.6km from New Delhi Center
---------------4---------------
East of Kailash, Defence Colony Teshil, South East Delhi, Delhi, 110024, India
=> 12.5km from New Delhi Center
---------------5---------------
Madangir, Ambedkar Nagar, Hauz Khas Tehsil, South Delhi, Delhi, 110062, India
=> 17.4km from New Delhi Center
---------------6---------------
Qutab Golf Course, Basant Kaur Marg, Bhavishya Nidhi Enclave, Hauz Khas Tehsil, South Delhi, Delh

Thus, our analysis is concluded as we reverse geocoded in order to end up with a final list of 20 addresses centered at the 20 zones that are most suitable to open a new Italian restaurant. They are all within 18km of the city center with around 20% being within 8km of the center. These zones are located in South & South-East Delhi which are popular residential areas with higher incomes catered towards Italian fine dining. 

In [74]:
map_delhi = folium.Map(location=roi_center, zoom_start=13.3)
folium.Marker(delhi_center).add_to(map_delhi)
folium.Circle(roi_center, radius=4000, color='white', fill=True, fill_opacity=0.4).add_to(map_delhi)
for lon, lat in cluster_centers:
    folium.Circle([lat, lon], radius=500, color='green', fill=True, fill_opacity=0.25).add_to(map_delhi)
for lonlat, addr in zip(cluster_centers, candidate_area_addresses):
    folium.Marker([lonlat[1], lonlat[0]], popup=addr).add_to(map_delhi)
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.CircleMarker([lat, lon], radius=0.5, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(map_delhi) 
for lat, lon in zip(good_latitudes, good_longitudes):
    folium.Circle([lat, lon], radius=250, color='#00000000', fill=True, fill_color='#0066ff', fill_opacity=0.07).add_to(map_delhi)
map_delhi

## Results and Discussion

Our analysis shows that there are a lot of restuarants in New Delhi (within the 20km radius around the center), after further analysis it was found that an area near Sarjoini Nagar in South/South-East Delhi has a lower restaurant density while also being close to the center.

This area was then further analyzed in order to determine good locations for opening an Italian restaurant. This was done by filtering locations so that good locations were those that matched the following criteria:
1. They had less than 2 restaurants in a 1km radius.
2. They had no Italian restaurants in 1km radius. 

Then these filtered locations were clustered using Kmeans which resulted in 20 clusters which we used as zones to show the general area of the good locations. Using the centroids of these zones we derived the addresses of the zones which has yielded the final list of 20 locations that are suitable for further analysis accomodating to additional factors in order to determine the viability of opening a sucessful Italian restaurant. 

## Conclusion

The purpose of this project was to identigy areas which had a low number of restuarants and especially Italian restaurants, as these areas would be more inclined towards being a good location with low competition for opening a profitable Italian restaurant. 

This was done by creating a grid of candidate neighborhoods and then extracting data on restaurants within these neighborhoods using the Foursquare API. Then upon further analysis a region of interest was determined, which was an area near Sarojini Nagar in South/South-East Delhi. Further analysis and filteration yielded possible good location that were then clustered into 20 different zones of interest and the centers of these zones were reverse geocoded to obtain a list of 20 addresses that serve as the final deliverable of the project.

These addresses can now serve as an inital starting point for stakeholders to start further analysis into the 20 areas, so that additional factors are taken into consideration, for example the existance of nearby hotels, popular attractions, real estate prices, socio-economic dynamics of the zones in order to arrive at the final best area to open an Italian restuarant. This is because this project has only considered the density of restuarants thus determining 20 areas of low competition from other restaurants and no competition from Italian restaurants.