# The Battle of Neighborhoods (Week 1)

## Introduction

This project helps to recommend a suitable place for opening an Italian restaurant.
We need to find places that will be best suited for opening an Italian restaurant in New Delhi. Since there are lots of restaurants in Delhi, we will try to detect locations that are not already crowded with restaurants. We are also particularly interested in areas with no Italian restaurants in vicinity. 


We will explore the neighborhoods and find the most suitable area suited for our restaurant based on the above criteria. Advantages of each area will be clearly expressed so that best possible final location can be chosen by stakeholders.


## Data Description

The data required for finding a suitable location for our restaurant will be data of places that are less crowded by Indian Restaurants. For this purpose, we need to collect this data on all the neighborhood of Delhi. Dataset used in this project is taken from Kaggle. (https://www.kaggle.com/shaswatd673/delhi-neighborhood-data)

Once we have collected all the data the places will be ranked on the basis of nearby venues that contribute to our target audience. Based on definition of our problem, factors that will influence our decision are:
1. Number of restaurants in a neighborhood
2. Number of Italian restaurants in a neighborhood

Coordinates of the centre of New Delhi, and of all the neighborhoods are obtained using Here API.

Number of restaurants and their type and location in every neighborhood will be obtained using Foursquare API.

## Explore Delhi

In [47]:
import pandas as pd
from pandas import DataFrame
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas import json_normalize # tranform JSON file into a pandas dataframe
import json
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

In [36]:
delhi_df = pd.read_csv('datasets_512081_944688_delhi_dataSet.csv',index_col=0)

In [37]:
delhi_df.head()

Unnamed: 0,Borough,Neighborhood,latitude,longitude
0,North West Delhi,Adarsh Nagar,28.614192,77.071541
1,North West Delhi,Ashok Vihar,28.699453,77.184826
2,North West Delhi,Azadpur,28.707657,77.175547
3,North West Delhi,Bawana,28.79966,77.032885
4,North West Delhi,Begum Pur,,


In [38]:
delhi_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 185 entries, 0 to 184
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Borough       185 non-null    object 
 1   Neighborhood  185 non-null    object 
 2   latitude      163 non-null    float64
 3   longitude     163 non-null    float64
dtypes: float64(2), object(2)
memory usage: 7.2+ KB


In [39]:
x_df = delhi_df[delhi_df.latitude.isnull()]
delhi_df = delhi_df.dropna()
delhi_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 163 entries, 0 to 184
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Borough       163 non-null    object 
 1   Neighborhood  163 non-null    object 
 2   latitude      163 non-null    float64
 3   longitude     163 non-null    float64
dtypes: float64(2), object(2)
memory usage: 6.4+ KB


In [40]:
x_df.reset_index(drop = True, inplace = True)
x_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22 entries, 0 to 21
Data columns (total 4 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Borough       22 non-null     object 
 1   Neighborhood  22 non-null     object 
 2   latitude      0 non-null      float64
 3   longitude     0 non-null      float64
dtypes: float64(2), object(2)
memory usage: 832.0+ bytes


In [144]:
url = "https://nominatim.openstreetmap.org/search.php?q=Delhi+India&polygon_geojson=1&format=json"
r = requests.get(url).json()
r

[{'place_id': 28251966,
  'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
  'osm_type': 'node',
  'osm_id': 2702400314,
  'boundingbox': ['28.4917178', '28.8117178', '77.0619388', '77.3819388'],
  'lat': '28.6517178',
  'lon': '77.2219388',
  'display_name': 'Delhi, Kotwali Tehsil, Central Delhi, Delhi, 110006, India',
  'class': 'place',
  'type': 'city',
  'importance': 0.880289498654456,
  'icon': 'https://nominatim.openstreetmap.org/images/mapicons/poi_place_city.p.20.png',
  'geojson': {'type': 'Point', 'coordinates': [77.2219388, 28.6517178]}},
 {'place_id': 235691280,
  'licence': 'Data © OpenStreetMap contributors, ODbL 1.0. https://osm.org/copyright',
  'osm_type': 'relation',
  'osm_id': 1942586,
  'boundingbox': ['28.4046285', '28.8834464', '76.8388351', '77.346601'],
  'lat': '28.6273928',
  'lon': '77.1716954',
  'display_name': 'Delhi, India',
  'class': 'boundary',
  'type': 'administrative',
  'importance': 0.880289498654456,
  'icon

In [145]:
data = pd.DataFrame(r[1]['geojson']['coordinates'])
data = data.transpose()
data.head()

Unnamed: 0,0
0,"[76.8388351, 28.5732306]"
1,"[76.8388947, 28.5726624]"
2,"[76.8401246, 28.5714724]"
3,"[76.8401675, 28.5696275]"
4,"[76.8401913, 28.5695522]"


In [146]:
data.columns = ['location']
data.location=data.location.astype(str)
data['longitude'] = data.location.str.rsplit(',').str[0] 
data['latitude'] =  data.location.str.rsplit(',').str[-1] 
data.longitude = data.longitude.str.replace('[','')
data.latitude = data.latitude.str.replace(']','')
data.location = data.location.str.replace('[','')
data.location = data.location.str.replace(']','')
data = data.astype({'longitude': float, 'latitude': float})
#check if any location of same name from other country is added
print(min(data.longitude),max(data.longitude),sep="-")
print(min(data.latitude),max(data.latitude),sep="-")
data['address'] = 0

76.8388351-77.346601
28.4046285-28.8834464


In [147]:
data.head()

Unnamed: 0,location,longitude,latitude,address
0,"76.8388351, 28.5732306",76.838835,28.573231,0
1,"76.8388947, 28.5726624",76.838895,28.572662,0
2,"76.8401246, 28.5714724",76.840125,28.571472,0
3,"76.8401675, 28.5696275",76.840168,28.569627,0
4,"76.8401913, 28.5695522",76.840191,28.569552,0


In [150]:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="delhi_locations")
for i in range(data.shape[0]):
    latitude = data.iloc[i]['latitude']
    longitude = data.iloc[i]['longitude']
    location = geolocator.reverse((latitude,longitude),timeout=15)
    data['address'].iloc[[i]] = location.address

In [151]:
data.head()

Unnamed: 0,location,longitude,latitude,address
0,"76.8388351, 28.5732306",76.838835,28.573231,"Issapur, Najafgarh Tehsil, South West Delhi, D..."
1,"76.8388947, 28.5726624",76.838895,28.572662,"Issapur, Najafgarh Tehsil, South West Delhi, D..."
2,"76.8401246, 28.5714724",76.840125,28.571472,"Najafgarh Tehsil, South West Delhi, Delhi, 110..."
3,"76.8401675, 28.5696275",76.840168,28.569627,"Najafgarh Tehsil, South West Delhi, Delhi, 110..."
4,"76.8401913, 28.5695522",76.840191,28.569552,"Najafgarh Tehsil, South West Delhi, Delhi, 110..."


In [152]:
print(data.iloc[2]['address'])

Najafgarh Tehsil, South West Delhi, Delhi, 110073, India


In [161]:
delhi_data = data.copy()
# delhi_data.drop_duplicates(subset = 'address')
delhi_data.reset_index(drop = True, inplace = True)
# delhi_data.head()
delhi_str =  delhi_data.address.str.rsplit(',')
delhi_data['Borough'] =  delhi_str.str[-4] 
delhi_data['Neighborhood'] = delhi_str.str[-5] 

In [165]:
from geopy.geocoders import Nominatim
geolocator = Nominatim(user_agent="delhi_locations")
location = geolocator.geocode('Delhi, IN')
latitude = location.latitude
longitude = location.longitude
print(latitude,longitude)

28.6517178 77.2219388


In [166]:
import folium
map_delhi = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(delhi_data['latitude'], delhi_data['longitude'], delhi_data['Borough'], delhi_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_delhi)  
    
map_delhi

In [213]:
overpass_url = "http://overpass-api.de/api/interpreter"
overpass_query = """
[out:json];
area["ISO3166-2" = "IN-DL"][admin_level = 4];
node["place"="suburb"](area);
out;
"""
response = requests.get(overpass_url, 
                        params={'data': overpass_query})
data_suburb = response.json()
delhi_suburb = json_normalize(data_suburb['elements'])
delhi_suburb.filter(['lat','lon','tags.name'])

Unnamed: 0,lat,lon,tags.name
0,28.529249,77.154134,Vasant Kunj
1,28.591893,77.082824,Palam
2,28.565692,77.174646,Ramakrishna Puram
3,28.574157,77.19537,Sarojini Nagar
4,28.641499,77.214061,Paharganj
5,28.594677,77.188521,Chanakyapuri
6,28.669578,77.095956,Paschim Vihar
7,28.716209,77.117074,Rohini
8,28.512633,77.267031,Tughlakabad
9,28.49421,77.306863,Badarpur


In [220]:
overpass_url = "http://overpass-api.de/api/interpreter"
overpass_query = """
[out:json];
area["ISO3166-2" = "IN-DL"][admin_level = 4];
node["place"="neighbourhood"](area);
out;
"""
response = requests.get(overpass_url, 
                        params={'data': overpass_query})
data_neighbourhood = response.json()
delhi_neighbourhood = json_normalize(data_neighbourhood['elements'])
delhi_neighbourhood.filter(['lat','lon','tags.name'])

Unnamed: 0,lat,lon,tags.name
0,28.522712,77.223087,Pushp Vihar
1,28.592966,77.210903,Safdarjung
2,28.519212,77.236210,Ambedkar Nagar
3,28.601645,77.242118,Sunder Nagar
4,28.595970,77.231163,Golf Links
...,...,...,...
63,28.531664,77.259252,Navjeevan Camp
64,28.502772,77.202146,Indira Enclave
65,28.623135,77.296685,West Vinod Nagar
66,28.498548,77.191011,Tara Market


In [215]:
overpass_url = "http://overpass-api.de/api/interpreter"
overpass_query = """
[out:json];
area["ISO3166-2" = "IN-DL"][admin_level = 4];
node["place"="hamlet"](area);
out;
"""
response = requests.get(overpass_url, 
                        params={'data': overpass_query})
data_hamlet = response.json()
delhi_hamlet = json_normalize(data_hamlet['elements'])
delhi_hamlet.filter(['lat','lon','tags.name'])

Unnamed: 0,lat,lon,tags.name
0,28.51331,77.195854,Saidulajaib Extension


In [216]:
overpass_url = "http://overpass-api.de/api/interpreter"
overpass_query = """
[out:json];
area["ISO3166-2" = "IN-DL"][admin_level = 4];
node["place"="village"](area);
out;
"""
response = requests.get(overpass_url, 
                        params={'data': overpass_query})
data_village = response.json()
delhi_village = json_normalize(data_village['elements'])
delhi_village.filter(['lat','lon','tags.name'])

Unnamed: 0,lat,lon,tags.name
0,28.521826,77.178323,Mehrauli
1,28.540007,77.119775,Rangpuri
2,28.435746,77.175898,Dera
3,28.52008,76.984824,Nankheri
4,28.589861,76.920931,Jaffarpur Kalan
5,28.852817,77.076721,Banker
6,28.851161,77.06445,Lampur
7,28.613759,77.284866,Samaspur
8,28.427953,77.188803,Bhati Kalan
9,28.562304,77.00371,Chhawala


In [225]:
overpass_url = "http://overpass-api.de/api/interpreter"
overpass_query = """
[out:json];
area["ISO3166-2" = "IN-DL"][admin_level = 4];
node["place"="quarter"](area);
out;
"""
response = requests.get(overpass_url, 
                        params={'data': overpass_query})
data_quarter = response.json()
delhi_quarter = json_normalize(data_quarter['elements'])
delhi_quarter.filter(['lat','lon','tags.name'])

Unnamed: 0,lat,lon,tags.name
0,28.631383,77.219792,Connaught Place
1,28.658172,77.219512,Naya Bazaar
2,28.665567,77.228819,Kashmere Gate
3,28.53707,77.261805,Kalkaji
4,28.572209,77.16618,Sector 8
5,28.656953,77.294718,East Arjun Nagar
6,28.690321,77.29025,West Jyoti Nagar
7,28.666452,77.265961,Ajit Nagar
8,28.666989,77.268288,Old Selampur West
9,28.666972,77.260872,Valmiki Colony


In [235]:
overpass_url = "http://overpass-api.de/api/interpreter"
overpass_query = """
[out:json];
area["ISO3166-2" = "IN-DL"][admin_level = 4];

node["place"="subarea"](area);
out;
"""
response = requests.get(overpass_url, 
                        params={'data': overpass_query})
data_city_block = response.json()
data_city_block
# delhi_city_block = json_normalize(data_city_block['elements'])
# delhi_city_block.filter(['lat','lon','tags.name'])

{'version': 0.6,
 'generator': 'Overpass API 0.7.56.3 eb200aeb',
 'osm3s': {'timestamp_osm_base': '2020-06-10T17:49:03Z',
  'timestamp_areas_base': '2020-06-10T17:01:03Z',
  'copyright': 'The data included in this document is from www.openstreetmap.org. The data is made available under ODbL.'},
 'elements': []}

In [236]:
total_data = pd.concat([delhi_quarter,delhi_village,delhi_hamlet,delhi_neighbourhood,delhi_suburb])

In [237]:
import folium
map_delhi = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng in zip(total_data['lat'], total_data['lon']):
    label = '{}, {}'.format("neighborhood", "borough")
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_delhi)  
    
map_delhi

In [239]:
overpass_url = "http://overpass-api.de/api/interpreter"
overpass_query = """
[out:json];
area["ISO3166-2" = "IN-DL"][admin_level = 4];

node["amenity"="restaurant"](area);
out;
"""
response = requests.get(overpass_url, 
                        params={'data': overpass_query})
data_restaurant = response.json()
data_restaurant_df = json_normalize(data_restaurant['elements'])

In [240]:
data_restaurant_df.head()

Unnamed: 0,type,id,lat,lon,tags.amenity,tags.created_by,tags.name,tags.cuisine,tags.outdoor_seating,tags.tourism,...,tags.branch,tags.name:bn,tags.addr:block,tags.short_name,tags.drink:shake,tags.seats,tags.addr:district,tags.addr:subdistrict,tags.instagram,tags.diet:lacto_vegetarian
0,node,308894803,28.606946,77.229919,restaurant,Potlatch 0.10f,,,,,...,,,,,,,,,,
1,node,438049083,28.566347,77.234057,restaurant,,Flavours Restaurant,,,,...,,,,,,,,,,
2,node,449452338,28.518931,77.161168,restaurant,,,,,,...,,,,,,,,,,
3,node,449453229,28.522664,77.165571,restaurant,,Nirula's,,,,...,,,,,,,,,,
4,node,495944527,28.55002,77.251128,restaurant,,Subway,sandwich,,,...,,,,,,,,,,


In [241]:
data_restaurant_df.filter(['lat','lon','tags.amenity','tags.name','tags.cuisine'])

Unnamed: 0,lat,lon,tags.amenity,tags.name,tags.cuisine
0,28.606946,77.229919,restaurant,,
1,28.566347,77.234057,restaurant,Flavours Restaurant,
2,28.518931,77.161168,restaurant,,
3,28.522664,77.165571,restaurant,Nirula's,
4,28.550020,77.251128,restaurant,Subway,sandwich
...,...,...,...,...,...
362,28.519133,77.206324,restaurant,Jai Shri Krishna,
363,28.529151,77.213798,restaurant,Teekoy Kerala Restaurant,
364,28.641159,77.213401,restaurant,Singh’s café & restaurant,
365,28.570404,77.242546,restaurant,,


In [242]:
import folium
map_delhi = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng in zip(data_restaurant_df['lat'], data_restaurant_df['lon']):
    label = '{}, {}'.format("neighborhood", "borough")
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_delhi)  
    
map_delhi

In [41]:
URL = "https://geocode.search.hereapi.com/v1/geocode"

api_key = 'oai5nFUpkH3IJqs1BpMXDj4cTQrpeBGstkMHan6Y2m0' # Acquire from developer.here.com
for i in range(0,22):
    location = x_df.Neighborhood.iloc[[i]] + ', Delhi, India' #taking user input
    PARAMS = {'apikey':api_key,'q':location}


# sending get request and saving the response as response object 
    r = requests.get(url = URL, params = PARAMS) 
    data = r.json()

    x_df.latitude.iloc[[i]] = data['items'][0]['position']['lat']
    x_df.longitude.iloc[[i]] = data['items'][0]['position']['lng']

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)


IndexError: list index out of range

In [42]:
x_df

Unnamed: 0,Borough,Neighborhood,latitude,longitude
0,North West Delhi,Begum Pur,28.7333,77.06542
1,North West Delhi,Rohini Sub City,28.73356,77.10401
2,North Delhi,Ghantewala,28.65638,77.2313
3,North Delhi,Gulabi Bagh,28.67546,77.18864
4,North Delhi,Sadar Bazaar,28.59036,77.12018
5,North Delhi,Tees Hazari,28.66751,77.21563
6,North East Delhi,New Usmanpur,28.68249,77.25651
7,North East Delhi,Sadatpur,,
8,Central Delhi,Rajender Nagar,,
9,Central Delhi,Sadar Bazaar,,


In [10]:
delhi_df = pd.concat([delhi_df,x_df])
delhi_df.reset_index(drop = True, inplace = True)

In [None]:
delhi_df = delhi_df[delhi_df['latitude']<29]
delhi_df = delhi_df[delhi_df['longitude']>77]
delhi_df.info()

In [67]:
location =  'Delhi, India' #taking user input
PARAMS = {'apikey':api_key,'q':location}


# sending get request and saving the response as response object 
r = requests.get(url = URL, params = PARAMS) 
data = r.json()

latitude = data['items'][0]['position']['lat']
longitude = data['items'][0]['position']['lng']

Let's plot all our neighborhoods on a map using folium package. Our map is centered around New Delhi.

In [None]:
import folium
map_delhi = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(delhi_df['latitude'], delhi_df['longitude'], delhi_df['Borough'], delhi_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_delhi)  
    
map_delhi

To get the venues we use foursquare apis.

Since we want to open a restaurant we would like to explore the food sector.

In [None]:
CLIENT_ID = 'VD3ROH2EDENTX1EC3JPHS0DQH4ZNQZUJ2N2TZXV15EWA21EP'
CLIENT_SECRET= 'XTNHJ1BYTDO41NXHBYCANPRU122G02PDTPUPMT3SEGJV1IKA'
VERSION = '20180605' 
category = '4d4b7105d754a06374d81259'
LIMIT = 100

In [None]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&categoryId={}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng,
            category,
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [None]:
delhi_venues = getNearbyVenues(names=delhi_df['Neighborhood'],
                                   latitudes=delhi_df['latitude'],
                                   longitudes=delhi_df['longitude']
                                  )
delhi_venues.groupby('Neighborhood').count()

In [None]:
delhi_venues

#### We can visulaize all the venues using seaborn library

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt
g = sns.catplot(x="Venue Category",kind="count", data = delhi_venues,palette="ch:.25")
g.fig.set_size_inches(20,15)
g.set_xticklabels(rotation=90)

Now let's drop all the neighborhood whose venues details are not in our data frame.

In [None]:
venues_grouped = delhi_venues.groupby('Neighborhood').count().filter(['Neighborhood','Venue Category'])
delhi_df_new = delhi_df.join(venues_grouped, on='Neighborhood')
delhi_df_new.dropna(inplace = True)
delhi_df_new.drop(['Venue Category'],axis =1 ,inplace = True)
delhi_df_new.info()

In [None]:
# delhi_res_df = delhi_venues[delhi_venues['Venue Category'].str.contains("Restaurant")]
delhi_res_df = delhi_venues


In [None]:
res_df = delhi_res_df.groupby('Neighborhood').count().filter(['Neighborhood','Venue Category'])
res_df

Now let's join our venues neighborhood to the initial dataset we had.

In [None]:
delhi_merged_df = delhi_df_new
delhi_merged_df = delhi_merged_df.join(res_df, on='Neighborhood')


All the places that doesn't have any venue nearby will have NAN values. So let's fill all the null values with 0 and add the number of venues in each neighborhood to our dataframe.

In [None]:
delhi_merged_df = delhi_merged_df.fillna(0)
delhi_merged_df['Venue Category'] = delhi_merged_df['Venue Category'].astype('int64')
delhi_merged_df = delhi_merged_df.rename(columns={"Venue Category": "Number of Venues"})

In [None]:
delhi_merged_df.head()

Once we have organized our data, we need to one hot encode the data so that we can apply our K-Nearest Neighbor model to the data.


In [None]:
delhi_onehot_df = pd.get_dummies(delhi_res_df[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
delhi_onehot_df['Neighborhood'] = delhi_res_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [delhi_onehot_df.columns[-1]] + list(delhi_onehot_df.columns[:-1])
delhi_onehot_df = delhi_onehot_df[fixed_columns]
delhi_onehot_df.head()

In [None]:
delhi_grouped_df = delhi_onehot_df.groupby('Neighborhood').mean().reset_index()

Now let's add the most 10 most common places of each area to the dataframe, so that it becomes easy for us to check the frequency of different types of restaurants in each neighborhood.

In [None]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [None]:
import numpy as np
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted_df = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_df['Neighborhood'] = delhi_grouped_df['Neighborhood']

for ind in np.arange(delhi_grouped_df.shape[0]):
    neighborhoods_venues_sorted_df.iloc[ind, 1:] = return_most_common_venues(delhi_grouped_df.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_df.head()

Apply our KNN model to the data. We will make 4 clusters from our data.

In [None]:
k_clusters = 4

delhi_res_clustering = delhi_grouped_df.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=k_clusters, random_state=0).fit(delhi_res_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

Now add the cluster label to all the neighborhoods to plot them on map.

In [None]:
neighborhoods_venues_sorted_df.insert(0, 'Cluster Labels', kmeans.labels_)

delhi_merged_res = delhi_merged_df


delhi_merged_res = delhi_merged_res.join(neighborhoods_venues_sorted_df.set_index('Neighborhood'), on='Neighborhood')



delhi_merged_res=delhi_merged_res.fillna(0)
delhi_merged_res['Cluster Labels'] = delhi_merged_res['Cluster Labels'].astype('int64')

In [None]:
delhi_merged_res.drop_duplicates()
delhi_merged_res.reset_index(drop = True, inplace = True)

Once we have all the data in one dataframe, we can visualize it.

In [None]:
g = sns.catplot(x="1st Most Common Venue",kind="count", data = delhi_merged_res,palette="ch:.25")
g.fig.set_size_inches(15,15)
g.set_xticklabels(rotation=90)

From the above plot we can clearly see that Indian Restaurants are the highest in number. We can see that Italian restaurants are not many and are not 1st most common venue of any neighborhood.

#### Make a heatmap to visualize the areas. It will clearly show the areas that are more crowded and that are less crowded.

In [None]:
from folium import plugins
from folium.plugins import HeatMap
restaurant_latlons = [[res[2], res[3]] for res in delhi_merged_res.values]
map_delhi = folium.Map(location=[latitude, longitude], zoom_start=11.2)
folium.TileLayer('cartodbpositron').add_to(map_delhi) #cartodbpositron cartodbdark_matter
HeatMap(restaurant_latlons).add_to(map_delhi)
folium.Marker([latitude, longitude]).add_to(map_delhi)
folium.Circle([latitude, longitude], radius=3000, fill=False, color='white').add_to(map_delhi)
folium.Circle([latitude, longitude], radius=7000, fill=False, color='white').add_to(map_delhi)
folium.Circle([latitude, longitude], radius=10000, fill=False, color='white').add_to(map_delhi)
map_delhi

Now let's plot our neighborhood on the map according to their cluster and compare the position of clusters with the heatmap.

In [None]:
map_restaurants = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k_clusters)
ys = [i + x + (i*x)**2 for i in range(k_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = ['red','#03F7ED','#76F91B','blue']

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(delhi_merged_res['latitude'], delhi_merged_res['longitude'], delhi_merged_res['Neighborhood'], delhi_merged_res['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_restaurants)
       
map_restaurants

From the heatmap we can clearly see that cluster 0 is mainly crowded with restaurants and cluster 1 is less crowded relatively.

Now let's divide the dataframe on the basis of clusters, so that we can analyse characteristics of each cluster.

In [None]:
df_0 = delhi_merged_res.loc[delhi_merged_res['Cluster Labels'] == 0,delhi_merged_res.columns[[1] + list(range(4, delhi_merged_res.shape[1]))]]
df_0.reset_index(drop = True, inplace = True)

In [None]:
df_1 = delhi_merged_res.loc[delhi_merged_res['Cluster Labels'] == 1,delhi_merged_res.columns[[1] + list(range(4, delhi_merged_res.shape[1]))]]
df_1.reset_index(drop = True, inplace = True)

# Analyzing Clusters

In [None]:
plt.figure(figsize=(15,8))
g = sns.countplot(x="Number of Venues", data = delhi_merged_res,palette="ch:2,r=.2,l=.6")
g.set(xlabel='Delhi Venues')


In [None]:
fig, axs = plt.subplots(ncols= 2,figsize=(20,8))

h = sns.countplot(x="Number of Venues", data = df_0,palette="ch:.25", ax = axs[0])
i = sns.countplot(x="Number of Venues", data = df_1,palette="ch:.25",ax = axs[1])

plt.suptitle("Comparison of Number of Venues Among Clusters", fontsize=20)
h.set(xlabel='df_0')
i.set(xlabel='df_1')



In [None]:
df = pd.DataFrame([('df_0',df_0.shape[0]),('df_1',df_1.shape[0])])
df.columns = ["Cluster","No. of Neighborhoods"]
u = sns.barplot(x="Cluster",y = "No. of Neighborhoods",data = df,palette = 'ch:2,r=.2,l=.6')

In [None]:

fig, axs = plt.subplots(ncols =2,figsize = (20,15))
p = sns.countplot(x='1st Most Common Venue', data=df_0, ax=axs[0])
q = sns.countplot(x='1st Most Common Venue', data=df_1, ax=axs[1])

p.set(xlabel='df_0')
q.set(xlabel='df_1')

plt.suptitle("Comparison of 1st common place Among Clusters", fontsize=20)
for ax in fig.axes:
    plt.sca(ax)
    plt.xticks(rotation=90)
plt.subplots_adjust(hspace = 0.6)

Cluster 0 seems to have mostly foreign foods venues as compared to cluster 1.

So now we will analyse the clusters separately.

Since maximum neighborhoods have 4 venues, we will check 4 most common venues of each cluster.

# df_0

In [None]:
fig, (ax1, ax2) = plt.subplots(2,2,figsize = (20,10))
p = sns.countplot(x='1st Most Common Venue', data=df_0, ax=ax1[0])
q = sns.countplot(x='2nd Most Common Venue', data=df_0, ax=ax1[1])
r = sns.countplot(x='3rd Most Common Venue',data=df_0, ax=ax2[0])
s = sns.countplot(x='4th Most Common Venue',data=df_0, ax=ax2[1])
for ax in fig.axes:
    plt.sca(ax)
    plt.xticks(rotation=90)
  
plt.subplots_adjust(hspace = 1)

# df_1

In [None]:
fig, (ax1, ax2) = plt.subplots(2,2,figsize = (20,10))
p = sns.countplot(x='1st Most Common Venue', data=df_1, ax=ax1[0])
q = sns.countplot(x='2nd Most Common Venue', data=df_1, ax=ax1[1])
r = sns.countplot(x='3rd Most Common Venue',data=df_1, ax=ax2[0])
s = sns.countplot(x='4th Most Common Venue',data=df_1, ax=ax2[1])
for ax in fig.axes:
    plt.sca(ax)
    plt.xticks(rotation=90)
plt.subplots_adjust(hspace = 1)

From these we can conclude that Cluster 0 neighborhoods have a variety of restaurants. Overall the majority of restaurants are Fast food restaurants. Italian restaurants are less in number but still more as compared to cluster 1.

Cluster 1 neighborhoods have a majority of Indian Restaurants and there are not Italian restaurants at all in the 4 most common venues. This clusters fits well to our requirements.

Since we want to consider areas with less restaurants we will only visualize areas that have less than 2 restaurants

In [None]:
x_0 = df_0[df_0['Number of Venues']<2] #since these have more foriegn restaurants, we will go for less than 2
x_1 = df_1[df_1['Number of Venues']<=2]

In [None]:
x_0

We will consider only those neighborhoods that doesn't have any foreign restaurants.

In [None]:
x_0.drop(x_0[(x_0['1st Most Common Venue'] == 'American Restaurant') | (x_0['1st Most Common Venue']=='Japanese Restaurant') | (x_0['1st Most Common Venue']=='French Restaurant')].index, inplace =True)

In [None]:
x_1

We can see that there are no Italian Restaurants in these ares. These are also not crowded with restaurants. So these areas will be suitable for our restaurant.

In [None]:
final_df = pd.concat([x_0,x_1])
final_df.reset_index(drop = True, inplace = True)

In [None]:
final_df.filter(['Neighborhood', 'Number of Venues'])

These neighborhoods will be suitable for our restaurant.