# Coursera Capstone Week 5 Project

*Opening a New Indian Restaurant in Delhi, India*

**Objectives**
1. Build a dataframe of neighborhoods in Delhi, India by web scraping the data from Wikipedia page
2. Get the geographical coordinates of the neighborhoods
3. Obtain the venue data for the neighborhoods from Foursquare API
4. Explore and cluster the neighborhoods
5. Select the best cluster to open a new Indian Restaurant


### Importing Libraries

In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

!pip install geocoder 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!pip install folium 
import folium # map rendering library
import re

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 8.8MB/s ta 0:00:011
[?25hCollecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/a4/f0/44e69d50519880287cc41e7c8a6acc58daa9a9acf5f6afc52bcc70f69a6d/folium-0.11.0-py2.py3-none-any.whl (93kB)
[K     |████████████████████████████████| 102kB 8.6MB/s ta 0:00:011
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/13/fb/9eacc24ba3216510c6b59a4ea1cd53d87f25ba76237d7f4393abeaf4c94e/branca-0.4.1-py3-none-any

### Web-Scraping Data from Wikipedia page to a pandas Dataframe

In [4]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Neighbourhoods_of_Delhi").text
soup = BeautifulSoup(data, 'html.parser')
neighborhoodList=[]
for row in soup.find("div", class_="toc").findAll("li"):
    neighborhoodList.append(row.text)
neighborhoodList = neighborhoodList[:9]
for i in range(0,len(neighborhoodList)):
    neighborhoodList[i] = neighborhoodList[i].replace(str(i+1)+' ','')
df = pd.DataFrame({"Neighborhood": neighborhoodList})
df.head()

Unnamed: 0,Neighborhood
0,North West Delhi
1,North Delhi
2,North East Delhi
3,Central Delhi
4,New Delhi


### Getting the Co-ordinates of the Neighborhoods 

In [5]:
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Delhi, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

coords = [ get_latlng(neighborhood) for neighborhood in df["Neighborhood"].tolist() ]

In [6]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
# merge the coordinates into the original dataframe
df['Latitude'] = df_coords['Latitude']
df['Longitude'] = df_coords['Longitude']

In [7]:
df

Unnamed: 0,Neighborhood,Latitude,Longitude
0,North West Delhi,28.656625,77.163057
1,North Delhi,28.656625,77.163057
2,North East Delhi,28.6341,77.21689
3,Central Delhi,28.64902,77.19319
4,New Delhi,28.63095,77.21721
5,East Delhi,28.672028,77.14721
6,South Delhi,28.55065,77.25187
7,South West Delhi,28.56788,77.18912
8,West Delhi,28.6372,77.28752


### Getting the co-ordinates of Delhi

In [8]:
# get the coordinates of Delhi
address = 'Delhi, India'

geolocator = Nominatim(user_agent="my-app")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Delhi, India is  {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Delhi, India is  28.6517178, 77.2219388.


### Creating a map of Delhi with Neighborhood superimposed on top

In [9]:
# create map of Delhi using latitude and longitude values
map_kl = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_kl)  
    
map_kl

### Using Foursquare-API to explore the neighborhoods

In [10]:
# define Foursquare Credentials and Version
CLIENT_ID = '0XE3SMLCK41JM4QBJUR3ONRHMXKVEXCNV1FBLDQ5ZA5OY03A' # your Foursquare ID
CLIENT_SECRET = '0M3CTFZQ0X3FG0TRGUZQ1NXZKJA42VB0DWBKAQM2C51BXRRH' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 0XE3SMLCK41JM4QBJUR3ONRHMXKVEXCNV1FBLDQ5ZA5OY03A
CLIENT_SECRET:0M3CTFZQ0X3FG0TRGUZQ1NXZKJA42VB0DWBKAQM2C51BXRRH


#### Getting top 100 venues around 5kms radius from the neighborhoods using the Foursquare the API

In [11]:
radius = 5000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

KeyError: 'groups'

In [201]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(821, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,North West Delhi,28.656625,77.163057,Lantern's,28.643183,77.177746,Bar
1,North West Delhi,28.656625,77.163057,Jaypee Siddharth,28.642483,77.175543,Hotel
2,North West Delhi,28.656625,77.163057,Haldiram's,28.666336,77.14657,Vegetarian / Vegan Restaurant
3,North West Delhi,28.656625,77.163057,Raviraj Ki Kulfi,28.649359,77.190215,Dessert Shop
4,North West Delhi,28.656625,77.163057,Old Rajender nagar market,28.641845,77.186148,Food & Drink Shop


#### Let's check how many venues were returned for each neighorhood

In [202]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Central Delhi,100,100,100,100,100,100
East Delhi,45,45,45,45,45,45
New Delhi,100,100,100,100,100,100
North Delhi,100,100,100,100,100,100
North East Delhi,100,100,100,100,100,100
North West Delhi,100,100,100,100,100,100
South Delhi,100,100,100,100,100,100
South West Delhi,100,100,100,100,100,100
West Delhi,76,76,76,76,76,76


#### Let's find out how many unique categories can be curated from all the returned venues

In [203]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 110 uniques categories.


In [204]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Bar', 'Hotel', 'Vegetarian / Vegan Restaurant', 'Dessert Shop',
       'Food & Drink Shop', 'Donut Shop', 'Indian Restaurant',
       'Athletics & Sports', 'American Restaurant', 'Snack Place',
       'Fast Food Restaurant', 'Sandwich Place', 'Market', 'Sports Bar',
       'Playground', 'BBQ Joint', 'Breakfast Spot', 'Coffee Shop', 'Café',
       'Bakery', 'Pizza Place', 'Food', 'Miscellaneous Shop',
       'Shopping Mall', 'Diner', 'Hookah Bar', 'Arcade',
       'Asian Restaurant', 'Garden Center', 'Pub',
       'Furniture / Home Store', 'Department Store', 'Ice Cream Shop',
       'Plaza', 'South Indian Restaurant', 'Clothing Store', 'Food Truck',
       'Restaurant', 'Lounge', 'Deli / Bodega', 'Tibetan Restaurant',
       'Bistro', 'Molecular Gastronomy Restaurant', 'Smoke Shop', 'Spa',
       'North Indian Restaurant', 'Spiritual Center', 'Theater',
       'Gastropub', 'Art Gallery'], dtype=object)

In [205]:
# check if the results contain "Indian Restaurant"
"Indian Restaurant" in venues_df['VenueCategory'].unique()

True

In [206]:
venues_df.head(10)

Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,North West Delhi,28.656625,77.163057,Lantern's,28.643183,77.177746,Bar
1,North West Delhi,28.656625,77.163057,Jaypee Siddharth,28.642483,77.175543,Hotel
2,North West Delhi,28.656625,77.163057,Haldiram's,28.666336,77.14657,Vegetarian / Vegan Restaurant
3,North West Delhi,28.656625,77.163057,Raviraj Ki Kulfi,28.649359,77.190215,Dessert Shop
4,North West Delhi,28.656625,77.163057,Old Rajender nagar market,28.641845,77.186148,Food & Drink Shop
5,North West Delhi,28.656625,77.163057,Dunkin',28.666258,77.126289,Donut Shop
6,North West Delhi,28.656625,77.163057,Suruchi Restaurant,28.647168,77.188693,Indian Restaurant
7,North West Delhi,28.656625,77.163057,Major Dhyan Chand Sports Complex,28.684029,77.167487,Athletics & Sports
8,North West Delhi,28.656625,77.163057,T.G.I. Friday's,28.653093,77.123155,American Restaurant
9,North West Delhi,28.656625,77.163057,Bikanervala Naraina,28.630711,77.138122,Indian Restaurant


### Analyze Each Neighborhood

In [207]:
# one hot encoding
df_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
df_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [df_onehot.columns[-1]] + list(df_onehot.columns[:-1])
df_onehot = df_onehot[fixed_columns]

print(df_onehot.shape)
df_onehot.head()

(821, 111)


Unnamed: 0,Neighborhoods,Airport,American Restaurant,Arcade,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Bengali Restaurant,Big Box Store,Bistro,Bookstore,Boutique,Breakfast Spot,Burmese Restaurant,Café,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Cricket Ground,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,English Restaurant,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Furniture / Home Store,Garden Center,Gastropub,Golf Course,Gourmet Shop,Gym,Gym / Fitness Center,Hindu Temple,Historic Site,History Museum,Hookah Bar,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irani Cafe,Italian Restaurant,Japanese Restaurant,Karnataka Restaurant,Korean Restaurant,Light Rail Station,Lounge,Market,Mediterranean Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Molecular Gastronomy Restaurant,Monument / Landmark,Mosque,Movie Theater,Multicuisine Indian Restaurant,Multiplex,Museum,Music Venue,Nightclub,North Indian Restaurant,Northeast Indian Restaurant,Other Nightlife,Palace,Park,Pizza Place,Playground,Plaza,Pool,Pub,Restaurant,Road,Sandwich Place,Sculpture Garden,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,South Indian Restaurant,Spa,Speakeasy,Spiritual Center,Sports Bar,Stadium,Tea Room,Temple,Thai Restaurant,Theater,Tibetan Restaurant,Track,Trail,Train Station,University,Vegetarian / Vegan Restaurant,Yoga Studio
0,North West Delhi,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,North West Delhi,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,North West Delhi,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
3,North West Delhi,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,North West Delhi,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [208]:
df_grouped = df_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(df_grouped.shape)
df_grouped

(9, 111)


Unnamed: 0,Neighborhoods,Airport,American Restaurant,Arcade,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Bengali Restaurant,Big Box Store,Bistro,Bookstore,Boutique,Breakfast Spot,Burmese Restaurant,Café,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Cricket Ground,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,English Restaurant,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Furniture / Home Store,Garden Center,Gastropub,Golf Course,Gourmet Shop,Gym,Gym / Fitness Center,Hindu Temple,Historic Site,History Museum,Hookah Bar,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irani Cafe,Italian Restaurant,Japanese Restaurant,Karnataka Restaurant,Korean Restaurant,Light Rail Station,Lounge,Market,Mediterranean Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Molecular Gastronomy Restaurant,Monument / Landmark,Mosque,Movie Theater,Multicuisine Indian Restaurant,Multiplex,Museum,Music Venue,Nightclub,North Indian Restaurant,Northeast Indian Restaurant,Other Nightlife,Palace,Park,Pizza Place,Playground,Plaza,Pool,Pub,Restaurant,Road,Sandwich Place,Sculpture Garden,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,South Indian Restaurant,Spa,Speakeasy,Spiritual Center,Sports Bar,Stadium,Tea Room,Temple,Thai Restaurant,Theater,Tibetan Restaurant,Track,Trail,Train Station,University,Vegetarian / Vegan Restaurant,Yoga Studio
0,Central Delhi,0.0,0.0,0.03,0.01,0.0,0.02,0.01,0.02,0.03,0.02,0.0,0.0,0.01,0.0,0.01,0.02,0.0,0.08,0.0,0.01,0.0,0.05,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.03,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.07,0.01,0.01,0.14,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.04,0.01,0.01,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.01,0.03,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0
1,East Delhi,0.022222,0.022222,0.0,0.0,0.0,0.022222,0.022222,0.044444,0.044444,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.088889,0.0,0.0,0.0,0.0,0.0,0.022222,0.044444,0.066667,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.022222,0.0,0.0,0.177778,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.044444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.022222,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0
2,New Delhi,0.0,0.0,0.01,0.02,0.01,0.02,0.0,0.02,0.02,0.02,0.0,0.0,0.01,0.02,0.02,0.0,0.0,0.1,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.08,0.01,0.01,0.14,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.04,0.01,0.02,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.01,0.02,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0
3,North Delhi,0.0,0.01,0.03,0.0,0.0,0.01,0.01,0.02,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.04,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.04,0.0,0.0,0.12,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.06,0.0,0.0,0.14,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.11,0.01,0.0,0.0,0.01,0.0,0.0,0.05,0.0,0.0,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0
4,North East Delhi,0.0,0.0,0.01,0.02,0.01,0.02,0.0,0.02,0.02,0.02,0.0,0.0,0.01,0.02,0.02,0.0,0.0,0.1,0.01,0.01,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.07,0.0,0.01,0.15,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.04,0.01,0.02,0.0,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.01,0.02,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0
5,North West Delhi,0.0,0.01,0.03,0.0,0.0,0.01,0.01,0.02,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.04,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.04,0.0,0.0,0.12,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.06,0.0,0.0,0.14,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.11,0.01,0.0,0.0,0.01,0.0,0.0,0.05,0.0,0.0,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0
6,South Delhi,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.05,0.03,0.02,0.0,0.06,0.0,0.0,0.0,0.01,0.0,0.04,0.0,0.01,0.01,0.0,0.03,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.04,0.0,0.02,0.12,0.01,0.0,0.05,0.03,0.0,0.0,0.0,0.04,0.06,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0
7,South West Delhi,0.0,0.01,0.0,0.02,0.0,0.02,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.05,0.04,0.0,0.01,0.03,0.01,0.0,0.0,0.02,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.05,0.0,0.01,0.13,0.0,0.0,0.05,0.03,0.01,0.01,0.0,0.03,0.03,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.01,0.0,0.01,0.01,0.02,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0
8,West Delhi,0.0,0.0,0.039474,0.0,0.0,0.0,0.013158,0.013158,0.013158,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.065789,0.026316,0.0,0.0,0.039474,0.0,0.0,0.013158,0.0,0.013158,0.013158,0.0,0.0,0.0,0.0,0.039474,0.0,0.0,0.013158,0.013158,0.013158,0.0,0.0,0.0,0.0,0.0,0.0,0.013158,0.013158,0.013158,0.0,0.013158,0.0,0.078947,0.0,0.0,0.052632,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.026316,0.0,0.026316,0.013158,0.0,0.0,0.013158,0.0,0.0,0.013158,0.0,0.131579,0.0,0.013158,0.013158,0.0,0.013158,0.013158,0.013158,0.0,0.013158,0.052632,0.0,0.013158,0.013158,0.0,0.0,0.0,0.0,0.013158,0.0,0.0,0.0,0.0,0.0,0.013158,0.0,0.013158,0.0,0.0,0.013158


In [209]:
len(df_grouped[df_grouped["Indian Restaurant"] > 0])

9

#### Create a new DataFrame for Indian Restaurants data only

In [210]:
df_res = df_grouped[["Neighborhoods","Indian Restaurant"]]

df_res.head()

Unnamed: 0,Neighborhoods,Indian Restaurant
0,Central Delhi,0.14
1,East Delhi,0.177778
2,New Delhi,0.14
3,North Delhi,0.14
4,North East Delhi,0.15


### Cluster Neighborhoods

In [211]:
kclusters = 3

df_clustering = df_res.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 2, 0, 0, 0, 0, 0, 0, 1], dtype=int32)

In [212]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
df_merged = df_res.copy()

# add clustering labels
df_merged["Cluster Labels"] = kmeans.labels_

In [213]:
df_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
df_merged.head()

Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels
0,Central Delhi,0.14,0
1,East Delhi,0.177778,2
2,New Delhi,0.14,0
3,North Delhi,0.14,0
4,North East Delhi,0.15,0


In [214]:
# merge df_grouped with df to add latitude/longitude for each neighborhood
df_merged = df_merged.join(df.set_index("Neighborhood"), on="Neighborhood")

print(df_merged.shape)


(9, 5)


In [215]:
df_merged.head()

Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels,Latitude,Longitude
0,Central Delhi,0.14,0,28.64902,77.19319
1,East Delhi,0.177778,2,28.672028,77.14721
2,New Delhi,0.14,0,28.63095,77.21721
3,North Delhi,0.14,0,28.656625,77.163057
4,North East Delhi,0.15,0,28.6341,77.21689


In [216]:
# sort the results by Cluster Labels
print(df_merged.shape)
df_merged.sort_values(["Cluster Labels"], inplace=True)
df_merged

(9, 5)


Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels,Latitude,Longitude
0,Central Delhi,0.14,0,28.64902,77.19319
2,New Delhi,0.14,0,28.63095,77.21721
3,North Delhi,0.14,0,28.656625,77.163057
4,North East Delhi,0.15,0,28.6341,77.21689
5,North West Delhi,0.14,0,28.656625,77.163057
6,South Delhi,0.12,0,28.55065,77.25187
7,South West Delhi,0.13,0,28.56788,77.18912
8,West Delhi,0.052632,1,28.6372,77.28752
1,East Delhi,0.177778,2,28.672028,77.14721


### Finally, let's visualize the resulting clusters

In [217]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_merged['Latitude'], df_merged['Longitude'], df_merged['Neighborhood'], df_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Examine Clusters

In [218]:
df_merged.loc[df_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels,Latitude,Longitude
0,Central Delhi,0.14,0,28.64902,77.19319
2,New Delhi,0.14,0,28.63095,77.21721
3,North Delhi,0.14,0,28.656625,77.163057
4,North East Delhi,0.15,0,28.6341,77.21689
5,North West Delhi,0.14,0,28.656625,77.163057
6,South Delhi,0.12,0,28.55065,77.25187
7,South West Delhi,0.13,0,28.56788,77.18912


In [219]:
df_merged.loc[df_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels,Latitude,Longitude
8,West Delhi,0.052632,1,28.6372,77.28752


In [220]:
df_merged.loc[df_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels,Latitude,Longitude
1,East Delhi,0.177778,2,28.672028,77.14721


**Observations:**
Most of the Indian Restaurants are concentrated in the central area of Delhi, with the highest number in cluster 2 and moderate number in cluster 0. On the other hand, cluster 1 has very low number to totally no Indian Restaurants in the neighborhoods. This represents a great opportunity and high potential area to open new shopping malls as there is very little to no competition from existing Restaurants. Meanwhile, Indian Restaurants in cluster 2 are likely suffering from intense competition due to oversupply and high concentration of Restaurants. From another perspective, this also shows that the oversupply of Restaurants mostly happened in the central area of the city, with the suburb area still have very few Indian Restaurants. Therefore, this project recommends to capitalize on these findings to open new Restaurnats in neighborhoods in cluster 1 with little to no competition. With unique selling propositions to stand out from the competition,one can also open a new Restaurant in neighborhoods in cluster 0 with moderate competition. Lastly, it is advised to avoid neighborhoods in cluster 2 which already have high concentration of Indian Restaurants and suffering from intense competition.