# Capstone Project

## Week 5 Final Code

### Recommending to a Property Developer where to built a Restaurant in Mumbai

* Build a dataframe of neighborhoods in Mumbai, India by web scraping the data from Wikipedia page
* Get the geographical coordinates of the neighborhoods
* Obtain the venue data for the neighborhoods from Foursquare API
* Explore and cluster the neighborhoods
* Select the best cluster to open a new restaurant

### 1. Import Libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library
import re

print("Libraries imported.")

Libraries imported.


### 2. Scrap data from Wikipedia page into a DataFrame

In [2]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Mumbai").text

In [3]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [4]:
regex = re.compile('^tocsection-')
content_lis = soup.find_all('li', attrs={'class': regex})
print(content_lis)

[<li class="toclevel-1 tocsection-1"><a href="#Western_Suburbs"><span class="tocnumber">1</span> <span class="toctext">Western Suburbs</span></a>
<ul>
<li class="toclevel-2 tocsection-2"><a href="#Andheri"><span class="tocnumber">1.1</span> <span class="toctext">Andheri</span></a></li>
<li class="toclevel-2 tocsection-3"><a href="#Bhayandar"><span class="tocnumber">1.2</span> <span class="toctext">Bhayandar</span></a></li>
<li class="toclevel-2 tocsection-4"><a href="#Bandra"><span class="tocnumber">1.3</span> <span class="toctext">Bandra</span></a></li>
<li class="toclevel-2 tocsection-5"><a href="#Borivali"><span class="tocnumber">1.4</span> <span class="toctext">Borivali</span></a></li>
<li class="toclevel-2 tocsection-6"><a href="#Dahisar"><span class="tocnumber">1.5</span> <span class="toctext">Dahisar</span></a></li>
<li class="toclevel-2 tocsection-7"><a href="#Goregaon"><span class="tocnumber">1.6</span> <span class="toctext">Goregaon</span></a></li>
<li class="toclevel-2 tocse

In [5]:
# append the data into the list
neighborhoodList = []
for li in content_lis:
    neighborhoodList.append(li.getText().split('\n')[0])
print(neighborhoodList)

['1 Western Suburbs', '1.1 Andheri', '1.2 Bhayandar', '1.3 Bandra', '1.4 Borivali', '1.5 Dahisar', '1.6 Goregaon', '1.7 Jogeshwari', '1.8 Juhu', '1.9 Kandivali west', '1.10 Kandivali east', '1.11 Khar', '1.12 Malad', '1.13 Santacruz', '1.14 Vasai', '1.15 Virar', '1.16 Vile Parle', '2 Eastern Suburbs', '2.1 Bhandup', '2.2 Ghatkopar', '2.3 Kanjurmarg', '2.4 Kurla', '2.5 Mulund', '2.6 Powai', '2.7 Vidyavihar', '2.8 Vikhroli', '3 Harbour Suburbs', '3.1 Chembur', '3.2 Govandi', '3.3 Mankhurd', '3.4 Trombay', '4 South Mumbai', '4.1 Antop Hill', '4.2 Byculla', '4.3 Colaba', '4.4 Dadar', '4.5 Fort', '4.6 Girgaon', '4.7 Kalbadevi', '4.8 Kamathipura', '4.9 Matunga', '4.10 Parel', '4.11 Tardeo', '5 Other', '6 References']


In [6]:
# create a new DataFrame from the list
mum_df = pd.DataFrame({"Names": neighborhoodList})

mum_df.head()

Unnamed: 0,Names
0,1 Western Suburbs
1,1.1 Andheri
2,1.2 Bhayandar
3,1.3 Bandra
4,1.4 Borivali


In [7]:

# new data frame with split value columns 
new = mum_df["Names"].str.split(" ", n = 1, expand = True) 
  
# making separate first name column from new data frame 
mum_df["index"]= new[0] 
  
# making separate last name column from new data frame 
mum_df["Neighborhood"]= new[1] 

mum_df.drop(columns =["index"], inplace = True) 
mum_df.drop(columns =["Names"], inplace = True)
mum_df.head()

Unnamed: 0,Neighborhood
0,Western Suburbs
1,Andheri
2,Bhayandar
3,Bandra
4,Borivali


In [8]:
# print the number of rows of the dataframe
mum_df.shape

(45, 1)

### 3. Get the geographical coordinates

In [9]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Mumbai, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [10]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in mum_df["Neighborhood"].tolist() ]

In [11]:
coords

[[19.167730000000063, 72.85052000000007],
 [19.11848309908247, 72.84177419095158],
 [19.30746000000005, 72.85170000000005],
 [19.054220000000043, 72.84019000000006],
 [19.229360000000042, 72.85751000000005],
 [19.250030000000038, 72.85908000000006],
 [19.164550000000077, 72.84946000000008],
 [19.13790000000006, 72.84941000000003],
 [19.01493000000005, 72.84522000000004],
 [19.207110000000057, 72.83492000000007],
 [19.205750000000023, 72.86969000000005],
 [19.069120000000055, 72.84643000000005],
 [19.186550000000068, 72.84836000000007],
 [19.081770000000063, 72.84205000000003],
 [19.07934000000006, 72.83916000000005],
 [19.01657000000006, 72.85853000000003],
 [19.100580000000036, 72.84377000000006],
 [19.00538889189226, 72.85576887678867],
 [19.145560000000046, 72.94856000000004],
 [19.086476606699875, 72.9089562772808],
 [19.131400000000042, 72.93565000000007],
 [19.064940000000036, 72.88073000000003],
 [19.171850000000063, 72.95564000000007],
 [19.123110000000054, 72.90944000000007],


In [12]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [13]:
# merge the coordinates into the original dataframe
mum_df['Latitude'] = df_coords['Latitude']
mum_df['Longitude'] = df_coords['Longitude']

In [14]:
mum_df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Western Suburbs,19.16773,72.85052
1,Andheri,19.118483,72.841774
2,Bhayandar,19.30746,72.8517
3,Bandra,19.05422,72.84019
4,Borivali,19.22936,72.85751


In [15]:
# the following steps are done to remove unwanted data from the dataframe
mum_df.drop([0, 15, 24, 29, 41,42], inplace=True)

In [16]:
mum_df.shape

(39, 3)

In [17]:
mum_df = mum_df.reset_index()

In [18]:
del mum_df['index']
mum_df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Andheri,19.118483,72.841774
1,Bhayandar,19.30746,72.8517
2,Bandra,19.05422,72.84019
3,Borivali,19.22936,72.85751
4,Dahisar,19.25003,72.85908


In [19]:
# save the DataFrame as CSV file
mum_df.to_csv("mum_df.csv", index=False)

### 4. Create a map of Mumbai with neighborhoods superimposed on top

In [20]:
# get the coordinates of Mumbai
address = 'Mumbai, India'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Mumbai, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Mumbai, India 18.9387711, 72.8353355.


In [21]:
# create map of Mumbai using latitude and longitude values
map_mum = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(mum_df['Latitude'], mum_df['Longitude'], mum_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_mum)  
    
map_mum

In [22]:
# save the map as HTML file
map_mum.save('map_mum.html')

### 5. Use the Foursquare API to explore the neighborhoods

In [24]:
# define Foursquare Credentials and Version
#@hidden cell
CLIENT_ID = 'TS5Y4GJJSDMQUMINV3F4RHAOZZRTNCU2I4PAAVHUBTAN4YOC' 
CLIENT_SECRET = 'DHDRQYF4IQNBVXLTMU0F05PPP50MHLNVV5NX0XPG45JYTG0D' 
VERSION = '20191010' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: TS5Y4GJJSDMQUMINV3F4RHAOZZRTNCU2I4PAAVHUBTAN4YOC
CLIENT_SECRET:DHDRQYF4IQNBVXLTMU0F05PPP50MHLNVV5NX0XPG45JYTG0D


#### Now, let's get the top 100 venues that are within a radius of 2000 meters.

In [25]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(mum_df['Latitude'], mum_df['Longitude'], mum_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [26]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(3025, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Andheri,19.118483,72.841774,Merwans Cake shop,19.1193,72.845418,Bakery
1,Andheri,19.118483,72.841774,Radha Krishna Veg Restaurant,19.11513,72.84306,Indian Restaurant
2,Andheri,19.118483,72.841774,Naturals,19.111204,72.837255,Ice Cream Shop
3,Andheri,19.118483,72.841774,Shawarma Factory,19.124591,72.840398,Falafel Restaurant
4,Andheri,19.118483,72.841774,Starbucks Coffee : A Tata Alliance,19.114569,72.836205,Coffee Shop


#### Let's check how many venues were returned for each neighorhood

In [27]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Andheri,100,100,100,100,100,100
Antop Hill,80,80,80,80,80,80
Bandra,100,100,100,100,100,100
Bhandup,24,24,24,24,24,24
Bhayandar,16,16,16,16,16,16
Borivali,100,100,100,100,100,100
Byculla,45,45,45,45,45,45
Chembur,100,100,100,100,100,100
Colaba,100,100,100,100,100,100
Dadar,100,100,100,100,100,100


#### Let's find out how many unique categories can be curated from all the returned venues

In [28]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 200 uniques categories.


In [29]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:203]

array(['Bakery', 'Indian Restaurant', 'Ice Cream Shop',
       'Falafel Restaurant', 'Coffee Shop', 'Pub', 'Sandwich Place',
       'Pizza Place', 'Juice Bar', 'Fast Food Restaurant',
       'Seafood Restaurant', 'Multiplex', 'Snack Place', 'Breakfast Spot',
       'Café', 'Cocktail Bar', 'American Restaurant', 'Bar', 'BBQ Joint',
       'Gym / Fitness Center', 'Chinese Restaurant', 'Diner',
       'Electronics Store', 'Asian Restaurant', 'Department Store',
       'Lounge', 'Park', 'Liquor Store', "Women's Store",
       'Vegetarian / Vegan Restaurant',
       'Residential Building (Apartment / Condo)', 'Spa', 'Smoke Shop',
       'Food Truck', 'Athletics & Sports', 'Fish Market', 'Burger Joint',
       'Martial Arts Dojo', 'Hotel', 'Tea Room', 'Clothing Store',
       'Train Station', 'Restaurant', 'Soccer Field', 'Playground', 'Gym',
       'Dessert Shop', 'Sports Club', 'Gourmet Shop', 'Deli / Bodega',
       'Indie Movie Theater', 'Korean Restaurant', 'Salad Place',
       'German

In [31]:
# check if the results contain "Cricket Ground"
"Restaurant" in venues_df['VenueCategory'].unique()

True

### 6. Analyze Each Neighborhood

In [32]:
# one hot encoding
mum_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
mum_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [mum_onehot.columns[-1]] + list(mum_onehot.columns[:-1])
mum_onehot = mum_onehot[fixed_columns]

print(mum_onehot.shape)
mum_onehot.head()

(3025, 201)


Unnamed: 0,Neighborhoods,Afghan Restaurant,Airport,American Restaurant,Antique Shop,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beach,Bed & Breakfast,Beer Garden,Bengali Restaurant,Big Box Store,Bike Rental / Bike Share,Bistro,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Building,Burger Joint,Burrito Place,Bus Station,Cafeteria,Café,Camera Store,Chaat Place,Cheese Shop,Chinese Restaurant,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Auditorium,College Gym,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Coworking Space,Creperie,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dhaba,Dim Sum Restaurant,Diner,Donut Shop,Eastern European Restaurant,Electronics Store,Event Space,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,General Entertainment,German Restaurant,Gift Shop,Gluten-free Restaurant,Goan Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Historic Site,History Museum,Hockey Arena,Hookah Bar,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irani Cafe,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Korean Restaurant,Light Rail Station,Liquor Store,Lounge,Maharashtrian Restaurant,Market,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Mountain,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Music Store,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,North Indian Restaurant,Opera House,Outdoors & Recreation,Paper / Office Supplies Store,Park,Parsi Restaurant,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Playground,Plaza,Pool,Pub,Punjabi Restaurant,Recording Studio,Residential Building (Apartment / Condo),Restaurant,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,South Indian Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Theme Park,Track,Track Stadium,Trail,Train,Train Station,Vegetarian / Vegan Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Zoo
0,Andheri,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Andheri,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Andheri,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Andheri,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Andheri,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [33]:
mum_grouped = mum_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(mum_grouped.shape)
mum_grouped

(39, 201)


Unnamed: 0,Neighborhoods,Afghan Restaurant,Airport,American Restaurant,Antique Shop,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Basketball Court,Beach,Bed & Breakfast,Beer Garden,Bengali Restaurant,Big Box Store,Bike Rental / Bike Share,Bistro,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Building,Burger Joint,Burrito Place,Bus Station,Cafeteria,Café,Camera Store,Chaat Place,Cheese Shop,Chinese Restaurant,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Auditorium,College Gym,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Coworking Space,Creperie,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dhaba,Dim Sum Restaurant,Diner,Donut Shop,Eastern European Restaurant,Electronics Store,Event Space,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Field,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,General Entertainment,German Restaurant,Gift Shop,Gluten-free Restaurant,Goan Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Historic Site,History Museum,Hockey Arena,Hookah Bar,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irani Cafe,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Korean Restaurant,Light Rail Station,Liquor Store,Lounge,Maharashtrian Restaurant,Market,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Mountain,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Music Store,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,North Indian Restaurant,Opera House,Outdoors & Recreation,Paper / Office Supplies Store,Park,Parsi Restaurant,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Playground,Plaza,Pool,Pub,Punjabi Restaurant,Recording Studio,Residential Building (Apartment / Condo),Restaurant,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,South Indian Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Thai Restaurant,Theater,Theme Park,Track,Track Stadium,Trail,Train,Train Station,Vegetarian / Vegan Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Zoo
0,Andheri,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.02,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.06,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.04,0.17,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.04,0.0,0.0,0.01,0.05,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0
1,Antop Hill,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.025,0.0,0.0,0.0,0.0375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0125,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0125,0.025,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0375,0.2,0.0125,0.0,0.0,0.0,0.0,0.0125,0.0,0.0125,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0125,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0125,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0125,0.0,0.0625,0.0,0.025,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.05,0.0,0.0,0.0,0.0,0.0
2,Bandra,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.06,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.12,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.02,0.0,0.0,0.05,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bhandup,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0
4,Bhayandar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1875,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0
5,Borivali,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.08,0.0,0.0,0.0,0.05,0.03,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.05,0.11,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.04,0.01,0.0,0.01,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0
6,Byculla,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.044444,0.0,0.044444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.022222,0.0,0.022222,0.155556,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.022222,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.022222,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.022222
7,Chembur,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.2,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.04,0.01,0.02,0.01,0.01,0.0,0.01,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.02,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0
8,Colaba,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.02,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.09,0.0,0.01,0.0,0.03,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.06,0.0,0.0,0.08,0.0,0.0,0.03,0.03,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.04,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Dadar,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.06,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.05,0.17,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.03,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0


In [34]:
len(mum_grouped[mum_grouped["Restaurant"] > 0])

32

#### Create a new DataFrame for Restaurant data only

In [35]:
mum_mall = mum_grouped[["Neighborhoods","Restaurant"]]

In [36]:
mum_mall.head()

Unnamed: 0,Neighborhoods,Restaurant
0,Andheri,0.0
1,Antop Hill,0.0
2,Bandra,0.02
3,Bhandup,0.083333
4,Bhayandar,0.0625


### 7. Cluster Neighborhoods

Run k-means to cluster the neighborhoods in Mumbai into 3 clusters.

In [37]:
# set number of clusters
kclusters = 3

mum_clustering = mum_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(mum_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 2, 1, 1, 1, 2, 2, 2, 0], dtype=int32)

In [38]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
mum_merged = mum_mall.copy()

# add clustering labels
mum_merged["Cluster Labels"] = kmeans.labels_

In [39]:
mum_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
mum_merged.head()

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels
0,Andheri,0.0,0
1,Antop Hill,0.0,0
2,Bandra,0.02,2
3,Bhandup,0.083333,1
4,Bhayandar,0.0625,1


In [40]:
# merge mumbai_grouped with mumbai_data to add latitude/longitude for each neighborhood
mum_merged = mum_merged.join(mum_df.set_index("Neighborhood"), on="Neighborhood")

print(mum_merged.shape)
mum_merged.head()

(39, 5)


Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
0,Andheri,0.0,0,19.118483,72.841774
1,Antop Hill,0.0,0,19.02635,72.86634
2,Bandra,0.02,2,19.05422,72.84019
3,Bhandup,0.083333,1,19.14556,72.94856
4,Bhayandar,0.0625,1,19.30746,72.8517


In [41]:
# sort the results by Cluster Labels
print(mum_merged.shape)
mum_merged.sort_values(["Cluster Labels"], inplace=True)
mum_merged

(39, 5)


Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
0,Andheri,0.0,0,19.118483,72.841774
36,Vasai,0.0,0,19.07934,72.83916
35,Trombay,0.0,0,19.019,72.89799
34,South Mumbai,0.014925,0,19.17287,72.83602
33,Santacruz,0.01,0,19.08177,72.84205
30,Other,0.016129,0,19.1716,72.95752
29,Mulund,0.012195,0,19.17185,72.95564
28,Matunga,0.010309,0,19.02718,72.8559
27,Malad,0.01087,0,19.18655,72.84836
26,Kurla,0.012048,0,19.06494,72.88073


#### Finally, let's visualize the resulting clusters

In [42]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(mum_merged['Latitude'], mum_merged['Longitude'], mum_merged['Neighborhood'], mum_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [43]:
# save the map as HTML file
map_clusters.save('mum_map_clusters.html')

### 8. Examine Clusters

#### Cluster 0

In [44]:
mum_merged.loc[mum_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
0,Andheri,0.0,0,19.118483,72.841774
36,Vasai,0.0,0,19.07934,72.83916
35,Trombay,0.0,0,19.019,72.89799
34,South Mumbai,0.014925,0,19.17287,72.83602
33,Santacruz,0.01,0,19.08177,72.84205
30,Other,0.016129,0,19.1716,72.95752
29,Mulund,0.012195,0,19.17185,72.95564
28,Matunga,0.010309,0,19.02718,72.8559
27,Malad,0.01087,0,19.18655,72.84836
26,Kurla,0.012048,0,19.06494,72.88073


#### Cluster 1

In [45]:
mum_merged.loc[mum_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
3,Bhandup,0.083333,1,19.14556,72.94856
4,Bhayandar,0.0625,1,19.30746,72.8517
21,Kamathipura,0.06,1,18.96172,72.82627
31,Powai,0.065934,1,19.12311,72.90944
5,Borivali,0.06,1,19.22936,72.85751
11,Eastern Suburbs,0.04878,1,19.005389,72.855769
10,Dahisar,0.048387,1,19.25003,72.85908
14,Girgaon,0.06,1,18.95696,72.81945


#### Cluster 2

In [46]:
mum_merged.loc[mum_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
2,Bandra,0.02,2,19.05422,72.84019
32,References,0.04,2,19.14435,72.93769
6,Byculla,0.022222,2,18.98074,72.84075
13,Ghatkopar,0.0375,2,19.086477,72.908956
8,Colaba,0.02,2,18.91527,72.82614
23,Kandivali west,0.027778,2,19.20711,72.83492
22,Kandivali east,0.030303,2,19.20575,72.86969
20,Kalbadevi,0.02,2,18.95004,72.82995
37,Vikhroli,0.03,2,19.11109,72.92781
12,Fort,0.02,2,18.93226,72.83288


## Observation

Most of the restaurants are concentrated in the Northern arears of Mumbai city, with the highest number in cluster 1 and moderate number in cluster 2. On the other hand, cluster 0 has very low number of restaurants in the neighborhoods. This represents a great opportunity and high potential areas to open new restaurant as there is very little to no competition from existing malls. Meanwhile, restaurant in cluster 1 are likely suffering from intense competition due to oversupply and high concentration of restaurant. From another perspective, this also shows that the oversupply of restaurants mostly happened in the developed parts like Thane in Mumbai city, with the suburb areas like South Mumbai still have very few restaurants. Therefore, this project recommends property developers to capitalize on these findings to open new restaurants in neighborhoods in cluster 0 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new restaurants in neighborhoods in cluster 2 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 1 which already have high concentration of restaurants and suffering from intense competition.