# Capstone Project: Venue analysis of Mumbai

## Table of contents
* [Introduction](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)
    

## Introduction <a name="introduction"></a>

Mumbai is the **most populous** city in India and the seventh most populous city in the world with a population of roughly **20 million** (as of 2018). Mumbai has the largest number of billionaires of all the cities in India. Mumbai is the financial, commercial and the entertainment capital of India. It is also one of the world's **top ten** centres of commerce in terms of global financial flow, generating **6.16%** of India's GDP. 


In this project we will try to find an optimal location for a cafe (a coffee shop). Specifically, this report will be targeted to stakeholders interested in opening a **Cafe** in **Mumbai**, India.

Since there are lots of food points and cafes in Mumbai. Suppose a coffee giant like Starbucks wants to start a chain of cafe shops in Mumbai through opening new standalone stores or through small outlets besides popular food points.
For standalone stores we will try to detect **locations that are not already crowded with Cafes**. We are also particularly interested in **areas with no Cafes in vicinity**. 
For outlets We will look for locations **with good number of restraunts or food points**, so that, some of them will agree for a merger.

We will use our data science powers to generate a few most promissing areas/ neighborhoods based on these criterias. Advantages of each area will then be clearly expressed so that best possible final location can be chosen by stakeholders.

## Data <a name="data"></a>

The data for Mumbai city with its areas (neighborhoods) and locations (boroughs) is available in a [Wikipedia Page](https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Mumbai).

The data contains latitudes and longitude of each area (neighborhood) as well.

The venues, landmarks and most frequent places in those neighbourhoods are obtained **using Foursquare API**.


## Methodology <a name="methodology"></a>

In this project we will direct our efforts on detecting areas of Mumbai that have low number of coffee shops.
* Step 1: Importing neighborhood data and cleaning it. Then visualizing the data using Folium
* Step 2: Using Foursquare API to get venue data
* Step 3: Cleaning and merging the venue data with the neighborhood data
* Step 4: Using KMeans Clustering Algorithm (Scikit Learn library) to cluster the similar neighborhoods through venues to make the analysis easier.
* Step 5: Final analysis using the clusters

## Analysis <a name="analysis"></a>

Let's perform some basic explanatory data analysis and derive some uasable info from our raw data:

In [57]:
import pandas as pd


In [58]:
url='https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Mumbai'

df=pd.read_html(url, header=0)[0]

In [59]:
df.head()

Unnamed: 0,Area,Location,Latitude,Longitude
0,Amboli,"Andheri,Western Suburbs",19.1293,72.8434
1,"Chakala, Andheri",Western Suburbs,19.111388,72.860833
2,D.N. Nagar,"Andheri,Western Suburbs",19.124085,72.831373
3,Four Bungalows,"Andheri,Western Suburbs",19.124714,72.82721
4,Lokhandwala,"Andheri,Western Suburbs",19.130815,72.82927


In [60]:
df.rename(columns={"Area":"Neighborhood", "Location": "Borough"}, inplace= True)

In [61]:
df.head()

Unnamed: 0,Neighborhood,Borough,Latitude,Longitude
0,Amboli,"Andheri,Western Suburbs",19.1293,72.8434
1,"Chakala, Andheri",Western Suburbs,19.111388,72.860833
2,D.N. Nagar,"Andheri,Western Suburbs",19.124085,72.831373
3,Four Bungalows,"Andheri,Western Suburbs",19.124714,72.82721
4,Lokhandwala,"Andheri,Western Suburbs",19.130815,72.82927


In [62]:
df.shape

(93, 4)

In [63]:
df.head()

Unnamed: 0,Neighborhood,Borough,Latitude,Longitude
0,Amboli,"Andheri,Western Suburbs",19.1293,72.8434
1,"Chakala, Andheri",Western Suburbs,19.111388,72.860833
2,D.N. Nagar,"Andheri,Western Suburbs",19.124085,72.831373
3,Four Bungalows,"Andheri,Western Suburbs",19.124714,72.82721
4,Lokhandwala,"Andheri,Western Suburbs",19.130815,72.82927


#### Using geopy to get Coordinates of Mumbai

In [64]:
import folium
from geopy.geocoders import Nominatim

In [65]:
address = 'Mumbai,India'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Mumbai are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Mumbai are 19.0759899, 72.8773928.


#### Visualizing the data set using folium

In [66]:
map_mumbai = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['Borough'], df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mumbai)  
    
map_mumbai

In [67]:
CLIENT_ID = '2TZN0WXQ5KNEIU4CT0HOCY1QQRA3RWENOUDNF0SKDKU****' # your Foursquare ID
CLIENT_SECRET = 'VVGUYT1PYIQP5P240XL5T2W1BPQBCYT1V1BX3CROJS3****'
VERSION = '20180605'
LIMIT= 100


In [68]:
import json
import requests # library to handle requests
from pandas.io.json import json_normalize

#### Using Foursquare API

In [69]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
                  
    
    # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#### Getting Venues:

In [70]:
mumbai_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Amboli
Chakala, Andheri
D.N. Nagar
Four Bungalows
Lokhandwala
Marol
Sahar
Seven Bungalows
Versova
Mira Road
Bhayandar
Uttan
Bandstand Promenade
Kherwadi
Pali Hill
I.C. Colony
Gorai
Dahisa
Aarey Milk Colony
Bangur Nagar
Jogeshwari West
Juhu
Charkop
Poisar
Mahavir Nagar
Thakur village
Pali Naka
Khar Danda
Dindoshi
Sunder Nagar
Kalina
Naigaon
Nalasopara
Virar
Irla
Vile Parle
Bhandup
Amrut Nagar
Asalfa
Pant Nagar
Kanjurmarg
Nehru Nagar
Nahur
Chandivali
Hiranandani Gardens
Indian Institute of Technology Bombay campus
Vidyavihar
Vikhroli
Chembur
Deonar
Mankhurd
Mahul
Agripada
Altamount Road
Bhuleshwar
Breach Candy
Carmichael Road
Cavel
Churchgate
Cotton Green
Cuffe Parade
Cumbala Hill
Currey Road
Dhobitalao
Dongri
Kala Ghoda
Kemps Corner
Lower Parel
Mahalaxmi
Mahim
Malabar Hill
Marine Drive
Marine Lines
Mumbai Central
Nariman Point
Prabhadevi
Sion
Walkeshwar
Worli
C.G.S. colony
Dagdi Chawl
Navy Nagar
Hindu colony
Ballard Estate
Chira Bazaar
Fanas Wadi
Chor Bazaar
Matunga
Parel
Gowalia Tank
D

In [71]:
print(mumbai_venues.shape)
mumbai_venues.head()

(1333, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Amboli,19.1293,72.8434,Cafe Arfa,19.12893,72.84714,Indian Restaurant
1,Amboli,19.1293,72.8434,"5 Spice , Bandra",19.130421,72.847206,Chinese Restaurant
2,Amboli,19.1293,72.8434,Subway,19.12786,72.844461,Sandwich Place
3,Amboli,19.1293,72.8434,Cafe Coffee Day,19.127748,72.844663,Coffee Shop
4,Amboli,19.1293,72.8434,Spices & Chillies,19.127765,72.844131,Asian Restaurant


In [72]:
mumbai_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agripada,5,5,5,5,5,5
Altamount Road,8,8,8,8,8,8
Amboli,10,10,10,10,10,10
Amrut Nagar,39,39,39,39,39,39
Asalfa,2,2,2,2,2,2
...,...,...,...,...,...,...
Vidyavihar,5,5,5,5,5,5
Vile Parle,34,34,34,34,34,34
Virar,1,1,1,1,1,1
Walkeshwar,6,6,6,6,6,6


#### Cleaning the venue data:

In [73]:
# one hot encoding
mumbai_onehot = pd.get_dummies(mumbai_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
mumbai_onehot['Neighborhood'] = mumbai_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [mumbai_onehot.columns[-1]] + list(mumbai_onehot.columns[:-1])
mumbai_onehot = mumbai_onehot[fixed_columns]

mumbai_onehot.head()

Unnamed: 0,Yoga Studio,ATM,Afghan Restaurant,American Restaurant,Amphitheater,Antique Shop,Arcade,Art Gallery,Arts & Crafts Store,Arts & Entertainment,...,Tex-Mex Restaurant,Theater,Tourist Information Center,Trail,Train Station,Vegetarian / Vegan Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [74]:
mumbai_grouped = mumbai_onehot.groupby('Neighborhood').mean().reset_index()
mumbai_grouped

Unnamed: 0,Neighborhood,Yoga Studio,ATM,Afghan Restaurant,American Restaurant,Amphitheater,Antique Shop,Arcade,Art Gallery,Arts & Crafts Store,...,Tex-Mex Restaurant,Theater,Tourist Information Center,Trail,Train Station,Vegetarian / Vegan Restaurant,Whisky Bar,Wine Bar,Wine Shop,Women's Store
0,Agripada,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Altamount Road,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Amboli,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Amrut Nagar,0.0,0.0,0.025641,0.025641,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Asalfa,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
81,Vidyavihar,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
82,Vile Parle,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
83,Virar,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
84,Walkeshwar,0.0,0.0,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,...,0.0,0.000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Sorting the data to get most common venues:

In [75]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [76]:
import numpy as np

In [77]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = mumbai_grouped['Neighborhood']

for ind in np.arange(mumbai_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(mumbai_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agripada,Bakery,Gym,Coffee Shop,Indian Restaurant,Women's Store,Dim Sum Restaurant,Farmers Market,Falafel Restaurant,Event Space,Electronics Store
1,Altamount Road,Café,Indian Restaurant,Coffee Shop,Sandwich Place,Pizza Place,Theater,Bakery,Dessert Shop,Electronics Store,Dumpling Restaurant
2,Amboli,Halal Restaurant,Asian Restaurant,Ice Cream Shop,Indian Restaurant,Fast Food Restaurant,Gym,Coffee Shop,Chinese Restaurant,Sandwich Place,Park
3,Amrut Nagar,Indian Restaurant,Café,Asian Restaurant,Restaurant,Fast Food Restaurant,Electronics Store,Coffee Shop,Bookstore,Bowling Alley,Brewery
4,Asalfa,Light Rail Station,Men's Store,Women's Store,Dim Sum Restaurant,Falafel Restaurant,Event Space,Electronics Store,Dumpling Restaurant,Donut Shop,Diner


#### KMeans Clustering:

In [78]:
from sklearn.cluster import KMeans

In [79]:
kclusters = 5

mumbai_grouped_clustering = mumbai_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(mumbai_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 2])

#### Merging the data

In [80]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

mumbai_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
mumbai_merged = mumbai_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

mumbai_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Amboli,"Andheri,Western Suburbs",19.1293,72.8434,0.0,Halal Restaurant,Asian Restaurant,Ice Cream Shop,Indian Restaurant,Fast Food Restaurant,Gym,Coffee Shop,Chinese Restaurant,Sandwich Place,Park
1,"Chakala, Andheri",Western Suburbs,19.111388,72.860833,0.0,Hotel,Restaurant,Café,Multiplex,Fast Food Restaurant,Bar,Diner,Asian Restaurant,Seafood Restaurant,Pizza Place
2,D.N. Nagar,"Andheri,Western Suburbs",19.124085,72.831373,0.0,Cocktail Bar,Arts & Entertainment,Indian Restaurant,Food Truck,Pizza Place,Snack Place,Gym / Fitness Center,Vegetarian / Vegan Restaurant,Antique Shop,Diner
3,Four Bungalows,"Andheri,Western Suburbs",19.124714,72.82721,0.0,Electronics Store,Women's Store,Arts & Entertainment,Juice Bar,Residential Building (Apartment / Condo),Market,Fish Market,Smoke Shop,Bar,Sports Club
4,Lokhandwala,"Andheri,Western Suburbs",19.130815,72.82927,0.0,Women's Store,Residential Building (Apartment / Condo),Cocktail Bar,Coffee Shop,Concert Hall,Department Store,Food Truck,Gym / Fitness Center,Indian Restaurant,Liquor Store


In [86]:
mumbai_merged.dropna(subset=["Cluster Labels"],axis=0, inplace= True)

#### Visualizing data via Folium:

In [87]:
import matplotlib.cm as cm
import matplotlib.colors as colors

In [89]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(mumbai_merged['Latitude'], mumbai_merged['Longitude'], mumbai_merged['Neighborhood'], mumbai_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        popup=label,
        
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Cluster No. 1

In [90]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 0, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Andheri,Western Suburbs",Halal Restaurant,Asian Restaurant,Ice Cream Shop,Indian Restaurant,Fast Food Restaurant,Gym,Coffee Shop,Chinese Restaurant,Sandwich Place,Park
1,Western Suburbs,Hotel,Restaurant,Café,Multiplex,Fast Food Restaurant,Bar,Diner,Asian Restaurant,Seafood Restaurant,Pizza Place
2,"Andheri,Western Suburbs",Cocktail Bar,Arts & Entertainment,Indian Restaurant,Food Truck,Pizza Place,Snack Place,Gym / Fitness Center,Vegetarian / Vegan Restaurant,Antique Shop,Diner
3,"Andheri,Western Suburbs",Electronics Store,Women's Store,Arts & Entertainment,Juice Bar,Residential Building (Apartment / Condo),Market,Fish Market,Smoke Shop,Bar,Sports Club
4,"Andheri,Western Suburbs",Women's Store,Residential Building (Apartment / Condo),Cocktail Bar,Coffee Shop,Concert Hall,Department Store,Food Truck,Gym / Fitness Center,Indian Restaurant,Liquor Store
...,...,...,...,...,...,...,...,...,...,...,...
87,South Mumbai,Indian Restaurant,Coffee Shop,Vegetarian / Vegan Restaurant,Fast Food Restaurant,Café,Bar,Hotel,Flower Shop,Juice Bar,Farmers Market
88,South Mumbai,Indian Restaurant,Whisky Bar,Plaza,Asian Restaurant,Women's Store,Falafel Restaurant,Event Space,Electronics Store,Dumpling Restaurant,Donut Shop
89,"Tardeo,South Mumbai",Café,Bookstore,Bar,Coffee Shop,Deli / Bodega,Pizza Place,Restaurant,Brewery,Salon / Barbershop,Lounge
90,South Mumbai,Indian Restaurant,Fast Food Restaurant,Cheese Shop,Market,Restaurant,Middle Eastern Restaurant,Café,Bar,Ice Cream Shop,Multiplex


#### Cluster No. 2

In [91]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 1, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
33,Western Suburbs,Bakery,Women's Store,Dim Sum Restaurant,Farmers Market,Falafel Restaurant,Event Space,Electronics Store,Dumpling Restaurant,Donut Shop,Diner
40,Eastern Suburbs,Bakery,Multiplex,Women's Store,Dim Sum Restaurant,Farmers Market,Falafel Restaurant,Event Space,Electronics Store,Dumpling Restaurant,Donut Shop


#### Cluster No. 3

In [92]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 2, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,"Mira-Bhayandar,Western Suburbs",Shipping Store,Women's Store,Dhaba,Falafel Restaurant,Event Space,Electronics Store,Dumpling Restaurant,Donut Shop,Diner,Dim Sum Restaurant


#### Cluster No. 4

In [93]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 3, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
42,"Mulund,Eastern Suburbs",Indian Restaurant,Restaurant,Ice Cream Shop,Dhaba,Falafel Restaurant,Event Space,Electronics Store,Dumpling Restaurant,Donut Shop,Diner
54,South Mumbai,Indian Restaurant,Cheese Shop,Food,Restaurant,Market,Ice Cream Shop,Fast Food Restaurant,American Restaurant,Deli / Bodega,Donut Shop
80,"Byculla,South Mumbai",Indian Restaurant,Bakery,Women's Store,Dim Sum Restaurant,Farmers Market,Falafel Restaurant,Event Space,Electronics Store,Dumpling Restaurant,Donut Shop
84,"Kalbadevi,South Mumbai",Indian Restaurant,Train Station,Café,Garden,Dessert Shop,Falafel Restaurant,Event Space,Electronics Store,Dumpling Restaurant,Donut Shop
86,"Kamathipura,South Mumbai",Indian Restaurant,BBQ Joint,Dessert Shop,Breakfast Spot,Restaurant,Ice Cream Shop,Market,Antique Shop,Deli / Bodega,Donut Shop
92,Mumbai,Indian Restaurant,Buffet,Lake,Food Court,Seafood Restaurant,Cricket Ground,Cupcake Shop,Falafel Restaurant,Event Space,Electronics Store


#### Cluster No. 5

In [94]:
mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 4, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
60,South Mumbai,Garden,Women's Store,Falafel Restaurant,Event Space,Electronics Store,Dumpling Restaurant,Donut Shop,Diner,Dim Sum Restaurant,Dhaba


## Results and Discussion <a name="results"></a>

We have divided the neighborhoods of Mumbai into 5 Clusters. Let's name these clusters by looking into their top venues types:
*  Cluster 1: "Multiple Social Venues" (These neighborhoods have diverse social venues like restraunts, cafes, multiplexs and entertainment centre, market, stores and lounges. These places have high foot count due to this. Most of the neighborhoods fall here, as Mumbai is an eventful place!)
*  Cluster 2: "Cafe venues" (These neighborhoods are best suited for opening a cafe. This is because they have some bakeries there- which suggests good foot count- but absence of cafes. Thus virtually no competition. Also, being closer to bakery points may complement the sales even more)
*  Cluster 3: "Commercial area" (There is only one neighborhood here and can be a potential high profit area because of its commercial nature)
*  Cluster 4: "Potential mergers" (These neighborhoods may be the feeding point of Mumbai given the huge number of restraunts they have. Thus, they are well suited for a merger for a small outlet)
*  Cluster 5: "Residential area" (They are residential and have least potential for a profitable cafe)

## Conclusion <a name="conclusion"></a>

* Cluster 1 (Multiple Social Venues) are good contenders for opening a cafe. Though the competition is there, as there already are some cafes opened in these neighborhoods.
* Cluster 2 (Cafe venues) contains neighborhoods that are best suited to open a cafe.
* Cluster 3 (Commercial area) contains a neighborhood that is good to open a cafe as it is a commercial centre.
* Cluster 4 (Potential mergers) has the neighborhoods with good number of restraunts already there, thus, hold potential for a merger to open an coffee outlet or include it in their menu.
* Cluster 5 (Residential area) contains neighborhood that can be safely ignored.
