# Capstone Project - The Battle of the Neighborhoods

**Clustering of neighborhoods in Mumbai**

### Introduction: Business Problem 
In this project I will try to segment the neighborhoods of Mumbai using K-means clustering. 

Since there is no proper data on the neighborhoods in Mumbai, I took the list from Wikipedia and scraped it with Beautiful soup and converted it into a table, populating the location attributes separately. For the missing values, I populated a separate csv file which I used in the program using the Dropbox API. Then the Foursquare API was used to get details on common venues in the city.

### Data
Data sources used: Wikipedia page for list of neighborhoods, type of venues from Foursquare API and location co-ordinates from Geocoder

In [1]:
#!pip install dropbox
import dropbox
import numpy as np 
import pandas as pd 
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json
from geopy.geocoders import Nominatim
import requests 
from pandas.io.json import json_normalize 
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium
from bs4 import BeautifulSoup
from io import StringIO

In [2]:
# Hidden cell with sensitive account info

In [99]:
# The code was removed by Watson Studio for sharing.

In [5]:
#Get the latitude and longtitude data for all neighborhoods
def getaddress(address):
    try:
        geolocator = Nominatim(user_agent="mum_explorer")
        location = geolocator.geocode(address)
        lat = location.latitude
        lon = location.longitude
        return[lat,lon]
    except:
        return[None,None]

In [6]:
#Downloading the list of neighborhoods in Mumbai from the Wikipedia page
wiki_url='https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Mumbai'
source= requests.get(wiki_url).text
soup= BeautifulSoup(source,'lxml')
#soup.prettify()

In [11]:
#Initializing the Dataframe
column_names = ['Suburb','Locality','Neighborhood','Latitude','Longitude'] 
neighborhoods = pd.DataFrame(columns=column_names)
dbx = dropbox.Dropbox(D_TOKEN)

Since the data on wikipedia is in the form of a list with many stray values, a for loop was used to append data to the dataframes iteratively. Any duplicates resulting from the process was removed in the next step

In [20]:
#Parsing the wikipedia page and assigning values to the Datafrom converting from a list of strings to a table
for i in soup.find_all('h2'):
    for j in i.next_siblings:
        if j.name == 'h2': break
        if j.name == 'h3':
            for k in j.next_siblings:
                if k.name == 'h3' or k.name == 'h2': break
                if k.name == 'ul':
                    for l in k.children:
                        if l.name == 'li':
                            result=getaddress(l.text+', Mumbai')
                            neighborhoods = neighborhoods.append({'Suburb': i.text.replace('[edit]',''),'Locality': j.text.replace('[edit]',''),\
                                                                      'Neighborhood': l.text,'Latitude':result[0],'Longitude':result[1]}, \
                                                                     ignore_index=True)
        if i.text.replace('[edit]','') in ['South Mumbai','Other'] and j.name == 'ul':
            for m in j.children:
                if m.name == 'li':
                    result=getaddress(m.text+', Mumbai')
                    neighborhoods = neighborhoods.append({'Suburb': i.text.replace('[edit]',''),'Locality': m.text,\
                                              'Neighborhood': m.text, 'Latitude':result[0],'Longitude':result[1]},ignore_index=True)

#neighborhoods

In [26]:
#Resolve duplicate issues from the parsing in the Dataframe
neighborhoods.drop_duplicates(subset='Neighborhood',keep='first',inplace=True)
neighborhoods.reset_index(drop=True, inplace=True)
#neighborhoods

Of the 120 odd neighborhoods populated into the dataframe, there were about 10 entries with no location data because of spelling mismatches. I saved the data on my Dropbox which was then used in the program directly

In [27]:
#Resolve missing Latitude and Longitude values from file stored in Dropbox
path='/Coursera/IBM/missing3.csv'
md, res = dbx.files_download(path)
data = res.content
#print(len(data), 'bytes; md:', md)
s=str(data,'utf-8')
with open('tmp.txt', 'w+') as f:
    f.write(s)
missing = pd.read_csv('tmp.txt', sep=',', header=None)
#missing
for i in missing.index:
    neighborhoods.set_value(missing[0][i],'Latitude',missing[1][i])
    neighborhoods.set_value(missing[0][i],'Longitude',missing[2][i])
#neighborhoods
neighborhoods.to_pickle('mumbai.pkl') #File saved



In [28]:
mumbai_main=pd.read_pickle('mumbai.pkl') #main file to use

In [29]:
# create map of Mumbai using latitude and longitude values
latitude = getaddress('Mumbai,India')[0]
longitude = getaddress('Mumbai,India')[1]
#print('The geograpical coordinate of New York City are {}, {}.'.format(latitude, longitude))
map_mumbai = folium.Map(location=[latitude, longitude], zoom_start=10)
# add markers to map
for lat, lng, borough, neighborhood in zip(mumbai_main['Latitude'], mumbai_main['Longitude'], mumbai_main['Locality'], \
                                           mumbai_main['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker([lat, lng],radius=5,popup=label,color='blue',fill=True,fill_color='#3186cc',\
                        fill_opacity=0.7,parse_html=False).add_to(map_mumbai)      
map_mumbai

In this step, the Foursquare API was used to populate a dataframe with venues in every neighborhood of Mumbai

In [30]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
        LIMIT=100   
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, LIMIT)
        results = requests.get(url).json()["response"]['groups'][0]['items']
        venues_list.append([(name, lat, lng, v['venue']['name'], v['venue']['location']['lat'], v['venue']['location']['lng'], v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category']
    return(nearby_venues)
    for name, lat, lng in zip(names, latitudes, longitudes):   
        # create the API request URL
        LIMIT=50
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.\
        format(CLIENT_ID,CLIENT_SECRET,VERSION,lat,lng,radius,LIMIT)
        results = requests.get(url).json()["response"]['groups'][0]['items']
        venues_list.append([(name,lat,lng,v['venue']['name'],v['venue']['location']['lat'],v['venue']['location']['lng'],\
                             v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood','Neighborhood Latitude','Neighborhood Longitude','Venue','Venue Latitude','Venue Longitude','Venue Category']
    
    return(nearby_venues)

In [31]:
mumbai_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],latitudes=neighborhoods['Latitude'],longitudes=neighborhoods['Longitude'])
mumbai_venues.to_pickle('mvenues.pkl') #in case calls run out
#df = pd.read_pickle('m.pkl')

Checks were necessary for the Mumbai venues. Some venue categories were listed as 'Neighborhoods' which gave rise to errors later, various neighborhoods had no listed venues in the Foursquare API which was deleted
subsequently

In [32]:
print(mumbai_venues.shape)
mumbai_venues.head()

(1679, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Amboli,19.13201,72.849864,Cafe Arfa,19.12893,72.84714,Indian Restaurant
1,Amboli,19.13201,72.849864,"5 Spice , Bandra",19.130421,72.847206,Chinese Restaurant
2,Amboli,19.13201,72.849864,Domino's Pizza,19.131,72.848,Pizza Place
3,Amboli,19.13201,72.849864,Bostan Restaurant,19.135898,72.847581,Asian Restaurant
4,Chakala,19.115287,72.861808,Courtyard Mumbai International Airport,19.114205,72.864148,Hotel


In [33]:
mumbai_venues.groupby('Neighborhood').count().Venue #lists number of venues per neighborhood
print('There are {} uniques categories.'.format(len(mumbai_venues['Venue Category'].unique()))) #prints number of unique categories of venues

There are 178 uniques categories.


In [34]:
#One-hot results has a column 'Neighborhood'. Checked that one particular result has this venue. Will delete
#mumbai_venues['Venue Category'].unique()
mumbai_venues.loc[mumbai_venues['Venue Category'] == 'Neighborhood']

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
577,Pandurangwadi,19.170896,72.850809,pandurangwadi,19.17313,72.85031,Neighborhood


In [35]:
ind=mumbai_venues.loc[mumbai_venues['Venue Category']=='Neighborhood'].index
mum_venues = mumbai_venues.drop(mumbai_venues.index[ind])
mum_venues.reset_index(drop=True, inplace=True)
#mum_venues

In [37]:
#Check whether any Neighborhood had no results
column_names = ['Index','Neighborhood'] 
delete = pd.DataFrame(columns=column_names)
for i in mumbai_main['Neighborhood']:
    result = mum_venues.loc[mum_venues['Neighborhood'] == i]
    if(result.empty):
        delete = delete.append({'Index':mumbai_main[mumbai_main['Neighborhood']==i].index.values.astype(int)[0],'Neighborhood':i},ignore_index=True)
delete

Unnamed: 0,Index,Neighborhood
0,55,Irla
1,73,Mahul


### Methodology
One-hot encoding is used to insert dummy values for Venue categories for K-means

In [38]:
#Analyze neighborhoods
mumbai_onehot = pd.get_dummies(mum_venues[['Venue Category']], prefix="", prefix_sep="")
mumbai_onehot['Neighborhood'] = mum_venues['Neighborhood']
fixed_columns = [mumbai_onehot.columns[-1]] + list(mumbai_onehot.columns[:-1])
mumbai_onehot = mumbai_onehot[fixed_columns]
mumbai_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,Airport Service,American Restaurant,Antique Shop,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Beach,Bed & Breakfast,Bengali Restaurant,Bistro,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Brewery,Bridal Shop,Burger Joint,Bus Line,Bus Station,Bus Stop,Cafeteria,Café,Campground,Candy Store,Chaat Place,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Academic Building,Comedy Club,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Creperie,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Donut Shop,Electronics Store,Event Space,Factory,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Field,Fish Market,Flea Market,Food,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gastropub,General Entertainment,German Restaurant,Gift Shop,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Historic Site,History Museum,Hockey Arena,Hookah Bar,Hostel,Hotel,Hotel Bar,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Irani Cafe,Italian Restaurant,Japanese Restaurant,Juice Bar,Light Rail Station,Lighting Store,Liquor Store,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Movie Theater,Moving Target,Mughlai Restaurant,Multiplex,Music Store,Music Venue,Nightclub,Noodle House,North Indian Restaurant,Office,Other Great Outdoors,Paper / Office Supplies Store,Park,Parsi Restaurant,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Playground,Plaza,Pool,Pub,Punjabi Restaurant,Racetrack,Recording Studio,Recreation Center,Residential Building (Apartment / Condo),Resort,Restaurant,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,South American Restaurant,South Indian Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Supermarket,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Theater,Track,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Amboli,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Amboli,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Amboli,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Amboli,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Chakala,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [39]:
mumbai_onehot.shape #Should have No of unique categories as columns as one was subtracted

(1678, 178)

In [40]:
mumbai_grouped = mumbai_onehot.groupby('Neighborhood').mean().reset_index()
mumbai_grouped.head()
mumbai_grouped.shape

(118, 178)

In [63]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [79]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = mumbai_grouped['Neighborhood']

for ind in np.arange(mumbai_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(mumbai_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aarey Milk Colony,Fast Food Restaurant,Lighting Store,Yoga Studio,Food Court,Flea Market,Fish Market,Field,Farmers Market,Falafel Restaurant,Factory
1,Agripada,Restaurant,Indian Restaurant,Coffee Shop,Gym,Bakery,Diner,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant
2,Altamount Road,Café,Brewery,Bakery,Pizza Place,Concert Hall,Dessert Shop,Coffee Shop,Diner,Restaurant,Salon / Barbershop
3,Amboli,Chinese Restaurant,Pizza Place,Indian Restaurant,Asian Restaurant,Donut Shop,Fish Market,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant
4,Amrut Nagar,Indian Restaurant,Restaurant,Electronics Store,Mediterranean Restaurant,Clothing Store,Sandwich Place,Shop & Service,Brewery,Bowling Alley,Sporting Goods Shop


In [80]:
kclusters = 10
mumbai_grouped_clustering = mumbai_grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(mumbai_grouped_clustering)
kmeans.labels_[0:10] # check cluster labels generated for each row in the dataframe

array([5, 5, 5, 1, 5, 1, 5, 5, 1, 5], dtype=int32)

In [82]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aarey Milk Colony,Fast Food Restaurant,Lighting Store,Yoga Studio,Food Court,Flea Market,Fish Market,Field,Farmers Market,Falafel Restaurant,Factory
1,Agripada,Restaurant,Indian Restaurant,Coffee Shop,Gym,Bakery,Diner,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant
2,Altamount Road,Café,Brewery,Bakery,Pizza Place,Concert Hall,Dessert Shop,Coffee Shop,Diner,Restaurant,Salon / Barbershop
3,Amboli,Chinese Restaurant,Pizza Place,Indian Restaurant,Asian Restaurant,Donut Shop,Fish Market,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant
4,Amrut Nagar,Indian Restaurant,Restaurant,Electronics Store,Mediterranean Restaurant,Clothing Store,Sandwich Place,Shop & Service,Brewery,Bowling Alley,Sporting Goods Shop


In [83]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [84]:
mumbai_main=pd.read_pickle('mumbai.pkl')
for i in delete['Neighborhood']:
    mumbai_main=mumbai_main[mumbai_main['Neighborhood']!=i]
mumbai_main.reset_index(drop=True,inplace=True)
#mumbai_main

In [85]:
mumbai_merged = mumbai_main
mumbai_merged = mumbai_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')
mumbai_merged.head()

Unnamed: 0,Suburb,Locality,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Western Suburbs,Andheri,Amboli,19.13201,72.849864,1,Chinese Restaurant,Pizza Place,Indian Restaurant,Asian Restaurant,Donut Shop,Fish Market,Field,Fast Food Restaurant,Farmers Market,Falafel Restaurant
1,Western Suburbs,Andheri,Chakala,19.115287,72.861808,5,Hotel,Café,Restaurant,Snack Place,Asian Restaurant,Multiplex,Fast Food Restaurant,Bakery,Indian Restaurant,Seafood Restaurant
2,Western Suburbs,Andheri,D.N. Nagar,19.128292,72.830193,5,Yoga Studio,Restaurant,Chinese Restaurant,Coffee Shop,Gym / Fitness Center,Women's Store,Ice Cream Shop,Indian Restaurant,Liquor Store,Lounge
3,Western Suburbs,Andheri,Four Bungalows,19.12875,72.827159,5,Accessories Store,Juice Bar,Department Store,Coffee Shop,Residential Building (Apartment / Condo),Fish Market,Shopping Mall,Market,Liquor Store,Bar
4,Western Suburbs,Andheri,JB Nagar,18.955793,72.836049,4,Indian Restaurant,Smoke Shop,Chinese Restaurant,Sandwich Place,Café,BBQ Joint,Dessert Shop,Indian Sweet Shop,Arcade,Hotel


In [86]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(mumbai_merged['Latitude'], mumbai_merged['Longitude'], mumbai_merged['Neighborhood'], mumbai_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker([lat, lon], radius=5, popup=label, color=rainbow[cluster-1], fill=True, fill_color=rainbow[cluster-1], fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Results and Discussion
As per the K-means clustering, initially 5 clusters were taken which mostly grouped neighborhoods according to whether there were Indian restaurants in the vicinity. 10 clusters did a slightly better job of segmenting the clusters in the city

<font color='red'>Red clusters or Cluster 0</font>  is an outlier neighborhood which has an airport

In [88]:
#mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 0, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]

<font color='purple'>Purple clusters or Cluster 1 </font>, <font color='aqua'>Blue clusters or Cluster 4 </font>are neighborhoods which have more Indian, fast-food, falafel restaurants and farmers markets among the most common venues. These neighborhoods
are mostly in West Mumbai and probably cover neighborhoods with high disposable incomes

In [90]:
#mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 1, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]

In [None]:
#mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 4, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]

<font color='blue'>Blue clusters or Cluster 2 </font> neighborhoods have fields, gyms, more parks, pubs. These neighborhoods cover moslty Central Mumbai. The proximity of these venues suggests that accessibility to residential neighborhoods in East and West Mumbai may have been a prime reason for these venue locations

In [93]:
#mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 2, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]

<font color='green'>Green clusters or Cluster 1 </font> neighborhoods are closer to beaches, parks, Italian, Chinese, Vegan restaurants

In [98]:
#mumbai_merged.loc[mumbai_merged['Cluster Labels'] == 5, mumbai_merged.columns[[1] + list(range(5, mumbai_merged.shape[1]))]]