# Introduction
This is the final Capstone project where we will compare the neighborhoods in the 2 prime metros of India, Mumbai and Delhi.

# Business Problem & background
Mumbai and Delhi are the two most important metro cities in India. There has always been a comparison in terms of quality of life, jobs, education, entertainment and recreational facilities that these cities have to offer to its residents. This data science project attempts to analyze the neighborhoods in each of these two cities and tries to understand what is popular in them and what they have to offer to someone who is trying to make a choice about living in either of the metro cities.

# Data Source
For this study, we will need data about neighborhoods in each of these metro cities. The data published by the government on postal codes for all India would serve us well for this study. We will specifically download the CSV provided under https://data.gov.in/resources/all-india-pincode-directory-contact-details-along-latitude-and-longitude.
In this study, we will download the CSV, read it into a pandas Dataframe and curate it to remove the data related to all other cities, towns, and places which are not Mumbai or Delhi, since we are only interested in comparing these two biggest metro cities in India.
We shall then clean up the unnecessary columns in the CSV, which is not relevant or useful for our current study. Post office names (office name) will be used as the neighborhood names in each of the regions such as Mumbai or Delhi.
Neighborhood names with the same Pincode will be combined as a single row.
Foursquare API will be used to find the longitude and latitude of each of the neighborhoods in both Mumbai and Delhi. This will form the dataset we will use for this study.

# Install all the required libraries

In [1]:
!pip install beautifulsoup4
!pip install lxml

import pandas as pd
import requests
from bs4 import BeautifulSoup

import numpy as np 
!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim 

import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Collecting beautifulsoup4
[?25l  Downloading https://files.pythonhosted.org/packages/3b/c8/a55eb6ea11cd7e5ac4bacdf92bac4693b90d3ba79268be16527555e186f0/beautifulsoup4-4.8.1-py3-none-any.whl (101kB)
[K     |████████████████████████████████| 102kB 15.9MB/s ta 0:00:01
[?25hCollecting soupsieve>=1.2 (from beautifulsoup4)
  Downloading https://files.pythonhosted.org/packages/81/94/03c0f04471fc245d08d0a99f7946ac228ca98da4fa75796c507f61e688c2/soupsieve-1.9.5-py2.py3-none-any.whl
Installing collected packages: soupsieve, beautifulsoup4
Successfully installed beautifulsoup4-4.8.1 soupsieve-1.9.5
Collecting lxml
[?25l  Downloading https://files.pythonhosted.org/packages/68/30/affd16b77edf9537f5be051905f33527021e20d563d013e8c42c7fd01949/lxml-4.4.2-cp36-cp36m-manylinux1_x86_64.whl (5.8MB)
[K     |████████████████████████████████| 5.8MB 16.2MB/s eta 0:00:01  | 4.8MB 16.2MB/s eta 0:00:01
[?25hInstalling collected packages: lxml
Successfully installed lxml-4.4.2
Solving environment: done


  cu

In [2]:
df = pd.read_csv('all_india_PO_list_without_APS_offices_ver2_lat_long.csv')
df.head()

Unnamed: 0,officename,pincode,officeType,Deliverystatus,divisionname,regionname,circlename,Taluk,Districtname,statename,Telephone,Related Suboffice,Related Headoffice,longitude,latitude
0,Achalapur B.O,504273,B.O,Delivery,Adilabad,Hyderabad,Andhra Pradesh,Asifabad,Adilabad,TELANGANA,,Rechini S.O,Mancherial H.O,,
1,Ada B.O,504293,B.O,Delivery,Adilabad,Hyderabad,Andhra Pradesh,Asifabad,Adilabad,TELANGANA,,Asifabad S.O,Mancherial H.O,,
2,Adegaon B.O,504307,B.O,Delivery,Adilabad,Hyderabad,Andhra Pradesh,Boath,Adilabad,TELANGANA,,Echoda S.O,Adilabad H.O,,
3,Adilabad Collectorate S.O,504001,S.O,Non-Delivery,Adilabad,Hyderabad,Andhra Pradesh,Adilabad,Adilabad,TELANGANA,08732-226703,,Adilabad H.O,,
4,Adilabad H.O,504001,H.O,Delivery,Adilabad,Hyderabad,Andhra Pradesh,Adilabad,Adilabad,TELANGANA,08732-226738,,,,


In [3]:
df = df[df['regionname'].isin(['Delhi','Mumbai'])].reset_index(drop = True)
df.head()

Unnamed: 0,officename,pincode,officeType,Deliverystatus,divisionname,regionname,circlename,Taluk,Districtname,statename,Telephone,Related Suboffice,Related Headoffice,longitude,latitude
0,Anand Vihar S.O,110092,S.O,Non-Delivery,Delhi East,Delhi,Delhi,,East Delhi,DELHI,011-22157472,,Krishna Nagar H.O,,
1,Azad Nagar S.O (East Delhi),110051,S.O,Non-Delivery,Delhi East,Delhi,Delhi,,East Delhi,DELHI,011-22093521,,Krishna Nagar H.O,,
2,Babarpur S.O (North East Delhi),110032,S.O,Non-Delivery,Delhi East,Delhi,Delhi,,North East Delhi,DELHI,011-22829634,,Jhilmil H.O,,
3,Badarpur Khadar B.O,110090,B.O,Delivery,Delhi East,Delhi,Delhi,East Delhi,East Delhi,DELHI,,Karawal Nagar S.O,Jhilmil H.O,,
4,Balbir Nagar S.O,110032,S.O,Non-Delivery,Delhi East,Delhi,Delhi,,East Delhi,DELHI,011-22320223,,Jhilmil H.O,,


In [7]:
df.shape

(1668, 15)

In [4]:
df_neighborhood = df[['officename','pincode','regionname']]
df_neighborhood.head()

Unnamed: 0,officename,pincode,regionname
0,Anand Vihar S.O,110092,Delhi
1,Azad Nagar S.O (East Delhi),110051,Delhi
2,Babarpur S.O (North East Delhi),110032,Delhi
3,Badarpur Khadar B.O,110090,Delhi
4,Balbir Nagar S.O,110032,Delhi


In [5]:
df_neighborhood = df_neighborhood.rename(columns = {"officename": "Neighborhood"}) 
df_neighborhood.head()

Unnamed: 0,Neighborhood,pincode,regionname
0,Anand Vihar S.O,110092,Delhi
1,Azad Nagar S.O (East Delhi),110051,Delhi
2,Babarpur S.O (North East Delhi),110032,Delhi
3,Badarpur Khadar B.O,110090,Delhi
4,Balbir Nagar S.O,110032,Delhi


In [6]:
df_neighborhood_clean = df_neighborhood.groupby(['pincode','regionname']).agg(lambda x: ", ".join(x.astype(str))).reset_index()
df_neighborhood_clean.head()

Unnamed: 0,pincode,regionname,Neighborhood
0,110001,Delhi,"Baroda House S.O, Bengali Market S.O, Bhagat S..."
1,110002,Delhi,"A.G.C.R. S.O, Ajmeri Gate Extn. S.O, Darya Gan..."
2,110003,Delhi,"Delhi High Court Extension Counter S.O, Delhi ..."
3,110004,Delhi,Rashtrapati Bhawan S.O
4,110005,Delhi,"Anand Parbat Indl. Area S.O, Anand Parbat S.O,..."


In [7]:
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim

Solving environment: done


  current version: 4.5.11
  latest version: 4.8.0

Please update conda by running

    $ conda update -n base -c defaults conda



# All requested packages already installed.



In [8]:
!pip install pgeocode
import pgeocode
import urllib.request

Collecting pgeocode
  Downloading https://files.pythonhosted.org/packages/45/12/c02be61e117d19a43b3d2b804311eedf49c0158f446d5b0d52f259c4b0fb/pgeocode-0.1.2-py2.py3-none-any.whl
Installing collected packages: pgeocode
Successfully installed pgeocode-0.1.2


In [9]:
# Use pgeocode to get latitude & longitude details for different postal codes 

nomi = pgeocode.Nominatim('in')
#nomi.query_postal_code("110001")
df_lat_long = nomi.query_postal_code(df_neighborhood_clean["pincode"].astype(str).tolist())
#df_lat_long.to_csv('delhi_mumbai_lat_long.csv')
df_lat_long.head()

Unnamed: 0,postal_code,country code,place_name,state_name,state_code,county_name,county_code,community_name,community_code,latitude,longitude,accuracy
0,110001,IN,"New Delhi G.P.O., Parliament House, Connaught ...",Delhi,7,New Delhi,94.0,New Delhi,,28.6369,77.218229,3
1,110002,IN,"Civic Centre, Darya Ganj, I.P.Estate, Ajmeri G...",Delhi,7,New Delhi,94.0,New Delhi Central,,28.6453,77.2456,3
2,110003,IN,"Delhi High Court, Pandara Road, Delhi High Cou...",Delhi,7,Central Delhi,95.0,New Delhi,,28.5947,77.22527,3
3,110004,IN,Rashtrapati Bhawan,Delhi,7,Central Delhi,95.0,New Delhi,,28.6453,77.2128,1
4,110005,IN,"Desh Bandhu Gupta Road, Karol Bagh, Guru Gobin...",Delhi,7,Central Delhi,95.0,New Delhi,,28.6551,77.188775,3


In [10]:
df_lat_long = df_lat_long.rename(columns = {"postal_code": "pincode"}) 
df_lat_long.head()

Unnamed: 0,pincode,country code,place_name,state_name,state_code,county_name,county_code,community_name,community_code,latitude,longitude,accuracy
0,110001,IN,"New Delhi G.P.O., Parliament House, Connaught ...",Delhi,7,New Delhi,94.0,New Delhi,,28.6369,77.218229,3
1,110002,IN,"Civic Centre, Darya Ganj, I.P.Estate, Ajmeri G...",Delhi,7,New Delhi,94.0,New Delhi Central,,28.6453,77.2456,3
2,110003,IN,"Delhi High Court, Pandara Road, Delhi High Cou...",Delhi,7,Central Delhi,95.0,New Delhi,,28.5947,77.22527,3
3,110004,IN,Rashtrapati Bhawan,Delhi,7,Central Delhi,95.0,New Delhi,,28.6453,77.2128,1
4,110005,IN,"Desh Bandhu Gupta Road, Karol Bagh, Guru Gobin...",Delhi,7,Central Delhi,95.0,New Delhi,,28.6551,77.188775,3


In [11]:

df_lat_long = df_lat_long[['pincode', 'latitude','longitude']]
df_lat_long.head()

Unnamed: 0,pincode,latitude,longitude
0,110001,28.6369,77.218229
1,110002,28.6453,77.2456
2,110003,28.5947,77.22527
3,110004,28.6453,77.2128
4,110005,28.6551,77.188775


In [16]:
df_lat_long.dtypes


pincode       object
latitude     float64
longitude    float64
dtype: object

In [12]:
df_lat_long.pincode = df_lat_long.pincode.astype(int)

In [13]:
df_neighborhood_latlong = pd.merge(df_neighborhood_clean, df_lat_long, on='pincode')

In [14]:
df_neighborhood_latlong.head()

Unnamed: 0,pincode,regionname,Neighborhood,latitude,longitude
0,110001,Delhi,"Baroda House S.O, Bengali Market S.O, Bhagat S...",28.6369,77.218229
1,110002,Delhi,"A.G.C.R. S.O, Ajmeri Gate Extn. S.O, Darya Gan...",28.6453,77.2456
2,110003,Delhi,"Delhi High Court Extension Counter S.O, Delhi ...",28.5947,77.22527
3,110004,Delhi,Rashtrapati Bhawan S.O,28.6453,77.2128
4,110005,Delhi,"Anand Parbat Indl. Area S.O, Anand Parbat S.O,...",28.6551,77.188775


In [20]:
df_neighborhood_latlong.shape

(335, 5)

In [21]:
df_neighborhood_latlong.isna().sum()

pincode         0
regionname      0
Neighborhood    0
latitude        0
longitude       0
dtype: int64

In [15]:
df_mumbai = df_neighborhood_latlong[df_neighborhood_latlong['regionname']=='Mumbai'].reset_index(drop = True)
df_mumbai.head()

Unnamed: 0,pincode,regionname,Neighborhood,latitude,longitude
0,400001,Mumbai,"Bazargate S.O, Elephanta Caves Po B.O, M.P.T. ...",18.9474,72.8138
1,400002,Mumbai,"Kalbadevi H.O, Ramwadi S.O, S. C. Court S.O, T...",18.975,72.8258
2,400003,Mumbai,"B.P.Lane S.O, Mandvi S.O (Mumbai), Masjid S.O,...",18.95,72.8333
3,400004,Mumbai,"Ambewadi S.O (Mumbai), Charni Road S.O, Chaupa...",18.95,72.8167
4,400005,Mumbai,"Asvini S.O, Colaba Bazar S.O, Colaba S.O, Holi...",18.9069,72.8106


In [23]:
df_mumbai.shape

(240, 5)

In [16]:
df_delhi = df_neighborhood_latlong[df_neighborhood_latlong['regionname']=='Delhi'].reset_index(drop = True)
df_delhi.head()

Unnamed: 0,pincode,regionname,Neighborhood,latitude,longitude
0,110001,Delhi,"Baroda House S.O, Bengali Market S.O, Bhagat S...",28.6369,77.218229
1,110002,Delhi,"A.G.C.R. S.O, Ajmeri Gate Extn. S.O, Darya Gan...",28.6453,77.2456
2,110003,Delhi,"Delhi High Court Extension Counter S.O, Delhi ...",28.5947,77.22527
3,110004,Delhi,Rashtrapati Bhawan S.O,28.6453,77.2128
4,110005,Delhi,"Anand Parbat Indl. Area S.O, Anand Parbat S.O,...",28.6551,77.188775


In [25]:
df_delhi.shape

(95, 5)

In [17]:
address = 'Mumbai, Maharashtra'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Mumbai are {}, {}.'.format(latitude, longitude))

  This is separate from the ipykernel package so we can avoid doing imports until


The geograpical coordinate of Mumbai are 18.9387711, 72.8353355.


In [18]:
map_Mumbai = folium.Map(location=[latitude, longitude], zoom_start=10)

folium.CircleMarker([latitude, longitude], radius=10, popup='Mumbai', color='Red', fill=True, fill_color='Red', fill_opacity=0.9,).add_to(map_Mumbai)

for lat, lng, label in zip(df_mumbai['latitude'], df_mumbai['longitude'], df_mumbai['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        ).add_to(map_Mumbai)  
    
map_Mumbai

In [47]:
address_delhi = 'Delhi, Delhi'

geolocator = Nominatim()
location_delhi = geolocator.geocode(address_delhi)
latitude_delhi = location_delhi.latitude
longitude_delhi = location_delhi.longitude
print('The geograpical coordinate of Delhi are {}, {}.'.format(latitude_delhi, longitude_delhi))

  This is separate from the ipykernel package so we can avoid doing imports until


The geograpical coordinate of Delhi are 28.6517178, 77.2219388.


In [48]:
map_Delhi = folium.Map(location=[latitude_delhi, longitude_delhi], zoom_start=10)

folium.CircleMarker([latitude_delhi, longitude_delhi], radius=10, popup='Delhi', color='Red', fill=True, fill_color='Red', fill_opacity=0.9,).add_to(map_Delhi)

for lat, lng, label in zip(df_delhi['latitude'], df_delhi['longitude'], df_delhi['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        ).add_to(map_Delhi)  
    
map_Delhi

In [21]:
CLIENT_ID = '1G5VQADCORTZCXP5TAJF3E4Q5WFGRGZCOXWQNNU3U2F0LVSX'      
CLIENT_SECRET = 'UPX5GQAPGPZKLVFAAA1HQWCZ3P0WUI0UKPGNI4NLMJJ2RWH4'   
VERSION = '20180901' 
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

CLIENT_ID: 1G5VQADCORTZCXP5TAJF3E4Q5WFGRGZCOXWQNNU3U2F0LVSX
CLIENT_SECRET:UPX5GQAPGPZKLVFAAA1HQWCZ3P0WUI0UKPGNI4NLMJJ2RWH4


In [22]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [23]:
radius = 500
LIMIT = 100 # this will retreive data for only 100 venues per neighbourhood
Mumbai_venues = getNearbyVenues(names = df_mumbai['Neighborhood'], latitudes = df_mumbai['latitude'], longitudes = df_mumbai['longitude'])

Bazargate S.O, Elephanta Caves Po B.O, M.P.T. S.O, Stock Exchange S.O, Tajmahal S.O, Town Hall S.O (Mumbai), Mumbai G.P.O. 
Kalbadevi H.O, Ramwadi S.O, S. C. Court S.O, Thakurdwar S.O
B.P.Lane S.O, Mandvi S.O (Mumbai), Masjid S.O, Null Bazar S.O
Ambewadi S.O (Mumbai), Charni Road S.O, Chaupati S.O, Girgaon S.O, Madhavbaug S.O, Opera House S.O
Asvini S.O, Colaba Bazar S.O, Colaba S.O, Holiday Camp S.O, V.W.T.C. S.O
Malabar Hill S.O
Bharat Nagar S.O (Mumbai), Grant Road S.O, N.S.Patkar Marg S.O, S V Marg S.O, Tardeo S.O
Falkland Road S.O, J.J.Hospital S.O, Kamathipura S.O, M A Marg S.O, Mumbai Central H.O
Chinchbunder H.O, Noor Baug S.O, Princess Dock S.O
Dockyard Road S.O, Mazgaon Dock S.O, Mazgaon Road S.O, Mazgaon S.O, V K Bhavan S.O
Agripada S.O, BPC  Jacob Circle S.O, Chinchpokli S.O, Haines Road S.O, Jacob Circle S.O
BEST STaff Quarters S.O, Chamarbaug S.O, Haffkin Institute S.O, Lal Baug S.O, Parel Naka S.O, Parel Rly Work Shop S.O, Parel S.O
Delisle Road S.O
Dadar Colony S.O, Dad

In [33]:
Mumbai_venues.shape

(575, 7)

In [34]:
Mumbai_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Bazargate S.O, Elephanta Caves Po B.O, M.P.T. ...",18.9474,72.8138,H2O And Salt Water Grill,18.950535,72.816868,Harbor / Marina
1,"Bazargate S.O, Elephanta Caves Po B.O, M.P.T. ...",18.9474,72.8138,Queens Necklace,18.945443,72.817872,Harbor / Marina
2,"Kalbadevi H.O, Ramwadi S.O, S. C. Court S.O, T...",18.975,72.8258,Celejor,18.975844,72.823679,Bakery
3,"Kalbadevi H.O, Ramwadi S.O, S. C. Court S.O, T...",18.975,72.8258,cafe coffee day,18.976988,72.824051,Coffee Shop
4,"Kalbadevi H.O, Ramwadi S.O, S. C. Court S.O, T...",18.975,72.8258,HDFC Bank,18.973795,72.822895,Bank


In [35]:
Mumbai_venues.isna().sum()

Neighborhood              0
Neighborhood Latitude     0
Neighborhood Longitude    0
Venue                     0
Venue Latitude            0
Venue Longitude           0
Venue Category            0
dtype: int64

In [24]:
Mumbai_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"A I Staff Colony S.O, Santacruz P&t Colony S.O",7,7,7,7,7,7
"Aareymilk Colony S.O, Nagari Niwara S.O, S R P F Camp S.O",5,5,5,5,5,5
"Agashi S.O, Chikhal Dongre B.O, Kophrad B.O, Vatar B.O",1,1,1,1,1,1
"Agripada S.O, BPC Jacob Circle S.O, Chinchpokli S.O, Haines Road S.O, Jacob Circle S.O",4,4,4,4,4,4
"Airoli B.O, Airoli S.O",8,8,8,8,8,8
...,...,...,...,...,...,...
V J B Udyan S.O,7,7,7,7,7,7
Virar East S.O,1,1,1,1,1,1
Vishnunagar S.O,1,1,1,1,1,1
Wagle I.E. S.O,1,1,1,1,1,1


In [36]:
print('There are {} uniques categories.'.format(len(Mumbai_venues['Venue Category'].unique())))

There are 115 uniques categories.


In [25]:
# one hot encoding
mumbai_onehot = pd.get_dummies(Mumbai_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
mumbai_onehot['Neighborhood'] = Mumbai_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [mumbai_onehot.columns[-1]] + list(mumbai_onehot.columns[:-1])
mumbai_onehot = mumbai_onehot[fixed_columns]

mumbai_onehot.head()

Unnamed: 0,Neighborhood,ATM,Airport Lounge,American Restaurant,Aquarium,Asian Restaurant,Astrologer,Athletics & Sports,Auto Workshop,Bakery,...,Supermarket,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Train,Train Station,Vegetarian / Vegan Restaurant,Wine Bar,Yoga Studio
0,"Bazargate S.O, Elephanta Caves Po B.O, M.P.T. ...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Bazargate S.O, Elephanta Caves Po B.O, M.P.T. ...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Kalbadevi H.O, Ramwadi S.O, S. C. Court S.O, T...",0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
3,"Kalbadevi H.O, Ramwadi S.O, S. C. Court S.O, T...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Kalbadevi H.O, Ramwadi S.O, S. C. Court S.O, T...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [39]:
mumbai_onehot.isna().sum().sum()

0

In [26]:
mumbai_grouped = mumbai_onehot.groupby('Neighborhood').mean().reset_index()
mumbai_grouped.head()

Unnamed: 0,Neighborhood,ATM,Airport Lounge,American Restaurant,Aquarium,Asian Restaurant,Astrologer,Athletics & Sports,Auto Workshop,Bakery,...,Supermarket,Tea Room,Thai Restaurant,Theater,Toy / Game Store,Train,Train Station,Vegetarian / Vegan Restaurant,Wine Bar,Yoga Studio
0,"A I Staff Colony S.O, Santacruz P&t Colony S.O",0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,...,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Aareymilk Colony S.O, Nagari Niwara S.O, S R P...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Agashi S.O, Chikhal Dongre B.O, Kophrad B.O, V...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Agripada S.O, BPC Jacob Circle S.O, Chinchpok...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Airoli B.O, Airoli S.O",0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0


In [27]:
radius = 500
LIMIT = 100 # this will retreive data for only 100 venues per neighbourhood
Delhi_venues = getNearbyVenues(names = df_delhi['Neighborhood'], latitudes = df_delhi['latitude'], longitudes = df_delhi['longitude'])

Baroda House S.O, Bengali Market S.O, Bhagat Singh Market S.O, Connaught Place S.O, Constitution House S.O, Election Commission S.O, Janpath S.O, Krishi Bhawan S.O, Lady Harding Medical College S.O, North Avenue S.O, Parliament House S.O, Patiala House S.O, Pragati Maidan Camp S.O, Pragati Maidan S.O, Rail Bhawan S.O, Sansad Marg H.O, Sansadiya Soudh S.O, Secretariat North S.O, Shastri Bhawan S.O, Supreme Court S.O, New Delhi G.P.O. 
A.G.C.R. S.O, Ajmeri Gate Extn. S.O, Darya Ganj S.O, Gandhi Smarak Nidhi S.O, I.P.Estate S.O, Indraprastha H.O, Minto Road S.O
Delhi High Court Extension Counter S.O, Delhi High Court S.O, Pandara Road S.O, Aliganj S.O (South Delhi), C G O Complex S.O, Golf Links S.O, Kasturba Nagar S.O (South Delhi), Lodi Road H.O, Pragati Vihar S.O, Safdarjung Air Port S.O
Rashtrapati Bhawan S.O
Anand Parbat Indl. Area S.O, Anand Parbat S.O, Bank Street S.O (Central Delhi), Desh Bandhu Gupta Road S.O, Guru Gobind Singh Marg S.O, Karol Bagh S.O, Master Prithvi Nath Marg S

In [42]:
Delhi_venues.shape

(474, 7)

In [43]:
Delhi_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Baroda House S.O, Bengali Market S.O, Bhagat S...",28.6369,77.218229,Sagar Ratna,28.635487,77.22065,Indian Restaurant
1,"Baroda House S.O, Bengali Market S.O, Bhagat S...",28.6369,77.218229,Nizam's Kathi Kabab | निजा़म काठी कबाब,28.634858,77.219462,Indian Restaurant
2,"Baroda House S.O, Bengali Market S.O, Bhagat S...",28.6369,77.218229,Wenger's,28.633412,77.218292,Bakery
3,"Baroda House S.O, Bengali Market S.O, Bhagat S...",28.6369,77.218229,Route 04,28.63489,77.220225,Bar
4,"Baroda House S.O, Bengali Market S.O, Bhagat S...",28.6369,77.218229,Khan Chacha | खान चाचा | خان چاچا,28.634202,77.22078,Indian Restaurant


In [44]:
Delhi_venues.isna().sum()

Neighborhood              0
Neighborhood Latitude     0
Neighborhood Longitude    0
Venue                     0
Venue Latitude            0
Venue Longitude           0
Venue Category            0
dtype: int64

In [45]:
Delhi_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"A F Rajokari S.O, Rajokari B.O",2,2,2,2,2,2
"A.G.C.R. S.O, Ajmeri Gate Extn. S.O, Darya Ganj S.O, Gandhi Smarak Nidhi S.O, I.P.Estate S.O, Indraprastha H.O, Minto Road S.O",1,1,1,1,1,1
"A.K.Market S.O, Multani Dhanda S.O, Pahar Ganj S.O, Swami Ram Tirth Nagar S.O",30,30,30,30,30,30
"Abul Fazal Enclave-I S.O, Jamia Nagar S.O, New Friends Colony S.O, Sukhdev Vihar S.O, Zakir Nagar S.O",3,3,3,3,3,3
"Adrash Nagar S.O, Bhalaswa B.O, Jahangir Puri A Block S.O, Jahangir Puri D Block S.O, Jahangir Puri H Block S.O, N.S.Mandi S.O",2,2,2,2,2,2
...,...,...,...,...,...,...
"R K Puram (Main) S.O, R K Puram West S.O",4,4,4,4,4,4
"R K Puram Sect-1 S.O, R K Puram Sect-12 S.O, R K Puram Sect-3 S.O, R K Puram Sect-4 S.O, R K Puram Sect-5 S.O, R K Puram Sect7 S.O, R K Puram Sect-8 S.O, R K Puram Sector - 6 Postal SB S.O",7,7,7,7,7,7
Rajender Nagar S.O,4,4,4,4,4,4
Rashtrapati Bhawan S.O,39,39,39,39,39,39


In [39]:
print('There are {} uniques categories.'.format(len(Delhi_venues['Venue Category'].unique())))

There are 111 uniques categories.


In [28]:
# one hot encoding
delhi_onehot = pd.get_dummies(Delhi_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
delhi_onehot['Neighborhood'] = Delhi_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [delhi_onehot.columns[-1]] + list(delhi_onehot.columns[:-1])
delhi_onehot = delhi_onehot[fixed_columns]

delhi_onehot.head()

Unnamed: 0,Neighborhood,ATM,Airport,Airport Service,American Restaurant,Arcade,Art Gallery,Asian Restaurant,Athletics & Sports,Australian Restaurant,...,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Wine Bar,Women's Store
0,"Baroda House S.O, Bengali Market S.O, Bhagat S...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Baroda House S.O, Bengali Market S.O, Bhagat S...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Baroda House S.O, Bengali Market S.O, Bhagat S...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Baroda House S.O, Bengali Market S.O, Bhagat S...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Baroda House S.O, Bengali Market S.O, Bhagat S...",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [29]:
delhi_grouped = delhi_onehot.groupby('Neighborhood').mean().reset_index()
delhi_grouped.head()

Unnamed: 0,Neighborhood,ATM,Airport,Airport Service,American Restaurant,Arcade,Art Gallery,Asian Restaurant,Athletics & Sports,Australian Restaurant,...,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Tibetan Restaurant,Toy / Game Store,Trail,Train Station,Turkish Restaurant,Wine Bar,Women's Store
0,"A F Rajokari S.O, Rajokari B.O",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"A.G.C.R. S.O, Ajmeri Gate Extn. S.O, Darya Gan...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"A.K.Market S.O, Multani Dhanda S.O, Pahar Ganj...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0
3,"Abul Fazal Enclave-I S.O, Jamia Nagar S.O, New...",0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Adrash Nagar S.O, Bhalaswa B.O, Jahangir Puri ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [30]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [31]:
num_top_venues = 10
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [32]:
# create a new dataframe for Mumbai top 10 venues
neighborhoods_venues_sorted_mumbai = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_mumbai['Neighborhood'] = mumbai_grouped['Neighborhood']

for ind in np.arange(mumbai_grouped.shape[0]):
    neighborhoods_venues_sorted_mumbai.iloc[ind, 1:] = return_most_common_venues(mumbai_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_mumbai.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"A I Staff Colony S.O, Santacruz P&t Colony S.O",Café,Coffee Shop,Modern European Restaurant,Asian Restaurant,Indian Restaurant,Tea Room,Spa,Yoga Studio,Farmers Market,Farm
1,"Aareymilk Colony S.O, Nagari Niwara S.O, S R P...",Smoke Shop,Bookstore,Indian Restaurant,Bakery,Fast Food Restaurant,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant
2,"Agashi S.O, Chikhal Dongre B.O, Kophrad B.O, V...",Dhaba,Yoga Studio,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Diner,Dumpling Restaurant,Electronics Store,Farm
3,"Agripada S.O, BPC Jacob Circle S.O, Chinchpok...",History Museum,Multiplex,Fast Food Restaurant,Snack Place,Fish & Chips Shop,Cosmetics Shop,Deli / Bodega,Department Store,Dessert Shop,Dhaba
4,"Airoli B.O, Airoli S.O",Pizza Place,Toy / Game Store,Café,Dumpling Restaurant,Asian Restaurant,Fast Food Restaurant,Gym,Hotel Bar,Dessert Shop,Fish & Chips Shop


In [50]:
neighborhoods_venues_sorted_mumbai.shape

(106, 11)

In [51]:
neighborhoods_venues_sorted_mumbai.isna().sum()

Neighborhood              0
1st Most Common Venue     0
2nd Most Common Venue     0
3rd Most Common Venue     0
4th Most Common Venue     0
5th Most Common Venue     0
6th Most Common Venue     0
7th Most Common Venue     0
8th Most Common Venue     0
9th Most Common Venue     0
10th Most Common Venue    0
dtype: int64

In [33]:
# create a new dataframe for Delhi top 10 venues
neighborhoods_venues_sorted_delhi = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_delhi['Neighborhood'] = delhi_grouped['Neighborhood']

for ind in np.arange(delhi_grouped.shape[0]):
    neighborhoods_venues_sorted_delhi.iloc[ind, 1:] = return_most_common_venues(delhi_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_delhi.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"A F Rajokari S.O, Rajokari B.O",Shoe Store,Mobile Phone Shop,Women's Store,Deli / Bodega,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant
1,"A.G.C.R. S.O, Ajmeri Gate Extn. S.O, Darya Gan...",Food & Drink Shop,Women's Store,Garden,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market
2,"A.K.Market S.O, Multani Dhanda S.O, Pahar Ganj...",Hotel,Indian Restaurant,Café,Indian Chinese Restaurant,Fast Food Restaurant,Restaurant,Breakfast Spot,Food,Snack Place,Bar
3,"Abul Fazal Enclave-I S.O, Jamia Nagar S.O, New...",Indian Restaurant,Airport Service,Hotel,Garden,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant
4,"Adrash Nagar S.O, Bhalaswa B.O, Jahangir Puri ...",Shoe Store,Mobile Phone Shop,Women's Store,Deli / Bodega,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant


In [58]:
neighborhoods_venues_sorted_delhi.isna().sum()

Neighborhood              0
1st Most Common Venue     0
2nd Most Common Venue     0
3rd Most Common Venue     0
4th Most Common Venue     0
5th Most Common Venue     0
6th Most Common Venue     0
7th Most Common Venue     0
8th Most Common Venue     0
9th Most Common Venue     0
10th Most Common Venue    0
dtype: int64

# Cluster Mumbai Neighborhoods

In [34]:
from sklearn.cluster import KMeans

In [35]:
# set number of clusters
kclusters = 5

mumbai_grouped_clustering = mumbai_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(mumbai_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 1, 1, 1, 1, 0, 1], dtype=int32)

In [61]:
neighborhoods_venues_sorted_mumbai.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"A I Staff Colony S.O, Santacruz P&t Colony S.O",Café,Coffee Shop,Modern European Restaurant,Asian Restaurant,Indian Restaurant,Tea Room,Spa,Yoga Studio,Farmers Market,Farm
1,"Aareymilk Colony S.O, Nagari Niwara S.O, S R P...",Smoke Shop,Bookstore,Indian Restaurant,Bakery,Fast Food Restaurant,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant
2,"Agashi S.O, Chikhal Dongre B.O, Kophrad B.O, V...",Dhaba,Yoga Studio,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Diner,Dumpling Restaurant,Electronics Store,Farm
3,"Agripada S.O, BPC Jacob Circle S.O, Chinchpok...",History Museum,Multiplex,Fast Food Restaurant,Snack Place,Fish & Chips Shop,Cosmetics Shop,Deli / Bodega,Department Store,Dessert Shop,Dhaba
4,"Airoli B.O, Airoli S.O",Pizza Place,Toy / Game Store,Café,Dumpling Restaurant,Asian Restaurant,Fast Food Restaurant,Gym,Hotel Bar,Dessert Shop,Fish & Chips Shop


In [36]:
# add clustering labels
neighborhoods_venues_sorted_mumbai.insert(0, 'ClusterLabels', kmeans.labels_)

Mumbai_merged = df_mumbai
Mumbai_merged = Mumbai_merged.join(neighborhoods_venues_sorted_mumbai.set_index('Neighborhood'), on='Neighborhood', how ='inner')
Mumbai_merged.head()

Unnamed: 0,pincode,regionname,Neighborhood,latitude,longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,400001,Mumbai,"Bazargate S.O, Elephanta Caves Po B.O, M.P.T. ...",18.9474,72.8138,1,Harbor / Marina,Yoga Studio,Flea Market,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store
1,400002,Mumbai,"Kalbadevi H.O, Ramwadi S.O, S. C. Court S.O, T...",18.975,72.8258,1,Coffee Shop,Athletics & Sports,Bakery,Bank,Food,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant
2,400003,Mumbai,"B.P.Lane S.O, Mandvi S.O (Mumbai), Masjid S.O,...",18.95,72.8333,1,Indian Restaurant,Cheese Shop,Middle Eastern Restaurant,Restaurant,Rest Area,Market,Food,Electronics Store,Fast Food Restaurant,Ice Cream Shop
3,400004,Mumbai,"Ambewadi S.O (Mumbai), Charni Road S.O, Chaupa...",18.95,72.8167,1,Harbor / Marina,Indian Restaurant,Train Station,Café,Food Truck,Juice Bar,Japanese Restaurant,Pizza Place,Fast Food Restaurant,Restaurant
4,400005,Mumbai,"Asvini S.O, Colaba Bazar S.O, Colaba S.O, Holi...",18.9069,72.8106,1,Gym,Garden,Bar,Yoga Studio,Flea Market,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner


In [69]:
Mumbai_merged.isna().sum()

pincode                   0
regionname                0
Neighborhood              0
latitude                  0
longitude                 0
ClusterLabels             0
1st Most Common Venue     0
2nd Most Common Venue     0
3rd Most Common Venue     0
4th Most Common Venue     0
5th Most Common Venue     0
6th Most Common Venue     0
7th Most Common Venue     0
8th Most Common Venue     0
9th Most Common Venue     0
10th Most Common Venue    0
dtype: int64

In [37]:
import matplotlib.cm as cm
import matplotlib.colors as colors
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Mumbai_merged['latitude'], Mumbai_merged['longitude'], Mumbai_merged['Neighborhood'], Mumbai_merged['ClusterLabels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [71]:
Mumbai_merged.loc[Mumbai_merged['ClusterLabels'] == 0, Mumbai_merged.columns[[1] + list(range(5, Mumbai_merged.shape[1]))]]

Unnamed: 0,regionname,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
89,Mumbai,0,Fish & Chips Shop,Yoga Studio,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store
90,Mumbai,0,Fish & Chips Shop,Yoga Studio,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store
91,Mumbai,0,Fish & Chips Shop,Yoga Studio,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store
92,Mumbai,0,Fish & Chips Shop,Yoga Studio,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store
93,Mumbai,0,Fish & Chips Shop,Yoga Studio,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store
94,Mumbai,0,Fish & Chips Shop,Yoga Studio,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store
95,Mumbai,0,Fish & Chips Shop,Yoga Studio,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store
96,Mumbai,0,Fish & Chips Shop,Yoga Studio,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store
97,Mumbai,0,Fish & Chips Shop,Yoga Studio,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store
98,Mumbai,0,Fish & Chips Shop,Yoga Studio,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store


12 neighborhoods belong to this cluster. The neighborhoods belonging to this cluster are popular for having Fish and Chips shop, Yoga Studio, Delis, and Convenience stores. This neighborhood seems like a place where a young population would enjoy as it comes with various restaurants of different cuisines as well Yoga Studios and necessities like convenience store. 

In [72]:
Mumbai_merged.loc[Mumbai_merged['ClusterLabels'] == 1, Mumbai_merged.columns[[1] + list(range(5, Mumbai_merged.shape[1]))]]

Unnamed: 0,regionname,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Mumbai,1,Harbor / Marina,Yoga Studio,Flea Market,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store
1,Mumbai,1,Coffee Shop,Athletics & Sports,Bakery,Bank,Food,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant
2,Mumbai,1,Indian Restaurant,Cheese Shop,Middle Eastern Restaurant,Restaurant,Rest Area,Market,Food,Electronics Store,Fast Food Restaurant,Ice Cream Shop
3,Mumbai,1,Harbor / Marina,Indian Restaurant,Train Station,Café,Food Truck,Juice Bar,Japanese Restaurant,Pizza Place,Fast Food Restaurant,Restaurant
4,Mumbai,1,Gym,Garden,Bar,Yoga Studio,Flea Market,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner
...,...,...,...,...,...,...,...,...,...,...,...,...
213,Mumbai,1,ATM,Plaza,Flea Market,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store
216,Mumbai,1,Café,Pizza Place,Fast Food Restaurant,Yoga Studio,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner
217,Mumbai,1,Hotel,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store,Farm
218,Mumbai,1,Hotel,Convenience Store,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store,Farm


76 neighborhoods are grouped into this cluster. This Cluster contains harbor, gym, ATM, cafes, flea markets as well as variety of stores and restraunts.

In [73]:
Mumbai_merged.loc[Mumbai_merged['ClusterLabels'] == 2, Mumbai_merged.columns[[1] + list(range(5, Mumbai_merged.shape[1]))]]

Unnamed: 0,regionname,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Mumbai,2,Boat or Ferry,Food,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store,Farm
37,Mumbai,2,Boat or Ferry,Food,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store,Farm
49,Mumbai,2,Boat or Ferry,Food,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store,Farm
63,Mumbai,2,Boat or Ferry,Food,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store,Farm
77,Mumbai,2,Boat or Ferry,Food,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store,Farm


5 neighbourhoods are grouped into this cluster. The most common venue in this cluster is boat or ferry. So it would be a good place for people who depend on boats or ferries for transport.

In [74]:
Mumbai_merged.loc[Mumbai_merged['ClusterLabels'] == 3, Mumbai_merged.columns[[1] + list(range(5, Mumbai_merged.shape[1]))]]

Unnamed: 0,regionname,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
58,Mumbai,3,Men's Store,Food,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store,Farm
59,Mumbai,3,Men's Store,Food,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store,Farm
60,Mumbai,3,Men's Store,Food,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store,Farm
61,Mumbai,3,Men's Store,Food,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store,Farm
62,Mumbai,3,Men's Store,Food,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store,Farm
64,Mumbai,3,Men's Store,Food,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store,Farm


6 neighbourhoods are grouped into this cluster. The most common venue in this cluster is Men's store followed by different restraunts.

In [75]:
Mumbai_merged.loc[Mumbai_merged['ClusterLabels'] == 4, Mumbai_merged.columns[[1] + list(range(5, Mumbai_merged.shape[1]))]]

Unnamed: 0,regionname,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
68,Mumbai,4,Clothing Store,Movie Theater,Yoga Studio,Flea Market,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store
69,Mumbai,4,Clothing Store,Movie Theater,Yoga Studio,Flea Market,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store
70,Mumbai,4,Clothing Store,Movie Theater,Yoga Studio,Flea Market,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store
71,Mumbai,4,Clothing Store,Movie Theater,Yoga Studio,Flea Market,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store
72,Mumbai,4,Clothing Store,Movie Theater,Yoga Studio,Flea Market,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store
73,Mumbai,4,Clothing Store,Movie Theater,Yoga Studio,Flea Market,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store
75,Mumbai,4,Clothing Store,Movie Theater,Yoga Studio,Flea Market,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store


7 neighbourhoods are grouped into this cluster. This Cluster contains Movie theaters as the 2nd most common venues. Could be a good place for theatre buffs to hang around.

# Cluster Delhi  Neighborhoods

In [49]:
kclusters = 5

delhi_grouped_clustering = delhi_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(delhi_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 0, 0, 4, 2, 0, 0, 0, 0, 4], dtype=int32)

In [50]:
neighborhoods_venues_sorted_delhi.head()

Unnamed: 0,ClusterLabels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2,"A F Rajokari S.O, Rajokari B.O",Shoe Store,Mobile Phone Shop,Women's Store,Deli / Bodega,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant
1,0,"A.G.C.R. S.O, Ajmeri Gate Extn. S.O, Darya Gan...",Food & Drink Shop,Women's Store,Garden,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market
2,0,"A.K.Market S.O, Multani Dhanda S.O, Pahar Ganj...",Hotel,Indian Restaurant,Café,Indian Chinese Restaurant,Fast Food Restaurant,Restaurant,Breakfast Spot,Food,Snack Place,Bar
3,4,"Abul Fazal Enclave-I S.O, Jamia Nagar S.O, New...",Indian Restaurant,Airport Service,Hotel,Garden,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant
4,2,"Adrash Nagar S.O, Bhalaswa B.O, Jahangir Puri ...",Shoe Store,Mobile Phone Shop,Women's Store,Deli / Bodega,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant


In [51]:
# add clustering labels
#neighborhoods_venues_sorted_delhi.insert(0, 'ClusterLabels', kmeans.labels_)

Delhi_merged = df_delhi

Delhi_merged = Delhi_merged.join(neighborhoods_venues_sorted_delhi.set_index('Neighborhood'), on='Neighborhood', how ='inner')
Delhi_merged.head()

Unnamed: 0,pincode,regionname,Neighborhood,latitude,longitude,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,110001,Delhi,"Baroda House S.O, Bengali Market S.O, Bhagat S...",28.6369,77.218229,0,Indian Restaurant,Chinese Restaurant,Café,Lounge,Hotel,Deli / Bodega,Bar,Platform,Dessert Shop,Falafel Restaurant
1,110002,Delhi,"A.G.C.R. S.O, Ajmeri Gate Extn. S.O, Darya Gan...",28.6453,77.2456,0,Food & Drink Shop,Women's Store,Garden,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market
3,110004,Delhi,Rashtrapati Bhawan S.O,28.6453,77.2128,0,Hotel,Restaurant,Café,Indian Restaurant,Hostel,Pizza Place,Indian Chinese Restaurant,Bakery,Motel,Road
4,110005,Delhi,"Anand Parbat Indl. Area S.O, Anand Parbat S.O,...",28.6551,77.188775,0,Pharmacy,Hotel,Smoke Shop,Asian Restaurant,Snack Place,Ice Cream Shop,Dessert Shop,Donut Shop,Department Store,Fabric Shop
5,110006,Delhi,"Delhi G.P.O. , Baratooti S.O, Chandni Chowk S....",28.6453,77.2128,0,Hotel,Restaurant,Café,Indian Restaurant,Hostel,Pizza Place,Indian Chinese Restaurant,Bakery,Motel,Road


In [52]:
import matplotlib.cm as cm
import matplotlib.colors as colors
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=10)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Delhi_merged['latitude'], Delhi_merged['longitude'], Delhi_merged['Neighborhood'], Delhi_merged['ClusterLabels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [80]:
Delhi_merged.loc[Delhi_merged['ClusterLabels'] == 0, Delhi_merged.columns[[1] + list(range(5, Delhi_merged.shape[1]))]]

Unnamed: 0,regionname,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Delhi,0,Indian Restaurant,Chinese Restaurant,Café,Lounge,Hotel,Deli / Bodega,Bar,Platform,Dessert Shop,Falafel Restaurant
1,Delhi,0,Food & Drink Shop,Women's Store,Garden,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market
3,Delhi,0,Hotel,Restaurant,Café,Indian Restaurant,Hostel,Pizza Place,Indian Chinese Restaurant,Bakery,Motel,Road
4,Delhi,0,Pharmacy,Hotel,Smoke Shop,Asian Restaurant,Snack Place,Ice Cream Shop,Dessert Shop,Donut Shop,Department Store,Fabric Shop
5,Delhi,0,Hotel,Restaurant,Café,Indian Restaurant,Hostel,Pizza Place,Indian Chinese Restaurant,Bakery,Motel,Road
6,Delhi,0,Pizza Place,Fast Food Restaurant,Bank,Flea Market,Sandwich Place,Grocery Store,Snack Place,Chinese Restaurant,Donut Shop,Indian Restaurant
7,Delhi,0,Hotel,Restaurant,Café,Indian Restaurant,Hostel,Pizza Place,Indian Chinese Restaurant,Bakery,Motel,Road
12,Delhi,0,Platform,Café,Moving Target,Food Court,Women's Store,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant
15,Delhi,0,Café,Restaurant,Indian Restaurant,Bakery,Lounge,Bar,Mediterranean Restaurant,Coffee Shop,American Restaurant,Art Gallery
16,Delhi,0,Park,Hotel,Thai Restaurant,Café,Gym,Falafel Restaurant,Pizza Place,Women's Store,Food & Drink Shop,Dessert Shop


36 neighbourhoods belong to this cluster. This cluster has pubs, coffee shops, restaurants, in addition to malls and History Museums.

In [81]:
Delhi_merged.loc[Delhi_merged['ClusterLabels'] == 1, Delhi_merged.columns[[1] + list(range(5, Delhi_merged.shape[1]))]]

Unnamed: 0,regionname,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
44,Delhi,1,Snack Place,Women's Store,Deli / Bodega,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market
87,Delhi,1,IT Services,Gym,Snack Place,Women's Store,Fried Chicken Joint,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant
88,Delhi,1,Gym,Women's Store,Garden,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market
90,Delhi,1,IT Services,Gym,Snack Place,Women's Store,Fried Chicken Joint,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant
91,Delhi,1,IT Services,Gym,Snack Place,Women's Store,Fried Chicken Joint,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant
92,Delhi,1,IT Services,Gym,Snack Place,Women's Store,Fried Chicken Joint,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant
93,Delhi,1,IT Services,Gym,Snack Place,Women's Store,Fried Chicken Joint,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant
94,Delhi,1,IT Services,Gym,Snack Place,Women's Store,Fried Chicken Joint,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant


8 neighbourhoods belong to this cluster. This cluster mainly contains IT services , Gym and snack places. It might be an area to stay for employees working in IT services.

In [82]:
Delhi_merged.loc[Delhi_merged['ClusterLabels'] == 2, Delhi_merged.columns[[1] + list(range(5, Delhi_merged.shape[1]))]]

Unnamed: 0,regionname,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
32,Delhi,2,Shoe Store,Mobile Phone Shop,Women's Store,Deli / Bodega,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant
33,Delhi,2,Shoe Store,Mobile Phone Shop,Women's Store,Deli / Bodega,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant
37,Delhi,2,Shoe Store,Mobile Phone Shop,Women's Store,Deli / Bodega,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant


3 neighbourhoods belong to this cluster. This cluster is a good shopping hub as it contains shoe stores, women's and mobile phone stores, fabric shop along with dessert and donut shops.

In [83]:
Delhi_merged.loc[Delhi_merged['ClusterLabels'] == 3, Delhi_merged.columns[[1] + list(range(5, Delhi_merged.shape[1]))]]

Unnamed: 0,regionname,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
46,Delhi,3,ATM,Garden,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market,Food
48,Delhi,3,ATM,Garden,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market,Food
59,Delhi,3,ATM,Gym / Fitness Center,Garden,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market


3 neighbourhoods belong to this cluster. This cluster mainly contains ATM, gyms and gardens. Might be a good place for joggers.

In [84]:
Delhi_merged.loc[Delhi_merged['ClusterLabels'] == 4, Delhi_merged.columns[[1] + list(range(5, Delhi_merged.shape[1]))]]

Unnamed: 0,regionname,ClusterLabels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Delhi,4,Indian Restaurant,Mobile Phone Shop,Garden,Ice Cream Shop,Fried Chicken Joint,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant
17,Delhi,4,Women's Store,Indian Restaurant,Airport,Department Store,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant,Flea Market
21,Delhi,4,Indian Restaurant,Athletics & Sports,Modern European Restaurant,Bistro,Bakery,Deli / Bodega,Art Gallery,Gift Shop,Fabric Shop,Falafel Restaurant
22,Delhi,4,Indian Restaurant,Athletics & Sports,Modern European Restaurant,Bistro,Bakery,Deli / Bodega,Art Gallery,Gift Shop,Fabric Shop,Falafel Restaurant
24,Delhi,4,Indian Restaurant,Airport Service,Hotel,Garden,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant
26,Delhi,4,Indian Restaurant,Athletics & Sports,Modern European Restaurant,Bistro,Bakery,Deli / Bodega,Art Gallery,Gift Shop,Fabric Shop,Falafel Restaurant
27,Delhi,4,Indian Restaurant,Athletics & Sports,Modern European Restaurant,Bistro,Bakery,Deli / Bodega,Art Gallery,Gift Shop,Fabric Shop,Falafel Restaurant
28,Delhi,4,Indian Restaurant,Nightclub,Dessert Shop,Department Store,IT Services,Hotel,Indian Chinese Restaurant,Donut Shop,Electronics Store,Ice Cream Shop
31,Delhi,4,Indian Restaurant,Ice Cream Shop,Clothing Store,Fried Chicken Joint,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop,Falafel Restaurant,Fast Food Restaurant
65,Delhi,4,Indian Restaurant,Department Store,Pharmacy,Grocery Store,Steakhouse,Food Court,Dessert Shop,Donut Shop,Electronics Store,Fabric Shop


12 neighbourhoods belong to this cluster. The neighborhoods belonging to this cluster is popular for having Indian restaurants. It also contains Athletics & sports venues  and modern European restaurants.

# Difference between Mumbai & Delhi
In this project, I have taken the data for two of India’s metro cities and have tried to analyze the neighborhood regions in these metro cities based on the top venues they have. I have clustered the neighborhoods based on the most common top venues in each of the neighborhood. I tried to understand the difference in the type of venues in these metros, which can offer decision points for anybody who is considering to settle in either of the metro cities.

Given our cluster information for both Mumbai and Delhi, we see that Mumbai and its neighbourhoods are a great place for a foodie. There are a lot of restaurants, cafes, bars, etc in Mumbai neighbourhoods. Also due to the proximity of Mumbai to the seashore, Mumbai neighborhoods offer for harbors, seafood, boat, and ferry rides. 
On the other hand, Delhi neighborhoods and good for those who like Arts and Crafts, History Museums and Pizza places. There is very less in terms of foreign cuisine restaurants in Delhi. Mumbai, on the other hand, is great for international visitors, expats, etc, because of the variety and types of food outlets it has. Delhi is inland and its neighborhoods have proximity to Water Parks, Museums and Arts, and Crafts stores.
