# Analyzing New Delhi for Opening a New Business


## Introduction

We will be analyzing the city of New Delhi, India to find a good spot to open our business. We are a large conglomerate owning popular hotels, restaurants, hypermarkets and even a bank. We are planning to expand our business to New Delhi and will be analyzing the various localities of the city to find out which would be the ideal location for opening a particular type of business center.

We will be starting our solution by downloading the Zomato dataset of New Delhi available in Kaggle to get the information of the various localities in New Delhi. We will then use the FourSquare API to get the various popular business establishments in each locality and conclude where to set up our businesses.


## Table of Contents

1. Download the dataset

2. Clean and format the dataset

3. Explore the dataset

4. Use FourSquare API to get the required information

5. Analyze the resulting dataset

6. Use Machine Learning to predict the ideal spot for our business establishment

7. Analyze the Resulting Clusters


First, before diving in to the analysis, we need to import some required libraries

In [1]:
import numpy as np
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import folium

## 1. Download the Dataset

The dataset can be downloaded from the following URL. Once the download is complete, please extract the archive and move the zomato.csv file in the working directory.

Then, we import the dataset in a pandas dataframe and inspect it.

Download URL: https://www.kaggle.com/shrutimehta/zomato-restaurants-data?select=zomato.csv

In [2]:
df_zomato = pd.read_csv('zomato.csv',encoding = "ISO-8859-1")

df_zomato.head()

Unnamed: 0,Restaurant ID,Restaurant Name,Country Code,City,Address,Locality,Locality Verbose,Longitude,Latitude,Cuisines,Average Cost for two,Currency,Has Table booking,Has Online delivery,Is delivering now,Switch to order menu,Price range,Aggregate rating,Rating color,Rating text,Votes
0,6317637,Le Petit Souffle,162,Makati City,"Third Floor, Century City Mall, Kalayaan Avenu...","Century City Mall, Poblacion, Makati City","Century City Mall, Poblacion, Makati City, Mak...",121.027535,14.565443,"French, Japanese, Desserts",1100,Botswana Pula(P),Yes,No,No,No,3,4.8,Dark Green,Excellent,314
1,6304287,Izakaya Kikufuji,162,Makati City,"Little Tokyo, 2277 Chino Roces Avenue, Legaspi...","Little Tokyo, Legaspi Village, Makati City","Little Tokyo, Legaspi Village, Makati City, Ma...",121.014101,14.553708,Japanese,1200,Botswana Pula(P),Yes,No,No,No,3,4.5,Dark Green,Excellent,591
2,6300002,Heat - Edsa Shangri-La,162,Mandaluyong City,"Edsa Shangri-La, 1 Garden Way, Ortigas, Mandal...","Edsa Shangri-La, Ortigas, Mandaluyong City","Edsa Shangri-La, Ortigas, Mandaluyong City, Ma...",121.056831,14.581404,"Seafood, Asian, Filipino, Indian",4000,Botswana Pula(P),Yes,No,No,No,4,4.4,Green,Very Good,270
3,6318506,Ooma,162,Mandaluyong City,"Third Floor, Mega Fashion Hall, SM Megamall, O...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.056475,14.585318,"Japanese, Sushi",1500,Botswana Pula(P),No,No,No,No,4,4.9,Dark Green,Excellent,365
4,6314302,Sambo Kojin,162,Mandaluyong City,"Third Floor, Mega Atrium, SM Megamall, Ortigas...","SM Megamall, Ortigas, Mandaluyong City","SM Megamall, Ortigas, Mandaluyong City, Mandal...",121.057508,14.58445,"Japanese, Korean",1500,Botswana Pula(P),Yes,No,No,No,4,4.8,Dark Green,Excellent,229


## 2. Clean and format the dataset

We will remove all the rows that have null values, as they cannot be used in the analysis.

In [3]:
df_zomato.dropna(inplace=True)

We see that the dataframe has records from cities all over the world. We will need to filter only the data related to the city of New Delhi.

We will do the filter and assign this to a new dataframe.

In [4]:
df_zomato_new_delhi = df_zomato[df_zomato['City'] == 'New Delhi']

The Zomato dataset contains information about some restaurants. We will be needing information on other places of interest apart from just restaurants. Also, the dataset is from 2018. So there is a chance that new restaurants have been founded and some minor restaurants have closed shop. So we will need to get the most up-to-date data from somewhere else. For that, we will be using the FourSquare API later.

For now, we just extract the information about the various localities in Delhi, as this information is unlikely to change in the last two years.

In [5]:
required_columns = ['Locality']

df_filtered = df_zomato_new_delhi[required_columns]

df_filtered.head()

Unnamed: 0,Locality
2560,Aaya Nagar
2561,Adchini
2562,Adchini
2563,Adchini
2564,Adchini


We find that the locality dataframe has duplicated rows. We need to remove the duplicates so that each row has a unique locality name.

In [6]:
df_filtered.drop_duplicates(subset=None, keep='first', inplace=True)

df_filtered.shape

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


(254, 1)

In [7]:
df_filtered = df_filtered.reset_index()

In [8]:
df_filtered.head()

Unnamed: 0,index,Locality
0,2560,Aaya Nagar
1,2561,Adchini
2,2574,"Aditya Mega Mall, Karkardooma"
3,2578,Aerocity
4,2582,"Aggarwal City Mall, Pitampura"


Once we remove the duplicates, we see that there are 254 localities in Delhi. Now, we remove the extra index column that was added.

In [9]:
df_filtered.drop(columns=['index'], axis=1, inplace=True)
df_filtered.head()

Unnamed: 0,Locality
0,Aaya Nagar
1,Adchini
2,"Aditya Mega Mall, Karkardooma"
3,Aerocity
4,"Aggarwal City Mall, Pitampura"


In [10]:
df_filtered.shape

(254, 1)

Next, we need to get the exact coordinates of each locality. Bear in mind that since a locality is not officially defined by the New Delhi Municipality, there can be certain minor localities mentioned which have incorrect names. We will disregard these areas as it is beyond the scope of this project to get the correct locality names.

In [11]:
new_columns = ['Locality', 'Latitude', 'Longitude']

df_localities = pd.DataFrame(columns=new_columns)

df_localities.head()

Unnamed: 0,Locality,Latitude,Longitude


We will loop through all the rows in the dataframe and try to find the coordinates. If a coordinate is not found, we will be ignoring that locality for the reason stated above

In [12]:
for index, row in df_filtered.iterrows():
    latitude = None
    longitude = None
    lat_lng_coords = None

    locator = Nominatim(user_agent='myGeocoder')
    location = locator.geocode('{}, New Delhi'.format(row['Locality']))

    if location is not None:
        latitude = location.latitude
        longitude = location.longitude
    else:
        # if no coordinates are found, then skip this locality
        print('No coordinates found for {}, New Delhi'.format(row['Locality']))
        continue

    df_localities = df_localities.append({'Locality': row['Locality'],
                                        'Longitude': longitude,
                                        'Latitude': latitude}, ignore_index=True)


df_localities.head()

No coordinates found for Aaya Nagar, New Delhi
No coordinates found for Aditya Mega Mall, Karkardooma, New Delhi
No coordinates found for Aggarwal City Mall, Pitampura, New Delhi
No coordinates found for Aggarwal City Plaza, Rohini, New Delhi
No coordinates found for Andaz Delhi, Aerocity, New Delhi
No coordinates found for Ansal Plaza Mall, Khel Gaon Marg, New Delhi
No coordinates found for ARSS Mall, Paschim Vihar, New Delhi
No coordinates found for Ashok Vihar Phase 1, New Delhi
No coordinates found for Ashok Vihar Phase 2, New Delhi
No coordinates found for Ashok Vihar Phase 3, New Delhi
No coordinates found for Basant Lok Market, Vasant Vihar, New Delhi
No coordinates found for Bellagio, Ashok Vihar Phase 2, New Delhi
No coordinates found for Best Western Taurus Hotel, Mahipalpur, New Delhi
No coordinates found for Centaur Hotel, Aerocity, New Delhi
No coordinates found for City Centre Mall, Rohini, New Delhi
No coordinates found for City Square Mall, Rajouri Garden, New Delhi
No 

Unnamed: 0,Locality,Latitude,Longitude
0,Adchini,28.537024,77.198227
1,Aerocity,28.54799,77.12124
2,Alaknanda,28.529336,77.251632
3,"Ambience Mall, Vasant Kunj",28.541245,77.155108
4,Anand Lok,28.557292,77.219122


Finally, we see how many remaining localities we have that we can work with. We have 151 correctly mapped localities and their coordinates.

In [13]:
df_localities.shape

(151, 3)

## 3. Explore the Dataset

We will be plotting out the localities on the map of Delhi to get a better view of the various locations. We will be using the folium library for this purpose

First, we will get the coordinates of the city of New Delhi

In [14]:
address = 'New Delhi'

geolocator = Nominatim(user_agent="nd_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New Delhi is {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New Delhi is 28.6138954, 77.2090057.


Now, we will invoke the Folium library to generate a map of New Delhi, then plot the various localities using the coordinates obtained from the previous step

In [15]:
map_newdelhi = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for locality, lat, lng in zip(df_localities['Locality'], df_localities['Latitude'], df_localities['Longitude']):
    label = '{}, New Delhi'.format(locality)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newdelhi)  

map_newdelhi

We can view all the localities defined in the earlier stage on this map. Use the mouse wheel to zoom in and out of the map to get a clearer picture.

Next, we will choose one locality from our dataset and perform and try to find the nearby venues of it from the Foursquare API.

In [16]:
df_localities.loc[0, 'Locality']

'Adchini'

We choose the very first locality in our dataset and proceed with the analysis

First, we extract the coordinates of this locality from our dataframe

In [17]:
locality_latitude = df_localities.loc[0, 'Latitude'] # neighborhood latitude value
locality_longitude = df_localities.loc[0, 'Longitude'] # neighborhood longitude value

locality_name = df_localities.loc[0, 'Locality'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(locality_name, 
                                                               locality_latitude, 
                                                               locality_longitude))

Latitude and longitude values of Adchini are 28.537024000000002, 77.19822731852571.


## 4. Use FourSquare API to get the required information

Next, we define the Foursquare API details in some constants

In [18]:
CLIENT_ID = 'I3QEN1ICKGB4W1JQFCBQN42PO4QRL5DY0OMXW1VSSXF4IRG4' # your Foursquare ID
CLIENT_SECRET = 'SYZKRVWYN4MUMTKOZKPWTTCV5P0WQHKJS1OW0EIJCLSTYYWD' # TODO: Move the secret to an environment variable
VERSION = '20200630' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: I3QEN1ICKGB4W1JQFCBQN42PO4QRL5DY0OMXW1VSSXF4IRG4
CLIENT_SECRET:SYZKRVWYN4MUMTKOZKPWTTCV5P0WQHKJS1OW0EIJCLSTYYWD


Now, we generate the Foursquare API URL using the constants defined above for the coordinates of our chosen locality. We also set a **radius** variable to fetch the venues only within this distance from the center of the locality.
Additionally, we also define a **LIMIT** variable to restrict our response to only the first 100 venues at most.

In [19]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    locality_latitude, 
    locality_longitude, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=I3QEN1ICKGB4W1JQFCBQN42PO4QRL5DY0OMXW1VSSXF4IRG4&client_secret=SYZKRVWYN4MUMTKOZKPWTTCV5P0WQHKJS1OW0EIJCLSTYYWD&v=20200630&ll=28.537024000000002,77.19822731852571&radius=500&limit=100'

We use the generated API url to get a list of venues in our chosen locality and store it in the **results** variable.

In [20]:
results = requests.get(url).json()

Now, we define a custom function that will fetch the category of each venue from the API response.

In [21]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Then, we will parse the API response JSON and store it in a dataframe, set the category for each venue and also simplify the column names

In [22]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,100% Rock,Pub,28.535552,77.197523
1,Cafe Coffee Day,Café,28.538589,77.198683
2,Kuzart Lane,Café,28.53799,77.198368
3,Waves Restaurant,Indian Restaurant,28.538582,77.198771
4,Qutub Residency Hotel New Delhi,Hotel,28.535727,77.197161


We see that the Foursquare API was successfully able to retrieve the venues within the selected locality along with the coordinates and categories. Now, we can expand this solution to the remaining localities in our dataframe.

Hence, we create a custome function that will take the dataframe of localities and loop thorugh each record to find all the venues in each locality using the Foursquare API. Then it will extract the venue name, coordinates and the venue category and generate a dataframe for the entire response.

In [23]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Locality', 
                  'Locality Latitude', 
                  'Locality Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

We invoke the function defined above and pass our localities dataframe as the input. We store the response dataframe in the **newdelhi_venues** variable.

In [24]:
newdelhi_venues = getNearbyVenues(names=df_localities['Locality'],
                                   latitudes=df_localities['Latitude'],
                                   longitudes=df_localities['Longitude']
                                  )

Adchini
Aerocity
Alaknanda
Ambience Mall, Vasant Kunj
Anand Lok
Anand Vihar
Asaf Ali Road
Barakhamba Road
Bhikaji Cama Place
Chanakyapuri
Chander Nagar
Chandni Chowk
Chawri Bazar
Chhatarpur
Chittaranjan Park
Civil Lines
Connaught Place
Daryaganj
Defence Colony
Delhi Cantt.
Dilshad Garden
DLF Emporio Mall, Vasant Kunj
DLF Promenade Mall, Vasant Kunj
Dr. Zakir Hussain Marg
Durga Puri
East of Kailash
East Patel Nagar
Friends Colony
Garden of Five Senses, Saket
Geeta Colony
Green Park
GTB Nagar
Hauz Khas
Hauz Khas Village
Holiday Inn, Aerocity
INA
India Gate
IP Extension
ITO
Jail Road
Jama Masjid
Janakpuri
Jangpura
Janpath
Jasola
Jaypee Vasant Continental, Vasant Vihar
JNU
Jor Bagh
Kailash Colony
Kalkaji
Kamla Nagar
Kapashera
Karampura
Karkardooma
Karol Bagh
Kashmiri Gate
Khan Market
Kirti Nagar
Krishna Nagar
Lado Sarai
Lajpat Nagar 1
Lajpat Nagar 2
Lawrence Road
Laxmi Nagar
Le Meridien, Janpath
Living Style Mall, Jasola
Lodhi Colony
Lodhi Road
Mahipalpur
Maidens Hotel, Civil Lines
Majnu k

## 5. Analyze the resulting dataset

We do some basic analysis of the response dataframe. We check its shape to see how many venues are there in total and also inspect the first 5 rows of the dataframe.

In [25]:
print(newdelhi_venues.shape)
newdelhi_venues.head()

(1456, 7)


Unnamed: 0,Locality,Locality Latitude,Locality Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Adchini,28.537024,77.198227,100% Rock,28.535552,77.197523,Pub
1,Adchini,28.537024,77.198227,Cafe Coffee Day,28.538589,77.198683,Café
2,Adchini,28.537024,77.198227,Kuzart Lane,28.53799,77.198368,Café
3,Adchini,28.537024,77.198227,Waves Restaurant,28.538582,77.198771,Indian Restaurant
4,Adchini,28.537024,77.198227,Qutub Residency Hotel New Delhi,28.535727,77.197161,Hotel


We need to find out how many venues were fetched for each locality in our previous dataframe. We do this by using the *groupby* function on the *Locality* column of the new dataframe.

In [26]:
newdelhi_venues.groupby('Locality').count()

Unnamed: 0_level_0,Locality Latitude,Locality Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Locality,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Adchini,7,7,7,7,7,7
Aerocity,19,19,19,19,19,19
Alaknanda,10,10,10,10,10,10
"Ambience Mall, Vasant Kunj",46,46,46,46,46,46
Anand Lok,8,8,8,8,8,8
Anand Vihar,4,4,4,4,4,4
Asaf Ali Road,5,5,5,5,5,5
Barakhamba Road,16,16,16,16,16,16
Bhikaji Cama Place,11,11,11,11,11,11
Chanakyapuri,2,2,2,2,2,2


## 6. Use Machine Learning to predict the ideal spot for our business establishment

We will be using Machine Learning techniques to find out which categories of venues are in demand for each locality. This will help us to decide which type of business is likely to succeed in which localities.

This project will be using K-Means clustering to cluster the localities with similar popular venues and finally, we will be able to decide which cluster is suitable to open our business so that it thrives.

Now, we find out how many different types of venue categories we are working with. We do this using the *unique* function on the *Venue Category* column.

In [27]:
print('There are {} uniques categories.'.format(len(newdelhi_venues['Venue Category'].unique())))

There are 186 uniques categories.


We need to analyze which venue categories are present in each neighborhood. We can use the one-hot encoding technique for this purpose.

In [28]:
# one hot encoding
newdelhi_onehot = pd.get_dummies(newdelhi_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
newdelhi_onehot['Locality'] = newdelhi_venues['Locality'] 

# move neighborhood column to the first column
fixed_columns = [newdelhi_onehot.columns[-1]] + list(newdelhi_onehot.columns[:-1])
newdelhi_onehot = newdelhi_onehot[fixed_columns]

newdelhi_onehot.head()

Unnamed: 0,Locality,ATM,Accessories Store,Airport,Airport Food Court,Airport Terminal,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Australian Restaurant,Auto Dealership,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Bed & Breakfast,Beer Garden,Bike Shop,Bistro,Bookstore,Boutique,Breakfast Spot,Brewery,Buffet,Burger Joint,Burmese Restaurant,Bus Station,Business Service,Cafeteria,Café,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dhaba,Diner,Dog Run,Donut Shop,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Hostel,Hot Dog Joint,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irani Cafe,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Light Rail Station,Lighting Store,Liquor Store,Lounge,Market,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Mosque,Movie Theater,Moving Target,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Music Venue,Neighborhood,Nightclub,Nightlife Spot,Noodle House,North Indian Restaurant,Other Great Outdoors,Other Repair Shop,Paper / Office Supplies Store,Park,Parsi Restaurant,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Platform,Playground,Plaza,Portuguese Restaurant,Pub,Public Art,Punjabi Restaurant,Racetrack,Resort,Rest Area,Restaurant,Road,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Sculpture Garden,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Stadium,South Indian Restaurant,Spa,Sports Bar,Stadium,Steakhouse,Sushi Restaurant,Tapas Restaurant,Tea Room,Temple,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Tibetan Restaurant,Tourist Information Center,Toy / Game Store,Trail,Train Station,Travel Agency,Turkish Restaurant,Udupi Restaurant,Vegetarian / Vegan Restaurant,Water Park,Wine Bar,Women's Store,Yoga Studio
0,Adchini,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Adchini,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Adchini,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Adchini,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Adchini,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


We call the shape property on the new dataframe to check whether all the venue categories got one-hot encoded successfully. The resulting number of columns should be 1 greater than the number of unique venue categories.

In [29]:
newdelhi_onehot.shape

(1456, 187)

We need to find out how frequency of a certain venue category in each neighborhood. Hence we group the dataframe on the Locality name and find the mean of each venue category,

The higher the mean, the more a certain category is present in the Locality.

In [30]:
newdelhi_grouped = newdelhi_onehot.groupby('Locality').mean().reset_index()
newdelhi_grouped

Unnamed: 0,Locality,ATM,Accessories Store,Airport,Airport Food Court,Airport Terminal,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Australian Restaurant,Auto Dealership,BBQ Joint,Baby Store,Bagel Shop,Bakery,Bank,Bar,Bed & Breakfast,Beer Garden,Bike Shop,Bistro,Bookstore,Boutique,Breakfast Spot,Brewery,Buffet,Burger Joint,Burmese Restaurant,Bus Station,Business Service,Cafeteria,Café,Cheese Shop,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dhaba,Diner,Dog Run,Donut Shop,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Health & Beauty Service,Historic Site,History Museum,Hostel,Hot Dog Joint,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Irani Cafe,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Light Rail Station,Lighting Store,Liquor Store,Lounge,Market,Mediterranean Restaurant,Memorial Site,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Mosque,Movie Theater,Moving Target,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Music Venue,Neighborhood,Nightclub,Nightlife Spot,Noodle House,North Indian Restaurant,Other Great Outdoors,Other Repair Shop,Paper / Office Supplies Store,Park,Parsi Restaurant,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Platform,Playground,Plaza,Portuguese Restaurant,Pub,Public Art,Punjabi Restaurant,Racetrack,Resort,Rest Area,Restaurant,Road,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Sculpture Garden,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Stadium,South Indian Restaurant,Spa,Sports Bar,Stadium,Steakhouse,Sushi Restaurant,Tapas Restaurant,Tea Room,Temple,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Tibetan Restaurant,Tourist Information Center,Toy / Game Store,Trail,Train Station,Travel Agency,Turkish Restaurant,Udupi Restaurant,Vegetarian / Vegan Restaurant,Water Park,Wine Bar,Women's Store,Yoga Studio
0,Adchini,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Aerocity,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.315789,0.052632,0.0,0.0,0.052632,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Alaknanda,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Ambience Mall, Vasant Kunj",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.065217,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.021739,0.0,0.021739,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.065217,0.0,0.043478,0.065217,0.0,0.086957,0.0,0.0,0.0,0.0,0.0,0.021739,0.021739,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.065217,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.065217,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.065217,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Anand Lok,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Anand Vihar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Asaf Ali Road,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Barakhamba Road,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1875,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Bhikaji Cama Place,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.272727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Chanakyapuri,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


We check the number of rows in the resulting dataframe using the shape property.

In [31]:
newdelhi_grouped.shape

(148, 187)

Now, we loop through the grouped dataframe and find out the most popular venues in each locality. This will give us an idea about which types of businesses are in demand in each locality.

In [32]:
num_top_venues = 5

for hood in newdelhi_grouped['Locality']:
    print("----"+hood+"----")
    temp = newdelhi_grouped[newdelhi_grouped['Locality'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adchini----
               venue  freq
0                Pub  0.29
1               Café  0.29
2  Indian Restaurant  0.14
3   Parsi Restaurant  0.14
4              Hotel  0.14


----Aerocity----
             venue  freq
0            Hotel  0.32
1      Coffee Shop  0.11
2    Shopping Mall  0.05
3    Train Station  0.05
4  Thai Restaurant  0.05


----Alaknanda----
                       venue  freq
0                  BBQ Joint   0.2
1          Indian Restaurant   0.2
2                 Steakhouse   0.1
3  Middle Eastern Restaurant   0.1
4          Food & Drink Shop   0.1


----Ambience Mall, Vasant Kunj----
                  venue  freq
0           Coffee Shop  0.09
1    Italian Restaurant  0.07
2  Fast Food Restaurant  0.07
3        Clothing Store  0.07
4                  Café  0.07


----Anand Lok----
          venue  freq
0          Café  0.25
1  Dessert Shop  0.12
2         Hotel  0.12
3   Golf Course  0.12
4   Music Venue  0.12


----Anand Vihar----
           venue  freq
0    Pizz

We will create a function that will append the most popular venue categories next to each locality for each row in the dataframe.

In [33]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Next, we will invoke the defined function above and fetch the popular venue categories for each locality and append them to each row.

In [34]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Locality']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
localities_venues_sorted = pd.DataFrame(columns=columns)
localities_venues_sorted['Locality'] = newdelhi_grouped['Locality']

for ind in np.arange(newdelhi_grouped.shape[0]):
    localities_venues_sorted.iloc[ind, 1:] = return_most_common_venues(newdelhi_grouped.iloc[ind, :], num_top_venues)

localities_venues_sorted.head()

Unnamed: 0,Locality,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adchini,Pub,Café,Indian Restaurant,Parsi Restaurant,Hotel,Yoga Studio,Farmers Market,French Restaurant,Food Truck,Food Court
1,Aerocity,Hotel,Coffee Shop,Indian Restaurant,Train Station,Buffet,Bed & Breakfast,Punjabi Restaurant,Plaza,Thai Restaurant,Shopping Mall
2,Alaknanda,Indian Restaurant,BBQ Joint,Pizza Place,Food & Drink Shop,Steakhouse,Middle Eastern Restaurant,Coffee Shop,Restaurant,Yoga Studio,Farmers Market
3,"Ambience Mall, Vasant Kunj",Coffee Shop,Shopping Mall,Fast Food Restaurant,Café,Asian Restaurant,Clothing Store,Italian Restaurant,Chinese Restaurant,Indian Restaurant,Deli / Bodega
4,Anand Lok,Café,Dessert Shop,Hotel,Golf Course,Other Great Outdoors,Music Venue,Metro Station,Yoga Studio,Food Truck,Food Court


Now, we will be applying K-Means clustering to group the localities based on the popular venue categories. We set the number of clusters to 5 for this analysis.

In [35]:
# import k-means from clustering stage
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 5

newdelhi_grouped_clustering = newdelhi_grouped.drop('Locality', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(newdelhi_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 0, 1, 4, 4, 4, 1, 1, 4, 4], dtype=int32)

We merge the cluster labels to our dataframe and inspect the resulting dataframe to check which localities have been assigned which clusters.

In [36]:
# add clustering labels
localities_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

newdelhi_merged = df_localities

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
newdelhi_merged = newdelhi_merged.join(localities_venues_sorted.set_index('Locality'), on='Locality')

newdelhi_merged.head() # check the last columns!

Unnamed: 0,Locality,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adchini,28.537024,77.198227,1.0,Pub,Café,Indian Restaurant,Parsi Restaurant,Hotel,Yoga Studio,Farmers Market,French Restaurant,Food Truck,Food Court
1,Aerocity,28.54799,77.12124,0.0,Hotel,Coffee Shop,Indian Restaurant,Train Station,Buffet,Bed & Breakfast,Punjabi Restaurant,Plaza,Thai Restaurant,Shopping Mall
2,Alaknanda,28.529336,77.251632,1.0,Indian Restaurant,BBQ Joint,Pizza Place,Food & Drink Shop,Steakhouse,Middle Eastern Restaurant,Coffee Shop,Restaurant,Yoga Studio,Farmers Market
3,"Ambience Mall, Vasant Kunj",28.541245,77.155108,4.0,Coffee Shop,Shopping Mall,Fast Food Restaurant,Café,Asian Restaurant,Clothing Store,Italian Restaurant,Chinese Restaurant,Indian Restaurant,Deli / Bodega
4,Anand Lok,28.557292,77.219122,4.0,Café,Dessert Shop,Hotel,Golf Course,Other Great Outdoors,Music Venue,Metro Station,Yoga Studio,Food Truck,Food Court


We do some simple cleaning on the resulting dataframe and covert the cluster labels to integers, so that we are able to successfully plot the clusters on the map of New Delhi using the Folium library.

In [42]:
newdelhi_final = newdelhi_merged.dropna()

newdelhi_final['Cluster Labels'] = newdelhi_final['Cluster Labels'].astype(int)
newdelhi_final.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,Locality,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adchini,28.537024,77.198227,1,Pub,Café,Indian Restaurant,Parsi Restaurant,Hotel,Yoga Studio,Farmers Market,French Restaurant,Food Truck,Food Court
1,Aerocity,28.54799,77.12124,0,Hotel,Coffee Shop,Indian Restaurant,Train Station,Buffet,Bed & Breakfast,Punjabi Restaurant,Plaza,Thai Restaurant,Shopping Mall
2,Alaknanda,28.529336,77.251632,1,Indian Restaurant,BBQ Joint,Pizza Place,Food & Drink Shop,Steakhouse,Middle Eastern Restaurant,Coffee Shop,Restaurant,Yoga Studio,Farmers Market
3,"Ambience Mall, Vasant Kunj",28.541245,77.155108,4,Coffee Shop,Shopping Mall,Fast Food Restaurant,Café,Asian Restaurant,Clothing Store,Italian Restaurant,Chinese Restaurant,Indian Restaurant,Deli / Bodega
4,Anand Lok,28.557292,77.219122,4,Café,Dessert Shop,Hotel,Golf Course,Other Great Outdoors,Music Venue,Metro Station,Yoga Studio,Food Truck,Food Court


We invoke the Folium library and map each locality on the map of New Delhi again, but this time, with distinct colors for the 5 clusters. This will give us a better view of how the clusters are arranged.

In [43]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(newdelhi_final['Latitude'], newdelhi_final['Longitude'], newdelhi_final['Locality'], newdelhi_final['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## 7. Analyze the Resulting Clusters

We go through each of the 5 clusters individually and check which are the popular categories of venues in each cluster. With this analysis we can conclude where we should be opening a certain type of business in order for it to thrive.

First, we inspect the first cluster. Analyzing the result, we see that this cluster can be a good spot for Hotels.

In [53]:
newdelhi_final.loc[newdelhi_final['Cluster Labels'] == 0, newdelhi_final.columns[[0] + list(range(4, newdelhi_final.shape[1]))]]

Unnamed: 0,Locality,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Aerocity,Hotel,Coffee Shop,Indian Restaurant,Train Station,Buffet,Bed & Breakfast,Punjabi Restaurant,Plaza,Thai Restaurant,Shopping Mall
10,Chander Nagar,Brewery,Chinese Restaurant,Hotel,Lighting Store,Arts & Crafts Store,Gym,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck
15,Civil Lines,Light Rail Station,Chinese Restaurant,Indian Restaurant,Burger Joint,Convenience Store,Hotel,Yoga Studio,French Restaurant,Food Truck,Food Court
26,East Patel Nagar,Hotel,Ice Cream Shop,Pizza Place,Convenience Store,Light Rail Station,Liquor Store,Bar,Fast Food Restaurant,Yoga Studio,Farmers Market
27,Friends Colony,Hotel,Chinese Restaurant,Tea Room,Clothing Store,Farmers Market,Fried Chicken Joint,French Restaurant,Food Truck,Food Court,Food & Drink Shop
34,"Holiday Inn, Aerocity",Indian Restaurant,Bed & Breakfast,Food Truck,Hotel,Yoga Studio,Fast Food Restaurant,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Court
47,Jor Bagh,Bookstore,Hotel,Historic Site,Metro Station,Fast Food Restaurant,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food Court
51,Kapashera,Hotel,Pizza Place,Water Park,Yoga Studio,Farmers Market,Fried Chicken Joint,French Restaurant,Food Truck,Food Court,Food & Drink Shop
54,Karol Bagh,Snack Place,Dessert Shop,Hotel,Yoga Studio,Farmers Market,Fried Chicken Joint,French Restaurant,Food Truck,Food Court,Food & Drink Shop
57,Kirti Nagar,Furniture / Home Store,Hotel,BBQ Joint,Fast Food Restaurant,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food Court,Food & Drink Shop


Similarly, we inspect the next cluster. Analyzing the result, we see that this cluster can be a good spot for restaurants and other eateries.

In [54]:
newdelhi_final.loc[newdelhi_final['Cluster Labels'] == 1, newdelhi_final.columns[[0] + list(range(4, newdelhi_final.shape[1]))]]

Unnamed: 0,Locality,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adchini,Pub,Café,Indian Restaurant,Parsi Restaurant,Hotel,Yoga Studio,Farmers Market,French Restaurant,Food Truck,Food Court
2,Alaknanda,Indian Restaurant,BBQ Joint,Pizza Place,Food & Drink Shop,Steakhouse,Middle Eastern Restaurant,Coffee Shop,Restaurant,Yoga Studio,Farmers Market
6,Asaf Ali Road,Movie Theater,Fast Food Restaurant,Indian Restaurant,Bar,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food Court
7,Barakhamba Road,Theater,Art Gallery,Indian Restaurant,Café,Bakery,Concert Hall,Light Rail Station,Hotel,Arcade,Hotel Bar
12,Chawri Bazar,Indian Restaurant,Hardware Store,Paper / Office Supplies Store,Frozen Yogurt Shop,Fast Food Restaurant,Snack Place,Light Rail Station,Mosque,Hotel,Donut Shop
13,Chhatarpur,Japanese Restaurant,Indian Restaurant,Public Art,Metro Station,Flea Market,Farmers Market,Fried Chicken Joint,French Restaurant,Food Truck,Food Court
16,Connaught Place,Indian Restaurant,Café,Bar,Chinese Restaurant,Coffee Shop,Lounge,South Indian Restaurant,Pub,Restaurant,Deli / Bodega
17,Daryaganj,Indian Restaurant,Restaurant,Road,Hotel,Yoga Studio,French Restaurant,Food Truck,Food Court,Food & Drink Shop,Food
18,Defence Colony,Italian Restaurant,Indian Restaurant,Bakery,Café,Bar,Metro Station,Burger Joint,Restaurant,Coffee Shop,Market
20,Dilshad Garden,Pizza Place,Indian Restaurant,Diner,Light Rail Station,Farmers Market,Fried Chicken Joint,French Restaurant,Food Truck,Food Court,Food & Drink Shop


Analysis reveals that the third cluster is an ideal spot for building a new ATM.

In [55]:
newdelhi_final.loc[newdelhi_final['Cluster Labels'] == 2, newdelhi_final.columns[[0] + list(range(4, newdelhi_final.shape[1]))]]

Unnamed: 0,Locality,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
83,Najafgarh,ATM,Food & Drink Shop,Fast Food Restaurant,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food Court,Food
137,Trilokpuri,ATM,Fast Food Restaurant,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food Court,Food & Drink Shop,Food
148,Wazirpur,ATM,Fast Food Restaurant,Furniture / Home Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food Court,Food & Drink Shop,Food


The fourth cluster is good for opening a new recreational area like a gym or a fitness center.

In [56]:
newdelhi_final.loc[newdelhi_final['Cluster Labels'] == 3, newdelhi_final.columns[[0] + list(range(4, newdelhi_final.shape[1]))]]

# good for opening a Recreational Area

Unnamed: 0,Locality,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
30,Green Park,Music Venue,Rest Area,Yoga Studio,Farmers Market,Fried Chicken Joint,French Restaurant,Food Truck,Food Court,Food & Drink Shop,Food
39,Jail Road,Music Venue,Rest Area,Yoga Studio,Farmers Market,Fried Chicken Joint,French Restaurant,Food Truck,Food Court,Food & Drink Shop,Food
49,Kalkaji,Music Venue,Rest Area,Yoga Studio,Farmers Market,Fried Chicken Joint,French Restaurant,Food Truck,Food Court,Food & Drink Shop,Food
60,Lajpat Nagar 1,Music Venue,Rest Area,Yoga Studio,Farmers Market,Fried Chicken Joint,French Restaurant,Food Truck,Food Court,Food & Drink Shop,Food
63,Laxmi Nagar,Music Venue,Rest Area,Yoga Studio,Farmers Market,Fried Chicken Joint,French Restaurant,Food Truck,Food Court,Food & Drink Shop,Food
127,Subhash Nagar,Music Venue,Rest Area,Yoga Studio,Farmers Market,Fried Chicken Joint,French Restaurant,Food Truck,Food Court,Food & Drink Shop,Food


If we are planning to open a shopping mall, the fifth cluster would be a good choice as it will ensure a large number of footfall in our mall.

In [57]:
newdelhi_final.loc[newdelhi_final['Cluster Labels'] == 4, newdelhi_final.columns[[0] + list(range(4, newdelhi_final.shape[1]))]]

Unnamed: 0,Locality,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,"Ambience Mall, Vasant Kunj",Coffee Shop,Shopping Mall,Fast Food Restaurant,Café,Asian Restaurant,Clothing Store,Italian Restaurant,Chinese Restaurant,Indian Restaurant,Deli / Bodega
4,Anand Lok,Café,Dessert Shop,Hotel,Golf Course,Other Great Outdoors,Music Venue,Metro Station,Yoga Studio,Food Truck,Food Court
5,Anand Vihar,Pizza Place,Movie Theater,Shoe Store,Multiplex,Yoga Studio,French Restaurant,Food Truck,Food Court,Food & Drink Shop,Food
8,Bhikaji Cama Place,Lounge,Breakfast Spot,Gym / Fitness Center,Italian Restaurant,Hotel,Café,Asian Restaurant,Chinese Restaurant,Bakery,Frozen Yogurt Shop
9,Chanakyapuri,Performing Arts Venue,Trail,Yoga Studio,Farmers Market,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food Court,Food & Drink Shop
11,Chandni Chowk,Fast Food Restaurant,Platform,Train Station,Chinese Restaurant,Diner,Dog Run,Fried Chicken Joint,French Restaurant,Food Truck,Food Court
14,Chittaranjan Park,Market,Chinese Restaurant,Pizza Place,Diner,Fast Food Restaurant,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Truck,Food Court
21,"DLF Emporio Mall, Vasant Kunj",Coffee Shop,Shopping Mall,Fast Food Restaurant,Café,Asian Restaurant,Clothing Store,Italian Restaurant,Chinese Restaurant,Indian Restaurant,Deli / Bodega
22,"DLF Promenade Mall, Vasant Kunj",Coffee Shop,Shopping Mall,Fast Food Restaurant,Café,Asian Restaurant,Clothing Store,Italian Restaurant,Chinese Restaurant,Indian Restaurant,Deli / Bodega
28,"Garden of Five Senses, Saket",Garden,Italian Restaurant,Pub,Coffee Shop,Asian Restaurant,Café,Mediterranean Restaurant,Fast Food Restaurant,Fried Chicken Joint,French Restaurant
