# 1. Problem Statement

Mumbai is one of largest and busiest city in India. Thousands of non native people who come to visit Mumbai and even the natives of Mumbai often face the problem of finding the best restaurants for their needs. They don’t have the access to all the data for all the restaurants, and even if they have it , it is too much for them to analyse.

People need a tool which can help them to select the restaurant which best fits their interests, as per the the cost, location, type, ratings, etc.

# 2. Data Description

<b> Additional_outlet_count: </b> It tells the number of additional outels of the restaurant available in the city

<b> Call: </b> It shows whether the restaurant takes order through calls or not

<b> Cost_for_two: </b> It shows the average food cost for two people

<b> Cuisines: </b> Type of food options (Italian, Mexican, Indian, etc)

<b> Delivery_Time_min_order: </b> Time for delivery, if applicable

<b> Features: </b> Extra facilities available at the restaurant, if applicable

<b> Home_Delivery: </b> It shows whether the restaurant provides home delivery or not

<b> Operational_hours: </b> Timings for the restaurant

<b> Rating_votes: </b> Total ratings of the restaurant

<b> Restaurant_Location: </b> Location where the restaurant is situated

<b> Restaurant_Name: </b> Name of the restaurant

<b> Restaurant_Type: </b> Type of the restaurant (Quick bite, family, etc)

<b> View_Menu: </b> Whether food menu is available outside the restaurant or not

# 3. Working Methodology

In [59]:
import pandas as pd # library for data analsysis
import urllib.request
from bs4 import BeautifulSoup
import numpy as np
import geocoder
import signal
from geopy.geocoders import Nominatim
import requests

import folium
import json # library to handle JSON files
import matplotlib.cm as cm
import matplotlib.colors as colors
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from pandas.io.json import json_normalize

In [2]:
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [3]:
zom_data = pd.read_csv('zomato_res_final.csv')

In [4]:
zom_data.sample(5)

Unnamed: 0,Additional_outlet_count,Call,Cost_for_two(Rs.),Cuisines,Features,Home_Delivery,Operational_hours,Restaurant_Location,Restaurant_Name,Restaurant_Type,View_Menu,Min_Order(Rs.),Delivery_Time(mins),Rating,Votes,Rating_Category,Operational_after_Midnight,Cuisine_count,Feature_Count,Res_Type_Count,Competitors_in_Location,Score
3458,,True,400,South Indian,,False,"8am – 10:53pm (Mon),8am – 11pm (Tue, Wed, Sat)...",Goregaon East,Fort Kochi,,True,,,3.5,27.0,Good,False,1,1,1,104.0,20.840728
896,,True,500,Chinese,,False,"11am – 3:30pm, 6:30pm – 11:30pm (Mon-Sun)",Vile Parle West,Gattu's Chinese,Casual Dining,True,,,3.7,434.0,Good,False,1,1,1,49.0,26.058811
4446,,True,450,"Fast Food, Juices",,False,"Closed (Mon),12noon – 11:30pm (Tue-Sat),4pm –...",Vile Parle East,Shree Annapurna,Quick Bites,True,,,4.0,109.0,Very Good,False,2,1,1,91.0,28.232798
683,,True,300,"Bakery, Desserts",,False,9am – 7pm (Mon-Sun),Vile Parle West,BakerHer,"Bakery,Dessert Parlor",True,,,4.4,183.0,Very Good,False,2,1,2,49.0,34.197214
2569,,True,200,Mithai,,True,8:30am – 9:45pm (Mon-Sun),Santacruz East,Vijay Stores,Sweet Shop,True,99.0,35.0,3.5,18.0,Good,False,1,1,1,72.0,20.786342


In [5]:
zom_data.shape

(6526, 22)

In [6]:
zom_data['Additional_outlet_count'].fillna(0, inplace=True)
zom_data['Additional_outlet_count'] = zom_data['Additional_outlet_count'].apply(int)
zom_data.head()

Unnamed: 0,Additional_outlet_count,Call,Cost_for_two(Rs.),Cuisines,Features,Home_Delivery,Operational_hours,Restaurant_Location,Restaurant_Name,Restaurant_Type,View_Menu,Min_Order(Rs.),Delivery_Time(mins),Rating,Votes,Rating_Category,Operational_after_Midnight,Cuisine_count,Feature_Count,Res_Type_Count,Competitors_in_Location,Score
0,1,True,1500,"Finger Food, Continental, European, Italian","Food Hygiene Rated Restaurants In Mumbai, Best...",False,12noon – 1am (Mon-Sun),Kamala Mills Compound,Lord of the Drinks,"Lounge,Casual Dining",True,,,4.9,1326.0,Excellent,True,4,2,2,19.0,48.000806
1,1,True,800,Pizza,"Value For Money, Best of Mumbai",False,11am – 12:30AM (Mon-Sun),Malad West,Joey's Pizza,Quick Bites,True,,,4.6,5974.0,Excellent,True,1,2,1,209.0,71.950295
2,0,True,2500,Seafood,"Super Seafood, Best of Mumbai",False,"Closed (Mon),12noon – 3pm, 7pm – 12midnight...",Bandra West,Bastian,"Casual Dining,Bar",True,,,4.5,1438.0,Excellent,False,1,2,2,241.0,43.16037
3,0,True,1800,"Finger Food, Continental","Where's The Party?, Best of Mumbai, Food Hygie...",False,12noon – 1am (Mon-Sun),Lower Parel,Tamasha,"Lounge,Bar",True,,,4.9,3275.0,Excellent,True,2,3,2,125.0,59.778427
4,2,True,450,"North Indian, Street Food, Fast Food, Chinese",,True,"12noon – 4pm, 7pm – 11:45pm (Mon-Sun)",Vashi,Bhagat Tarachand,Casual Dining,True,0.0,45.0,4.1,1422.0,Very Good,False,4,1,1,116.0,37.546442


In [7]:
area_wise = zom_data.groupby('Restaurant_Location').count().reset_index()[['Restaurant_Location', 'Additional_outlet_count']]

In [8]:
area_wise.columns = ['Restaurant_Location', 'No_Of_Restaurant']
area_wise.head()

Unnamed: 0,Restaurant_Location,No_Of_Restaurant
0,4 Bungalows,32
1,Airoli,68
2,Alibaug,7
3,Ambernath,12
4,Andheri,3


In [9]:
area_wise.shape

(120, 2)

In [10]:
area_wise['No_Of_Restaurant'].sum()

6511

In [11]:
zom_res_data = zom_data[['Restaurant_Name', 'Restaurant_Location', 'Cuisines', 'Call', 'Home_Delivery', 'Cost_for_two(Rs.)', 'Rating', 'Votes', 'Score']]

In [12]:
zom_res_data.sample(10)

Unnamed: 0,Restaurant_Name,Restaurant_Location,Cuisines,Call,Home_Delivery,Cost_for_two(Rs.),Rating,Votes,Score
61,Madeira & Mime,Powai,"Continental, Cafe, North Indian, Chinese, Fing...",True,True,1400,4.6,2282.0,49.639892
3339,Hotel Sai Baba,Nerul,"North Indian, Chinese, South Indian, Fast Food",True,True,450,2.9,114.0,13.090598
6487,Waves Restaurant,Alibaug,North Indian,True,False,800,3.5,23.0,20.816556
2268,Shri Nidhi,Goregaon West,"South Indian, North Indian, Chinese, Beverages...",True,False,550,3.6,133.0,22.860586
4692,Delhi Dine,Kharghar,"Chinese, Mughlai, Thai",True,False,700,3.7,205.0,24.674986
233,Brewbot Eatery & Pub Brewery,Veera Desai Area,"Mediterranean, European, American, Italian, Fi...",True,False,1500,4.2,2041.0,42.66631
2202,Ratnakar Lunch Home,Vikhroli,"Chinese, North Indian, Seafood",True,False,350,3.4,27.0,19.461417
3333,Baluchi - The Lalit Mumbai,Chakala,"Mughlai, North Indian",True,False,4000,4.1,328.0,30.935504
3362,Patisserie & Delicatessen - Trident,Bandra Kurla Complex,"Bakery, Desserts",True,False,1500,3.9,78.0,26.666157
4942,Kings Fast Food Corner,Kalyan,"Fast Food, Chinese",True,False,400,3.2,5.0,16.569853


In [13]:
zom_res_data['Call'] = zom_res_data['Call'].apply(bool)
zom_res_data['Home_Delivery'] = zom_res_data['Home_Delivery'].apply(bool)
zom_res_data['Rating'] = zom_res_data['Rating'].apply(float)
zom_res_data['Votes'] = zom_res_data['Votes'].apply(int)

zom_res_data['Cost_for_two(Rs.)	'] = zom_res_data['Cost_for_two(Rs.)'].apply(int)
zom_res_data['Score'] = zom_res_data['Score'].apply(float)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  """Entry point for launching an IPython kernel.
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See

In [14]:
zom_res_data.sample(10)

Unnamed: 0,Restaurant_Name,Restaurant_Location,Cuisines,Call,Home_Delivery,Cost_for_two(Rs.),Rating,Votes,Score,Cost_for_two(Rs.).1
5263,Benzy's Family Restaurant,Mumbai CST Area,"North Indian, South Indian, Chinese",True,True,400,3.3,34,18.124407,400
2729,Pizzaria House,Kandivali West,"Pizza, Fast Food",True,True,550,3.8,254,26.350398,550
6408,Star Kitchen,Chembur,"Chinese, Fast Food",True,False,400,3.2,11,16.60611,400
1429,Sharda Bhavan,Matunga East,South Indian,True,False,200,4.0,373,29.828125,200
3617,Sandwich Corner,Girgaum,Fast Food,True,False,300,3.3,8,17.967292,300
663,Dinshaw's Xpress Cafe,Andheri West,"Cafe, Burger, Desserts, Italian, Pizza, Parsi",True,True,800,4.1,1349,37.10531,800
1483,Hideout Cafe and Bar,Lower Parel,"Finger Food, Italian, North Indian, Beverages,...",True,False,1300,4.0,515,30.686217,1300
4483,Jyoti Refreshment,Dadar East,"Chinese, South Indian, North Indian, Fast Food...",True,False,500,3.7,177,24.505784,500
3149,Mi & Me - Minerals and Meals,Near Andheri East Station,Fast Food,True,True,300,3.8,48,25.10556,300
3983,Bipin Sandwich & Pizza Plaza,Borivali West,"Continental, Sandwich, Pizza",True,True,300,3.0,51,14.089206,300


In [15]:
area_wise['Latitude'] = np.nan
area_wise['Longitude'] = np.nan

In [None]:
import math
for i in range(area_wise.shape[0]):
    
    if(not(math.isnan(area_wise['Latitude'][i]))):
        continue
        
    loc = area_wise['Restaurant_Location'][i]
    address = '{}, Mumbai, Maharastra, India'.format(loc)

    try:
        geolocator = Nominatim(user_agent="mumbai_explorer")
        location = geolocator.geocode(address)
        if(location is None):
            continue
    except:
        continue
    print(location)
    area_wise['Latitude'][i] = location[1][0]
    area_wise['Longitude'][i] = location[1][1]

Mumbai, Mumbai City, Maharashtra, India


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Airoli, Airoli Station Road, Digha Village, Airoli, Navi Mumbai, Thane, Maharashtra, 400708, India


In [16]:
area_wise = pd.read_csv('area_wise.csv')

In [73]:
area_wise.to_csv('area_wise.csv')

In [17]:
area_wise.dropna(subset=['Latitude'], how='all', inplace = True)
area_wise = area_wise.reset_index()[['Restaurant_Location', 'No_Of_Restaurant', 'Latitude', 'Longitude']]

In [18]:
area_wise.head()

Unnamed: 0,Restaurant_Location,No_Of_Restaurant,Latitude,Longitude
0,4 Bungalows,32,18.938771,72.835335
1,Airoli,68,19.158515,72.999402
2,Alibaug,7,18.662728,72.878768
3,Ambernath,12,19.201561,73.200477
4,Andheri,3,19.120371,72.848043


In [19]:
res_data = area_wise.merge(zom_res_data, on = 'Restaurant_Location')
res_data = res_data[['Restaurant_Name', 'Restaurant_Location', 'Cuisines', 'Call', 'Home_Delivery', 'Cost_for_two(Rs.)', 'Rating', 'Votes', 'Score']]

In [20]:
res_data.head()

Unnamed: 0,Restaurant_Name,Restaurant_Location,Cuisines,Call,Home_Delivery,Cost_for_two(Rs.),Rating,Votes,Score
0,Pishu's,4 Bungalows,"Healthy Food, Juices, Fast Food, Salad",True,True,650,4.4,1230,40.524135
1,Goila Butter Chicken,4 Bungalows,"Biryani, Desserts, North Indian, Mughlai, Rolls",True,True,700,4.1,751,33.491653
2,KA.FE,4 Bungalows,"Cafe, Italian, Lebanese, Middle Eastern, Mexic...",True,True,600,4.2,722,34.695719
3,Monkey King,4 Bungalows,"North Indian, Chinese, Mughlai",True,True,500,4.7,126,37.990699
4,The Serial Griller,4 Bungalows,"Fast Food, Burger",True,True,600,4.4,1276,40.802109


In [21]:
res_data.shape

(6323, 9)

In [30]:
address = 'Mumbai, India'

geolocator = Nominatim(user_agent="mum_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Mumbai are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Mumbai are 18.9387711, 72.8353355.


### Create Map of Mumbai with various areas

In [31]:
# create map of New York using latitude and longitude values
map_mumbai = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, count, name in zip(area_wise['Latitude'], area_wise['Longitude'], area_wise['No_Of_Restaurant'], area_wise['Restaurant_Location']):
    label = '{}, {}'.format(name, count)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mumbai)  
    
map_mumbai

In [32]:
most10 = area_wise.sort_values('No_Of_Restaurant', ascending=False)[:10]

In [33]:
most10.index = range(10)
most10

Unnamed: 0,Restaurant_Location,No_Of_Restaurant,Latitude,Longitude
0,Thane West,386,19.17502,72.971802
1,Andheri West,295,19.117249,72.833968
2,Bandra West,241,19.058336,72.830267
3,Malad West,209,19.184013,72.841216
4,Mira Road,204,18.915924,72.819736
5,Borivali West,193,19.229456,72.847991
6,Chembur,148,19.061213,72.897591
7,Powai,134,19.11872,72.907348
8,Kandivali West,131,19.20838,72.842227
9,Kalyan,129,19.137892,72.810668


### Representing the areas with most number of restaurants

In [34]:
# create map of New York using latitude and longitude values
map_mumbai = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, count, name in zip(most10['Latitude'], most10['Longitude'], most10['No_Of_Restaurant'], most10['Restaurant_Location']):
    label = '{}, {}'.format(name, count)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5*count//80,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_mumbai)  

map_mumbai

### Defining Foursquare Credentials

In [35]:
CLIENT_ID = 'XXXHADRBWKOS51T02KPKRBT001DYWN3LMH00TJL3KEQRTF5C' # your Foursquare ID
CLIENT_SECRET = '0LX0R3GKHGKXEBKUYRXH43WVSGORYQ5ZHHLRWRMYG1OKX0Y5' # your Foursquare Secret
VERSION = '20180604'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XXXHADRBWKOS51T02KPKRBT001DYWN3LMH00TJL3KEQRTF5C
CLIENT_SECRET:0LX0R3GKHGKXEBKUYRXH43WVSGORYQ5ZHHLRWRMYG1OKX0Y5


### Let us explore Thane which has maximum number of restaurants

#### Let us get the co-ordinates of Thane West

In [36]:
area_latitude = most10['Latitude'][0]
area_longitude = most10['Longitude'][0]

In [37]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    area_latitude, 
    area_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=XXXHADRBWKOS51T02KPKRBT001DYWN3LMH00TJL3KEQRTF5C&client_secret=0LX0R3GKHGKXEBKUYRXH43WVSGORYQ5ZHHLRWRMYG1OKX0Y5&v=20180604&ll=19.17502,72.9718018&radius=500&limit=100'

In [38]:
results = requests.get(url).json()

In [39]:
results

{'meta': {'code': 200, 'requestId': '5ced17efdb04f52f657524ce'},
 'response': {'headerLocation': 'Thāne',
  'headerFullLocation': 'Thāne',
  'headerLocationGranularity': 'city',
  'totalResults': 4,
  'suggestedBounds': {'ne': {'lat': 19.179520004500006,
    'lng': 72.97655723566001},
   'sw': {'lat': 19.170519995499994, 'lng': 72.96704636433998}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4fc0f51ce4b03278516fa5e2',
       'name': 'Chopstix',
       'location': {'address': 'Hari Om Nagar',
        'crossStreet': 'Kopri',
        'lat': 19.177289762508362,
        'lng': 72.97187791236344,
        'labeledLatLngs': [{'label': 'display',
          'lat': 19.177289762508362,
          'lng': 72.97187791236344}],
        'distance': 252,
        'postalCode': '400021',

In [40]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [41]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Chopstix,Chinese Restaurant,19.17729,72.971878
1,Sainath Dhaba,Indian Restaurant,19.174828,72.968256
2,Sambhaji Raje Sabhagruha,Concert Hall,19.173848,72.968303
3,ekvira dhaba,Diner,19.176589,72.967461


#### Function to find out the venues for each location

In [42]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Location', 
                  'Locality Latitude', 
                  'Locality Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [43]:
mumbai_venues = getNearbyVenues(names=most10['Restaurant_Location'],
                                   latitudes=most10['Latitude'],
                                   longitudes=most10['Longitude']
                                  )

Thane West
Andheri West
Bandra West
Malad West
Mira Road
Borivali West
Chembur
Powai
Kandivali West
Kalyan


In [44]:
mumbai_venues.sample(10)

Unnamed: 0,Location,Locality Latitude,Locality Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
153,Chembur,19.061213,72.897591,Chembur Post Office Wada Pav,19.05694,72.898056,Snack Place
46,Bandra West,19.058336,72.830267,Krispy Kreme,19.059399,72.829542,Donut Shop
21,Bandra West,19.058336,72.830267,Almeida Park,19.057656,72.831541,Park
249,Kalyan,19.137892,72.810668,Leaping Windows Cafe,19.138775,72.813482,Café
37,Bandra West,19.058336,72.830267,Hearsch Bakery,19.05512,72.827006,Bakery
28,Bandra West,19.058336,72.830267,Pali Bhavan,19.062089,72.829459,Indian Restaurant
70,Bandra West,19.058336,72.830267,The Irish House,19.061575,72.829506,Pub
198,Powai,19.11872,72.907348,The Beatle Hotel Mumbai,19.121306,72.909855,Bed & Breakfast
230,Powai,19.11872,72.907348,hiranandi complex,19.117402,72.9109,Department Store
127,Mira Road,18.915924,72.819736,WTC Pasta,18.914766,72.818845,Snack Place


In [45]:
mumbai_venues.groupby('Location').count()

Unnamed: 0_level_0,Locality Latitude,Locality Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Location,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Andheri West,17,17,17,17,17,17
Bandra West,87,87,87,87,87,87
Borivali West,17,17,17,17,17,17
Chembur,25,25,25,25,25,25
Kalyan,16,16,16,16,16,16
Kandivali West,11,11,11,11,11,11
Malad West,4,4,4,4,4,4
Mira Road,20,20,20,20,20,20
Powai,62,62,62,62,62,62
Thane West,4,4,4,4,4,4


In [46]:
print('There are {} uniques categories.'.format(len(mumbai_venues['Venue Category'].unique())))

There are 79 uniques categories.


### Analyzing the obtained data

In [47]:
# one hot encoding
mumbai_onehot = pd.get_dummies(mumbai_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
mumbai_onehot['Location'] = mumbai_venues['Location'] 

fixed_columns = [mumbai_onehot.columns[-1]] + list(mumbai_onehot.columns[:-1])
mumbai_onehot = mumbai_onehot[fixed_columns]


mumbai_onehot.head()

Unnamed: 0,Location,Arcade,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Beach,Bed & Breakfast,Beer Garden,Bistro,Bookstore,Brewery,Burger Joint,Bus Station,Café,Chinese Restaurant,Clothing Store,Coffee Shop,College Auditorium,Concert Hall,Creperie,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Donut Shop,Event Space,Fast Food Restaurant,Film Studio,Food,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Garden,Gastropub,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Lounge,Mediterranean Restaurant,Men's Store,Miscellaneous Shop,Molecular Gastronomy Restaurant,Nightclub,North Indian Restaurant,Park,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Pool,Pub,Restaurant,Salad Place,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Smoke Shop,Snack Place,South Indian Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Whisky Bar,Women's Store
0,Thane West,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Thane West,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Thane West,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Thane West,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Andheri West,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [48]:
mumbai_onehot.shape

(263, 80)

### Let's group neighbourhood categories

In [49]:
mumbai_grouped = mumbai_onehot.groupby('Location').mean().reset_index()
mumbai_grouped

Unnamed: 0,Location,Arcade,Asian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bar,Beach,Bed & Breakfast,Beer Garden,Bistro,Bookstore,Brewery,Burger Joint,Bus Station,Café,Chinese Restaurant,Clothing Store,Coffee Shop,College Auditorium,Concert Hall,Creperie,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Dhaba,Diner,Donut Shop,Event Space,Fast Food Restaurant,Film Studio,Food,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Garden,Gastropub,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hotel,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Lounge,Mediterranean Restaurant,Men's Store,Miscellaneous Shop,Molecular Gastronomy Restaurant,Nightclub,North Indian Restaurant,Park,Performing Arts Venue,Pharmacy,Pizza Place,Plaza,Pool,Pub,Restaurant,Salad Place,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shopping Mall,Smoke Shop,Snack Place,South Indian Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Vegetarian / Vegan Restaurant,Whisky Bar,Women's Store
0,Andheri West,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.352941,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0
1,Bandra West,0.022989,0.034483,0.011494,0.011494,0.045977,0.045977,0.0,0.0,0.011494,0.0,0.022989,0.0,0.022989,0.0,0.08046,0.068966,0.022989,0.022989,0.011494,0.0,0.0,0.0,0.011494,0.0,0.022989,0.0,0.0,0.011494,0.022989,0.011494,0.0,0.0,0.0,0.011494,0.022989,0.0,0.0,0.0,0.022989,0.0,0.0,0.034483,0.011494,0.011494,0.137931,0.011494,0.011494,0.0,0.011494,0.011494,0.0,0.0,0.0,0.0,0.011494,0.0,0.0,0.011494,0.0,0.0,0.034483,0.0,0.011494,0.0,0.0,0.0,0.011494,0.011494,0.034483,0.0,0.0,0.011494,0.0,0.0,0.0,0.022989,0.011494,0.0,0.011494
2,Borivali West,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.176471,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.117647,0.0,0.0,0.058824,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0
3,Chembur,0.0,0.04,0.0,0.0,0.04,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.28,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.08,0.04,0.0
4,Kalyan,0.0,0.0,0.0,0.0,0.0625,0.0,0.125,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.1875,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Kandivali West,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.181818,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Malad West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Mira Road,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.05,0.1,0.0,0.1,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.1,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0
8,Powai,0.016129,0.016129,0.0,0.0,0.048387,0.048387,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.048387,0.016129,0.032258,0.0,0.0,0.016129,0.016129,0.0,0.032258,0.032258,0.016129,0.0,0.032258,0.0,0.048387,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.016129,0.0,0.016129,0.0,0.016129,0.016129,0.048387,0.129032,0.0,0.032258,0.016129,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.032258,0.016129,0.016129,0.0,0.048387,0.0,0.016129,0.016129,0.0,0.016129,0.0,0.016129,0.0,0.0,0.0,0.0,0.016129,0.016129,0.0,0.0,0.0,0.0
9,Thane West,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


# 4. Results

### Choosing restaurant based on popularity

In [50]:
num_top_venues = 5

for hood in mumbai_grouped['Location']:
    print("----"+hood+"----")
    temp = mumbai_grouped[mumbai_grouped['Location'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Andheri West----
               venue  freq
0  Indian Restaurant  0.35
1                Pub  0.12
2        Coffee Shop  0.12
3               Café  0.12
4          Nightclub  0.06


----Bandra West----
                venue  freq
0   Indian Restaurant  0.14
1                Café  0.08
2  Chinese Restaurant  0.07
3              Bakery  0.05
4                 Bar  0.05


----Borivali West----
                           venue  freq
0             Chinese Restaurant  0.18
1              Indian Restaurant  0.12
2  Vegetarian / Vegan Restaurant  0.12
3                 Ice Cream Shop  0.06
4        South Indian Restaurant  0.06


----Chembur----
                           venue  freq
0              Indian Restaurant  0.28
1  Vegetarian / Vegan Restaurant  0.08
2                            Bar  0.08
3                           Café  0.08
4                    Snack Place  0.04


----Kalyan----
                venue  freq
0                Café  0.19
1               Beach  0.12
2         Bus St

### Choosing restaurant based on home delivery

In [54]:
res_data.groupby('Home_Delivery').count()

Unnamed: 0_level_0,Restaurant_Name,Restaurant_Location,Cuisines,Call,Cost_for_two(Rs.),Rating,Votes,Score
Home_Delivery,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
False,4021,4021,4020,4021,4021,4021,4021,4021
True,2302,2302,2302,2302,2302,2302,2302,2302


### Choosing restaurant based on Cost for two

In [80]:
res_data.groupby('Cost_for_two(Rs.)').count().reset_index()

Unnamed: 0,Cost_for_two(Rs.),Restaurant_Name,Restaurant_Location,Cuisines,Call,Home_Delivery,Rating,Votes,Score
0,50,1,1,1,1,1,1,1,1
1,100,72,72,72,72,72,72,72,72
2,120,3,3,3,3,3,3,3,3
3,150,157,157,157,157,157,157,157,157
4,180,1,1,1,1,1,1,1,1
5,200,350,350,350,350,350,350,350,350
6,230,1,1,1,1,1,1,1,1
7,249,1,1,1,1,1,1,1,1
8,250,250,250,250,250,250,250,250,250
9,280,1,1,1,1,1,1,1,1


#### Choosing restaurant based on ratings

In [82]:
res_data.groupby('Rating').count().reset_index()

Unnamed: 0,Rating,Restaurant_Name,Restaurant_Location,Cuisines,Call,Home_Delivery,Cost_for_two(Rs.),Votes,Score
0,2.0,3,3,3,3,3,3,3,3
1,2.1,3,3,3,3,3,3,3,3
2,2.2,2,2,2,2,2,2,2,2
3,2.3,3,3,3,3,3,3,3,3
4,2.4,6,6,6,6,6,6,6,6
5,2.5,13,13,13,13,13,13,13,13
6,2.6,30,30,30,30,30,30,30,30
7,2.7,56,56,56,56,56,56,56,56
8,2.8,96,96,96,96,96,96,96,96
9,2.9,122,122,122,122,122,122,122,122


In [84]:
res_data.groupby('Restaurant_Location').count().reset_index()

Unnamed: 0,Restaurant_Location,Restaurant_Name,Cuisines,Call,Home_Delivery,Cost_for_two(Rs.),Rating,Votes,Score
0,4 Bungalows,32,32,32,32,32,32,32,32
1,Airoli,68,68,68,68,68,68,68,68
2,Alibaug,7,7,7,7,7,7,7,7
3,Ambernath,12,12,12,12,12,12,12,12
4,Andheri,3,3,3,3,3,3,3,3
5,Andheri East,2,2,2,2,2,2,2,2
6,Andheri West,295,295,295,295,295,295,295,295
7,Azad Nagar,22,22,22,22,22,22,22,22
8,Bandra,6,6,6,6,6,6,6,6
9,Bandra East,37,37,37,37,37,37,37,37


# 5. Other Observations & Discussions 
<ul>
    <li> Most of the restaurants are located on the costal sides </li>
    <li> Most of the restaurants are quiet affordable with avg cost for two being under Rs. 1000</li>
    <li> Most of therestaurant with low prices have higher ratings with more probable number of customers</li>
    <li> There are a few restaurant which are well listed yet don't avail home delivery</li>
    <li> Chinese food and bakery seem to be the most popular Cuisines </li>
</ul>

#### A Report by <a href=https://abhishekver.github.io> Abhishek Verma </a>