# Coursera Capstone for Data Science Final Project

Welcome to my final Capstone Project. Before I start, I would like to give a brief summary of what I intend to do for my final project for the capstone project. I have lived in Mumbai for nearly 16 years now and I love this city not only for the benefits it has compared to so many cities and metros across India, but also the opportunities this city offers and how much we can explore in this city. Hence, keeping in mind to do a basic project, I decided to make my final capstone project on the idea of building a new mall in this city keeping the location of other shopping malls in this city as the factor. While in the real life this is a much more complicated project to work with, but I believe that by keeping this project simple I can still present my data science skills.

First let's import the packages. I will be importing all the packages that I have learnt so far in this professional certificate. I may not use them all. I am importing them all so that I don't face errors of packages or features not found while doing this project.

In [1]:
!pip install geopy
!pip install geocoder
!pip install folium

import numpy as np

import geopy

import pandas as pd 

pd.set_option("display.max_columns", None)

pd.set_option("display.max_rows", None)

import json

from geopy.geocoders import Nominatim 

import geocoder

import requests

from bs4 import BeautifulSoup

from pandas.io.json import json_normalize

import matplotlib.cm as cm

import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.metrics import silhouette_score

%matplotlib notebook

print('All packages imported.')

Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 8.0 MB/s  eta 0:00:01
[?25hCollecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
Collecting folium
  Downloading folium-0.12.1-py2.py3-none-any.whl (94 kB)
[K     |████████████████████████████████| 94 kB 4.9 MB/s  eta 0:00:01
Collecting branca>=0.3.0
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.12.1
All packages imported.


## Data Retrieval

A good news for selecting Mumbai is that we don't have to do a lot of data scraping. The data about the names of the neighbourhoods is available here in this Wikipedia page along with their lattitudes and longitudes (https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Mumbai). Let's put this information in a dataframe.

In [2]:
df = pd.read_html('https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Mumbai')[-1]
df.rename(columns={'Area': 'Neighborhood'}, inplace=True)
df.head(10)

Unnamed: 0,Neighborhood,Location,Latitude,Longitude
0,Amboli,"Andheri,Western Suburbs",19.1293,72.8434
1,"Chakala, Andheri",Western Suburbs,19.111388,72.860833
2,D.N. Nagar,"Andheri,Western Suburbs",19.124085,72.831373
3,Four Bungalows,"Andheri,Western Suburbs",19.124714,72.82721
4,Lokhandwala,"Andheri,Western Suburbs",19.130815,72.82927
5,Marol,"Andheri,Western Suburbs",19.119219,72.882743
6,Sahar,"Andheri,Western Suburbs",19.098889,72.867222
7,Seven Bungalows,"Andheri,Western Suburbs",19.129052,72.817018
8,Versova,"Andheri,Western Suburbs",19.12,72.82
9,Mira Road,"Mira-Bhayandar,Western Suburbs",19.284167,72.871111


I want you to notice the use of '[-1]' while reading the data from the website. As we all know that most of the programming languages we need to know the length of the data available in order to count in the reverse order, but since Python is a modern and a bit different than the traditional object-oriented programming languages like C, Java, etc. You don't need to know the length of the data to count backwards, instead you can do it by using the minus sign or the negative numbers to count from the end or count backwards. Since this data was available at the end of the Wikipedia page, I conveniently used this feature.

I have changed the title of one of the columns from 'Area' to 'Neighborhood' because we are looking for neighborhoods and all these areas are neighborhoods in India (or at least that's how we were taught in school).

## Data Wrangling

Let's us see the counts of the different locations mentioned in the 'Location' column

In [3]:
df['Location'].value_counts()

South Mumbai                       30
Andheri,Western Suburbs             8
Western Suburbs                     6
Eastern Suburbs                     4
Powai,Eastern Suburbs               3
Kandivali West,Western Suburbs      3
Mira-Bhayandar,Western Suburbs      3
Bandra,Western Suburbs              3
Ghatkopar,Eastern Suburbs           3
Khar,Western Suburbs                2
Vasai,Western Suburbs               2
Goregaon,Western Suburbs            2
Borivali (West),Western Suburbs     2
Harbour Suburbs                     2
Mumbai                              2
Malad,Western Suburbs               2
Kalbadevi,South Mumbai              2
Govandi,Harbour Suburbs             1
Mulund,Eastern Suburbs              1
Kamathipura,South Mumbai            1
Sanctacruz,Western Suburbs          1
Vile Parle,Western Suburbs          1
Antop Hill,South Mumbai             1
Kandivali East,Western Suburbs      1
Tardeo,South Mumbai                 1
Trombay,Harbour Suburbs             1
Colaba,South

We observe that certain locations don't occur more than once or twice. The reason for this is because the main, prominent and most well-known locations like "South Mumbai" and "Western Suburbs" are being further divided into smaller areas under the larger areas. Let's clean this data and extract a larger data set from this.

In [4]:
df['Location'] = df['Location'].apply(lambda x: x.split(',')[-1])
df.head(10)

Unnamed: 0,Neighborhood,Location,Latitude,Longitude
0,Amboli,Western Suburbs,19.1293,72.8434
1,"Chakala, Andheri",Western Suburbs,19.111388,72.860833
2,D.N. Nagar,Western Suburbs,19.124085,72.831373
3,Four Bungalows,Western Suburbs,19.124714,72.82721
4,Lokhandwala,Western Suburbs,19.130815,72.82927
5,Marol,Western Suburbs,19.119219,72.882743
6,Sahar,Western Suburbs,19.098889,72.867222
7,Seven Bungalows,Western Suburbs,19.129052,72.817018
8,Versova,Western Suburbs,19.12,72.82
9,Mira Road,Western Suburbs,19.284167,72.871111


Now this looks a lot more neater and gives us a larger data set. Let's take a look at the location counts once again.

In [5]:
df['Location'].value_counts()

South Mumbai       39
Western Suburbs    36
Eastern Suburbs    12
Harbour Suburbs     4
Mumbai              2
Name: Location, dtype: int64

The following is the data frame created which we will use for the project.

In [6]:
df

Unnamed: 0,Neighborhood,Location,Latitude,Longitude
0,Amboli,Western Suburbs,19.1293,72.8434
1,"Chakala, Andheri",Western Suburbs,19.111388,72.860833
2,D.N. Nagar,Western Suburbs,19.124085,72.831373
3,Four Bungalows,Western Suburbs,19.124714,72.82721
4,Lokhandwala,Western Suburbs,19.130815,72.82927
5,Marol,Western Suburbs,19.119219,72.882743
6,Sahar,Western Suburbs,19.098889,72.867222
7,Seven Bungalows,Western Suburbs,19.129052,72.817018
8,Versova,Western Suburbs,19.12,72.82
9,Mira Road,Western Suburbs,19.284167,72.871111


As I had mentioned in the data section of my report that I would be checking the co-ordinates of this table with the co-ordinates from Geocoder and so let's do it!

In [7]:
df['Geo_Latitude'] = None
df['Geo_Longitude'] = None

for i, neighbor in enumerate(df['Neighborhood']):
    lat_lng_coords = None
    
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Mumbai, India'.format(neighbor))
        lat_lng_coords = g.latlng
    
    if lat_lng_coords:
        latitude = lat_lng_coords[0]
        longitude = lat_lng_coords[1]
    
    df.loc[i, 'Geo_Latitude'] = latitude
    df.loc[i, 'Geo_Longitude'] = longitude

df

Unnamed: 0,Neighborhood,Location,Latitude,Longitude,Geo_Latitude,Geo_Longitude
0,Amboli,Western Suburbs,19.1293,72.8434,19.1291,72.8464
1,"Chakala, Andheri",Western Suburbs,19.111388,72.860833,19.1084,72.8623
2,D.N. Nagar,Western Suburbs,19.124085,72.831373,19.1251,72.8325
3,Four Bungalows,Western Suburbs,19.124714,72.82721,19.1264,72.8242
4,Lokhandwala,Western Suburbs,19.130815,72.82927,19.1432,72.8249
5,Marol,Western Suburbs,19.119219,72.882743,19.1191,72.8828
6,Sahar,Western Suburbs,19.098889,72.867222,19.1027,72.8626
7,Seven Bungalows,Western Suburbs,19.129052,72.817018,19.1286,72.8212
8,Versova,Western Suburbs,19.12,72.82,19.1377,72.8135
9,Mira Road,Western Suburbs,19.284167,72.871111,19.2657,72.8707


As I had mentioned in the Report, I would compare the given co-ordinates with the co-ordinates derived from Geocoder and replace the co-ordinates with the co-ordinates derived from Geocoder. So to make this simple, we can just drop the co-ordinates from the Wikipedia page and rename the 'Geo_Latitude' and 'Geo_Longitude' as 'Latitude' and 'Longitude' respectively.

In [8]:
df_1 = df

#storing the original table in case required later on

df = df.drop(columns = ['Latitude', 'Longitude'])

Let's take a look at our dataframe now

In [9]:
df

Unnamed: 0,Neighborhood,Location,Geo_Latitude,Geo_Longitude
0,Amboli,Western Suburbs,19.1291,72.8464
1,"Chakala, Andheri",Western Suburbs,19.1084,72.8623
2,D.N. Nagar,Western Suburbs,19.1251,72.8325
3,Four Bungalows,Western Suburbs,19.1264,72.8242
4,Lokhandwala,Western Suburbs,19.1432,72.8249
5,Marol,Western Suburbs,19.1191,72.8828
6,Sahar,Western Suburbs,19.1027,72.8626
7,Seven Bungalows,Western Suburbs,19.1286,72.8212
8,Versova,Western Suburbs,19.1377,72.8135
9,Mira Road,Western Suburbs,19.2657,72.8707


Well so we have the co-ordinates from Geocoder, however I would like to point out that they may be a little imprecise because the number of decimal places is lower than that of the orignal co-ordinates but there were certain errors with the original co-ordinates and as I trust Geocoder more, I would use these co-ordinates.

In [10]:
df.rename(columns = {'Geo_Latitude' : 'Latitude', 'Geo_Longitude': 'Longitude'}, inplace = True)

Let's take a look at the Dataframe now

In [11]:
df

Unnamed: 0,Neighborhood,Location,Latitude,Longitude
0,Amboli,Western Suburbs,19.1291,72.8464
1,"Chakala, Andheri",Western Suburbs,19.1084,72.8623
2,D.N. Nagar,Western Suburbs,19.1251,72.8325
3,Four Bungalows,Western Suburbs,19.1264,72.8242
4,Lokhandwala,Western Suburbs,19.1432,72.8249
5,Marol,Western Suburbs,19.1191,72.8828
6,Sahar,Western Suburbs,19.1027,72.8626
7,Seven Bungalows,Western Suburbs,19.1286,72.8212
8,Versova,Western Suburbs,19.1377,72.8135
9,Mira Road,Western Suburbs,19.2657,72.8707


Now let us store it as a CSV file

In [12]:
df_csv = df

df_csv.to_csv("df_csv.csv", index=False)

## Creating a map of Mumbai

Let's create a map of Mumbai using the co-ordinates we have collected.

In [13]:
address = 'Mumbai, India'

geolocator = Nominatim(user_agent="Coursera-Capstone-Data_Science-Project")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Mumbai, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Mumbai, India 19.0759899, 72.8773928.


In [14]:
map_mum = folium.Map(location=[latitude, longitude], zoom_start=11)

for lat, lng, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_mum)  
    
map_mum

Yay!! We have created the map and to be honest, it looks great. I mean if I had to give a tourist a map of Mumbai with the locations of different neighborhoods, this would certainly be one of the few!

Now let's save this map in the form of a html file

In [15]:
map_mum1 = map_mum

map_mum.save('map_mum.html')

## Using Foursquare API to explore the neighborhoods

So now that we have done a bit of basics by getting the data, arranging it in a table, use coding to get the co-ordinates and then finally mapping them on a map using folium. Now it's time we use our API - Foursquare. The next cell is a sensitive cell with some information and that's why you won't be able to see it, but you would be able to see the results in the outputs of some other code cells.

In [16]:
# The code was removed by Watson Studio for sharing.

Now let's get the top 100 venues for the neighborhoods within a radius of 2 km (2000 m).

In [17]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    
    
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

Now let us convert the list of venues into a dataframe and define the name of the columns of this dataframe. After that let's print the shape of this dataframe and the first 5 rows with it.

In [18]:
venues_df = pd.DataFrame(venues)

venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'Venue Name', 'Venue Latitude', 'Venue Longitude', 'Venue Category']

print(venues_df.shape)
venues_df.head()

(7480, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,Venue Name,Venue Latitude,Venue Longitude,Venue Category
0,Amboli,19.12906,72.84644,Cafe Arfa,19.12893,72.84714,Indian Restaurant
1,Amboli,19.12906,72.84644,Merwans Cake shop,19.1193,72.845418,Bakery
2,Amboli,19.12906,72.84644,Shawarma Factory,19.124591,72.840398,Falafel Restaurant
3,Amboli,19.12906,72.84644,Jaffer Bhai's Delhi Darbar,19.137714,72.845909,Mughlai Restaurant
4,Amboli,19.12906,72.84644,Narayan Sandwich,19.121398,72.85027,Sandwich Place


Now let us see how many venues actually turned up for each neighborhood and take a look at the table.

In [19]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,Venue Name,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aarey Milk Colony,41,41,41,41,41,41
Agripada,75,75,75,75,75,75
Altamount Road,90,90,90,90,90,90
Amboli,71,71,71,71,71,71
Amrut Nagar,91,91,91,91,91,91
Asalfa,93,93,93,93,93,93
Ballard Estate,100,100,100,100,100,100
Bandstand Promenade,100,100,100,100,100,100
Bangur Nagar,100,100,100,100,100,100
Bhandup,27,27,27,27,27,27


Now let us determine the number of unique categories.

In [20]:
print('There are {} uniques categories.'.format(len(venues_df['Venue Category'].unique())))

There are 219 uniques categories.


Wow that's a lot of unique categories! Let's check out the list of these unique categories.

In [21]:
venues_df['Venue Category'].unique()

array(['Indian Restaurant', 'Bakery', 'Falafel Restaurant',
       'Mughlai Restaurant', 'Sandwich Place', 'American Restaurant',
       'Pizza Place', 'Brewery', 'Chinese Restaurant', 'Pub',
       'Ice Cream Shop', 'Multiplex', 'BBQ Joint', 'Lounge', 'Bar',
       'Diner', "Women's Store", 'Jewelry Store', 'Fast Food Restaurant',
       'Gym / Fitness Center', 'Coffee Shop', 'College Gym',
       'Residential Building (Apartment / Condo)', 'Department Store',
       'Sports Bar', 'Hotel', 'Camera Store', 'Shopping Mall', 'Pharmacy',
       'Boutique', 'Electronics Store', 'Accessories Store', 'Restaurant',
       'Asian Restaurant', 'Airport Service', 'Juice Bar',
       'Italian Restaurant', 'Café', 'Seafood Restaurant', 'Snack Place',
       'Spa', 'Maharashtrian Restaurant', 'Bagel Shop', 'Nightclub',
       'Food Truck', 'Airport Lounge', 'Cocktail Bar', 'Beer Garden',
       'Airport', 'Resort', 'Vegetarian / Vegan Restaurant', 'Donut Shop',
       'Martial Arts School', 'Tea Ro

Let's check if "Shopping Mall" is present in the 'VenueCategory' column.

In [22]:
"Shopping Mall" in venues_df['Venue Category'].unique()

True

## Analysing All Neighborhoods

To analyse the neighborhoods, we will be using one-hot encoding first. As we all know that computers are very primitive and only understand binary values of '1' and '0'. While making a model, it is difficult to be apply machine learning to categorical data and with one-hot encoding we convert this categorical data into numerical data. One-hot encoding is basically converting all this categorical data into binary vectors.The categorical values are first mapped into integer values. Then each integer value is then represented as a binary vector that is all 0s (except the index of the integer which is marked as 1). So let's try it!

In [23]:
mum_onehot = pd.get_dummies(venues_df[['Venue Category']], prefix="", prefix_sep="")

mum_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

fixed_columns = [mum_onehot.columns[-1]] + list(mum_onehot.columns[:-1])
mum_onehot = mum_onehot[fixed_columns]

print(mum_onehot.shape)
mum_onehot.head()

(7480, 220)


Unnamed: 0,Neighborhoods,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,American Restaurant,Antique Shop,Arcade,Art Gallery,Asian Restaurant,Athletics & Sports,Australian Restaurant,Auto Workshop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Bengali Restaurant,Big Box Store,Bistro,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Building,Burger Joint,Burrito Place,Bus Station,Cafeteria,Café,Camera Store,Chaat Place,Cheese Shop,Chinese Restaurant,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Auditorium,College Gym,Comedy Club,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dhaba,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Duty-free Shop,Electronics Store,Event Space,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,General College & University,General Entertainment,German Restaurant,Gift Shop,Gluten-free Restaurant,Goan Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hawaiian Restaurant,Historic Site,History Museum,Hockey Arena,Hookah Bar,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Irani Cafe,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Lake,Light Rail Station,Lighthouse,Liquor Store,Lounge,Maharashtrian Restaurant,Market,Martial Arts School,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Motorcycle Shop,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Music Store,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,North Indian Restaurant,Office,Opera House,Other Great Outdoors,Outdoors & Recreation,Paper / Office Supplies Store,Park,Parsi Restaurant,Performing Arts Venue,Pharmacy,Pizza Place,Planetarium,Platform,Playground,Plaza,Pool,Pub,Punjabi Restaurant,Recreation Center,Residential Building (Apartment / Condo),Resort,Restaurant,Road,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,South American Restaurant,South Indian Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Toy / Game Store,Track,Trail,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Water Park,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo
0,Amboli,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Amboli,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Amboli,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Amboli,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Amboli,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Now we will take the rows and group them by 'Neighborhoods' along with the mean of the frequency of occurance of each category.

In [24]:
mum_grouped = mum_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(mum_grouped.shape)
mum_grouped

(93, 220)


Unnamed: 0,Neighborhoods,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,American Restaurant,Antique Shop,Arcade,Art Gallery,Asian Restaurant,Athletics & Sports,Australian Restaurant,Auto Workshop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Bengali Restaurant,Big Box Store,Bistro,Bookstore,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Building,Burger Joint,Burrito Place,Bus Station,Cafeteria,Café,Camera Store,Chaat Place,Cheese Shop,Chinese Restaurant,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Academic Building,College Auditorium,College Gym,Comedy Club,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Creperie,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dhaba,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Duty-free Shop,Electronics Store,Event Space,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Football Stadium,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,General College & University,General Entertainment,German Restaurant,Gift Shop,Gluten-free Restaurant,Goan Restaurant,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Hawaiian Restaurant,Historic Site,History Museum,Hockey Arena,Hookah Bar,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Irani Cafe,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Lake,Light Rail Station,Lighthouse,Liquor Store,Lounge,Maharashtrian Restaurant,Market,Martial Arts School,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Motorcycle Shop,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Music Store,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,North Indian Restaurant,Office,Opera House,Other Great Outdoors,Outdoors & Recreation,Paper / Office Supplies Store,Park,Parsi Restaurant,Performing Arts Venue,Pharmacy,Pizza Place,Planetarium,Platform,Playground,Plaza,Pool,Pub,Punjabi Restaurant,Recreation Center,Residential Building (Apartment / Condo),Resort,Restaurant,Road,Roof Deck,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,South American Restaurant,South Indian Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Toy / Game Store,Track,Trail,Train Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Water Park,Whisky Bar,Wine Bar,Wine Shop,Women's Store,Yoga Studio,Zoo
0,Aarey Milk Colony,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.02439,0.0,0.0,0.0,0.02439,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.0,0.0,0.02439,0.02439,0.0,0.0,0.02439,0.0,0.146341,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.02439,0.0,0.0,0.04878,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.0,0.073171,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Agripada,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.026667,0.013333,0.0,0.0,0.0,0.013333,0.0,0.066667,0.0,0.04,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026667,0.0,0.0,0.0,0.026667,0.013333,0.013333,0.0,0.026667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.013333,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.013333,0.04,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.16,0.0,0.0,0.0,0.026667,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.013333,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.026667,0.013333,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.013333
2,Altamount Road,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.022222,0.011111,0.0,0.0,0.0,0.011111,0.0,0.0,0.011111,0.0,0.0,0.0,0.011111,0.011111,0.0,0.0,0.0,0.0,0.011111,0.0,0.022222,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.022222,0.0,0.022222,0.0,0.0,0.0,0.011111,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.011111,0.0,0.011111,0.0,0.0,0.011111,0.0,0.0,0.0,0.011111,0.0,0.033333,0.0,0.111111,0.0,0.0,0.0,0.022222,0.011111,0.0,0.044444,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.011111,0.011111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.011111,0.0,0.0,0.033333,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.022222,0.011111,0.011111,0.0,0.0,0.0,0.011111,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.011111,0.0,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.011111,0.0,0.0
3,Amboli,0.014085,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.014085,0.0,0.056338,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.042254,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028169,0.0,0.0,0.084507,0.0,0.0,0.0,0.028169,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.014085,0.0,0.0,0.014085,0.0,0.0,0.028169,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028169,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.070423,0.0,0.211268,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.028169,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.056338,0.0,0.0,0.0,0.0,0.0,0.056338,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014085,0.0,0.0
4,Amrut Nagar,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.010989,0.0,0.010989,0.0,0.032967,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.032967,0.0,0.010989,0.0,0.0,0.0,0.0,0.054945,0.0,0.0,0.0,0.065934,0.0,0.0,0.0,0.043956,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.021978,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054945,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032967,0.043956,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032967,0.0,0.098901,0.0,0.0,0.0,0.032967,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,0.0,0.0,0.0,0.032967,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.010989,0.0,0.054945,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010989,0.032967,0.0,0.0,0.0,0.0,0.0,0.021978,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.043956,0.0,0.010989,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.010989,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Asalfa,0.0,0.0,0.0,0.010753,0.0,0.021505,0.021505,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.010753,0.0,0.0,0.032258,0.0,0.0,0.0,0.021505,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.053763,0.0,0.0,0.0,0.021505,0.010753,0.0,0.0,0.107527,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.010753,0.0,0.021505,0.0,0.0,0.0,0.021505,0.0,0.010753,0.010753,0.0,0.0,0.0,0.0,0.0,0.064516,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.032258,0.0,0.0,0.010753,0.11828,0.0,0.0,0.0,0.021505,0.0,0.0,0.010753,0.0,0.010753,0.0,0.010753,0.021505,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.010753,0.0,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021505,0.0,0.0,0.0,0.0,0.0,0.021505,0.0,0.0,0.0,0.010753,0.053763,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.010753,0.010753,0.0,0.010753,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010753,0.010753,0.0,0.0,0.0,0.021505,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Ballard Estate,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09,0.0,0.01,0.01,0.03,0.01,0.0,0.02,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.03,0.0,0.04,0.0,0.07,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Bandstand Promenade,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.05,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.04,0.01,0.0,0.01,0.07,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.05,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.03,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
8,Bangur Nagar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.06,0.0,0.0,0.0,0.04,0.06,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.0,0.01,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.0,0.05,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.03,0.0,0.02,0.01,0.03,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0
9,Bhandup,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.037037,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.148148,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.074074,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Since our main concern of category for this project are Shopping Malls, let's take a look at frequency at which 'Shopping Malls' occur in this table (i.e. the number of times the value in the 'Shopping Malls' category is greater than 0).

In [25]:
len(mum_grouped[mum_grouped["Shopping Mall"] > 0])

29

Well so we have a number for Shopping Malls. As this is the category we are keeping our focus, we really don't care about the other categories and we could make a new Dataframe with the Neighborhoods and the Shopping Mall values only!

In [26]:
mum_mall = mum_grouped[["Neighborhoods","Shopping Mall"]]

Let's check the first 10 values.

In [27]:
mum_mall.head(10)

Unnamed: 0,Neighborhoods,Shopping Mall
0,Aarey Milk Colony,0.02439
1,Agripada,0.0
2,Altamount Road,0.0
3,Amboli,0.014085
4,Amrut Nagar,0.0
5,Asalfa,0.010753
6,Ballard Estate,0.0
7,Bandstand Promenade,0.0
8,Bangur Nagar,0.03
9,Bhandup,0.037037


## Clustering the Neighborhoods Using k-means Clustering

'k-means clustering' is an unsupervised machine learning technique for clustering data. First we need to select the number of centroids for clustering, i.e. the number of clusters we want to grouo the data into, or the value for 'k'. Then the data points are grouped into the cluster of the closest centroid. This step is called Expectation. Then the new centroid/mean of the cluster is calculated. The step is called Maximization. The Expectation Step and Maximization step are in an iteration and continue until none of the data points are left that need to be grouped, or the centroid for each cluster does not change. Since it is unsupervised, we do not supervise this technique when we apply and we check for the results after this technique is applied. Let's use this for clustering the neighborhoods for this project.

As I had mentioned that this is a simple project, so we will group the data points into 3 clusters, or we will use k = 3

In [28]:
kclusters = 3

mum_clustering = mum_mall.drop(["Neighborhoods"], 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(mum_clustering)

kmeans.labels_[0:10]

array([1, 0, 0, 1, 0, 1, 0, 0, 2, 2], dtype=int32)

In the above cell, the last chep checked the cluster labels generated for the first 10 rows.

Now let's create a dataframe with the 'one-hot encoding' values of Shopping Malls in different neighborhoods along with the clusters.

In [29]:
mum_merged = mum_mall.copy()

mum_merged["Cluster Labels"] = kmeans.labels_

I will change the name from Neighborhoods to Neighborhood as we had dropped the title 'Neighborhoods' before and this isa title appropriate as each row signifies one neighborhood in Mumbai. Also, the term 'Neighborhood' will be helpful a little later. Now let's take a look at the first 10 values.

In [30]:
mum_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
mum_merged.head(10)

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Aarey Milk Colony,0.02439,1
1,Agripada,0.0,0
2,Altamount Road,0.0,0
3,Amboli,0.014085,1
4,Amrut Nagar,0.0,0
5,Asalfa,0.010753,1
6,Ballard Estate,0.0,0
7,Bandstand Promenade,0.0,0
8,Bangur Nagar,0.03,2
9,Bhandup,0.037037,2


Now let's add the co-ordinates of the neighborhoods too. The term 'Neighborhood' will be helpful here. Also, let's check the end of the dataframe to see how this dataframe looks like!

In [31]:
mum_merged = mum_merged.join(df.set_index("Neighborhood"), on="Neighborhood")

print(mum_merged.shape)
mum_merged.head()

(93, 6)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Location,Latitude,Longitude
0,Aarey Milk Colony,0.02439,1,Western Suburbs,19.1703,72.8711
1,Agripada,0.0,0,South Mumbai,18.9763,72.8262
2,Altamount Road,0.0,0,South Mumbai,18.9643,72.8078
3,Amboli,0.014085,1,Western Suburbs,19.1291,72.8464
4,Amrut Nagar,0.0,0,Eastern Suburbs,19.1452,72.8467


In [32]:
mum_merged.tail()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Location,Latitude,Longitude
88,Vikhroli,0.03,2,Eastern Suburbs,19.1111,72.9278
89,Vile Parle,0.0,0,Western Suburbs,19.0962,72.8502
90,Virar,0.020833,1,Western Suburbs,19.0166,72.8585
91,Walkeshwar,0.0,0,South Mumbai,18.9501,72.7998
92,Worli,0.02,1,South Mumbai,19.0074,72.8169


Now let us sort the dataframe according to the Cluster Labels. After that let's take a look at the dataframe.

In [33]:
print(mum_merged.shape)
mum_merged.sort_values(["Cluster Labels"], inplace=True)
mum_merged

(93, 6)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Location,Latitude,Longitude
46,Juhu,0.0,0,Western Suburbs,19.0149,72.8452
69,Nalasopara,0.0,0,Western Suburbs,19.42,72.8141
36,Fanas Wadi,0.0,0,South Mumbai,18.9521,72.8272
37,Four Bungalows,0.0,0,Western Suburbs,19.1264,72.8242
38,Gorai,0.0,0,Western Suburbs,19.2409,72.7841
39,Gowalia Tank,0.0,0,South Mumbai,18.9645,72.8112
41,Hiranandani Gardens,0.0,0,Eastern Suburbs,19.119,72.9068
42,I.C. Colony,0.0,0,Western Suburbs,19.2492,72.8502
43,Indian Institute of Technology Bombay campus,0.0,0,Eastern Suburbs,19.1238,72.9112
44,Irla,0.0,0,Western Suburbs,19.0164,72.829


Now let's visualize this data on a map.

In [34]:
clusters_map = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(mum_merged['Latitude'], mum_merged['Longitude'], mum_merged['Neighborhood'], mum_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(clusters_map)
       
clusters_map

Now let's save this map as a html file.

In [35]:
clusters_map.save('clusters_map.html')

## Examining the Clusters

Let's take a look at the first group of clusters, or Cluster 0.

In [36]:
mum_merged.loc[mum_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Location,Latitude,Longitude
46,Juhu,0.0,0,Western Suburbs,19.0149,72.8452
69,Nalasopara,0.0,0,Western Suburbs,19.42,72.8141
36,Fanas Wadi,0.0,0,South Mumbai,18.9521,72.8272
37,Four Bungalows,0.0,0,Western Suburbs,19.1264,72.8242
38,Gorai,0.0,0,Western Suburbs,19.2409,72.7841
39,Gowalia Tank,0.0,0,South Mumbai,18.9645,72.8112
41,Hiranandani Gardens,0.0,0,Eastern Suburbs,19.119,72.9068
42,I.C. Colony,0.0,0,Western Suburbs,19.2492,72.8502
43,Indian Institute of Technology Bombay campus,0.0,0,Eastern Suburbs,19.1238,72.9112
44,Irla,0.0,0,Western Suburbs,19.0164,72.829


Let's take a look at the first group of clusters, or Cluster 1.

In [41]:
mum_merged.loc[mum_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Location,Latitude,Longitude
86,Versova,0.01,1,Western Suburbs,19.1377,72.8135
76,Parel,0.02,1,South Mumbai,18.9957,72.8391
77,Poisar,0.020619,1,Western Suburbs,19.2116,72.8527
75,Pant Nagar,0.01087,1,Eastern Suburbs,19.0863,72.915
90,Virar,0.020833,1,Western Suburbs,19.0166,72.8585
72,Nehru Nagar,0.02,1,Eastern Suburbs,19.0005,72.8228
87,Vidyavihar,0.02,1,Eastern Suburbs,19.0797,72.8973
0,Aarey Milk Colony,0.02439,1,Western Suburbs,19.1703,72.8711
68,Naigaon,0.010309,1,Western Suburbs,19.0119,72.8453
3,Amboli,0.014085,1,Western Suburbs,19.1291,72.8464


Let's take a look at the first group of clusters, or Cluster 2.

In [38]:
mum_merged.loc[mum_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Location,Latitude,Longitude
49,Kanjurmarg,0.03125,2,Eastern Suburbs,19.1314,72.9357
71,Navy Nagar,0.034483,2,South Mumbai,18.906,72.8155
83,Thakur village,0.05,2,Western Suburbs,19.2102,72.8754
13,C.G.S. colony,0.038462,2,South Mumbai,19.1389,72.9382
9,Bhandup,0.037037,2,Eastern Suburbs,19.1456,72.9486
8,Bangur Nagar,0.03,2,Western Suburbs,19.1679,72.8329
88,Vikhroli,0.03,2,Eastern Suburbs,19.1111,72.9278


#### Observations and (Brief) Analysis

In the map, it shows that the neighborhoods with Shopping Malls are scattered across the cities. Also from the clusters we see that Clusters 1 and 2 show the presence of Shopping Malls while Cluster 0 shows the presence of no Shopping Malls. So now let's talk about what each Cluster means for the involved stakeholders, in our case mostly property developers. Cluster 0 will be the neighborhoods the property developers should look forward to building Shopping Malls as the absence of Shopping Malls means no competition when building new ones in these areas. Cluster 1 would be a neighborhoods some of the property developers could target as the presence of some Shopping Malls in these neighborhoods would be of some competition, but depending on the mall and other features the mall will have, the new Shopping Mall built in these neighborhoods could succeed or fail. Since I decided to look into the idea of neighborhoods where new Shopping Malls could be built considering the location of other Shopping Malls only, therefore I will not talk about the other factors and how it could affect the idea of building a Shopping Mall where there are moderate number of Shopping Malls. Lastly, it would be definitely recommended to the property developers to avoid building Shopping Malls in the neighborhoods that fall under Cluster 2 because these Neighborhoods show the presence of more Shopping Malls and considering the presence of so many of these Malls in these neighborhoods, building a Mall in these neighborhoods as the competition with already present Shopping Malls will be very high.

## Final Remarks

With the Observations and the (brief) analysis above, I have completed the Project part needing a Jupyter notebook for this Capstone Project. I have used the Jupyter Notebook available at Watson Studio under a Project. If you are correcting this Capstone Project, I would recomend you to next go to my Report and then the Presentation. Thank you for taking time to correct this project!

If you are not correcting this project, but just reading it to understand or exploring some other Capstone projects, Thank you for reading this Project!