
### Week 5 Final Report
**_Opening a New Shopping Complex in Mumbai, India_**
- Build a dataframe of neighborhoods in Mumbai, India by web scraping the data from Wikipedia page
- Get the geographical coordinates of the neighborhoods
- Obtain the venue data for the neighborhoods from Foursquare API
- Explore and cluster the neighborhoods
- Select the best cluster to open a new Shopping Complex
***
### 1. Import libraries

In [103]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
import json # library to handle JSON files
from bs4 import BeautifulSoup # library to parse HTML and XML documents

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans


import requests # library to handle requests
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

print('Folium installed')
print('Libraries imported.')


Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Folium installed
Libraries imported.


In [104]:
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

### 2. Scrap data from Wikipedia page into a DataFrame

In [105]:
# send the GET request to fetch the suburbs of Mumbai, India
data = requests.get("https://en.wikipedia.org/wiki/Category:Suburbs_of_Mumbai").text

In [106]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [107]:
# create a list to store neighborhood data
neighborhoodList = []

In [108]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [109]:
# create a new DataFrame from the list
kl_df = pd.DataFrame({"Neighborhood": neighborhoodList})

kl_df.head()

Unnamed: 0,Neighborhood
0,Andheri
1,Anushakti Nagar
2,Baiganwadi
3,Bandra
4,Bhandup


In [110]:
# print the number of rows of the dataframe
# cleansing data removing location that is not appropriate
kl_df=kl_df[~ kl_df['Neighborhood'].str.contains('\(')]
kl_df=kl_df[~ kl_df['Neighborhood'].str.contains(',')]
#resetting index
kl_df.reset_index(inplace = True, drop = True)


In [111]:
# define a function to get coordinates
def get_latlng(neighborhood):
    geolocator = Nominatim(user_agent="foursquare_agent")
    # initialize your variable to None
    lat_lng_coords = None
    #neighborhood=neighborhood.split(',')[0]
    
# used try catch block here to handle service issue with geolocator, if some timeout or service not found issue comes then in that case assign none,none
# because of service issue, later we filter out those rows from dataframe for thoose rows we donot have latitude,longitude
    try:
        lat_lng_coords = geolocator.geocode('{}, Mumbai, India'.format(neighborhood))
        latitude = lat_lng_coords.latitude
        longitude = lat_lng_coords.longitude
    except:
        latitude=None
        longitude=None
 
    return [latitude,longitude]

In [112]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in kl_df["Neighborhood"].tolist() ]

In [113]:
coords

[[19.1196976, 72.8464205],
 [19.0395778, 72.9221562],
 [19.06191305, 72.9249720999947],
 [19.0549792, 72.8402203],
 [19.1438684, 72.9384327],
 [19.2287385, 72.8568773],
 [19.2141193, 72.8258652],
 [19.0612128, 72.8975909],
 [19.2571784, 72.8575361],
 [19.2256037, 72.8654301],
 [19.2327915, 73.032249],
 [19.0859539, 72.9082381],
 [19.1647526, 72.8500176],
 [18.9644472, 72.8135727],
 [19.1348994, 72.8488199],
 [19.1070215, 72.8275275],
 [19.1378919, 72.8106677822975],
 [19.2043739, 72.851978],
 [19.1296868, 72.9283701],
 [None, None],
 [19.0652797, 72.8793805],
 [19.0485178, 72.9323356],
 [18.9159239, 72.8197358],
 [19.12886815, 72.8608220141623],
 [19.1722904, 72.9564695],
 [19.2329212, 73.0322948],
 [19.0724416, 72.9029071955208],
 [19.1297617, 72.8213781],
 [19.1131295, 73.0116982],
 [19.2097189, 72.8759248],
 [19.075713, 73.0003541],
 [None, None],
 [19.0269192, 72.8759337],
 [19.0116962, 72.8180702]]

In [114]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [115]:
# merge the coordinates into the original dataframe
kl_df['Latitude'] = df_coords['Latitude']
kl_df['Longitude'] = df_coords['Longitude']

In [116]:
# check the neighborhoods and the coordinates
print(kl_df.shape)
kl_df

(34, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Andheri,19.119698,72.84642
1,Anushakti Nagar,19.039578,72.922156
2,Baiganwadi,19.061913,72.924972
3,Bandra,19.054979,72.84022
4,Bhandup,19.143868,72.938433
5,Borivali,19.228738,72.856877
6,Charkop,19.214119,72.825865
7,Chembur,19.061213,72.897591
8,Dahisar,19.257178,72.857536
9,Devipada,19.225604,72.86543


In [117]:
# drop records for which because of geolocator service issue longitude and latitude is not available
kl_df=kl_df.dropna()

#reset the index
kl_df.reset_index(inplace = True, drop = True)

#print dataframe
print(kl_df)
# save the DataFrame as CSV file
#kl_df.to_csv("kl_df.csv", index=False)

       Neighborhood   Latitude  Longitude
0           Andheri  19.119698  72.846420
1   Anushakti Nagar  19.039578  72.922156
2        Baiganwadi  19.061913  72.924972
3            Bandra  19.054979  72.840220
4           Bhandup  19.143868  72.938433
5          Borivali  19.228738  72.856877
6           Charkop  19.214119  72.825865
7           Chembur  19.061213  72.897591
8           Dahisar  19.257178  72.857536
9          Devipada  19.225604  72.865430
10         Dombivli  19.232792  73.032249
11        Ghatkopar  19.085954  72.908238
12         Goregaon  19.164753  72.850018
13       Grant Road  18.964447  72.813573
14       Jogeshwari  19.134899  72.848820
15             Juhu  19.107021  72.827528
16           Kalyan  19.137892  72.810668
17        Kandivali  19.204374  72.851978
18       Kanjurmarg  19.129687  72.928370
19            Kurla  19.065280  72.879380
20         Mankhurd  19.048518  72.932336
21        Mira Road  18.915924  72.819736
22    Mogra Village  19.128868  72

### 4. Create a map of Mumbai, India with neighborhoods superimposed on top

In [118]:
# get the coordinates of Mumbai
address = 'Mumbai, India'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Mumbai, india {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Mumbai, india 18.9387711, 72.8353355.


In [119]:
# create map of Toronto using latitude and longitude values
map_kl = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(kl_df['Latitude'], kl_df['Longitude'], kl_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_kl)  
    
map_kl

In [120]:
# save the map as HTML file
#map_kl.save('map_kl.html')

### 5. Use the Foursquare API to explore the neighborhoods

In [121]:
# define Foursquare Credentials and Version
CLIENT_ID = 'TBRZXF5S5UQCQOAY3UU2TIGKYVAP1RBQSV4OXVYNI4U2DFBR' # your Foursquare ID
CLIENT_SECRET = 'SLCP0ERR5IXDJ5YCRG3XMXIMYT0ORDLZEGWTP2AQSVLG4TIO' # your Foursquare Secret
VERSION = '20190605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: TBRZXF5S5UQCQOAY3UU2TIGKYVAP1RBQSV4OXVYNI4U2DFBR
CLIENT_SECRET:SLCP0ERR5IXDJ5YCRG3XMXIMYT0ORDLZEGWTP2AQSVLG4TIO


**Now, let's get the top 100 venues that are within a radius of 2000 meters.**

In [122]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(kl_df['Latitude'], kl_df['Longitude'], kl_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [123]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(2061, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Andheri,19.119698,72.84642,Merwans Cake shop,19.1193,72.845418,Bakery
1,Andheri,19.119698,72.84642,Radha Krishna Veg Restaurant,19.11513,72.84306,Indian Restaurant
2,Andheri,19.119698,72.84642,McDonald's,19.119691,72.846102,Fast Food Restaurant
3,Andheri,19.119698,72.84642,Naturals,19.111204,72.837255,Ice Cream Shop
4,Andheri,19.119698,72.84642,Narayan Sandwich,19.121398,72.85027,Sandwich Place


**Let's check how many venues were returned for each neighorhood**

In [124]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Andheri,92,92,92,92,92,92
Anushakti Nagar,21,21,21,21,21,21
Baiganwadi,12,12,12,12,12,12
Bandra,100,100,100,100,100,100
Bhandup,25,25,25,25,25,25
Borivali,98,98,98,98,98,98
Charkop,33,33,33,33,33,33
Chembur,100,100,100,100,100,100
Dahisar,48,48,48,48,48,48
Devipada,84,84,84,84,84,84


**Let's find out how many unique categories can be curated from all the returned venues**

In [125]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 192 uniques categories.


In [126]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:192]

array(['Bakery', 'Indian Restaurant', 'Fast Food Restaurant',
       'Ice Cream Shop', 'Sandwich Place', 'Falafel Restaurant',
       'Coffee Shop', 'Pizza Place', 'Pub', 'Juice Bar', 'Café',
       'Seafood Restaurant', 'Snack Place', 'Hotel', 'Asian Restaurant',
       'Chinese Restaurant', 'Electronics Store', 'Department Store',
       'Vegetarian / Vegan Restaurant', 'Bar', 'BBQ Joint', 'Food Truck',
       'Burger Joint', 'Multiplex', 'Athletics & Sports',
       'Gym / Fitness Center', 'Martial Arts Dojo', 'Lounge', 'Tea Room',
       'Cocktail Bar', 'Restaurant', 'Dessert Shop', 'Bowling Alley',
       'Arts & Crafts Store', 'Food', 'Plaza', 'Supermarket',
       'Music Venue', 'Park', 'Sports Bar', 'Platform',
       'Food & Drink Shop', 'Hot Dog Joint', 'Train Station',
       'Fried Chicken Joint', 'Sports Club', 'Gourmet Shop',
       'Deli / Bodega', 'German Restaurant', 'Korean Restaurant',
       'Salad Place', 'Modern European Restaurant', 'Sushi Restaurant',
       'In

In [128]:
# check if the results contain "Shopping Mall"
"Neighborhood" in venues_df['VenueCategory'].unique()

True

### 6. Analyze Each Neighborhood

In [129]:
# one hot encoding
kl_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
kl_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [kl_onehot.columns[-1]] + list(kl_onehot.columns[:-1])
kl_onehot = kl_onehot[fixed_columns]

print(kl_onehot.shape)
kl_onehot.head()

(2061, 193)


Unnamed: 0,Neighborhoods,Afghan Restaurant,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Beach,Bed & Breakfast,Beer Garden,Bengali Restaurant,Big Box Store,Bistro,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Brewery,Building,Burger Joint,Bus Station,Cafeteria,Café,Campground,Chaat Place,Cheese Shop,Chinese Restaurant,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Auditorium,College Gym,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Coworking Space,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Donut Shop,Electronics Store,Event Space,Factory,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Garden Center,Gastropub,General College & University,General Entertainment,German Restaurant,Gift Shop,Goan Restaurant,Golf Course,Gourmet Shop,Government Building,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Harbor / Marina,Historic Site,History Museum,Hookah Bar,Hot Dog Joint,Hotel,Hotel Bar,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indie Movie Theater,Indoor Play Area,Irani Cafe,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Korean Restaurant,Lake,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Martial Arts Dojo,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Mountain,Movie Theater,Mughlai Restaurant,Multiplex,Music Store,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,Office,Opera House,Other Great Outdoors,Outdoors & Recreation,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Platform,Playground,Plaza,Pool,Pub,Punjabi Restaurant,Racetrack,Recreation Center,Residential Building (Apartment / Condo),Resort,Restaurant,River,Road,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Sculpture Garden,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Social Club,South Indian Restaurant,Spa,Spanish Restaurant,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Tea Room,Tennis Court,Thai Restaurant,Theater,Theme Park,Toll Booth,Toy / Game Store,Track,Track Stadium,Trail,Train Station,Vegetarian / Vegan Restaurant,Video Game Store,Wine Bar,Women's Store
0,Andheri,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Andheri,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Andheri,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Andheri,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Andheri,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


**Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category**

In [130]:
kl_grouped = kl_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(kl_grouped.shape)
kl_grouped

len(kl_grouped[kl_grouped["Shopping Mall"] > 0])

(32, 193)


13

**Create a new DataFrame for Shopping Mall data only**

In [131]:
kl_mall = kl_grouped[["Neighborhoods","Shopping Mall"]]

In [132]:
kl_mall.head()

Unnamed: 0,Neighborhoods,Shopping Mall
0,Andheri,0.0
1,Anushakti Nagar,0.0
2,Baiganwadi,0.0
3,Bandra,0.0
4,Bhandup,0.08


### 7. Cluster Neighborhoods
Run k-means to cluster the neighborhoods in Mumbai into 3 clusters.

In [133]:
# set number of clusters
kclusters = 3

kl_clustering = kl_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kl_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 1, 1, 0, 2, 1, 2, 1, 2], dtype=int32)

In [134]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
kl_merged = kl_mall.copy()

# add clustering labels
kl_merged["Cluster Labels"] = kmeans.labels_

In [135]:
kl_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
kl_merged.head()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Andheri,0.0,1
1,Anushakti Nagar,0.0,1
2,Baiganwadi,0.0,1
3,Bandra,0.0,1
4,Bhandup,0.08,0


In [136]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
kl_merged = kl_merged.join(kl_df.set_index("Neighborhood"), on="Neighborhood")

print(kl_merged.shape)
kl_merged.head() # check the last columns!

(32, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Andheri,0.0,1,19.119698,72.84642
1,Anushakti Nagar,0.0,1,19.039578,72.922156
2,Baiganwadi,0.0,1,19.061913,72.924972
3,Bandra,0.0,1,19.054979,72.84022
4,Bhandup,0.08,0,19.143868,72.938433


In [137]:
# sort the results by Cluster Labels
print(kl_merged.shape)
kl_merged.sort_values(["Cluster Labels"], inplace=True)
kl_merged

(32, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
28,Thakur village,0.048387,0,19.209719,72.875925
4,Bhandup,0.08,0,19.143868,72.938433
0,Andheri,0.0,1,19.119698,72.84642
27,Shil Phata,0.0,1,19.113129,73.011698
26,Seven Bungalows,0.0,1,19.129762,72.821378
24,Mumbra,0.0,1,19.232921,73.032295
22,Mogra Village,0.0,1,19.128868,72.860822
21,Mira Road,0.0,1,18.915924,72.819736
20,Mankhurd,0.0,1,19.048518,72.932336
19,Kurla,0.0,1,19.06528,72.87938


**Finally, let's visualize the resulting clusters**

In [138]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kl_merged['Latitude'], kl_merged['Longitude'], kl_merged['Neighborhood'], kl_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [139]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

### 8. Examine Clusters

#### Cluster 0

In [140]:
kl_merged.loc[kl_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
28,Thakur village,0.048387,0,19.209719,72.875925
4,Bhandup,0.08,0,19.143868,72.938433


#### Cluster 1

In [141]:
kl_merged.loc[kl_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Andheri,0.0,1,19.119698,72.84642
27,Shil Phata,0.0,1,19.113129,73.011698
26,Seven Bungalows,0.0,1,19.129762,72.821378
24,Mumbra,0.0,1,19.232921,73.032295
22,Mogra Village,0.0,1,19.128868,72.860822
21,Mira Road,0.0,1,18.915924,72.819736
20,Mankhurd,0.0,1,19.048518,72.932336
19,Kurla,0.0,1,19.06528,72.87938
16,Kalyan,0.0,1,19.137892,72.810668
30,Wadala,0.0,1,19.026919,72.875934


#### Cluster 2

In [142]:
kl_merged.loc[kl_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
29,Vashi,0.012658,2,19.075713,73.000354
25,Pestom sagar,0.02381,2,19.072442,72.902907
5,Borivali,0.020408,2,19.228738,72.856877
12,Goregaon,0.02,2,19.164753,72.850018
7,Chembur,0.01,2,19.061213,72.897591
9,Devipada,0.02381,2,19.225604,72.86543
18,Kanjurmarg,0.023256,2,19.129687,72.92837
17,Kandivali,0.012987,2,19.204374,72.851978
11,Ghatkopar,0.030612,2,19.085954,72.908238
23,Mulund,0.028986,2,19.17229,72.956469


#### Observations:
Most of the shopping malls are concentrated in the central area of Mumbai, with the highest number in cluster 2 and moderate number in cluster 0. On the other hand, cluster 1 has very low number to totally no shopping mall in the neighborhoods. This represents a great opportunity and high potential areas to open new shopping malls as there is very little to no competition from existing malls. Meanwhile, shopping malls in cluster 2 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. From another perspective, this also shows that the oversupply of shopping malls mostly happened in the central area of the city, with the suburb area still have very few shopping malls. Therefore, this project recommends property developers to capitalize on these findings to open new shopping malls in neighborhoods in cluster 1 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighborhoods in cluster 0 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 2 which already have high concentration of shopping malls and suffering from intense competition.

