# IBM Applied Data Science Capstone by Coursera

## Week 5 Final Report

### Opening a New Shopping Mall in Kolkata

* Build a dataframe of neighborhoods in Kolkata, India by web scraping the data from Wikipedia page
* Get the geographical coordinates of the neighborhoods
* Obtain the venue data for the neighborhoods from Foursquare API
* Explore and cluster the neighborhoods
* Select the best cluster to open a new shopping mall

*********************************************************************************

#### 1. Install and Import All Relevant Packages and Libraries required to execute the assignment

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

!pip install opencage
from opencage.geocoder import OpenCageGeocode

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print("Environment Set, Packages Installed and Libraries Imported.")

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    openssl-1.1.1d             |       h516909a_0         2.1 MB  conda-forge
    certifi-2019.11.28         |           py36_0         149 KB  conda-forge
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    ca-certificates-2019.11.28 |       hecc5488_0         145 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.50-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

#### 2. Scrap data from Wikipedia page into a DataFrame

In [50]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Kolkata").text

In [51]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')
soup

<!DOCTYPE html>

<html class="client-nojs" dir="ltr" lang="en">
<head>
<meta charset="utf-8"/>
<title>Category:Neighbourhoods in Kolkata - Wikipedia</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgMonthNamesShort":["","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"wgRequestId":"Xi0JhgpAICsAADfcCV0AAACG","wgCSPNonce":!1,"wgCanonicalNamespace":"Category","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":14,"wgPageName":"Category:Neighbourhoods_in_Kolkata","wgTitle":"Neighbourhoods in Kolkata","wgCurRevisionId":922932867,"wgRevisionId":922932867,"wgArticleId":12900196,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Neighbourhoods in W

In [52]:
# create a list to store neighborhood data
neighborhoodList = []

In [53]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [54]:
# create a new DataFrame from the list
kl_df = pd.DataFrame({"Neighborhood": neighborhoodList})

kl_df

Unnamed: 0,Neighborhood
0,Neighbourhoods in Kolkata Metropolitan Area
1,Abhirampur
2,Agarpara
3,Ajoy Nagar
4,Alipore
5,Amodghata
6,Amtala
7,"Anandapur, Kolkata"
8,Ankurhati
9,Argari


In [55]:
# print the number of rows of the dataframe
kl_df.shape

(200, 1)

In [56]:
# drop the first row which is not a real neighborhood but a header in the page
kl_df.drop(0, inplace=True)
kl_df

Unnamed: 0,Neighborhood
1,Abhirampur
2,Agarpara
3,Ajoy Nagar
4,Alipore
5,Amodghata
6,Amtala
7,"Anandapur, Kolkata"
8,Ankurhati
9,Argari
10,Asuti


In [57]:
# print the number of rows of the dataframe
kl_df.shape

(199, 1)

In [58]:
kl_df['Neighborhood'], kl_df['District'] = kl_df['Neighborhood'].str.split(', ', 1).str
kl_df

Unnamed: 0,Neighborhood,District
1,Abhirampur,
2,Agarpara,
3,Ajoy Nagar,
4,Alipore,
5,Amodghata,
6,Amtala,
7,Anandapur,Kolkata
8,Ankurhati,
9,Argari,
10,Asuti,


In [59]:
# print the number of rows of the dataframe
kl_df.shape

(199, 2)

In [60]:
# drop the column District
kl_df.drop('District', axis=1, inplace=True)
kl_df

Unnamed: 0,Neighborhood
1,Abhirampur
2,Agarpara
3,Ajoy Nagar
4,Alipore
5,Amodghata
6,Amtala
7,Anandapur
8,Ankurhati
9,Argari
10,Asuti


In [14]:
# print the number of rows of the dataframe
kl_df.shape

(199, 1)

#### 3. Get the geographical coordinates

In [61]:
#Testing OpenCage Geocoder connection
key = 'a6b019f18da641478606acbc0de60223'  # get api key from:  https://opencagedata.com

geocoder = OpenCageGeocode(key)
query = 'Bijuesca, Spain'  
results = geocoder.geocode(query)
print (results)

[{'annotations': {'DMS': {'lat': "41° 32' 25.83312'' N", 'lng': "1° 55' 13.28232'' W"}, 'MGRS': '30TWL9005499324', 'Maidenhead': 'IN91am99nr', 'Mercator': {'x': -213773.074, 'y': 5064053.763}, 'OSM': {'edit_url': 'https://www.openstreetmap.org/edit?relation=342295#map=16/41.54051/-1.92036', 'note_url': 'https://www.openstreetmap.org/note/new#map=16/41.54051/-1.92036&layers=N', 'url': 'https://www.openstreetmap.org/?mlat=41.54051&mlon=-1.92036#map=16/41.54051/-1.92036'}, 'UN_M49': {'regions': {'ES': '724', 'EUROPE': '150', 'SOUTHERN_EUROPE': '039', 'WORLD': '001'}, 'statistical_groupings': ['MEDC']}, 'callingcode': 34, 'currency': {'alternate_symbols': [], 'decimal_mark': ',', 'html_entity': '&#x20AC;', 'iso_code': 'EUR', 'iso_numeric': '978', 'name': 'Euro', 'smallest_denomination': 1, 'subunit': 'Cent', 'subunit_to_unit': 100, 'symbol': '€', 'symbol_first': 1, 'thousands_separator': '.'}, 'flag': '🇪🇸', 'geohash': 'ezqsk61xh2nts1rzhnmu', 'qibla': 106.81, 'roadinfo': {'drive_on': 'right

In [62]:
lat = results[0]['geometry']['lat']

lng = results[0]['geometry']['lng']

print (lat, lng)

41.5405092 -1.9203562


In [63]:
# create empty lists
list_lat = []   
list_long = []


for index, row in kl_df.iterrows(): # iterate over rows in dataframe



    Neighborhood = row['Neighborhood']
    
    query = str(Neighborhood)+', Kolkata'

    results = geocoder.geocode(query)   
    lat = results[0]['geometry']['lat']
    long = results[0]['geometry']['lng']

    list_lat.append(lat)
    list_long.append(long)

	
# create new columns from lists    

kl_df['Latitude'] = list_lat   

kl_df['Longitude'] = list_long

kl_df

Unnamed: 0,Neighborhood,Latitude,Longitude
1,Abhirampur,22.56263,88.36304
2,Agarpara,22.56263,88.36304
3,Ajoy Nagar,22.56263,88.36304
4,Alipore,22.539171,88.327278
5,Amodghata,22.56263,88.36304
6,Amtala,22.56263,88.36304
7,Anandapur,22.514256,88.409886
8,Ankurhati,22.56263,88.36304
9,Argari,22.56263,88.36304
10,Asuti,22.56263,88.36304


In [64]:
# print the number of rows of the dataframe
kl_df.shape

(199, 3)

In [65]:
# save the DataFrame as CSV file
kl_df.to_csv("kl_df.csv", index=False)

#### 4. Create a map of Kuala Lumpur with neighborhoods superimposed on top

In [66]:
# get the coordinates of Kuala Lumpur
address = 'Kolkata, India'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate i.e latitude and longitude of Kolkata, India is {}, {}.'.format(latitude, longitude))

The geograpical coordinate i.e latitude and longitude of Kolkata, India is 22.54541245, 88.3567751581234.


In [67]:
# create map of Toronto using latitude and longitude values
map_kl = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(kl_df['Latitude'], kl_df['Longitude'], kl_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_kl)  
    
map_kl

In [68]:
# save the map as HTML file
map_kl.save('map_kl.html')

In [69]:
# save the map as PNG file
map_kl.save('map_kl.png')

#### 5. Use the Foursquare API to explore the neighborhoods

In [70]:
# define Foursquare Credentials and Version
CLIENT_ID = 'RP1PM0NLYZKSTC0IGOROVVM5ELB05NVNEKYJDRUZHYS3OQ2M' # your Foursquare ID
CLIENT_SECRET = 'XJJXDQC1DGERDSA32QPGD5O00FQWPHUABGMLTWPSPCYC3TPP' # your Foursquare Secret
VERSION = '20180604' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: RP1PM0NLYZKSTC0IGOROVVM5ELB05NVNEKYJDRUZHYS3OQ2M
CLIENT_SECRET:XJJXDQC1DGERDSA32QPGD5O00FQWPHUABGMLTWPSPCYC3TPP


#### Now, let's get the top 100 venues that are within a radius of 2000 meters.

In [72]:

radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(kl_df['Latitude'], kl_df['Longitude'], kl_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()['response']['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [73]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(8830, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Abhirampur,22.56263,88.36304,Blue & Beyond,22.559131,88.35328,Pub
1,Abhirampur,22.56263,88.36304,The Oberoi Grand,22.561749,88.351594,Hotel
2,Abhirampur,22.56263,88.36304,Peter Cat,22.552365,88.352544,Indian Restaurant
3,Abhirampur,22.56263,88.36304,Lalit Great Eastern Hotel,22.567967,88.35001,Hotel
4,Abhirampur,22.56263,88.36304,Arsalan,22.553897,88.354063,Mughlai Restaurant


#### Let's check how many venues were returned for each neighorhood

In [74]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Abhirampur,53,53,53,53,53,53
Agarpara,53,53,53,53,53,53
Ajoy Nagar,53,53,53,53,53,53
Alipore,27,27,27,27,27,27
Amodghata,53,53,53,53,53,53
Amtala,53,53,53,53,53,53
Anandapur,16,16,16,16,16,16
Ankurhati,53,53,53,53,53,53
Argari,53,53,53,53,53,53
Asuti,53,53,53,53,53,53


#### Let's find out how many unique categories can be curated from all the returned venues

In [75]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 127 uniques categories.


In [38]:
# print out the list of categories
venues_df['VenueCategory'].unique()

array(['Pub', 'Hotel', 'Indian Restaurant', 'Mughlai Restaurant', 'Café',
       'Bakery', 'Restaurant', 'BBQ Joint', 'Bookstore',
       'Asian Restaurant', 'South Indian Restaurant', 'Department Store',
       'Lounge', 'Nightclub', 'Neighborhood', 'Thai Restaurant', 'Market',
       'Japanese Restaurant', 'Fast Food Restaurant', 'Coffee Shop',
       'Chinese Restaurant', 'Snack Place', 'Indian Sweet Shop',
       'Mexican Restaurant', 'Plaza', 'Juice Bar', 'Multiplex', 'Park',
       'Train Station', 'Flea Market', 'Platform', 'Bus Station', 'Pool',
       'Awadhi Restaurant', 'Italian Restaurant', 'History Museum',
       'Dhaba', 'Dessert Shop', 'Performing Arts Venue', 'Military Base',
       'Historic Site', 'Indie Theater', 'Zoo', 'Art Gallery',
       'Pizza Place', 'Food', 'Mediterranean Restaurant', 'Hotel Pool',
       'Racetrack', 'Athletics & Sports', 'Shopping Mall',
       'Fried Chicken Joint', 'Sandwich Place', 'Supermarket',
       'Cricket Ground', 'Harbor / Marina

In [76]:
# check if the results contain "Shopping Mall"
"Neighborhood" in venues_df['VenueCategory'].unique()

True

#### 6. Analyze Each Neighborhood

In [77]:
# one hot encoding
kl_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
kl_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [kl_onehot.columns[-1]] + list(kl_onehot.columns[:-1])
kl_onehot = kl_onehot[fixed_columns]

print(kl_onehot.shape)
kl_onehot.head()

(8830, 128)


Unnamed: 0,Neighborhoods,ATM,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Awadhi Restaurant,BBQ Joint,Bakery,Bank,Bar,Bed & Breakfast,Beer Garden,Bengali Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brewery,Bus Station,Bus Stop,Business Service,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Cricket Ground,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store,Falafel Restaurant,Fast Food Restaurant,Field,Flea Market,Food,Food & Drink Shop,Food Court,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gastropub,Gift Shop,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Historic Site,History Museum,Hookah Bar,Hostel,Hotel,Hotel Pool,IT Services,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Indie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Juice Bar,Kerala Restaurant,Lounge,Market,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Military Base,Motorcycle Shop,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Music Venue,Neighborhood,Nightclub,North Indian Restaurant,Optical Shop,Paper / Office Supplies Store,Park,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Plaza,Pool,Pub,Racetrack,Residential Building (Apartment / Condo),Restaurant,River,Sandwich Place,Scenic Lookout,Shoe Store,Shopping Mall,Snack Place,South Indian Restaurant,Sports Club,Stadium,Steakhouse,Supermarket,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Theme Park,Theme Restaurant,Tibetan Restaurant,Train Station,Tram Station,Vegetarian / Vegan Restaurant,Watch Shop,Zoo
0,Abhirampur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Abhirampur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Abhirampur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Abhirampur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Abhirampur,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [78]:
kl_grouped = kl_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(kl_grouped.shape)
kl_grouped

(197, 128)


Unnamed: 0,Neighborhoods,ATM,Airport,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Awadhi Restaurant,BBQ Joint,Bakery,Bank,Bar,Bed & Breakfast,Beer Garden,Bengali Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Brewery,Bus Station,Bus Stop,Business Service,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Cricket Ground,Department Store,Dessert Shop,Dhaba,Diner,Dumpling Restaurant,Electronics Store,Falafel Restaurant,Fast Food Restaurant,Field,Flea Market,Food,Food & Drink Shop,Food Court,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gastropub,Gift Shop,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Historic Site,History Museum,Hookah Bar,Hostel,Hotel,Hotel Pool,IT Services,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Indie Theater,Irish Pub,Italian Restaurant,Japanese Restaurant,Juice Bar,Kerala Restaurant,Lounge,Market,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Military Base,Motorcycle Shop,Movie Theater,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Music Venue,Neighborhood,Nightclub,North Indian Restaurant,Optical Shop,Paper / Office Supplies Store,Park,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Plaza,Pool,Pub,Racetrack,Residential Building (Apartment / Condo),Restaurant,River,Sandwich Place,Scenic Lookout,Shoe Store,Shopping Mall,Snack Place,South Indian Restaurant,Sports Club,Stadium,Steakhouse,Supermarket,Tea Room,Tex-Mex Restaurant,Thai Restaurant,Theme Park,Theme Restaurant,Tibetan Restaurant,Train Station,Tram Station,Vegetarian / Vegan Restaurant,Watch Shop,Zoo
0,Abhirampur,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.018868,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.018868,0.0,0.0,0.075472,0.018868,0.0,0.018868,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.075472,0.0,0.0,0.0,0.09434,0.037736,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.018868,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.056604,0.0,0.018868,0.0,0.0,0.018868,0.037736,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.018868,0.018868,0.018868,0.056604,0.0,0.0,0.037736,0.0,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0
1,Agarpara,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.018868,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.018868,0.0,0.0,0.075472,0.018868,0.0,0.018868,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.075472,0.0,0.0,0.0,0.09434,0.037736,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.018868,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.056604,0.0,0.018868,0.0,0.0,0.018868,0.037736,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.018868,0.018868,0.018868,0.056604,0.0,0.0,0.037736,0.0,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0
2,Ajoy Nagar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.018868,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.018868,0.0,0.0,0.075472,0.018868,0.0,0.018868,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.075472,0.0,0.0,0.0,0.09434,0.037736,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.018868,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.056604,0.0,0.018868,0.0,0.0,0.018868,0.037736,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.018868,0.018868,0.018868,0.056604,0.0,0.0,0.037736,0.0,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0
3,Alipore,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.037037,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.074074,0.0,0.0,0.0,0.0,0.0,0.037037,0.037037,0.0,0.0,0.0,0.0,0.074074,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.037037,0.0,0.0,0.037037,0.037037,0.0,0.0,0.037037,0.0,0.0,0.037037,0.0,0.074074,0.0,0.0,0.0,0.037037,0.0,0.037037,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.037037,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037
4,Amodghata,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.018868,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.018868,0.0,0.0,0.075472,0.018868,0.0,0.018868,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.075472,0.0,0.0,0.0,0.09434,0.037736,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.018868,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.056604,0.0,0.018868,0.0,0.0,0.018868,0.037736,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.018868,0.018868,0.018868,0.056604,0.0,0.0,0.037736,0.0,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0
5,Amtala,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.018868,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.018868,0.0,0.0,0.075472,0.018868,0.0,0.018868,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.075472,0.0,0.0,0.0,0.09434,0.037736,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.018868,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.056604,0.0,0.018868,0.0,0.0,0.018868,0.037736,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.018868,0.018868,0.018868,0.056604,0.0,0.0,0.037736,0.0,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0
6,Anandapur,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Ankurhati,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.018868,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.018868,0.0,0.0,0.075472,0.018868,0.0,0.018868,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.075472,0.0,0.0,0.0,0.09434,0.037736,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.018868,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.056604,0.0,0.018868,0.0,0.0,0.018868,0.037736,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.018868,0.018868,0.018868,0.056604,0.0,0.0,0.037736,0.0,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0
8,Argari,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.018868,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.018868,0.0,0.0,0.075472,0.018868,0.0,0.018868,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.075472,0.0,0.0,0.0,0.09434,0.037736,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.018868,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.056604,0.0,0.018868,0.0,0.0,0.018868,0.037736,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.018868,0.018868,0.018868,0.056604,0.0,0.0,0.037736,0.0,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0
9,Asuti,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.018868,0.037736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.0,0.0,0.018868,0.0,0.0,0.075472,0.018868,0.0,0.018868,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.075472,0.0,0.0,0.0,0.09434,0.037736,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.018868,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.056604,0.0,0.018868,0.0,0.0,0.018868,0.037736,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.018868,0.018868,0.018868,0.056604,0.0,0.0,0.037736,0.0,0.0,0.0,0.0,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.018868,0.0,0.0,0.0,0.0


In [80]:
len(kl_grouped[kl_grouped["Shopping Mall"] > 0])

35

#### Create a new DataFrame for Shopping Mall data only

In [81]:
kl_mall = kl_grouped[["Neighborhoods","Shopping Mall"]]
kl_mall.head()

Unnamed: 0,Neighborhoods,Shopping Mall
0,Abhirampur,0.0
1,Agarpara,0.0
2,Ajoy Nagar,0.0
3,Alipore,0.0
4,Amodghata,0.0


#### 7. Cluster Neighborhoods

#### Run k-means to cluster the neighborhoods in Kuala Lumpur into 3 clusters.

In [82]:
# set number of clusters
kclusters = 3

kl_clustering = kl_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kl_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1,
       1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 2, 1, 0, 1,
       1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 2, 0, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1],
      dtype=int32)

In [85]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
kl_merged = kl_mall.copy()

# add clustering labels
kl_merged["Cluster Labels"] = kmeans.labels_

In [86]:
kl_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
kl_merged

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Abhirampur,0.0,1
1,Agarpara,0.0,1
2,Ajoy Nagar,0.0,1
3,Alipore,0.0,1
4,Amodghata,0.0,1
5,Amtala,0.0,1
6,Anandapur,0.125,2
7,Ankurhati,0.0,1
8,Argari,0.0,1
9,Asuti,0.0,1


In [87]:
# merge kl_grouped with kl_df to add latitude/longitude for each neighborhood
kl_merged = kl_merged.join(kl_df.set_index("Neighborhood"), on="Neighborhood")

print(kl_merged.shape)
kl_merged # check the last columns!

(198, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Abhirampur,0.0,1,22.56263,88.36304
1,Agarpara,0.0,1,22.56263,88.36304
2,Ajoy Nagar,0.0,1,22.56263,88.36304
3,Alipore,0.0,1,22.539171,88.327278
4,Amodghata,0.0,1,22.56263,88.36304
5,Amtala,0.0,1,22.56263,88.36304
6,Anandapur,0.125,2,22.514256,88.409886
7,Ankurhati,0.0,1,22.56263,88.36304
8,Argari,0.0,1,22.56263,88.36304
9,Asuti,0.0,1,22.56263,88.36304


In [88]:
# sort the results by Cluster Labels
print(kl_merged.shape)
kl_merged.sort_values(["Cluster Labels"], inplace=True)
kl_merged

(198, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
100,Dhakuria,0.031746,0,22.507255,88.36733
56,Bijoygarh,0.034483,0,22.48727,88.368559
58,Bikramgarh,0.057143,0,22.496641,88.36161
152,Jodhpur Park,0.028571,0,22.505606,88.363674
134,Hastings,0.03125,0,22.545934,88.327139
130,Haltu,0.076923,0,22.510785,88.381874
160,Kalikapur,0.0625,0,22.502229,88.387492
127,Golf Green,0.0625,0,22.491333,88.361838
126,Gobindapur,0.032258,0,22.501995,88.357767
168,Kankurgachi,0.045455,0,22.578972,88.391517


#### Finally, let's visualize the resulting clusters

In [89]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kl_merged['Latitude'], kl_merged['Longitude'], kl_merged['Neighborhood'], kl_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [90]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

In [91]:
# save the map as PNG file
map_clusters.save('map_clusters.png')

#### 8. Examine Clusters

#### Cluster 0

In [92]:
kl_merged.loc[kl_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
100,Dhakuria,0.031746,0,22.507255,88.36733
56,Bijoygarh,0.034483,0,22.48727,88.368559
58,Bikramgarh,0.057143,0,22.496641,88.36161
152,Jodhpur Park,0.028571,0,22.505606,88.363674
134,Hastings,0.03125,0,22.545934,88.327139
130,Haltu,0.076923,0,22.510785,88.381874
160,Kalikapur,0.0625,0,22.502229,88.387492
127,Golf Green,0.0625,0,22.491333,88.361838
126,Gobindapur,0.032258,0,22.501995,88.357767
168,Kankurgachi,0.045455,0,22.578972,88.391517


#### Cluster 1

In [93]:
kl_merged.loc[kl_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
106,Dum Dum,0.0,1,22.621114,88.392897
131,Hanspukuria,0.0,1,22.56263,88.36304
132,Haridevpur,0.0,1,22.56263,88.36304
133,Harinavi,0.0,1,22.56263,88.36304
105,Dhuilya,0.0,1,22.56263,88.36304
113,Entally,0.0,1,22.56263,88.36304
137,Hatibagan,0.0,1,22.56263,88.36304
138,Hind Motor,0.0,1,22.56263,88.36304
103,Dharmapur,0.0,1,22.567746,88.347602
139,Howrah,0.0,1,22.58419,88.341251


#### Cluster 2

In [94]:
kl_merged.loc[kl_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
6,Anandapur,0.125,2,22.514256,88.409886
54,Bidhannagar,0.136364,2,22.58162,88.452387
150,Jetia,0.166667,2,22.564982,88.313928
171,Kasba,0.1,2,22.514337,88.405636


### Observations:
Most of the shopping malls are concentrated in the 4 areas around Anandapur, Bidhannagar. Jetia and Kasba which form cluster 2 and a few moderate number in cluster 0. On the other hand, cluster 1 has very low number to totally no shopping mall in the neighborhoods. This represents a great opportunity and high potential areas to open new shopping malls as there is very little to no competition from existing malls. Meanwhile, shopping malls in cluster 2 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. From another perspective, this also shows that the oversupply of shopping malls mostly happened in the 4 areas of the city - Anandapur, Bidhannagar. Jetia and Kasba, Therefore, this project recommends property developers to capitalize on these findings to definitely open new shopping malls in neighborhoods in cluster 1 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighborhoods in cluster 0 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 2 which already have high concentration of shopping malls and suffering from intense competition.