# IBM Applied Data Science Capstone Course by Coursera¶


<b>Week 5 Final Report
* Opening a New Shopping Mall in Jaipur, India.</b>


* Build a dataframe of neighborhoods in Jaipur, India by web scraping the data from Wikipedia page
* Get the geographical coordinates of the neighborhoods
* Obtain the venue data for the neighborhoods from Foursquare API
* Explore and cluster the neighborhoods
* Select the best cluster to open a new shopping mall

In [3]:
#Importing Libraries

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


In [4]:
### Scraping Data from wikipedia API fr jaipur
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Jaipur").text

In [5]:

# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [7]:
# create a list to store neighborhood data
neighborhoodList = []

In [8]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [9]:
# create a new DataFrame from the list
jai_df = pd.DataFrame({"Neighborhood": neighborhoodList})

jai_df.head()

Unnamed: 0,Neighborhood
0,"Anand Lok, Jaipur"
1,Bhankrota
2,Chandpole
3,Hawa Mahal
4,Jawahar Circle


In [10]:
# print the number of rows of the dataframe
jai_df.shape

(14, 1)

<b> Getting Geo coordinates( longitude, lattitude)</b>


In [11]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Kuala Lumpur, Malaysia'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [13]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in jai_df["Neighborhood"].tolist() ]

In [14]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [15]:
# merge the coordinates into the original dataframe
jai_df['Latitude'] = df_coords['Latitude']
jai_df['Longitude'] = df_coords['Longitude']

In [16]:
# check the neighborhoods and the coordinates
print(jai_df.shape)
jai_df

(14, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,"Anand Lok, Jaipur",3.14789,101.69405
1,Bhankrota,3.14789,101.69405
2,Chandpole,3.14789,101.69405
3,Hawa Mahal,3.14789,101.69405
4,Jawahar Circle,3.08243,101.67083
5,Jhotwara,3.14789,101.69405
6,Kathputhli slum,3.14789,101.69405
7,Malviya Nagar (Jaipur),3.14789,101.69405
8,Mansarovar (Jaipur),3.14789,101.69405
9,"Pratap Nagar, Jaipur",3.14789,101.69405


In [18]:
# save the DataFrame as CSV file
jai_df.to_csv("jai_df.csv", index=False)

<b>Create a map of Kuala Lumpur with neighborhoods superimposed on top</b>

In [20]:
# get the coordinates of Kuala Lumpur
address = 'Jaipur, India'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Kuala Lumpur, Malaysiae {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Kuala Lumpur, Malaysiae 26.916194, 75.820349.


In [23]:
# create map of Toronto using latitude and longitude values
map_jai = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(jai_df['Latitude'], jai_df['Longitude'], jai_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_kl)  
    
map_jai

In [24]:
# save the map as HTML file
map_jai.save('map_jai.html')

# Use the Foursquare API to explore the neighborhoods¶

In [56]:
# define Foursquare Credentials and Version
CLIENT_ID = 'DWR42D******************' # your Foursquare ID
CLIENT_SECRET = 'U1DUUZO***************'# your Foursquare Secret
VERSION = '20190211' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: DWR42D******************
CLIENT_SECRET:U1DUUZO***************


In [27]:
# Now, let's get the top 100 venues that are within a radius of 5000 meters.

radius = 5000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(jai_df['Latitude'], jai_df['Longitude'], jai_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [28]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(1400, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,"Anand Lok, Jaipur",3.14789,101.69405,Urbanscapes House,3.146803,101.696028,Exhibit
1,"Anand Lok, Jaipur",3.14789,101.69405,Islamic Arts Museum Malaysia,3.141595,101.689837,Museum
2,"Anand Lok, Jaipur",3.14789,101.69405,Shin Kee Beef Noodles,3.145058,101.696877,Noodle House
3,"Anand Lok, Jaipur",3.14789,101.69405,茨厂街驰名罗汉果龙眼冰糖炖冬瓜 (Air Mata Kucing),3.144269,101.697703,Food Truck
4,"Anand Lok, Jaipur",3.14789,101.69405,Family Mart,3.145195,101.698606,Convenience Store


Let's check how many venues were returned for each neighorhood


In [29]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Anand Lok, Jaipur",100,100,100,100,100,100
Bhankrota,100,100,100,100,100,100
Chandpole,100,100,100,100,100,100
Hawa Mahal,100,100,100,100,100,100
Jawahar Circle,100,100,100,100,100,100
Jhotwara,100,100,100,100,100,100
Kathputhli slum,100,100,100,100,100,100
Malviya Nagar (Jaipur),100,100,100,100,100,100
Mansarovar (Jaipur),100,100,100,100,100,100
"Pratap Nagar, Jaipur",100,100,100,100,100,100


find out how many unique categories in venues

In [30]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 80 uniques categories.


In [31]:
# print out the list of categories
venues_df['VenueCategory'].unique()

array(['Exhibit', 'Museum', 'Noodle House', 'Food Truck',
       'Convenience Store', 'Hotel', 'Speakeasy', 'Dessert Shop',
       'Malay Restaurant', 'Chettinad Restaurant', 'Park', 'Café',
       'Indian Restaurant', 'Monument / Landmark',
       'Latin American Restaurant', 'IT Services', 'Chinese Restaurant',
       'Gym', 'Hotel Pool', 'Halal Restaurant', 'Bar', 'Coffee Shop',
       'Tapas Restaurant', 'Spanish Restaurant', 'Gift Shop',
       'Restaurant', 'Hotel Bar', 'Italian Restaurant', 'Shopping Mall',
       'Department Store', 'Stationery Store', 'Resort',
       'Vegetarian / Vegan Restaurant', 'Building', 'Bookstore',
       'Japanese Restaurant', 'Asian Restaurant', 'Dim Sum Restaurant',
       'Spa', 'Udon Restaurant', 'Beer Bar', 'Lounge', 'Jewelry Store',
       'Juice Bar', 'Cosmetics Shop', 'Hainan Restaurant', 'Pool',
       'Hostel', 'Cocktail Bar', 'Multiplex',
       'Tourist Information Center', 'Boutique', 'Seafood Restaurant',
       'Tea Room', 'Thai Resta

In [32]:
# check if the results contain "Shopping Mall"
"Neighborhood" in venues_df['VenueCategory'].unique()

False

<b>Analyze Each Neighborhood</b>

In [36]:
# one hot encoding
jai_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
jai_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [jai_onehot.columns[-1]] + list(jai_onehot.columns[:-1])
jai_onehot = jai_onehot[fixed_columns]

print(jai_onehot.shape)
jai_onehot.head()

(1400, 81)


Unnamed: 0,Neighborhoods,Asian Restaurant,BBQ Joint,Bakery,Bar,Beer Bar,Bookstore,Boutique,Bubble Tea Shop,Building,Burger Joint,Café,Chettinad Restaurant,Chinese Restaurant,Cocktail Bar,Coffee Shop,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop,Dim Sum Restaurant,Exhibit,Fish Market,Food Truck,Fruit & Vegetable Store,Gift Shop,Gym,Hainan Restaurant,Halal Restaurant,Hostel,Hotel,Hotel Bar,Hotel Pool,Hotpot Restaurant,Hunan Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Latin American Restaurant,Lounge,Malay Restaurant,Massage Studio,Middle Eastern Restaurant,Monument / Landmark,Motorcycle Shop,Movie Theater,Multiplex,Museum,Nightclub,Noodle House,Park,Pet Store,Pool,Residential Building (Apartment / Condo),Resort,Restaurant,Seafood Restaurant,Shopping Mall,South Indian Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Stationery Store,Supermarket,Tapas Restaurant,Tea Room,Thai Restaurant,Tourist Information Center,Trail,Udon Restaurant,Vegetarian / Vegan Restaurant,Wine Bar,Women's Store
0,"Anand Lok, Jaipur",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Anand Lok, Jaipur",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Anand Lok, Jaipur",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Anand Lok, Jaipur",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Anand Lok, Jaipur",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [37]:
jai_grouped = jai_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(jai_grouped.shape)
jai_grouped

(14, 81)


Unnamed: 0,Neighborhoods,Asian Restaurant,BBQ Joint,Bakery,Bar,Beer Bar,Bookstore,Boutique,Bubble Tea Shop,Building,Burger Joint,Café,Chettinad Restaurant,Chinese Restaurant,Cocktail Bar,Coffee Shop,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop,Dim Sum Restaurant,Exhibit,Fish Market,Food Truck,Fruit & Vegetable Store,Gift Shop,Gym,Hainan Restaurant,Halal Restaurant,Hostel,Hotel,Hotel Bar,Hotel Pool,Hotpot Restaurant,Hunan Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Latin American Restaurant,Lounge,Malay Restaurant,Massage Studio,Middle Eastern Restaurant,Monument / Landmark,Motorcycle Shop,Movie Theater,Multiplex,Museum,Nightclub,Noodle House,Park,Pet Store,Pool,Residential Building (Apartment / Condo),Resort,Restaurant,Seafood Restaurant,Shopping Mall,South Indian Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Stationery Store,Supermarket,Tapas Restaurant,Tea Room,Thai Restaurant,Tourist Information Center,Trail,Udon Restaurant,Vegetarian / Vegan Restaurant,Wine Bar,Women's Store
0,"Anand Lok, Jaipur",0.01,0.0,0.0,0.02,0.01,0.01,0.02,0.0,0.01,0.0,0.05,0.01,0.04,0.02,0.02,0.03,0.01,0.01,0.01,0.02,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.01,0.14,0.02,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.02,0.02,0.01,0.0,0.0,0.01,0.04,0.03,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.02,0.03,0.0,0.01,0.0,0.01,0.01,0.01,0.04,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.02,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.01
1,Bhankrota,0.01,0.0,0.0,0.02,0.01,0.01,0.02,0.0,0.01,0.0,0.05,0.01,0.04,0.02,0.02,0.03,0.01,0.01,0.01,0.02,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.01,0.14,0.02,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.02,0.02,0.01,0.0,0.0,0.01,0.04,0.03,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.02,0.03,0.0,0.01,0.0,0.01,0.01,0.01,0.04,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.02,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.01
2,Chandpole,0.01,0.0,0.0,0.02,0.01,0.01,0.02,0.0,0.01,0.0,0.05,0.01,0.04,0.02,0.02,0.03,0.01,0.01,0.01,0.02,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.01,0.14,0.02,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.02,0.02,0.01,0.0,0.0,0.01,0.04,0.03,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.02,0.03,0.0,0.01,0.0,0.01,0.01,0.01,0.04,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.02,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.01
3,Hawa Mahal,0.01,0.0,0.0,0.02,0.01,0.01,0.02,0.0,0.01,0.0,0.05,0.01,0.04,0.02,0.02,0.03,0.01,0.01,0.01,0.02,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.01,0.14,0.02,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.02,0.02,0.01,0.0,0.0,0.01,0.04,0.03,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.02,0.03,0.0,0.01,0.0,0.01,0.01,0.01,0.04,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.02,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.01
4,Jawahar Circle,0.03,0.02,0.03,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.1,0.0,0.13,0.0,0.04,0.03,0.0,0.0,0.01,0.0,0.0,0.01,0.03,0.01,0.0,0.01,0.0,0.01,0.0,0.02,0.0,0.0,0.02,0.01,0.0,0.03,0.0,0.01,0.01,0.06,0.0,0.01,0.01,0.02,0.0,0.0,0.05,0.01,0.01,0.0,0.01,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.02,0.0,0.03,0.01,0.02,0.02,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0
5,Jhotwara,0.01,0.0,0.0,0.02,0.01,0.01,0.02,0.0,0.01,0.0,0.05,0.01,0.04,0.02,0.02,0.03,0.01,0.01,0.01,0.02,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.01,0.14,0.02,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.02,0.02,0.01,0.0,0.0,0.01,0.04,0.03,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.02,0.03,0.0,0.01,0.0,0.01,0.01,0.01,0.04,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.02,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.01
6,Kathputhli slum,0.01,0.0,0.0,0.02,0.01,0.01,0.02,0.0,0.01,0.0,0.05,0.01,0.04,0.02,0.02,0.03,0.01,0.01,0.01,0.02,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.01,0.14,0.02,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.02,0.02,0.01,0.0,0.0,0.01,0.04,0.03,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.02,0.03,0.0,0.01,0.0,0.01,0.01,0.01,0.04,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.02,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.01
7,Malviya Nagar (Jaipur),0.01,0.0,0.0,0.02,0.01,0.01,0.02,0.0,0.01,0.0,0.05,0.01,0.04,0.02,0.02,0.03,0.01,0.01,0.01,0.02,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.01,0.14,0.02,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.02,0.02,0.01,0.0,0.0,0.01,0.04,0.03,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.02,0.03,0.0,0.01,0.0,0.01,0.01,0.01,0.04,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.02,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.01
8,Mansarovar (Jaipur),0.01,0.0,0.0,0.02,0.01,0.01,0.02,0.0,0.01,0.0,0.05,0.01,0.04,0.02,0.02,0.03,0.01,0.01,0.01,0.02,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.01,0.14,0.02,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.02,0.02,0.01,0.0,0.0,0.01,0.04,0.03,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.02,0.03,0.0,0.01,0.0,0.01,0.01,0.01,0.04,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.02,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.01
9,"Pratap Nagar, Jaipur",0.01,0.0,0.0,0.02,0.01,0.01,0.02,0.0,0.01,0.0,0.05,0.01,0.04,0.02,0.02,0.03,0.01,0.01,0.01,0.02,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.01,0.01,0.14,0.02,0.01,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.02,0.02,0.01,0.0,0.0,0.01,0.04,0.03,0.0,0.0,0.02,0.0,0.0,0.01,0.01,0.0,0.02,0.03,0.0,0.01,0.0,0.01,0.01,0.01,0.04,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.02,0.01,0.01,0.01,0.0,0.01,0.01,0.0,0.01


In [39]:
len(jai_grouped[jai_grouped["Shopping Mall"] > 0])

14

<b>Create a new DataFrame for Shopping Mall data only</b>

In [42]:
jai_mall = jai_grouped[["Neighborhoods","Shopping Mall"]]


In [43]:
jai_mall.head()


Unnamed: 0,Neighborhoods,Shopping Mall
0,"Anand Lok, Jaipur",0.04
1,Bhankrota,0.04
2,Chandpole,0.04
3,Hawa Mahal,0.04
4,Jawahar Circle,0.02


###  k-means to cluster the neighborhoods in Jaipur into 3 clusters

In [44]:
# set number of clusters
kclusters = 3

jai_clustering = jai_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(jai_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

  return_n_iter=True)


array([0, 0, 0, 0, 1, 0, 0, 0, 0, 0])

In [45]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
jai_merged = jai_mall.copy()

# add clustering labels
jai_merged["Cluster Labels"] = kmeans.labels_

In [46]:
jai_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
jai_merged.head()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,"Anand Lok, Jaipur",0.04,0
1,Bhankrota,0.04,0
2,Chandpole,0.04,0
3,Hawa Mahal,0.04,0
4,Jawahar Circle,0.02,1


In [47]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
jai_merged = jai_merged.join(jai_df.set_index("Neighborhood"), on="Neighborhood")

print(jai_merged.shape)
jai_merged.head() # check the last columns!

(14, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,"Anand Lok, Jaipur",0.04,0,3.14789,101.69405
1,Bhankrota,0.04,0,3.14789,101.69405
2,Chandpole,0.04,0,3.14789,101.69405
3,Hawa Mahal,0.04,0,3.14789,101.69405
4,Jawahar Circle,0.02,1,3.08243,101.67083


In [48]:
# sort the results by Cluster Labels
print(jai_merged.shape)
jai_merged.sort_values(["Cluster Labels"], inplace=True)
jai_merged

(14, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,"Anand Lok, Jaipur",0.04,0,3.14789,101.69405
1,Bhankrota,0.04,0,3.14789,101.69405
2,Chandpole,0.04,0,3.14789,101.69405
3,Hawa Mahal,0.04,0,3.14789,101.69405
5,Jhotwara,0.04,0,3.14789,101.69405
6,Kathputhli slum,0.04,0,3.14789,101.69405
7,Malviya Nagar (Jaipur),0.04,0,3.14789,101.69405
8,Mansarovar (Jaipur),0.04,0,3.14789,101.69405
9,"Pratap Nagar, Jaipur",0.04,0,3.14789,101.69405
10,Sanganer,0.04,0,3.14789,101.69405


# Finally, let's visualize the resulting clusters


In [49]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(jai_merged['Latitude'], jai_merged['Longitude'], jai_merged['Neighborhood'], jai_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [50]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

# Examine Clusters

In [52]:
#cluster 0
jai_merged.loc[jai_merged['Cluster Labels'] == 0]


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,"Anand Lok, Jaipur",0.04,0,3.14789,101.69405
1,Bhankrota,0.04,0,3.14789,101.69405
2,Chandpole,0.04,0,3.14789,101.69405
3,Hawa Mahal,0.04,0,3.14789,101.69405
5,Jhotwara,0.04,0,3.14789,101.69405
6,Kathputhli slum,0.04,0,3.14789,101.69405
7,Malviya Nagar (Jaipur),0.04,0,3.14789,101.69405
8,Mansarovar (Jaipur),0.04,0,3.14789,101.69405
9,"Pratap Nagar, Jaipur",0.04,0,3.14789,101.69405
10,Sanganer,0.04,0,3.14789,101.69405


In [54]:
#cluster 1
jai_merged.loc[jai_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
4,Jawahar Circle,0.02,1,3.08243,101.67083


In [55]:
#cluster 2
jai_merged.loc[jai_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude



<b> Observations:</b>


Most of the shopping malls are concentrated in the central area of Jaipur, with the highest number in cluster 0 and moderate number in cluster 1. On the other hand, cluster 2 has very low number to totally no shopping mall in the neighborhoods. This represents a great opportunity and high potential areas to open new shopping malls as there is very little to no competition from existing malls. Meanwhile, shopping malls in cluster 2 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. From another perspective, this also shows that the oversupply of shopping malls mostly happened in the central area of the city, with the suburb area still have very few shopping malls. Therefore, this project recommends property developers to capitalize on these findings to open new shopping malls in neighborhoods in cluster 1 and 2 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighborhoods in cluster 0 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 0 which already have high concentration of shopping malls and suffering from intense competition.