<h1><center>Applied Data Science Capstone Project<center></h1>

<h1><center>Opening a New Shopping Mall in Ahembdabad,India</center></h1>

<ul>
   <li>Build a dataframe of neighborhoods in Ahembdabad,India by web scraping the data from Wikipedia page</li>
   <li>Get the geographical coordinates of the neighborhoods</li>
   <li>Obtain the venue data for the neighborhoods from Foursquare API</li>
   <li>Explore and Cluster the neighborhoods</li>
   <li>Select the best cluster to open a new shopping mall</li>
</ul>

<h3>1.Import Libraries</h3>

In [1]:
import numpy as np # library to handle data in a vectorized manner
!pip install geopy

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!pip install folium
import folium # map rendering library

print("Libraries imported.")

Libraries imported.


<h3> 2. Scrap data from Wikipedia page into a Dataframe </h3>

In [2]:
#send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Ahmedabad").text
soup = BeautifulSoup(data,"html5lib")

In [3]:
# create a list to store neighborhood data
neighborhoodList = []

In [4]:
#append data into the list
for row in soup.find_all("div",class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [5]:
# create a new DataFrame from the list
ad_df = pd.DataFrame({"Neighborhood": neighborhoodList})

ad_df.head()

Unnamed: 0,Neighborhood
0,Agol
1,Ahmedabad Cantonment
2,Alam Roza
3,Ambawadi
4,Amraiwadi


In [6]:
# print the number of rows of the dataframe
ad_df.shape

(80, 1)

<h3>3. Get Geographical Coordinates</h3>

In [7]:
!pip install geocoder
import geocoder




In [8]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        
        g = geocoder.arcgis('{}, Ahemdabad, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords


In [9]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in ad_df["Neighborhood"].tolist() ]


In [10]:
coords

[[23.027760000000058, 72.60027000000008],
 [23.027760000000058, 72.60027000000008],
 [23.002120000000048, 72.54979000000003],
 [23.018850000000043, 72.55441000000008],
 [23.00735000000003, 72.62268000000006],
 [23.011390000000063, 72.51712000000003],
 [23.04708000000005, 72.60481000000004],
 [23.04708000000005, 72.60481000000004],
 [22.84128000000004, 72.45453000000003],
 [23.027760000000058, 72.60027000000008],
 [23.034760000000063, 72.63024000000007],
 [23.00278000000003, 72.57706000000007],
 [22.315900000000056, 72.10697000000005],
 [23.002575410797863, 72.59815911107509],
 [23.159320000000037, 72.01855000000006],
 [23.030320000000074, 72.47247000000004],
 [23.000980000000027, 72.57459000000006],
 [22.806890000000067, 72.42511000000007],
 [23.112140000000068, 72.57989000000003],
 [23.087290000000053, 72.54899000000006],
 [23.956720000000075, 72.70260000000007],
 [23.036070000000052, 72.59213000000005],
 [23.32218000000006, 72.18817000000007],
 [23.022390333701104, 72.57669435394357]

In [11]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [12]:
# merge the coordinates into the original dataframe
ad_df['Latitude'] = df_coords['Latitude']
ad_df['Longitude'] = df_coords['Longitude']

In [13]:
# check the neighborhoods and the coordinates
print(ad_df.shape)
ad_df

(80, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Agol,23.02776,72.60027
1,Ahmedabad Cantonment,23.02776,72.60027
2,Alam Roza,23.00212,72.54979
3,Ambawadi,23.01885,72.55441
4,Amraiwadi,23.00735,72.62268
5,Anand Nagar (Ahmedabad),23.01139,72.51712
6,Asarwa,23.04708,72.60481
7,Asarwa Chakla,23.04708,72.60481
8,Badarkha,22.84128,72.45453
9,Bahiyal,23.02776,72.60027


In [14]:
# save the DataFrame as CSV file
ad_df.to_csv("ad_df.csv", index=False)

<h3>4. Create a map of Ahembdabad with neighborhoods superimposed on top</h3>

In [18]:
# get the coordinates of Ahembdabad
address = 'Ahmedabad'

geolocator = Nominatim(user_agent="adi explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Ahmedabad, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Ahmedabad, India 23.0216238, 72.5797068.


In [19]:
# create map of Ahmedabad using latitude and longitude values
map_ad = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(ad_df['Latitude'], ad_df['Longitude'], ad_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_ad)  
    
map_ad

In [20]:
# save the map as HTML file
map_ad.save('map_ad.html')

<h3>5. Use the Foursquare API to explore the neighborhoods</h3>

In [21]:
# define Foursquare Credentials and Version
CLIENT_ID = '3J5J3L0QYPWJTWAE0IP40ZHRKMCEQFXCDITGEE31ZC4HLCE4' # your Foursquare ID
CLIENT_SECRET = '4CNC1U05POVCI5EFTCFOJ5WVFN1UOHA0RQNMAOXXJOWN0UBG' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 3J5J3L0QYPWJTWAE0IP40ZHRKMCEQFXCDITGEE31ZC4HLCE4
CLIENT_SECRET:4CNC1U05POVCI5EFTCFOJ5WVFN1UOHA0RQNMAOXXJOWN0UBG


<b>Now, let's get the top 100 venues that are within a radius of 2000 meters.</b>

In [22]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(ad_df['Latitude'], ad_df['Longitude'], ad_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [23]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(1703, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Agol,23.02776,72.60027,Manek Chowk Khau Gali,23.023505,72.588539,Snack Place
1,Agol,23.02776,72.60027,Manek Chowk,23.023626,72.588553,Fast Food Restaurant
2,Agol,23.02776,72.60027,Lucky Tea,23.027829,72.581394,Tea Room
3,Agol,23.02776,72.60027,Moti Mahal,23.02912,72.599724,Indian Restaurant
4,Agol,23.02776,72.60027,Jama Masjid,23.024323,72.587042,Historic Site


<b>Let's check how many venues were returned for each neighorhood</b>

In [24]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agol,20,20,20,20,20,20
Ahmedabad Cantonment,20,20,20,20,20,20
Alam Roza,10,10,10,10,10,10
Ambawadi,79,79,79,79,79,79
Amraiwadi,4,4,4,4,4,4
Anand Nagar (Ahmedabad),57,57,57,57,57,57
Asarwa,5,5,5,5,5,5
Asarwa Chakla,5,5,5,5,5,5
Bahiyal,20,20,20,20,20,20
Bapunagar,5,5,5,5,5,5


<b>Let's find out how many unique categories can be curated from all the returned venues</b>

In [25]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 106 uniques categories.


In [26]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Snack Place', 'Fast Food Restaurant', 'Tea Room',
       'Indian Restaurant', 'Historic Site', 'Hotel', 'Ice Cream Shop',
       'Pizza Place', 'Multiplex', 'Train Station',
       'Vegetarian / Vegan Restaurant', 'Clothing Store', 'Shopping Mall',
       'Bus Station', 'Diner', 'Coffee Shop', 'Sandwich Place',
       'Mexican Restaurant', 'Park', 'Café', 'Dessert Shop', 'Bookstore',
       'Arts & Crafts Store', 'Farmers Market', 'Theater', 'Restaurant',
       'Breakfast Spot', 'Arcade', 'Asian Restaurant', 'Bakery',
       'Food Truck', 'Movie Theater', 'BBQ Joint', 'Event Space',
       'American Restaurant', 'Chinese Restaurant', 'Electronics Store',
       'Athletics & Sports', 'Tennis Court', 'Art Gallery',
       'History Museum', 'Market', 'Museum', 'ATM', 'IT Services',
       'Health & Beauty Service', 'Zoo', 'Lake',
       'Comfort Food Restaurant', 'North Indian Restaurant'], dtype=object)

In [27]:
# check if the results contain "Shopping Mall"
"Neighborhood" in venues_df['VenueCategory'].unique()

False

<h3>6. Analyze Each Neighborhood</h3>

In [28]:
# one hot encoding
ad_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
ad_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [ad_onehot.columns[-1]] + list(ad_onehot.columns[:-1])
ad_onehot = ad_onehot[fixed_columns]

print(ad_onehot.shape)
ad_onehot.head()

(1703, 107)


Unnamed: 0,Neighborhoods,ATM,Airport Gate,Airport Service,Airport Terminal,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,BBQ Joint,Bakery,Baseball Field,Bike Rental / Bike Share,Bistro,Bookstore,Breakfast Spot,Buffet,Bus Station,Business Service,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Cricket Ground,Cupcake Shop,Dance Studio,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flower Shop,Food,Food Court,Food Truck,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,General Entertainment,Gourmet Shop,Grocery Store,Gym,Health & Beauty Service,Historic Site,History Museum,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Juice Bar,Lake,Lounge,Market,Mattress Store,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Moroccan Restaurant,Motel,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Museum,North Indian Restaurant,Park,Performing Arts Venue,Pizza Place,Platform,Recreation Center,Restaurant,River,Sandwich Place,Sculpture Garden,Shoe Store,Shopping Mall,Ski Area,Smoke Shop,Snack Place,Spa,Speakeasy,Street Food Gathering,Tea Room,Tennis Court,Theater,Toy / Game Store,Train Station,Tree,Vegetarian / Vegan Restaurant,Video Store,Women's Store,Yoga Studio,Zoo
0,Agol,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Agol,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Agol,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
3,Agol,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Agol,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


<b>Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category</b>

In [31]:
ad_grouped = ad_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(ad_grouped.shape)
ad_grouped

(73, 107)


Unnamed: 0,Neighborhoods,ATM,Airport Gate,Airport Service,Airport Terminal,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,BBQ Joint,Bakery,Baseball Field,Bike Rental / Bike Share,Bistro,Bookstore,Breakfast Spot,Buffet,Bus Station,Business Service,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Cricket Ground,Cupcake Shop,Dance Studio,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flower Shop,Food,Food Court,Food Truck,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,General Entertainment,Gourmet Shop,Grocery Store,Gym,Health & Beauty Service,Historic Site,History Museum,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Juice Bar,Lake,Lounge,Market,Mattress Store,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Moroccan Restaurant,Motel,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Museum,North Indian Restaurant,Park,Performing Arts Venue,Pizza Place,Platform,Recreation Center,Restaurant,River,Sandwich Place,Sculpture Garden,Shoe Store,Shopping Mall,Ski Area,Smoke Shop,Snack Place,Spa,Speakeasy,Street Food Gathering,Tea Room,Tennis Court,Theater,Toy / Game Store,Train Station,Tree,Vegetarian / Vegan Restaurant,Video Store,Women's Store,Yoga Studio,Zoo
0,Agol,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.05,0.0,0.0,0.05,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0
1,Ahmedabad Cantonment,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.05,0.0,0.0,0.05,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0
2,Alam Roza,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0
3,Ambawadi,0.0,0.0,0.0,0.0,0.0,0.012658,0.0,0.012658,0.012658,0.0,0.0,0.0,0.012658,0.0,0.0,0.0,0.012658,0.012658,0.0,0.0,0.0,0.139241,0.0,0.012658,0.037975,0.0,0.0,0.0,0.0,0.0,0.0,0.050633,0.012658,0.0,0.0,0.0,0.0,0.0,0.012658,0.075949,0.0,0.0,0.0,0.037975,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.075949,0.0,0.0,0.025316,0.113924,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037975,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025316,0.0,0.037975,0.0,0.0,0.012658,0.0,0.063291,0.0,0.0,0.025316,0.0,0.0,0.025316,0.0,0.0,0.0,0.050633,0.0,0.025316,0.0,0.0,0.0,0.025316,0.0,0.0,0.0,0.0
4,Amraiwadi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0
5,Anand Nagar (Ahmedabad),0.0,0.0,0.0,0.0,0.017544,0.0,0.017544,0.0,0.0,0.017544,0.0,0.052632,0.017544,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.105263,0.017544,0.017544,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.017544,0.0,0.0,0.0,0.087719,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.017544,0.122807,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.035088,0.0,0.0,0.0,0.0,0.0,0.0,0.017544,0.017544,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0
6,Asarwa,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Asarwa Chakla,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Bahiyal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.05,0.0,0.0,0.05,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.0,0.0
9,Bapunagar,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [32]:
len(ad_grouped[ad_grouped["Shopping Mall"] > 0])

27

<b>Create a new DataFrame for Shopping Mall data only</b>

In [33]:
ad_mall = ad_grouped[["Neighborhoods","Shopping Mall"]]

In [34]:
ad_mall.head()

Unnamed: 0,Neighborhoods,Shopping Mall
0,Agol,0.05
1,Ahmedabad Cantonment,0.05
2,Alam Roza,0.0
3,Ambawadi,0.025316
4,Amraiwadi,0.0


<h3>7. Cluster Neighborhoods</h3>

Run k-means to cluster the neighborhoods in Kuala Lumpur into 3 clusters.

In [36]:
# set number of clusters
kclusters = 3

kl_clustering = ad_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kl_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 2, 0, 0, 0, 2, 0, 0, 2, 0], dtype=int32)

In [37]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
ad_merged = ad_mall.copy()

# add clustering labels
ad_merged["Cluster Labels"] = kmeans.labels_

In [38]:
ad_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
ad_merged.head()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Agol,0.05,2
1,Ahmedabad Cantonment,0.05,2
2,Alam Roza,0.0,0
3,Ambawadi,0.025316,0
4,Amraiwadi,0.0,0


In [40]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
ad_merged = ad_merged.join(ad_df.set_index("Neighborhood"), on="Neighborhood")

print(ad_merged.shape)
ad_merged.head() # check the last columns!

(73, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Agol,0.05,2,23.02776,72.60027
1,Ahmedabad Cantonment,0.05,2,23.02776,72.60027
2,Alam Roza,0.0,0,23.00212,72.54979
3,Ambawadi,0.025316,0,23.01885,72.55441
4,Amraiwadi,0.0,0,23.00735,72.62268


In [41]:
# sort the results by Cluster Labels
print(ad_merged.shape)
ad_merged.sort_values(["Cluster Labels"], inplace=True)
ad_merged

(73, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
36,Kalyanpura (Ahmedabad),0.0,0,23.04764,72.56149
34,Kabirchowk,0.0,0,23.090257,72.585512
39,Khodiyarnagar,0.0,0,23.03435,72.64652
41,Lambha,0.0,0,22.93802,72.58586
42,Makarba,0.0,0,22.99692,72.49837
45,Mithakali,0.022472,0,23.02851,72.56525
46,Motera,0.0,0,23.10319,72.60513
47,Naranpura,0.0,0,23.05506,72.55557
49,Nava Vadaj,0.0,0,23.06024,72.56671
50,Navjivan (Neighbourhood),0.0,0,23.04413,72.56883


<b>Finally, let's visualize the resulting clusters</b>

In [48]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(ad_merged['Latitude'], ad_merged['Longitude'], ad_merged['Neighborhood'], ad_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [43]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

<h3>8. Examine Clusters</h3>

<b>Cluster 0</b>

In [44]:
ad_merged.loc[ad_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
36,Kalyanpura (Ahmedabad),0.0,0,23.04764,72.56149
34,Kabirchowk,0.0,0,23.090257,72.585512
39,Khodiyarnagar,0.0,0,23.03435,72.64652
41,Lambha,0.0,0,22.93802,72.58586
42,Makarba,0.0,0,22.99692,72.49837
45,Mithakali,0.022472,0,23.02851,72.56525
46,Motera,0.0,0,23.10319,72.60513
47,Naranpura,0.0,0,23.05506,72.55557
49,Nava Vadaj,0.0,0,23.06024,72.56671
50,Navjivan (Neighbourhood),0.0,0,23.04413,72.56883


<b>Cluster 1</b>

In [45]:
ad_merged.loc[ad_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
48,Naroda,0.2,1,23.07293,72.65378


<b>Cluster 2</b>

In [46]:
ad_merged.loc[ad_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
20,Ghatlodiya,0.090909,2,23.07275,72.54964
1,Ahmedabad Cantonment,0.05,2,23.02776,72.60027
30,Jholapur,0.05,2,23.02776,72.60027
68,Ujedia,0.05,2,23.02776,72.60027
31,Jivrajpark,0.037037,2,23.0061,72.53149
32,"Jodhpur, Gujarat",0.032609,2,23.02063,72.52522
5,Anand Nagar (Ahmedabad),0.035088,2,23.01139,72.51712
63,Shardanagar,0.047619,2,23.01073,72.55525
25,Gomtipur,0.083333,2,23.01597,72.61082
8,Bahiyal,0.05,2,23.02776,72.60027


<h3>Observations:</h3>


Observations:

Most of the shopping malls are concentrated in the central area of Ahmedabad city, with the highest number in cluster 0 and moderate number in cluster 2. On the other hand, cluster 1 has very low number to totally no shopping mall in the neighborhoods. This represents a great opportunity and high potential areas to open new shopping malls as there is very little to no competition from existing malls. Meanwhile, shopping malls in cluster 2 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. From another perspective, this also shows that the oversupply of shopping malls mostly happened in the central area of the city, with the suburb area still have very few shopping malls. Therefore, this project recommends property developers to capitalize on these findings to open new shopping malls in neighborhoods in cluster 1 and cluster2 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighborhoods in cluster 2 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 0 which already have high concentration of shopping malls and suffering from intense competition.
