# IBM Applied Data Science Capstone Course 

### Week 5 Final Report

### Opening a New Shopping Mall in Hyderabad,India



#### 
 - Build a dataframe of neighborhoods in Hyderabad,India by web scraping the data from Wikipedia page
 -  Get the geographical coordinates of the neighborhoods
 -  Obtain the venue data for the neighborhoods from Foursquare API
 -  Explore and cluster the neighborhoods
 - Select the best cluster to open a new shopping mall

### Importing Libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
!pip install geocoder
import geocoder
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values


import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# for webscraping import Beautiful Soup 
!pip install bs4
from bs4 import BeautifulSoup

import xml

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')


Collecting package metadata (current_repodata.json): done
Solving environment: done


  current version: 4.8.3
  latest version: 4.8.4

Please update conda by running

    $ conda update -n base -c defaults conda



## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-2.0.0                |     pyh9f0ad1d_0          63 KB  conda-forge
    openssl-1.1.1g             |       h516909a_1         2.1 MB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.2 MB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geo

In [2]:
# send the GET request
data = requests.get("https://commons.wikimedia.org/wiki/Category:Suburbs_of_Hyderabad,_India").text        

In [3]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')


In [4]:
# create a list to store neighborhood data
neighborhoodList = []


In [5]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)


In [6]:
# create a new DataFrame from the list
hyd_df = pd.DataFrame({"Neighborhood": neighborhoodList})

hyd_df.head()

AttributeError: 'NoneType' object has no attribute 'items'

                             Neighborhood
0                  ►  Abids‎ (1 C,  13 F)
1                   ►  Alwal‎ (1 C,  1 F)
2    ►  Ameerpet, Hyderabad‎ (3 C,  21 F)
3  ►  Bandlaguda, Rangareddy‎ (1 C,  2 F)
4          ►  Banjara Hills‎ (3 C,  25 F)

In [7]:
hyd_df

AttributeError: 'NoneType' object has no attribute 'items'

                                    Neighborhood
0                         ►  Abids‎ (1 C,  13 F)
1                          ►  Alwal‎ (1 C,  1 F)
2           ►  Ameerpet, Hyderabad‎ (3 C,  21 F)
3         ►  Bandlaguda, Rangareddy‎ (1 C,  2 F)
4                 ►  Banjara Hills‎ (3 C,  25 F)
5                    ►  Basheerbagh‎ (1 C,  7 F)
6                      ►  Begumpet‎ (5 C,  10 F)
7                             ►  Boduppal‎ (3 F)
8                        ►  Bolarum‎ (3 C,  1 F)
9          ►  Cavalry Barracks, Hyderabad‎ (1 C)
10                        ►  Chikkadpally‎ (7 F)
11                           ►  Dabirpura‎ (1 C)
12                  ►  Dilsukhnagar‎ (1 C,  3 F)
13                           ►  Domalguda‎ (3 C)
14                           ►  Erragadda‎ (3 F)
15                   ►  Gachibowli‎ (4 C,  17 F)
16                       ►  Gajularamaram‎ (2 F)
17                     ►  Ghatkesar‎ (1 C,  2 F)
18                      ►  Golconda‎ (5 C,  4 F)
19                  

In [8]:

# print the number of rows of the dataframe
hyd_df.shape

(54, 1)

###  Getting the geographical coordinates

In [9]:
# define a function to get coordinates

def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Hyderabad, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [10]:
# call the function to get the coordinates, store in a new list using list comprehension

coords = [ get_latlng(neighborhood) for neighborhood in hyd_df["Neighborhood"].tolist() ]

In [11]:
coords

[[17.389800000000037, 78.47658000000007],
 [17.535430000000076, 78.54427000000004],
 [17.43482000000006, 78.44949000000008],
 [17.299820000000068, 78.46495000000004],
 [17.415350000000046, 78.43435000000005],
 [17.40211000000005, 78.47770000000008],
 [17.447290000000066, 78.45396000000005],
 [17.423299982875264, 78.58280001142191],
 [17.536218869427803, 78.2350425425703],
 [17.40893503530367, 78.32674007784891],
 [17.40301000000005, 78.49792000000008],
 [17.40893503530367, 78.32674007784891],
 [17.368570000000034, 78.53515000000004],
 [17.409950000000038, 78.48229000000003],
 [17.45333000000005, 78.43034000000006],
 [17.43181000000004, 78.38636000000008],
 [17.522760000000062, 78.43862000000007],
 [17.46686941076456, 78.24915353871232],
 [17.389410000000055, 78.40406000000007],
 [17.32707000000005, 78.60533000000004],
 [17.448230000000024, 78.37429000000003],
 [17.399230000000045, 78.48073000000005],
 [17.36838000000006, 78.39999000000006],
 [17.42865000000006, 78.39762000000007],
 [17

In [12]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])


In [13]:
# merge the coordinates into the original dataframe
hyd_df['Latitude'] = df_coords['Latitude']
hyd_df['Longitude'] = df_coords['Longitude']


In [14]:

# check the neighborhoods and the coordinates
print(hyd_df.shape)
hyd_df

(54, 3)


AttributeError: 'NoneType' object has no attribute 'items'

                                    Neighborhood   Latitude  Longitude
0                         ►  Abids‎ (1 C,  13 F)  17.389800  78.476580
1                          ►  Alwal‎ (1 C,  1 F)  17.535430  78.544270
2           ►  Ameerpet, Hyderabad‎ (3 C,  21 F)  17.434820  78.449490
3         ►  Bandlaguda, Rangareddy‎ (1 C,  2 F)  17.299820  78.464950
4                 ►  Banjara Hills‎ (3 C,  25 F)  17.415350  78.434350
5                    ►  Basheerbagh‎ (1 C,  7 F)  17.402110  78.477700
6                      ►  Begumpet‎ (5 C,  10 F)  17.447290  78.453960
7                             ►  Boduppal‎ (3 F)  17.423300  78.582800
8                        ►  Bolarum‎ (3 C,  1 F)  17.536219  78.235043
9          ►  Cavalry Barracks, Hyderabad‎ (1 C)  17.408935  78.326740
10                        ►  Chikkadpally‎ (7 F)  17.403010  78.497920
11                           ►  Dabirpura‎ (1 C)  17.408935  78.326740
12                  ►  Dilsukhnagar‎ (1 C,  3 F)  17.368570  78.535150
13    

In [15]:

# save the DataFrame as CSV file
hyd_df.to_csv("hyd_df.csv", index=False)

TypeError: get_handle() got an unexpected keyword argument 'errors'

###  Creating a map of Hyderabad with neighborhoods superimposed on top

In [16]:
# get the coordinates of Kuala Lumpur
address = 'Hyderabad, India'

geolocator = Nominatim(user_agent="my_application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Hyderabad, India {}, {}.'.format(latitude, longitude))


The geograpical coordinates of Hyderabad, India 17.38878595, 78.46106473453146.


In [17]:
# create map of Hyderabad using latitude and longitude values
map_hyd = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(hyd_df['Latitude'], hyd_df['Longitude'], hyd_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_hyd)  
    
map_hyd


In [18]:
# save the map as HTML file
map_hyd.save('map_hyd.html')


###  Use the Foursquare API to explore the neighborhoods

In [20]:
# define Foursquare Credentials and Version
CLIENT_ID = '# your Foursquare ID' 
CLIENT_SECRET = '# your Foursquare Secret' 
VERSION = '20180605' # Foursquare API version


### Let's get the top 100 venues that are within a radius of 2000 meters.



In [21]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(hyd_df['Latitude'], hyd_df['Longitude'], hyd_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [22]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()


(2212, 7)


AttributeError: 'NoneType' object has no attribute 'items'

             Neighborhood  Latitude  Longitude       VenueName  VenueLatitude  \
0  ►  Abids‎ (1 C,  13 F)   17.3898   78.47658         Pragati      17.388088   
1  ►  Abids‎ (1 C,  13 F)   17.3898   78.47658   Santosh Dhaba      17.388485   
2  ►  Abids‎ (1 C,  13 F)   17.3898   78.47658  Mayur Pan Shop      17.388894   
3  ►  Abids‎ (1 C,  13 F)   17.3898   78.47658  Karachi Bakery      17.383454   
4  ►  Abids‎ (1 C,  13 F)   17.3898   78.47658    Ram ki Bandi      17.382398   

   VenueLongitude            VenueCategory  
0       78.481134  South Indian Restaurant  
1       78.479509        Indian Restaurant  
2       78.480578                Juice Bar  
3       78.475075                   Bakery  
4       78.475014               Food Truck  

In [23]:
venues_df

AttributeError: 'NoneType' object has no attribute 'items'

                                      Neighborhood   Latitude  Longitude  \
0                           ►  Abids‎ (1 C,  13 F)  17.389800  78.476580   
1                           ►  Abids‎ (1 C,  13 F)  17.389800  78.476580   
2                           ►  Abids‎ (1 C,  13 F)  17.389800  78.476580   
3                           ►  Abids‎ (1 C,  13 F)  17.389800  78.476580   
4                           ►  Abids‎ (1 C,  13 F)  17.389800  78.476580   
5                           ►  Abids‎ (1 C,  13 F)  17.389800  78.476580   
6                           ►  Abids‎ (1 C,  13 F)  17.389800  78.476580   
7                           ►  Abids‎ (1 C,  13 F)  17.389800  78.476580   
8                           ►  Abids‎ (1 C,  13 F)  17.389800  78.476580   
9                           ►  Abids‎ (1 C,  13 F)  17.389800  78.476580   
10                          ►  Abids‎ (1 C,  13 F)  17.389800  78.476580   
11                          ►  Abids‎ (1 C,  13 F)  17.389800  78.476580   
12          

### Let's check how many venues were returned for each neighorhood

In [24]:
venues_df.groupby(["Neighborhood"]).count()


AttributeError: 'NoneType' object has no attribute 'items'

                                              Latitude  Longitude  VenueName  \
Neighborhood                                                                   
►  Abids‎ (1 C,  13 F)                              83         83         83   
►  Alwal‎ (1 C,  1 F)                                4          4          4   
►  Ameerpet, Hyderabad‎ (3 C,  21 F)               100        100        100   
►  Bandlaguda, Rangareddy‎ (1 C,  2 F)               4          4          4   
►  Banjara Hills‎ (3 C,  25 F)                     100        100        100   
►  Basheerbagh‎ (1 C,  7 F)                        100        100        100   
►  Begumpet‎ (5 C,  10 F)                           50         50         50   
►  Boduppal‎ (3 F)                                   4          4          4   
►  Bolarum‎ (3 C,  1 F)                              2          2          2   
►  Cavalry Barracks, Hyderabad‎ (1 C)               19         19         19   
►  Chikkadpally‎ (7 F)                  

### Let's find out how many unique categories can be curated from all the returned venues

In [25]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))


There are 154 uniques categories.


In [26]:

# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['South Indian Restaurant', 'Indian Restaurant', 'Juice Bar',
       'Bakery', 'Food Truck', 'Hotel', 'Shoe Store', 'Ice Cream Shop',
       'Chaat Place', 'Diner', 'Lounge', 'Neighborhood', 'Burger Joint',
       'Dessert Shop', 'Café', 'Snack Place', 'Science Museum',
       'Chinese Restaurant', 'Mobile Phone Shop', 'Stadium', 'Restaurant',
       'Food', 'Smoke Shop', 'Coffee Shop', 'Fast Food Restaurant',
       'Hotel Bar', 'Breakfast Spot', 'Department Store', 'Bar',
       'Shopping Mall', 'Multiplex', 'Performing Arts Venue',
       'Gaming Cafe', 'Indie Movie Theater', 'Farmers Market',
       'Pizza Place', 'Fried Chicken Joint', 'Hookah Bar',
       'Clothing Store', 'Sandwich Place', 'Food Court', 'Bus Station',
       'Jewelry Store', 'Pharmacy', "Men's Store", 'Golf Course',
       'Asian Restaurant', 'ATM', 'Pub', 'Bookstore'], dtype=object)

In [27]:

# check if the results contain "Shopping Mall"
"Neighborhood" in venues_df['VenueCategory'].unique()

True

### Analyze Each Neighborhood

In [29]:

# one hot encoding
hyd_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
hyd_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [hyd_onehot.columns[-1]] + list(hyd_onehot.columns[:-1])
hyd_onehot = hyd_onehot[fixed_columns]

print(hyd_onehot.shape)
hyd_onehot.head()

(2212, 155)


AttributeError: 'NoneType' object has no attribute 'items'

            Neighborhoods  ATM  Accessories Store  Afghan Restaurant  Airport  \
0  ►  Abids‎ (1 C,  13 F)    0                  0                  0        0   
1  ►  Abids‎ (1 C,  13 F)    0                  0                  0        0   
2  ►  Abids‎ (1 C,  13 F)    0                  0                  0        0   
3  ►  Abids‎ (1 C,  13 F)    0                  0                  0        0   
4  ►  Abids‎ (1 C,  13 F)    0                  0                  0        0   

   Airport Food Court  Airport Service  American Restaurant  Arcade  \
0                   0                0                    0       0   
1                   0                0                    0       0   
2                   0                0                    0       0   
3                   0                0                    0       0   
4                   0                0                    0       0   

   Arts & Crafts Store  Asian Restaurant  Athletics & Sports  Auditorium  \
0         

### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [30]:

hyd_grouped = hyd_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(hyd_grouped.shape)
hyd_grouped

(52, 155)


AttributeError: 'NoneType' object has no attribute 'items'

                                   Neighborhoods       ATM  Accessories Store  \
0                         ►  Abids‎ (1 C,  13 F)  0.000000           0.000000   
1                          ►  Alwal‎ (1 C,  1 F)  0.250000           0.000000   
2           ►  Ameerpet, Hyderabad‎ (3 C,  21 F)  0.000000           0.000000   
3         ►  Bandlaguda, Rangareddy‎ (1 C,  2 F)  0.000000           0.000000   
4                 ►  Banjara Hills‎ (3 C,  25 F)  0.000000           0.000000   
5                    ►  Basheerbagh‎ (1 C,  7 F)  0.000000           0.000000   
6                      ►  Begumpet‎ (5 C,  10 F)  0.000000           0.000000   
7                             ►  Boduppal‎ (3 F)  0.250000           0.000000   
8                        ►  Bolarum‎ (3 C,  1 F)  0.500000           0.000000   
9          ►  Cavalry Barracks, Hyderabad‎ (1 C)  0.000000           0.000000   
10                        ►  Chikkadpally‎ (7 F)  0.000000           0.000000   
11                          

In [31]:
len(hyd_grouped[hyd_grouped["Shopping Mall"] > 0])


22

### Create a new DataFrame for Shopping Mall data only

In [32]:
hyd_mall = hyd_grouped[["Neighborhoods","Shopping Mall"]]

In [33]:

hyd_mall.head()

AttributeError: 'NoneType' object has no attribute 'items'

                            Neighborhoods  Shopping Mall
0                  ►  Abids‎ (1 C,  13 F)       0.012048
1                   ►  Alwal‎ (1 C,  1 F)       0.000000
2    ►  Ameerpet, Hyderabad‎ (3 C,  21 F)       0.020000
3  ►  Bandlaguda, Rangareddy‎ (1 C,  2 F)       0.000000
4          ►  Banjara Hills‎ (3 C,  25 F)       0.020000

In [34]:
hyd_mall

AttributeError: 'NoneType' object has no attribute 'items'

                                   Neighborhoods  Shopping Mall
0                         ►  Abids‎ (1 C,  13 F)       0.012048
1                          ►  Alwal‎ (1 C,  1 F)       0.000000
2           ►  Ameerpet, Hyderabad‎ (3 C,  21 F)       0.020000
3         ►  Bandlaguda, Rangareddy‎ (1 C,  2 F)       0.000000
4                 ►  Banjara Hills‎ (3 C,  25 F)       0.020000
5                    ►  Basheerbagh‎ (1 C,  7 F)       0.010000
6                      ►  Begumpet‎ (5 C,  10 F)       0.000000
7                             ►  Boduppal‎ (3 F)       0.000000
8                        ►  Bolarum‎ (3 C,  1 F)       0.000000
9          ►  Cavalry Barracks, Hyderabad‎ (1 C)       0.000000
10                        ►  Chikkadpally‎ (7 F)       0.016393
11                           ►  Dabirpura‎ (1 C)       0.000000
12                  ►  Dilsukhnagar‎ (1 C,  3 F)       0.058824
13                           ►  Domalguda‎ (3 C)       0.010870
14                           ►  Erragadd

### Cluster Neighborhoods

### Run k-means to cluster the neighborhoods in Visakhapatnam into 3 clusters.



In [35]:
# set number of clusters
hclusters = 3

hyd_clustering = hyd_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=hclusters, random_state=0).fit(hyd_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]


array([2, 0, 2, 0, 2, 2, 0, 0, 0, 0], dtype=int32)

In [36]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
hyd_merged = hyd_mall.copy()

# add clustering labels
hyd_merged["Cluster Labels"] = kmeans.labels_


In [37]:
hyd_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
hyd_merged.head()

AttributeError: 'NoneType' object has no attribute 'items'

                             Neighborhood  Shopping Mall  Cluster Labels
0                  ►  Abids‎ (1 C,  13 F)       0.012048               2
1                   ►  Alwal‎ (1 C,  1 F)       0.000000               0
2    ►  Ameerpet, Hyderabad‎ (3 C,  21 F)       0.020000               2
3  ►  Bandlaguda, Rangareddy‎ (1 C,  2 F)       0.000000               0
4          ►  Banjara Hills‎ (3 C,  25 F)       0.020000               2

In [38]:
# merge hyd_grouped with hyd_data to add latitude/longitude for each neighborhood
hyd_merged = hyd_merged.join(hyd_df.set_index("Neighborhood"), on="Neighborhood")

print(hyd_merged.shape)
hyd_merged.head() # check the last columns!


(52, 5)


AttributeError: 'NoneType' object has no attribute 'items'

                             Neighborhood  Shopping Mall  Cluster Labels  \
0                  ►  Abids‎ (1 C,  13 F)       0.012048               2   
1                   ►  Alwal‎ (1 C,  1 F)       0.000000               0   
2    ►  Ameerpet, Hyderabad‎ (3 C,  21 F)       0.020000               2   
3  ►  Bandlaguda, Rangareddy‎ (1 C,  2 F)       0.000000               0   
4          ►  Banjara Hills‎ (3 C,  25 F)       0.020000               2   

   Latitude  Longitude  
0  17.38980   78.47658  
1  17.53543   78.54427  
2  17.43482   78.44949  
3  17.29982   78.46495  
4  17.41535   78.43435  

In [39]:
# sort the results by Cluster Labels
print(hyd_merged.shape)
hyd_merged.sort_values(["Cluster Labels"], inplace=True)
hyd_merged


(52, 5)


AttributeError: 'NoneType' object has no attribute 'items'

                                    Neighborhood  Shopping Mall  \
51                  ►  Trimulgherry‎ (1 C,  3 F)       0.000000   
36                     ►  Moula-Ali‎ (3 C,  5 F)       0.000000   
34                             ►  Miyapur‎ (5 F)       0.000000   
33                         ►  Mehdipatnam‎ (1 C)       0.000000   
30                    ►  Malkajgiri‎ (3 C,  7 F)       0.000000   
29                      ►  Malakpet‎ (3 C,  2 F)       0.000000   
28                     ►  Madhapur‎ (1 C,  19 F)       0.000000   
27                        ►  L. B. Nagar‎ (16 F)       0.000000   
50                       ►  Tarnaka‎ (1 C,  6 F)       0.000000   
41                     ►  Nizampet‎ (2 C,  32 F)       0.000000   
23                     ►  Kachiguda‎ (1 C,  4 F)       0.000000   
21                       ►  Hydershakote‎ (14 F)       0.000000   
43                      ►  Pedda Amberpet‎ (1 F)       0.000000   
37                      ►  Nacharam‎ (1 C,  4 F)       0.00000

### Finally, let's visualize the resulting clusters



In [42]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(hclusters)
ys = [i+x+(i*x)**2 for i in range(hclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(hyd_merged['Latitude'], hyd_merged['Longitude'], hyd_merged['Neighborhood'], hyd_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [43]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

### Examine Clusters

### Cluster 0

In [44]:
hyd_merged.loc[hyd_merged['Cluster Labels'] == 0]

AttributeError: 'NoneType' object has no attribute 'items'

                              Neighborhood  Shopping Mall  Cluster Labels  \
51            ►  Trimulgherry‎ (1 C,  3 F)            0.0               0   
36               ►  Moula-Ali‎ (3 C,  5 F)            0.0               0   
34                       ►  Miyapur‎ (5 F)            0.0               0   
33                   ►  Mehdipatnam‎ (1 C)            0.0               0   
30              ►  Malkajgiri‎ (3 C,  7 F)            0.0               0   
29                ►  Malakpet‎ (3 C,  2 F)            0.0               0   
28               ►  Madhapur‎ (1 C,  19 F)            0.0               0   
27                  ►  L. B. Nagar‎ (16 F)            0.0               0   
50                 ►  Tarnaka‎ (1 C,  6 F)            0.0               0   
41               ►  Nizampet‎ (2 C,  32 F)            0.0               0   
23               ►  Kachiguda‎ (1 C,  4 F)            0.0               0   
21                 ►  Hydershakote‎ (14 F)            0.0               0   

### Cluster 1

In [45]:
hyd_merged.loc[hyd_merged['Cluster Labels'] == 1]

AttributeError: 'NoneType' object has no attribute 'items'

                    Neighborhood  Shopping Mall  Cluster Labels   Latitude  \
31           ►  Manikonda‎ (8 F)       0.083333               1  17.401390   
17      ►  Golconda‎ (5 C,  4 F)       0.066667               1  17.389410   
12  ►  Dilsukhnagar‎ (1 C,  3 F)       0.058824               1  17.368570   
26         ►  Kukatpally‎ (16 F)       0.100000               1  17.487350   
38   ►  Nagole, Hyderabad‎ (4 F)       0.076923               1  17.372426   

    Longitude  
31  78.391630  
17  78.404060  
12  78.535150  
26  78.420870  
38  78.544543  

### Cluster 2

In [46]:
hyd_merged.loc[hyd_merged['Cluster Labels'] == 2]

AttributeError: 'NoneType' object has no attribute 'items'

                                    Neighborhood  Shopping Mall  \
40                   ►  Narayanguda‎ (1 C,  5 F)       0.018519   
42  ►  Old City (Hyderabad, India)‎ (8 C,  26 F)       0.010101   
49                          ►  Somajiguda‎ (5 F)       0.020000   
39                     ►  Nampally‎ (2 C,  10 F)       0.016129   
0                         ►  Abids‎ (1 C,  13 F)       0.012048   
32                          ►  Masab Tank‎ (4 F)       0.010000   
24                    ►  Khairtabad‎ (1 C,  2 F)       0.010000   
22                 ►  Jubilee Hills‎ (3 C,  8 F)       0.010000   
20                           ►  Hyderguda‎ (2 F)       0.020000   
15                   ►  Gachibowli‎ (4 C,  17 F)       0.010000   
13                           ►  Domalguda‎ (3 C)       0.010870   
10                        ►  Chikkadpally‎ (7 F)       0.016393   
5                    ►  Basheerbagh‎ (1 C,  7 F)       0.010000   
4                 ►  Banjara Hills‎ (3 C,  25 F)       0.02000

### Observation

Most of the shopping malls are concentrated in areas like Manikonda,Golconda,Dilsukhnagar,Kukatpally,Nagole, with the highest number in cluster 1 and moderate number in cluster 2. On the other hand, cluster 0 has very low number to totally no shopping mall in the neighborhoods. This represents a great opportunity and high potential areas to open new shopping malls as there is very little to no competition from existing malls. Meanwhile, shopping malls in cluster 1 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. From another perspective, this also shows that the oversupply of shopping malls mostly happened in the central area of the city, with the suburb area still have very few shopping malls. Therefore, this project recommends property developers to capitalize on these findings to open new shopping malls in neighborhoods in cluster 0 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighborhoods in cluster 2 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 1 which already have high concentration of shopping malls and suffering from intense competition.

