# IBM Applied Data Science Capstone Course by Coursera

### Week 5 Final Report

#### Opening a New Shopping Mall in Vijayawada, India
•	Build a dataframe of neighborhoods in Vijayawada, India by web scraping the data from Wikipedia page

•	Get the geographical coordinates of the neighborhoods

•	Obtain the venue data for the neighborhoods from Foursquare API

•	Explore and cluster the neighborhoods

•	Select the best cluster to open a new shopping mall
________________________________________

### Import the libraries

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

import msgpack

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
!pip install folium
import folium # map rendering library

!conda install -c conda-forge geocoder -y

from bs4 import BeautifulSoup

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geopy-1.20.0               |             py_0          57 KB  conda-forge
    certifi-2019.6.16          |           py36_1         149 KB  conda-forge
    openssl-1.1.1c             |       h516909a_0         2.1 MB  conda-forge
    geographiclib-1.49         |             py_0          32 KB  conda-forge
    ca-certificates-2019.6.16  |       hecc5488_0         145 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         2.5 MB

The following NEW packages will be INSTALLED:

    geographiclib:   1.49-py_0         conda-forge
    geopy:           1.20.0-py_0       conda-forge

The following packages will be UPDATED:

    ca-

### Scrap data from Wikipedia page into a DataFrame

In [3]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Vijayawada").text

# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [4]:
# create a list to store neighborhood data
neighborhoodList = []

# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

# create a new DataFrame from the list
vja_df = pd.DataFrame({"Neighborhood": neighborhoodList})

vja_df.head()

Unnamed: 0,Neighborhood
0,Benz Circle
1,Enikepadu
2,Ganguru
3,"Gollapudi, Vijayawada"
4,Kesarapalle


In [5]:
vja_df.shape

(16, 1)

### Get the Geographical Coordinates

In [6]:
!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim 


import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium

Solving environment: done

# All requested packages already installed.

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    altair-3.2.0               |           py36_0         770 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.3.1               |             py_0          25 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         868 KB

The following NEW packages will be INSTALLED:

    altair:  3.2.0-py36_0 conda-forge
    branca:  0.3.1-py_0   conda-forge
    folium:  0.5.0-py_0   conda-forge
    vincent: 0.4.4-py_1   conda-forg

In [7]:
!pip install folium
import folium # map rendering library

!conda install -c conda-forge geocoder -y

Solving environment: done

# All requested packages already installed.



In [13]:
import geocoder

In [14]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Vijayawada, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [15]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in vja_df["Neighborhood"].tolist() ]

coords

[[16.497837779275173, 80.65382324581786],
 [16.516200000000026, 80.70292000000006],
 [16.50257000000005, 80.63977000000006],
 [16.539025425919522, 80.59457534973491],
 [16.521190000000047, 80.77511000000004],
 [16.496614367193818, 80.64541958527649],
 [16.507790000000057, 80.72021000000007],
 [16.56501000000003, 80.67774000000009],
 [16.51207000000005, 80.61318000000006],
 [16.49526000000003, 80.66131000000007],
 [16.47108000000003, 80.72092000000004],
 [16.480680000000064, 80.70790000000005],
 [16.521650000000022, 80.68870000000004],
 [16.525040000000047, 80.68221000000005],
 [16.523698567745893, 80.67692134613942],
 [16.47521000000006, 80.69824000000006]]

In [19]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

# merge the coordinates into the original dataframe
vja_df['Latitude'] = df_coords['Latitude']
vja_df['Longitude'] = df_coords['Longitude']

# check the neighborhoods and the coordinates
print(vja_df.shape)

(16, 3)


In [20]:
vja_df

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Benz Circle,16.497838,80.653823
1,Enikepadu,16.5162,80.70292
2,Ganguru,16.50257,80.63977
3,"Gollapudi, Vijayawada",16.539025,80.594575
4,Kesarapalle,16.52119,80.77511
5,Mogalrajapuram,16.496614,80.64542
6,"Nidamanuru, Krishna district",16.50779,80.72021
7,Nunna,16.56501,80.67774
8,"One Town, Vijayawada",16.51207,80.61318
9,Patamata,16.49526,80.66131


### Create a map of Vijayawada with neighborhoods superimposed on top

In [16]:
# get the coordinates of Vijayawada
address = 'Vijayawada, India'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Vijayawada, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Vijayawada, India 16.5087586, 80.6185102.


In [21]:
# create map of Vijayawada using latitude and longitude values
map_vja = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(vja_df['Latitude'], vja_df['Longitude'], vja_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_vja)  
    
map_vja

# save the map as HTML file
map_vja.save('map_vja.html')

In [22]:
map_vja

### Use the Foursquare API to explore the neighborhoods

In [23]:
# define Foursquare Credentials and Version
CLIENT_ID = '02Q0IF34U50AKKBCRWVX5SQFTBF25XVNK41GFMXF5WA5C0YB' # your Foursquare ID
CLIENT_SECRET = 'TXGQTGVWI1RDLGHGG0EHHRB2IIX55WJWVT3PPEQQAGJC1H0V' # your Foursquare Secret
VERSION = '20181130' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 02Q0IF34U50AKKBCRWVX5SQFTBF25XVNK41GFMXF5WA5C0YB
CLIENT_SECRET:TXGQTGVWI1RDLGHGG0EHHRB2IIX55WJWVT3PPEQQAGJC1H0V


#### Now, let's get the top 100 venues that are within a radius of 2000 meters.

In [26]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(vja_df['Latitude'], vja_df['Longitude'], vja_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(257, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Benz Circle,16.497838,80.653823,Aptronix,16.497294,80.656273,Electronics Store
1,Benz Circle,16.497838,80.653823,Kitkat,16.501053,80.65767,Bakery
2,Benz Circle,16.497838,80.653823,Baskin Robbins,16.503227,80.648265,Ice Cream Shop
3,Benz Circle,16.497838,80.653823,"SSS- Idly, Policlinic Road.",16.501898,80.652812,Breakfast Spot
4,Benz Circle,16.497838,80.653823,Talwalkers Better Value Fitness,16.500125,80.656548,Gym


In [27]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Benz Circle,51,51,51,51,51,51
Enikepadu,5,5,5,5,5,5
Ganguru,51,51,51,51,51,51
"Gollapudi, Vijayawada",5,5,5,5,5,5
Kesarapalle,4,4,4,4,4,4
Mogalrajapuram,41,41,41,41,41,41
"Nidamanuru, Krishna district",6,6,6,6,6,6
Nunna,2,2,2,2,2,2
"One Town, Vijayawada",14,14,14,14,14,14
Patamata,48,48,48,48,48,48


In [28]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

There are 44 uniques categories.


array(['Electronics Store', 'Bakery', 'Ice Cream Shop', 'Breakfast Spot',
       'Gym', 'Clothing Store', 'Fast Food Restaurant', 'Multiplex',
       'Coffee Shop', 'Vegetarian / Vegan Restaurant', 'Pizza Place',
       'Hotel', 'Shopping Mall', 'Restaurant', 'Café',
       'Indian Restaurant', 'Mediterranean Restaurant', 'Pub',
       'Outdoors & Recreation', 'Hotel Bar', 'Chocolate Shop',
       'Smoke Shop', 'Sporting Goods Shop', 'Convenience Store',
       'Train Station', 'Diner', 'Movie Theater', 'Market',
       'Department Store', 'Asian Restaurant', 'Bookstore', 'Pharmacy',
       'Airport Terminal', 'Airport Service', 'Airport Food Court',
       'Building', 'Bus Station', 'Warehouse Store', 'Bed & Breakfast',
       'Food Court', 'Halal Restaurant', 'Garden Center', 'Playground',
       'Andhra Restaurant'], dtype=object)

In [29]:
# check if the results contain "Shopping Mall"
"Neighborhood" in venues_df['VenueCategory'].unique()

False

### Analyze Each Neighborhood

In [30]:
# one hot encoding
vja_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
vja_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [vja_onehot.columns[-1]] + list(vja_onehot.columns[:-1])
vja_onehot = vja_onehot[fixed_columns]

print(vja_onehot.shape)
vja_onehot.head()

(257, 45)


Unnamed: 0,Neighborhoods,Airport Food Court,Airport Service,Airport Terminal,Andhra Restaurant,Asian Restaurant,Bakery,Bed & Breakfast,Bookstore,Breakfast Spot,Building,Bus Station,Café,Chocolate Shop,Clothing Store,Coffee Shop,Convenience Store,Department Store,Diner,Electronics Store,Fast Food Restaurant,Food Court,Garden Center,Gym,Halal Restaurant,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Market,Mediterranean Restaurant,Movie Theater,Multiplex,Outdoors & Recreation,Pharmacy,Pizza Place,Playground,Pub,Restaurant,Shopping Mall,Smoke Shop,Sporting Goods Shop,Train Station,Vegetarian / Vegan Restaurant,Warehouse Store
0,Benz Circle,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Benz Circle,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Benz Circle,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Benz Circle,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Benz Circle,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Next, let's group the rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [31]:
vja_grouped = vja_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(vja_grouped.shape)
vja_grouped

(16, 45)


Unnamed: 0,Neighborhoods,Airport Food Court,Airport Service,Airport Terminal,Andhra Restaurant,Asian Restaurant,Bakery,Bed & Breakfast,Bookstore,Breakfast Spot,Building,Bus Station,Café,Chocolate Shop,Clothing Store,Coffee Shop,Convenience Store,Department Store,Diner,Electronics Store,Fast Food Restaurant,Food Court,Garden Center,Gym,Halal Restaurant,Hotel,Hotel Bar,Ice Cream Shop,Indian Restaurant,Market,Mediterranean Restaurant,Movie Theater,Multiplex,Outdoors & Recreation,Pharmacy,Pizza Place,Playground,Pub,Restaurant,Shopping Mall,Smoke Shop,Sporting Goods Shop,Train Station,Vegetarian / Vegan Restaurant,Warehouse Store
0,Benz Circle,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.019608,0.0,0.0,0.078431,0.019608,0.019608,0.098039,0.0,0.0,0.0,0.019608,0.078431,0.0,0.0,0.019608,0.0,0.039216,0.019608,0.058824,0.137255,0.0,0.019608,0.0,0.117647,0.019608,0.0,0.058824,0.0,0.019608,0.019608,0.058824,0.019608,0.0,0.0,0.019608,0.0
1,Enikepadu,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.0
2,Ganguru,0.0,0.0,0.0,0.0,0.019608,0.039216,0.0,0.019608,0.019608,0.0,0.0,0.078431,0.0,0.019608,0.058824,0.0,0.019608,0.019608,0.039216,0.058824,0.0,0.0,0.019608,0.0,0.058824,0.019608,0.058824,0.176471,0.039216,0.019608,0.019608,0.078431,0.019608,0.0,0.019608,0.0,0.019608,0.019608,0.039216,0.0,0.0,0.0,0.0,0.0
3,"Gollapudi, Vijayawada",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Kesarapalle,0.25,0.5,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Mogalrajapuram,0.0,0.0,0.0,0.0,0.0,0.04878,0.0,0.0,0.02439,0.0,0.0,0.097561,0.0,0.02439,0.073171,0.0,0.0,0.0,0.02439,0.097561,0.0,0.0,0.02439,0.0,0.073171,0.02439,0.073171,0.146341,0.0,0.02439,0.0,0.073171,0.02439,0.0,0.04878,0.0,0.02439,0.02439,0.04878,0.0,0.0,0.0,0.0,0.0
6,"Nidamanuru, Krishna district",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.166667,0.0,0.0
7,Nunna,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5
8,"One Town, Vijayawada",0.0,0.0,0.0,0.0,0.142857,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.214286,0.0,0.0,0.285714,0.071429,0.0,0.071429,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Patamata,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.020833,0.0,0.0,0.083333,0.020833,0.020833,0.083333,0.0,0.0,0.0,0.020833,0.083333,0.020833,0.0,0.020833,0.0,0.041667,0.0,0.041667,0.166667,0.0,0.020833,0.020833,0.104167,0.020833,0.0,0.0625,0.0,0.020833,0.0,0.041667,0.020833,0.0,0.0,0.020833,0.0


In [33]:
len(vja_grouped[vja_grouped["Shopping Mall"] > 0])

4

### Create a new DataFrame for Shopping Mall data only

In [34]:
vja_mall = vja_grouped[["Neighborhoods","Shopping Mall"]]

vja_mall.head()

Unnamed: 0,Neighborhoods,Shopping Mall
0,Benz Circle,0.058824
1,Enikepadu,0.0
2,Ganguru,0.039216
3,"Gollapudi, Vijayawada",0.0
4,Kesarapalle,0.0


### Cluster Neighborhoods

#### Run k-means to cluster the neighborhoods in Vijayawada into 3 clusters.

In [35]:
# set number of clusters
kclusters = 3

vja_clustering = vja_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(vja_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 0, 1, 0, 0, 1, 0, 0, 0, 1], dtype=int32)

In [36]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
vja_merged = vja_mall.copy()

# add clustering labels
vja_merged["Cluster Labels"] = kmeans.labels_

vja_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
vja_merged.head()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Benz Circle,0.058824,2
1,Enikepadu,0.0,0
2,Ganguru,0.039216,1
3,"Gollapudi, Vijayawada",0.0,0
4,Kesarapalle,0.0,0


In [37]:
# merge vja_grouped with vja_data to add latitude/longitude for each neighborhood
vja_merged = vja_merged.join(vja_df.set_index("Neighborhood"), on="Neighborhood")

print(vja_merged.shape)
vja_merged.head() # check the last columns!

(16, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Benz Circle,0.058824,2,16.497838,80.653823
1,Enikepadu,0.0,0,16.5162,80.70292
2,Ganguru,0.039216,1,16.50257,80.63977
3,"Gollapudi, Vijayawada",0.0,0,16.539025,80.594575
4,Kesarapalle,0.0,0,16.52119,80.77511


In [38]:
# sort the results by Cluster Labels
print(vja_merged.shape)
vja_merged.sort_values(["Cluster Labels"], inplace=True)
vja_merged

(16, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
1,Enikepadu,0.0,0,16.5162,80.70292
3,"Gollapudi, Vijayawada",0.0,0,16.539025,80.594575
4,Kesarapalle,0.0,0,16.52119,80.77511
6,"Nidamanuru, Krishna district",0.0,0,16.50779,80.72021
7,Nunna,0.0,0,16.56501,80.67774
8,"One Town, Vijayawada",0.0,0,16.51207,80.61318
10,Penamaluru,0.0,0,16.47108,80.72092
11,Poranki,0.0,0,16.48068,80.7079
12,Prasadampadu,0.0,0,16.52165,80.6887
13,Ramavarappadu,0.0,0,16.52504,80.68221


### Finally, let's visualize the resulting clusters

In [39]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(vja_merged['Latitude'], vja_merged['Longitude'], vja_merged['Neighborhood'], vja_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [40]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

### Examine the Clusters

## Cluster 0

In [41]:
vja_merged.loc[vja_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
1,Enikepadu,0.0,0,16.5162,80.70292
3,"Gollapudi, Vijayawada",0.0,0,16.539025,80.594575
4,Kesarapalle,0.0,0,16.52119,80.77511
6,"Nidamanuru, Krishna district",0.0,0,16.50779,80.72021
7,Nunna,0.0,0,16.56501,80.67774
8,"One Town, Vijayawada",0.0,0,16.51207,80.61318
10,Penamaluru,0.0,0,16.47108,80.72092
11,Poranki,0.0,0,16.48068,80.7079
12,Prasadampadu,0.0,0,16.52165,80.6887
13,Ramavarappadu,0.0,0,16.52504,80.68221


## Cluster 1

In [42]:
vja_merged.loc[vja_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
2,Ganguru,0.039216,1,16.50257,80.63977
5,Mogalrajapuram,0.04878,1,16.496614,80.64542
9,Patamata,0.041667,1,16.49526,80.66131


In [43]:
vja_merged.loc[vja_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Benz Circle,0.058824,2,16.497838,80.653823


## Observations:

Most of the shopping malls are concentrated in the central area of Vijayawada City, with the highest number in cluster 2 and moderate number in cluster 1. On the other hand, cluster 0 has very low number to totally no shopping mall in the neighborhoods. This represents a great opportunity and high potential areas to open new shopping malls as there is very little to no competition from existing malls. Meanwhile, shopping malls in cluster 2 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. From another perspective, this also shows that the oversupply of shopping malls mostly happened in the central area of the city, with the suburb area still have very few shopping malls. Therefore, this project recommends property developers to capitalize on these findings to open new shopping malls in neighborhoods in cluster 0 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighborhoods in cluster 1 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 2 which already have high concentration of shopping malls and suffering from intense competition.