CAPSTONE PROJECT

Opening a New Shopping Mall in Hyderabad, INDIA
Steps:
    1. Creating a Dataframe of neighborhoods in Hyderabad, India by scraping the data from Wikipedia page.
    2. Getting the geographical coordinates.
    3. Getting the venue data from Foursquare API.
    4. Creating the clusters.
    5. Selecting the best CLuster.

In [1]:
# Importing Libraries
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


In [2]:
# sending the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Hyderabad,_India").text

In [3]:
# parsing the data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [4]:
# create a list to store neighborhood data
neighborhoodList = []

In [5]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [6]:
# create a new DataFrame
hy_df = pd.DataFrame({"Neighborhood": neighborhoodList})

hy_df.head()

Unnamed: 0,Neighborhood
0,A. S. Rao Nagar
1,A.C. Guards
2,Abhyudaya Nagar
3,Abids
4,Adikmet


In [7]:
# print the number of rows of the dataframe
hy_df.shape

(200, 1)

Getting the Geographical Coordinates

In [8]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Hyderabad, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [9]:
# calling the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in hy_df["Neighborhood"].tolist() ]

In [10]:
coords

[[17.411200000000065, 78.50824000000006],
 [17.392977027745946, 78.45686724051741],
 [17.337650000000053, 78.56414000000007],
 [17.389800000000037, 78.47658000000007],
 [17.410610000000077, 78.51513000000006],
 [17.37751000000003, 78.48005000000006],
 [17.387364823969637, 78.4669870622138],
 [17.34259000000003, 78.47626000000008],
 [17.36068000000006, 78.47998000000007],
 [17.503370000000075, 78.41602000000006],
 [17.535430000000076, 78.54427000000004],
 [17.385820000000024, 78.51836000000003],
 [17.435350000000028, 78.44861000000003],
 [17.40784000000002, 78.49150000000003],
 [17.385140000000035, 78.44738000000007],
 [17.369170000000054, 78.43683000000004],
 [17.40710000000007, 78.50233000000003],
 [17.372720000000072, 78.49047000000007],
 [17.38897000000003, 78.48681000000005],
 [17.39931000000007, 78.49964000000006],
 [17.339920000000063, 78.54553000000004],
 [17.448510000000056, 78.44924000000003],
 [17.415350000000046, 78.43435000000005],
 [17.38859199570786, 78.47665099785392],
 

In [11]:
# create temporary dataframe
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [12]:
# merge the coordinates into the original dataframe
hy_df['Latitude'] = df_coords['Latitude']
hy_df['Longitude'] = df_coords['Longitude']

In [13]:
# check the neighborhoods and the coordinates
print(hy_df.shape)
hy_df

(200, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,A. S. Rao Nagar,17.4112,78.50824
1,A.C. Guards,17.392977,78.456867
2,Abhyudaya Nagar,17.33765,78.56414
3,Abids,17.3898,78.47658
4,Adikmet,17.41061,78.51513
5,Afzal Gunj,17.37751,78.48005
6,Aghapura,17.387365,78.466987
7,"Aliabad, Hyderabad",17.34259,78.47626
8,Alijah Kotla,17.36068,78.47998
9,Allwyn Colony,17.50337,78.41602


In [14]:
# saving the DataFrame as CSV file
hy_df.to_csv("hy_df.csv", index=False)

In [None]:
Creating map of Hyderabad

In [16]:
# get the coordinates of Hyderabad
address = 'Hyderabad, India'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Hyderabad, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Hyderabad, India 17.38878595, 78.46106473453146.


In [18]:
# create map of Toronto using latitude and longitude values
map_hy = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(hy_df['Latitude'], hy_df['Longitude'], hy_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_hy)  
    
map_hy

In [19]:
# save the map as HTML file
map_hy.save('map_hy.html')

Using the Foursquare API

In [20]:
# define Foursquare Credentials and Version
CLIENT_ID = 'XXXXXXXXXXXXx' # your Foursquare ID
CLIENT_SECRET = 'XXXXXXXXXXXXXXXX' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

In [22]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(723, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,A. S. Rao Nagar,17.4112,78.50824,Bawarchi,17.406369,78.497662,Indian Restaurant
1,A. S. Rao Nagar,17.4112,78.50824,Subway,17.404173,78.51495,Sandwich Place
2,A. S. Rao Nagar,17.4112,78.50824,Sudharshan Theatre 35mm,17.40653,78.49515,Movie Theater
3,A. S. Rao Nagar,17.4112,78.50824,Devi 70 MM,17.406329,78.495409,Movie Theater
4,A. S. Rao Nagar,17.4112,78.50824,Baskin-Robbins,17.404311,78.510034,Ice Cream Shop


In [23]:
# Checking how many venues were returned for each neighborhood
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A. S. Rao Nagar,20,20,20,20,20,20
A.C. Guards,54,54,54,54,54,54
Abhyudaya Nagar,10,10,10,10,10,10
Abids,82,82,82,82,82,82
Adikmet,19,19,19,19,19,19
Afzal Gunj,44,44,44,44,44,44
Aghapura,53,53,53,53,53,53
"Aliabad, Hyderabad",9,9,9,9,9,9
Alijah Kotla,15,15,15,15,15,15
Allwyn Colony,14,14,14,14,14,14


In [24]:
#Unique categories
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 85 uniques categories.


In [25]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Indian Restaurant', 'Sandwich Place', 'Movie Theater',
       'Ice Cream Shop', 'Coffee Shop', 'Convenience Store', 'Café',
       'Asian Restaurant', 'Chinese Restaurant', 'Bookstore', 'Bakery',
       'Hyderabadi Restaurant', 'Juice Bar', 'Lounge',
       'South Indian Restaurant', 'Park', 'Bistro',
       'Middle Eastern Restaurant', 'Science Museum', 'Snack Place',
       'Hotel Bar', 'Vegetarian / Vegan Restaurant', 'Stadium',
       'Performing Arts Venue', 'Pizza Place', 'Hotel',
       'Fast Food Restaurant', 'Mobile Phone Shop', 'Fried Chicken Joint',
       'Department Store', 'Hookah Bar', 'Electronics Store',
       'Clothing Store', 'Bus Station', 'Pool', 'Dessert Shop',
       'Restaurant', 'Shoe Store', 'Diner', 'Chaat Place', 'Food Truck',
       'Neighborhood', 'Burger Joint', 'Multiplex', 'Smoke Shop',
       'Breakfast Spot', 'Bar', 'Shopping Mall', 'Food',
       'Indie Movie Theater'], dtype=object)

In [26]:
# check if the results contain "Shopping Mall"
"Neighborhood" in venues_df['VenueCategory'].unique()

True

6

In [27]:
# one hot encoding
hy_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
hy_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [hy_onehot.columns[-1]] + list(hy_onehot.columns[:-1])
hy_onehot = hy_onehot[fixed_columns]

print(hy_onehot.shape)
hy_onehot.head()

(723, 86)


Unnamed: 0,Neighborhoods,ATM,American Restaurant,Asian Restaurant,Bakery,Bank,Bar,Bengali Restaurant,Bistro,Bookstore,Breakfast Spot,Burger Joint,Bus Station,Café,Chaat Place,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Electronics Store,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,Fried Chicken Joint,Furniture / Home Store,Gaming Cafe,Garden Center,General Entertainment,Golf Course,Grocery Store,Gym,Historic Site,History Museum,Hookah Bar,Hotel,Hotel Bar,Hyderabadi Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Juice Bar,Lake,Lounge,Middle Eastern Restaurant,Mobile Phone Shop,Monument / Landmark,Motorcycle Shop,Movie Theater,Multiplex,Neighborhood,Nightclub,Park,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Pool,Pub,Rajasthani Restaurant,Resort,Restaurant,Sandwich Place,Science Museum,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,South Indian Restaurant,Stadium,Thai Restaurant,Train Station,Vegetarian / Vegan Restaurant
0,A. S. Rao Nagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,A. S. Rao Nagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
2,A. S. Rao Nagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,A. S. Rao Nagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,A. S. Rao Nagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [28]:
#group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
hy_grouped = hy_onehot.groupby(["Neighborhoods"]).mean().reset_index()
print(hy_grouped.shape)
hy_grouped

(21, 86)


Unnamed: 0,Neighborhoods,ATM,American Restaurant,Asian Restaurant,Bakery,Bank,Bar,Bengali Restaurant,Bistro,Bookstore,Breakfast Spot,Burger Joint,Bus Station,Café,Chaat Place,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Concert Hall,Convenience Store,Cosmetics Shop,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Electronics Store,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,Fried Chicken Joint,Furniture / Home Store,Gaming Cafe,Garden Center,General Entertainment,Golf Course,Grocery Store,Gym,Historic Site,History Museum,Hookah Bar,Hotel,Hotel Bar,Hyderabadi Restaurant,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Juice Bar,Lake,Lounge,Middle Eastern Restaurant,Mobile Phone Shop,Monument / Landmark,Motorcycle Shop,Movie Theater,Multiplex,Neighborhood,Nightclub,Park,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Pool,Pub,Rajasthani Restaurant,Resort,Restaurant,Sandwich Place,Science Museum,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,South Indian Restaurant,Stadium,Thai Restaurant,Train Station,Vegetarian / Vegan Restaurant
0,A. S. Rao Nagar,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,A.C. Guards,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.037037,0.055556,0.0,0.0,0.018519,0.0,0.018519,0.0,0.0,0.0,0.0,0.018519,0.018519,0.0,0.0,0.0,0.018519,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.074074,0.018519,0.037037,0.037037,0.166667,0.0,0.0,0.018519,0.0,0.018519,0.018519,0.018519,0.0,0.0,0.0,0.0,0.0,0.0,0.018519,0.018519,0.0,0.055556,0.0,0.018519,0.0,0.0,0.0,0.0,0.0,0.018519,0.0,0.0,0.0,0.018519,0.037037,0.018519,0.0,0.0,0.037037
2,Abhyudaya Nagar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.3,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Abids,0.0,0.0,0.0,0.04878,0.0,0.012195,0.0,0.0,0.0,0.012195,0.012195,0.02439,0.036585,0.012195,0.04878,0.02439,0.0,0.02439,0.0,0.0,0.012195,0.0,0.012195,0.036585,0.012195,0.0,0.0,0.0,0.012195,0.060976,0.0,0.012195,0.0,0.012195,0.012195,0.012195,0.0,0.012195,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012195,0.04878,0.012195,0.0,0.060976,0.121951,0.012195,0.0,0.02439,0.0,0.012195,0.0,0.012195,0.0,0.0,0.0,0.012195,0.012195,0.0,0.0,0.012195,0.0,0.02439,0.0,0.0,0.0,0.0,0.0,0.036585,0.012195,0.012195,0.012195,0.012195,0.012195,0.02439,0.02439,0.012195,0.0,0.0,0.0
4,Adikmet,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.052632,0.0,0.052632,0.0,0.0,0.105263,0.0,0.105263,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.315789,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0
5,Afzal Gunj,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.022727,0.022727,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.022727,0.022727,0.045455,0.0,0.0,0.0,0.022727,0.045455,0.0,0.022727,0.0,0.022727,0.022727,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.068182,0.0,0.0,0.022727,0.068182,0.022727,0.0,0.045455,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.022727,0.0,0.022727,0.068182,0.0,0.0,0.022727,0.0
6,Aghapura,0.0,0.0,0.0,0.056604,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037736,0.037736,0.018868,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.018868,0.018868,0.0,0.0,0.0,0.018868,0.037736,0.0,0.018868,0.0,0.0,0.018868,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09434,0.018868,0.037736,0.037736,0.132075,0.0,0.0,0.037736,0.0,0.018868,0.018868,0.018868,0.0,0.0,0.0,0.0,0.018868,0.0,0.0,0.018868,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.018868,0.018868,0.018868,0.018868,0.037736,0.056604,0.018868,0.0,0.0,0.0
7,"Aliabad, Hyderabad",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.111111,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.111111,0.0
8,Alijah Kotla,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0,0.0,0.066667,0.0
9,Allwyn Colony,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.142857,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.071429,0.214286,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [29]:
len(hy_grouped[hy_grouped["Shopping Mall"] > 0])

8

In [30]:
# New dataframe
hy_mall = hy_grouped[["Neighborhoods","Shopping Mall"]]

In [31]:
hy_mall.head()

Unnamed: 0,Neighborhoods,Shopping Mall
0,A. S. Rao Nagar,0.0
1,A.C. Guards,0.0
2,Abhyudaya Nagar,0.0
3,Abids,0.012195
4,Adikmet,0.0


7

In [33]:
# set number of clusters
kclusters = 3

hy_clustering = hy_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(hy_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 2, 2, 2, 2, 2, 2, 2, 2, 0])

In [34]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
hy_merged = hy_mall.copy()

# add clustering labels
hy_merged["Cluster Labels"] = kmeans.labels_

In [35]:
hy_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
hy_merged.head()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,A. S. Rao Nagar,0.0,2
1,A.C. Guards,0.0,2
2,Abhyudaya Nagar,0.0,2
3,Abids,0.012195,2
4,Adikmet,0.0,2


In [38]:
# merge grouped data to add latitude/longitude for each neighborhood
hy_merged = hy_merged.join(hy_df.set_index("Neighborhood"), on="Neighborhood")

print(hy_merged.shape)
hy_merged.head() # check the last columns!

ValueError: columns overlap but no suffix specified: Index(['Latitude', 'Longitude'], dtype='object')

In [39]:
# sort the results by Cluster names
print(hy_merged.shape)
hy_merged.sort_values(["Cluster Labels"], inplace=True)
hy_merged

(21, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
11,Amberpet,0.055556,0,17.38582,78.51836
9,Allwyn Colony,0.071429,0,17.50337,78.41602
15,Attapur,0.166667,1,17.36917,78.43683
0,A. S. Rao Nagar,0.0,2,17.4112,78.50824
18,Badichowdi,0.014085,2,17.38897,78.48681
17,Azampura,0.0,2,17.37272,78.49047
16,"Azamabad, Hyderabad",0.0,2,17.4071,78.50233
14,Asif Nagar,0.0,2,17.38514,78.44738
13,"Ashok Nagar, Hyderabad",0.0,2,17.40784,78.4915
12,Ameerpet,0.02,2,17.43535,78.44861


In [41]:
# creating map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color parameters for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# adding points on map
markers_colors = []
for lat, lon, poi, cluster in zip(hy_merged['Latitude'], hy_merged['Longitude'], hy_merged['Neighborhood'], hy_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [42]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

Examining Clusters
CLuster 0

In [43]:
hy_merged.loc[hy_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
11,Amberpet,0.055556,0,17.38582,78.51836
9,Allwyn Colony,0.071429,0,17.50337,78.41602


Cluster 1

In [45]:
hy_merged.loc[hy_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
15,Attapur,0.166667,1,17.36917,78.43683


Cluster 2

In [46]:
hy_merged.loc[hy_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,A. S. Rao Nagar,0.0,2,17.4112,78.50824
18,Badichowdi,0.014085,2,17.38897,78.48681
17,Azampura,0.0,2,17.37272,78.49047
16,"Azamabad, Hyderabad",0.0,2,17.4071,78.50233
14,Asif Nagar,0.0,2,17.38514,78.44738
13,"Ashok Nagar, Hyderabad",0.0,2,17.40784,78.4915
12,Ameerpet,0.02,2,17.43535,78.44861
10,Alwal,0.0,2,17.53543,78.54427
8,Alijah Kotla,0.0,2,17.36068,78.47998
7,"Aliabad, Hyderabad",0.0,2,17.34259,78.47626


Results:
    As we can see that there are many shopping malls in the middle of Hyderabad city. The highest number in the cluster 2 shows us the information. Also we can see from Cluster 1, it has only one number. We can see that in the cluster 2 there are many shopping malls and there will be more competition in that area. So we can try to think about opening a new shopping mall in the region of Cluster 1 as there will be little competitors. We can also try to look at the area of Cluster 0 in addition with Cluster 1 to open in new areas. 