# Opening a new Shopping Mall in Hyderabad, India

1) Build a data frame of neighborhoods in Hyderabad, India by web scraping data from Wikipedia page.

2) Get the geographical co-ordinates of the neighboorhoods.

3) Obtain the venue data for the neighborhoods from Foursquare API.

4) Explore and cluster the neighborhoods.

5) select the best cluster to open a new shopping mall.



# 1 Import Libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


# 2 Scrap data from wikipedia page into a DataFrame

In [2]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Hyderabad").text

In [3]:
# parse the data from the html into a Beautiful soup object
soup = BeautifulSoup(data, 'html.parser')

In [4]:
# create a list to store neighborhood data
neighborhoodList = []

In [5]:
# append the data into list
for span in soup.findAll("span", class_="mw-headline"):
    #print(ul.find_all('a'))
    for link in span.findAll('a'):
        #print (link["title"])
        neighborhoodList.append(link["title"])

In [6]:
# create a new DataFrame from List
hyd_df = pd.DataFrame({"Neighborhood" : neighborhoodList})
hyd_df.head()

Unnamed: 0,Neighborhood
0,Ameerpet
1,Sanathnagar
2,Khairatabad
3,Musheerabad
4,Amberpet


In [7]:
# print the number of rows of the data frame
hyd_df.shape

(31, 1)

# 3. Get Geographical co-ordinates

In [8]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Hyderabad, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [10]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [get_latlng(neighborhood) for neighborhood in hyd_df["Neighborhood"]]

In [11]:
coords

[[17.435350000000028, 78.44861000000003],
 [17.45876000000004, 78.44310000000007],
 [17.405920000000037, 78.45856000000003],
 [17.414690000000064, 78.50148000000007],
 [17.385820000000024, 78.51836000000003],
 [17.38897000000003, 78.46733000000006],
 [17.442000000000064, 78.50192000000004],
 [17.460440000000062, 78.49299000000008],
 [17.394870000000026, 78.47076000000004],
 [17.448230000000024, 78.37429000000003],
 [17.42865000000006, 78.39762000000007],
 [17.43181000000004, 78.38636000000008],
 [17.48216000000002, 78.32300000000004],
 [17.48735000000005, 78.42087000000004],
 [17.526770000000056, 78.25234000000006],
 [17.491490000000056, 78.44123000000008],
 [17.505360000000053, 78.46749000000005],
 [17.533180000000073, 78.48102000000006],
 [17.535430000000076, 78.54427000000004],
 [17.447370000000035, 78.53520000000003],
 [17.47840551524693, 78.56366390879944],
 [17.555611015881126, 78.57884813578704],
 [17.415830000000028, 78.56949000000003],
 [17.463219409181434, 78.62115994355071],

In [12]:
# create  temporary data frame to populate the coordinates into latitude and longitude
df_coords = pd.DataFrame(coords, columns = ['Latitude', 'Longitude'])

In [13]:
# merge the coordinates into original data frame
hyd_df['Latitude'] = df_coords['Latitude']
hyd_df['Longitude'] = df_coords['Longitude']

In [14]:
# check the neighborhoods and the coordinates
print(hyd_df.shape)
hyd_df

(31, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Ameerpet,17.43535,78.44861
1,Sanathnagar,17.45876,78.4431
2,Khairatabad,17.40592,78.45856
3,Musheerabad,17.41469,78.50148
4,Amberpet,17.38582,78.51836
5,"Nampally, Hyderabad",17.38897,78.46733
6,Secunderabad,17.442,78.50192
7,Secunderabad Cantonment Board,17.46044,78.49299
8,"Old City, Hyderabad",17.39487,78.47076
9,HITEC City,17.44823,78.37429


In [15]:
# save the dataframe as a csv file
hyd_df.to_csv("hyd_df.csv", index = False)

# 4. Create a map of Hyderabad with neighborhoods superimposed on top

In [17]:
# get the coordinates of Hyderabad
address = 'Hyderabad, India'

geolocator = Nominatim(user_agent="my-app")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Hyderabad, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Hyderabad, India 17.38878595, 78.46106473453146.


In [18]:
# create map of Hyderabad using latitude and longitude values
map_hyd = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(hyd_df['Latitude'], hyd_df['Longitude'], hyd_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_hyd)  
    
map_hyd

In [19]:
# save the map as a html file
map_hyd.save('map_hyd.html')

# 5 Use Foursquare API to explore the neighborhoods

In [20]:
CLIENT_ID = 'YUWOO1JLRIVU0R40QZ05IODMDMDD54DX3DVH5SEXLSZKU5F4' # your Foursquare ID
CLIENT_SECRET = 'KEHYSKHETV4KGPOBLXX3WGGDNR4LGPFKY1Z3WS2PAJPQWZQL' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: YUWOO1JLRIVU0R40QZ05IODMDMDD54DX3DVH5SEXLSZKU5F4
CLIENT_SECRET:KEHYSKHETV4KGPOBLXX3WGGDNR4LGPFKY1Z3WS2PAJPQWZQL


In [57]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(hyd_df['Latitude'], hyd_df['Longitude'], hyd_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

#### Now lets get the top 100 venues that are within a radius of 2000 meters

In [58]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

#define the column Names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']
print(venues_df.shape)
venues_df.head()                     

(1016, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Ameerpet,17.43535,78.44861,Kakatiya Deluxe Mess,17.433435,78.44709,Diner
1,Ameerpet,17.43535,78.44861,Mekong,17.437151,78.454301,Chinese Restaurant
2,Ameerpet,17.43535,78.44861,10 Downing Street,17.435868,78.457443,Pub
3,Ameerpet,17.43535,78.44861,Kebabs & Kurries,17.432374,78.457585,Indian Restaurant
4,Ameerpet,17.43535,78.44861,ITC Kakatiya,17.432514,78.457353,Hotel


In [42]:
venues_df[venues_df["VenueCategory"] == 'Shopping Mall']

Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
25,Ameerpet,17.43535,78.44861,GVK One,17.419411,78.448069,Shopping Mall
56,Ameerpet,17.43535,78.44861,Hyderabad Central,17.426386,78.452786,Shopping Mall
137,Khairatabad,17.40592,78.45856,GVK One,17.419411,78.448069,Shopping Mall
238,Musheerabad,17.41469,78.50148,Mega mart,17.405555,78.496468,Shopping Mall
260,Amberpet,17.38582,78.51836,Vishal Mega Mart,17.396379,78.529467,Shopping Mall
309,"Nampally, Hyderabad",17.38897,78.46733,Brand Factory,17.392367,78.477491,Shopping Mall
479,"Old City, Hyderabad",17.39487,78.47076,Brand Factory,17.392367,78.477491,Shopping Mall
631,Jubilee Hills,17.42865,78.39762,Inorbit Mall,17.43361,78.386207,Shopping Mall
716,Gachibowli,17.43181,78.38636,Inorbit Mall,17.43361,78.386207,Shopping Mall
835,Kukatpally,17.48735,78.42087,Metro mall,17.480453,78.419347,Shopping Mall


#### Let's check how many venues were returned for every neighborhood

In [23]:
venues_df.groupby("Neighborhood").count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alwal,4,4,4,4,4,4
Amberpet,16,16,16,16,16,16
Ameerpet,100,100,100,100,100,100
"Balanagar, Medchal district",4,4,4,4,4,4
Dilsukhnagar,17,17,17,17,17,17
Gachibowli,100,100,100,100,100,100
Ghatkesar,5,5,5,5,5,5
HITEC City,100,100,100,100,100,100
Hayathnagar,4,4,4,4,4,4
Jubilee Hills,100,100,100,100,100,100


#### Let's curate how many unique categories can be curated from all the returned venues

In [25]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 127 uniques categories.


In [39]:
# print out the list of categories
venues_df['VenueCategory'].unique()

array(['Diner', 'Chinese Restaurant', 'Pub', 'Indian Restaurant', 'Hotel',
       'Ice Cream Shop', 'Clothing Store', 'South Indian Restaurant',
       'Fast Food Restaurant', 'Bookstore', 'Performing Arts Venue',
       'Department Store', 'Vegetarian / Vegan Restaurant', 'Multiplex',
       'Bar', 'Bakery', 'Asian Restaurant', 'Shopping Mall', 'Hotel Bar',
       'American Restaurant', 'Coffee Shop', 'Hyderabadi Restaurant',
       'Sandwich Place', 'Nightclub', 'Restaurant', 'Breakfast Spot',
       'Convenience Store', 'Motorcycle Shop', 'Donut Shop',
       'Thai Restaurant', 'Lounge', 'Café', 'Bengali Restaurant',
       'Pizza Place', 'Rajasthani Restaurant', 'Furniture / Home Store',
       'Hookah Bar', 'Cocktail Bar', 'Concert Hall', 'Deli / Bodega',
       'Middle Eastern Restaurant', 'Electronics Store',
       'Indie Movie Theater', 'Metro Station', 'Train Station',
       'Bus Station', 'Snack Place', 'Light Rail Station', 'Bistro',
       'Chaat Place', 'Scenic Lookout',

In [27]:
# check if the results contain "Shopping Mall"
"Shopping Mall" in venues_df['VenueCategory'].unique()

True

# 6. Analyse each neighborhood

In [28]:
# one hot encoding
hyd_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
hyd_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [hyd_onehot.columns[-1]] + list(hyd_onehot.columns[:-1])
hyd_onehot = hyd_onehot[fixed_columns]

print(hyd_onehot.shape)
hyd_onehot.head()

(1016, 128)


Unnamed: 0,Neighborhoods,ATM,Afghan Restaurant,American Restaurant,Arcade,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,BBQ Joint,Bakery,Bar,Beer Garden,Bengali Restaurant,Bistro,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Bridal Shop,Burger Joint,Bus Station,Business Service,Café,Campground,Chaat Place,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Diner,Donut Shop,Dumpling Restaurant,Electronics Store,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Gastropub,General Entertainment,Golf Course,Greek Restaurant,Grocery Store,Gym,Hookah Bar,Hotel,Hotel Bar,Hunan Restaurant,Hyderabadi Restaurant,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Italian Restaurant,Jewelry Store,Juice Bar,Lake,Light Rail Station,Liquor Store,Lounge,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Neighborhood,New American Restaurant,Nightclub,North Indian Restaurant,Office,Park,Parsi Restaurant,Performing Arts Venue,Pharmacy,Pizza Place,Playground,Pub,Rajasthani Restaurant,Restaurant,Sandwich Place,Scenic Lookout,Science Museum,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Social Club,South Indian Restaurant,Spa,Sports Bar,Stadium,Steakhouse,Supermarket,Tea Room,Thai Restaurant,Train Station,Vegetarian / Vegan Restaurant,Women's Store
0,Ameerpet,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Ameerpet,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Ameerpet,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Ameerpet,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Ameerpet,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Now, let's group rows by neighborhood and by taking the mean frequency of occurence of each category

In [31]:
hyd_grouped = hyd_onehot.groupby(["Neighborhoods"]).mean().reset_index()
print(hyd_grouped.shape)
hyd_grouped

(31, 128)


Unnamed: 0,Neighborhoods,ATM,Afghan Restaurant,American Restaurant,Arcade,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,BBQ Joint,Bakery,Bar,Beer Garden,Bengali Restaurant,Bistro,Bookstore,Bowling Alley,Breakfast Spot,Brewery,Bridal Shop,Burger Joint,Bus Station,Business Service,Café,Campground,Chaat Place,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Diner,Donut Shop,Dumpling Restaurant,Electronics Store,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Gastropub,General Entertainment,Golf Course,Greek Restaurant,Grocery Store,Gym,Hookah Bar,Hotel,Hotel Bar,Hunan Restaurant,Hyderabadi Restaurant,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Italian Restaurant,Jewelry Store,Juice Bar,Lake,Light Rail Station,Liquor Store,Lounge,Mattress Store,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Motorcycle Shop,Mountain,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Neighborhood,New American Restaurant,Nightclub,North Indian Restaurant,Office,Park,Parsi Restaurant,Performing Arts Venue,Pharmacy,Pizza Place,Playground,Pub,Rajasthani Restaurant,Restaurant,Sandwich Place,Scenic Lookout,Science Museum,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Social Club,South Indian Restaurant,Spa,Sports Bar,Stadium,Steakhouse,Supermarket,Tea Room,Thai Restaurant,Train Station,Vegetarian / Vegan Restaurant,Women's Store
0,Alwal,0.25,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Amberpet,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1875,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1875,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Ameerpet,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.04,0.03,0.01,0.05,0.0,0.01,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.08,0.01,0.0,0.01,0.02,0.14,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.05,0.0,0.01,0.01,0.01,0.04,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.05,0.0
3,"Balanagar, Medchal district",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Dilsukhnagar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.058824,0.0,0.176471,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Gachibowli,0.0,0.01,0.0,0.0,0.0,0.04,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.01,0.0,0.0,0.09,0.0,0.0,0.01,0.0,0.01,0.04,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.05,0.0,0.01,0.01,0.02,0.17,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.01,0.06,0.0,0.01,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.02,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0
6,Ghatkesar,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0
7,HITEC City,0.0,0.01,0.0,0.0,0.01,0.03,0.01,0.0,0.03,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.07,0.0,0.0,0.02,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.06,0.0,0.0,0.01,0.03,0.14,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.01,0.0,0.07,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0
8,Hayathnagar,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Jubilee Hills,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.02,0.01,0.01,0.01,0.0,0.01,0.0,0.01,0.01,0.03,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.01,0.0,0.03,0.03,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.02,0.03,0.01,0.0,0.0,0.0,0.05,0.11,0.0,0.0,0.03,0.0,0.01,0.0,0.0,0.01,0.05,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.01,0.01,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0


### Create a new DataFrame for shopping mall only

In [32]:
len(hyd_grouped[hyd_grouped["Shopping Mall"] > 0])

11

In [33]:
hyd_mall = hyd_grouped[["Neighborhoods", "Shopping Mall"]]
hyd_mall.head()

Unnamed: 0,Neighborhoods,Shopping Mall
0,Alwal,0.0
1,Amberpet,0.0625
2,Ameerpet,0.02
3,"Balanagar, Medchal district",0.25
4,Dilsukhnagar,0.058824


# 7. Cluster Neighborhoods

Run k-means to cluster the neighborhoods in Hyderabad into 3 clusters 

In [34]:
# set number of clusters
kclusters = 3

hyd_clustering = hyd_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(hyd_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 2, 0, 1, 2, 0, 0, 0, 0, 0])

In [35]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
hyd_merged = hyd_mall.copy()

# add clustering labels
hyd_merged["Cluster Labels"] = kmeans.labels_

In [36]:
hyd_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
hyd_merged.head()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Alwal,0.0,0
1,Amberpet,0.0625,2
2,Ameerpet,0.02,0
3,"Balanagar, Medchal district",0.25,1
4,Dilsukhnagar,0.058824,2


In [37]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
hyd_merged = hyd_merged.join(hyd_df.set_index("Neighborhood"), on="Neighborhood")

print(hyd_merged.shape)
hyd_merged.head() # check the last columns!

(31, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Alwal,0.0,0,17.53543,78.54427
1,Amberpet,0.0625,2,17.38582,78.51836
2,Ameerpet,0.02,0,17.43535,78.44861
3,"Balanagar, Medchal district",0.25,1,17.49149,78.44123
4,Dilsukhnagar,0.058824,2,17.36857,78.53515


In [38]:
# sort the results by Cluster Labels
print(hyd_merged.shape)
hyd_merged.sort_values(["Cluster Labels"], inplace=True)
hyd_merged

(31, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Alwal,0.0,0,17.53543,78.54427
28,Serilingampally,0.0,0,17.48216,78.323
27,Secunderabad Cantonment Board,0.0,0,17.46044,78.49299
26,Secunderabad,0.0,0,17.442,78.50192
25,Saroornagar,0.0,0,17.35442,78.53921
24,Sanathnagar,0.0,0,17.45876,78.4431
23,Rajendranagar mandal,0.0,0,17.3189,78.38076
22,Qutbullapur,0.0,0,17.50536,78.46749
21,Patancheru,0.0,0,17.52677,78.25234
20,"Old City, Hyderabad",0.010101,0,17.39487,78.47076


#### Finally let's visualise the resulting clusters

In [43]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(hyd_merged['Latitude'], hyd_merged['Longitude'], hyd_merged['Neighborhood'], hyd_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [44]:
# save the map as HTML file
map_clusters.save('map_clusters_hyd.html')

# 8. Examine Clusters

Cluster 0

In [45]:
hyd_merged.loc[hyd_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Alwal,0.0,0,17.53543,78.54427
28,Serilingampally,0.0,0,17.48216,78.323
27,Secunderabad Cantonment Board,0.0,0,17.46044,78.49299
26,Secunderabad,0.0,0,17.442,78.50192
25,Saroornagar,0.0,0,17.35442,78.53921
24,Sanathnagar,0.0,0,17.45876,78.4431
23,Rajendranagar mandal,0.0,0,17.3189,78.38076
22,Qutbullapur,0.0,0,17.50536,78.46749
21,Patancheru,0.0,0,17.52677,78.25234
20,"Old City, Hyderabad",0.010101,0,17.39487,78.47076


Cluster 1

In [46]:
hyd_merged.loc[hyd_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
3,"Balanagar, Medchal district",0.25,1,17.49149,78.44123


cluster 2

In [47]:
hyd_merged.loc[hyd_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
4,Dilsukhnagar,0.058824,2,17.36857,78.53515
1,Amberpet,0.0625,2,17.38582,78.51836
14,Kukatpally,0.1,2,17.48735,78.42087


# Observations:
Most of the shopping malls are concentrated in the 2-3 places  of Hyderabad city, with the highest number in cluster 2 and moderate number in cluster 1. On the other hand, cluster 0 has very low number to totally no shopping mall in the neighborhoods. This represents a great opportunity and high potential areas to open new shopping malls as there is very little to no competition from existing malls. Meanwhile, shopping malls in cluster 2 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. Therefore, this project recommends property developers to capitalize on these findings to open new shopping malls in neighborhoods in cluster 0 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighborhoods in cluster 1 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 2 which already have high concentration of shopping malls and suffering from intense competition.
