# IBM Applied Data Science Capstone Course by Coursera

## Week 5 Final project

### opning a New Shopping Mall in Hyderabad,India.

     -Build a dataframe of neighborhoods in Hyderabad,india by web scraping  
     -the data from Wikipedia page
     -Get the geographical coordinates of the neighborhoods
     -Obtain the venue data for the neighborhoods from Foursquare API
     -Explore and cluster the neighborhoods
     -Select the best cluster to open a new shopping mall

## Import libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


## 2. Scrap data from Wikipedia page into a DataFrame

In [2]:
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Hyderabad,_India").text


In [3]:
soup = BeautifulSoup(data, 'html.parser')


In [4]:
neighborhoodList = []


In [5]:
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [6]:
hyd_df = pd.DataFrame({"Neighborhood": neighborhoodList})

hyd_df.head()

Unnamed: 0,Neighborhood
0,A. S. Rao Nagar
1,A.C. Guards
2,Abhyudaya Nagar
3,Abids
4,Adikmet


In [8]:
hyd_df.shape

(200, 1)

## 3. Get the geographical coordinates

In [9]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Hyderabad,India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [12]:
coords = [ get_latlng(neighborhood) for neighborhood in hyd_df["Neighborhood"].tolist() ]


In [15]:
coords

[[17.411200000000065, 78.50824000000006],
 [17.392977027745946, 78.45686724051741],
 [17.337650000000053, 78.56414000000007],
 [17.389800000000037, 78.47658000000007],
 [17.410610000000077, 78.51513000000006],
 [17.37751000000003, 78.48005000000006],
 [17.387364823969637, 78.4669870622138],
 [17.34259000000003, 78.47626000000008],
 [17.36068000000006, 78.47998000000007],
 [17.503370000000075, 78.41602000000006],
 [17.535430000000076, 78.54427000000004],
 [17.385820000000024, 78.51836000000003],
 [17.435350000000028, 78.44861000000003],
 [17.40784000000002, 78.49150000000003],
 [17.385140000000035, 78.44738000000007],
 [17.369170000000054, 78.43683000000004],
 [17.40710000000007, 78.50233000000003],
 [17.372720000000072, 78.49047000000007],
 [17.38897000000003, 78.48681000000005],
 [17.39931000000007, 78.49964000000006],
 [17.339920000000063, 78.54553000000004],
 [17.448510000000056, 78.44924000000003],
 [17.415350000000046, 78.43435000000005],
 [17.38859199570786, 78.47665099785392],
 

In [16]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [17]:

# merge the coordinates into the original dataframe
hyd_df['Latitude'] = df_coords['Latitude']
hyd_df['Longitude'] = df_coords['Longitude']

In [18]:
print(hyd_df.shape)
hyd_df

(200, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,A. S. Rao Nagar,17.4112,78.50824
1,A.C. Guards,17.392977,78.456867
2,Abhyudaya Nagar,17.33765,78.56414
3,Abids,17.3898,78.47658
4,Adikmet,17.41061,78.51513
5,Afzal Gunj,17.37751,78.48005
6,Aghapura,17.387365,78.466987
7,"Aliabad, Hyderabad",17.34259,78.47626
8,Alijah Kotla,17.36068,78.47998
9,Allwyn Colony,17.50337,78.41602


In [20]:
hyd_df.to_csv("hyd_df.csv", index=False)

## 4. Create a map of Hyderabad with neighborhoods superimposed on top

In [21]:
address = 'Hyderabad,India'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Hyderabad,India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Hyderabad,India 17.38878595, 78.46106473453146.


In [22]:
map_hyd = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(hyd_df['Latitude'], hyd_df['Longitude'], hyd_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_hyd)  
    
map_hyd

In [23]:
map_hyd.save('map_hyd.html')


##  5. Use the Foursquare API to explore the neighborhoods

In [24]:
# define Foursquare Credentials and Version
CLIENT_ID = 'XFAAOSUZI11JZFHTJQLJ5XE3GUTKUA5VN0R0YO2GBNQLMV4U' # your Foursquare ID
CLIENT_SECRET = '1TDU4WKMHR0GPNOSJSZETMTZHBR3WBB1HB0LZFGUT1K4LA0S' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XFAAOSUZI11JZFHTJQLJ5XE3GUTKUA5VN0R0YO2GBNQLMV4U
CLIENT_SECRET:1TDU4WKMHR0GPNOSJSZETMTZHBR3WBB1HB0LZFGUT1K4LA0S


In [26]:
LIMIT = 100
radius = 2000

venues = []

for lat, long, neighborhood in zip(hyd_df['Latitude'], hyd_df['Longitude'], hyd_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [27]:
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(6785, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,A. S. Rao Nagar,17.4112,78.50824,Bawarchi,17.406369,78.497662,Indian Restaurant
1,A. S. Rao Nagar,17.4112,78.50824,Sudharshan Theatre 35mm,17.40653,78.49515,Movie Theater
2,A. S. Rao Nagar,17.4112,78.50824,Subway,17.404173,78.51495,Sandwich Place
3,A. S. Rao Nagar,17.4112,78.50824,Devi 70 MM,17.406329,78.495409,Movie Theater
4,A. S. Rao Nagar,17.4112,78.50824,Spencer's,17.412592,78.4984,Convenience Store


In [28]:
venues_df.groupby(["Neighborhood"]).count()


Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A. S. Rao Nagar,22,22,22,22,22,22
A.C. Guards,50,50,50,50,50,50
Abhyudaya Nagar,11,11,11,11,11,11
Abids,81,81,81,81,81,81
Adikmet,20,20,20,20,20,20
Afzal Gunj,45,45,45,45,45,45
Aghapura,56,56,56,56,56,56
"Aliabad, Hyderabad",10,10,10,10,10,10
Alijah Kotla,17,17,17,17,17,17
Allwyn Colony,13,13,13,13,13,13


In [29]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))


There are 168 uniques categories.


In [30]:
venues_df['VenueCategory'].unique()[:50]

array(['Indian Restaurant', 'Movie Theater', 'Sandwich Place',
       'Convenience Store', 'Ice Cream Shop', 'Coffee Shop', 'Café',
       'Asian Restaurant', 'Flea Market', 'Park', 'Bakery',
       'Hyderabadi Restaurant', 'Juice Bar', 'Lounge',
       'South Indian Restaurant', 'Bistro', 'Science Museum',
       'Snack Place', 'Middle Eastern Restaurant',
       'Vegetarian / Vegan Restaurant', 'Stadium',
       'Performing Arts Venue', 'Hotel', 'Hotel Bar',
       'Fast Food Restaurant', 'Pizza Place', 'Mobile Phone Shop',
       'Fried Chicken Joint', 'Department Store', 'Hookah Bar',
       'Electronics Store', 'Grocery Store', 'Clothing Store',
       'Breakfast Spot', 'Bus Station', 'Fruit & Vegetable Store',
       'Restaurant', 'Shoe Store', 'Food Truck', 'Neighborhood',
       'Chaat Place', 'Diner', 'Burger Joint', 'Dessert Shop',
       'Chinese Restaurant', 'Smoke Shop', 'Bar', 'Shopping Mall', 'Food',
       'Multiplex'], dtype=object)

In [31]:

# check if the results contain "Shopping Mall"
"Neighborhood" in venues_df['VenueCategory'].unique()

True

## analyze Each Neighborhood

In [32]:
hyd_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
hyd_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [hyd_onehot.columns[-1]] + list(hyd_onehot.columns[:-1])
hyd_onehot = hyd_onehot[fixed_columns]

print(hyd_onehot.shape)
hyd_onehot.head()

(6785, 169)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Afghan Restaurant,American Restaurant,Andhra Restaurant,Arcade,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,BBQ Joint,Badminton Court,Bakery,Bank,Bar,Baseball Field,Bed & Breakfast,Beer Garden,Bengali Restaurant,Big Box Store,Bike Rental / Bike Share,Bistro,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Brewery,Buffet,Burger Joint,Bus Station,Business Service,Butcher,Cafeteria,Café,Chaat Place,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Dumpling Restaurant,Electronics Store,Ethiopian Restaurant,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flea Market,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,History Museum,Hookah Bar,Hotel,Hotel Bar,Hunan Restaurant,Hyderabadi Restaurant,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Intersection,Irani Cafe,Irish Pub,Italian Restaurant,Jewelry Store,Juice Bar,Lake,Laser Tag,Light Rail Station,Liquor Store,Lounge,Market,Mattress Store,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motorcycle Shop,Movie Theater,Multicuisine Indian Restaurant,Multiplex,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,North Indian Restaurant,Office,Outdoors & Recreation,Park,Parsi Restaurant,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Plaza,Pool,Pub,Rajasthani Restaurant,Residential Building (Apartment / Condo),Resort,Restaurant,River,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Seafood Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Social Club,South Indian Restaurant,Spa,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Supermarket,Tea Room,Tech Startup,Temple,Tex-Mex Restaurant,Thai Restaurant,Train Station,Vegetarian / Vegan Restaurant,Wings Joint,Women's Store,Zoo
0,A. S. Rao Nagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,A. S. Rao Nagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,A. S. Rao Nagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,A. S. Rao Nagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,A. S. Rao Nagar,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [33]:
hyd_grouped = hyd_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(hyd_grouped.shape)
hyd_grouped

(200, 169)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Afghan Restaurant,American Restaurant,Andhra Restaurant,Arcade,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auditorium,BBQ Joint,Badminton Court,Bakery,Bank,Bar,Baseball Field,Bed & Breakfast,Beer Garden,Bengali Restaurant,Big Box Store,Bike Rental / Bike Share,Bistro,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Brewery,Buffet,Burger Joint,Bus Station,Business Service,Butcher,Cafeteria,Café,Chaat Place,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Cricket Ground,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Diner,Discount Store,Donut Shop,Dumpling Restaurant,Electronics Store,Ethiopian Restaurant,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flea Market,Fondue Restaurant,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gas Station,Gastropub,General Entertainment,Golf Course,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Historic Site,History Museum,Hookah Bar,Hotel,Hotel Bar,Hunan Restaurant,Hyderabadi Restaurant,Ice Cream Shop,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Intersection,Irani Cafe,Irish Pub,Italian Restaurant,Jewelry Store,Juice Bar,Lake,Laser Tag,Light Rail Station,Liquor Store,Lounge,Market,Mattress Store,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Monument / Landmark,Motorcycle Shop,Movie Theater,Multicuisine Indian Restaurant,Multiplex,Museum,Music Venue,Neighborhood,New American Restaurant,Nightclub,Noodle House,North Indian Restaurant,Office,Outdoors & Recreation,Park,Parsi Restaurant,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Plaza,Pool,Pub,Rajasthani Restaurant,Residential Building (Apartment / Condo),Resort,Restaurant,River,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Seafood Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Social Club,South Indian Restaurant,Spa,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Supermarket,Tea Room,Tech Startup,Temple,Tex-Mex Restaurant,Thai Restaurant,Train Station,Vegetarian / Vegan Restaurant,Wings Joint,Women's Store,Zoo
0,A. S. Rao Nagar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.136364,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.318182,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.136364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,A.C. Guards,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.06,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.06,0.02,0.0,0.04,0.04,0.14,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0
2,Abhyudaya Nagar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.272727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Abids,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.049383,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.037037,0.012346,0.049383,0.024691,0.0,0.024691,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.037037,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.049383,0.0,0.0,0.012346,0.0,0.012346,0.0,0.012346,0.0,0.012346,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.037037,0.012346,0.0,0.0,0.061728,0.135802,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.024691,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.012346,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.024691,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.012346,0.0,0.012346,0.0,0.012346,0.012346,0.012346,0.024691,0.0,0.0,0.024691,0.0,0.012346,0.0,0.012346,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.012346,0.0,0.0,0.0
4,Adikmet,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.15,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.35,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Afzal Gunj,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.022222,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.022222,0.044444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.044444,0.0,0.0,0.022222,0.0,0.022222,0.0,0.022222,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.111111,0.0,0.0,0.0,0.022222,0.111111,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.022222,0.0,0.022222,0.0,0.022222,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.022222,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Aghapura,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.017857,0.017857,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.017857,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.035714,0.0,0.0,0.017857,0.0,0.0,0.0,0.017857,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.017857,0.0,0.035714,0.035714,0.178571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.017857,0.017857,0.017857,0.017857,0.0,0.0,0.053571,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0
7,"Aliabad, Hyderabad",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0
8,Alijah Kotla,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.176471,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0
9,Allwyn Colony,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.153846,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.076923,0.230769,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.230769,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [34]:
len(hyd_grouped[hyd_grouped["Shopping Mall"] > 0])


58

In [35]:
hyd_mall = hyd_grouped[["Neighborhoods","Shopping Mall"]]
hyd_mall

Unnamed: 0,Neighborhoods,Shopping Mall
0,A. S. Rao Nagar,0.0
1,A.C. Guards,0.0
2,Abhyudaya Nagar,0.0
3,Abids,0.012346
4,Adikmet,0.0
5,Afzal Gunj,0.022222
6,Aghapura,0.017857
7,"Aliabad, Hyderabad",0.0
8,Alijah Kotla,0.0
9,Allwyn Colony,0.0


## 7. Cluster Neighborhoods

In [118]:
# set number of clusters
kclusters = 3

hyd_clustering = hyd_grouped.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters,init = "k-means++", random_state=0).fit(hyd_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[100:110]

array([1, 1, 1, 1, 1, 1, 1, 1, 0, 0])

In [119]:
hyd_merged = hyd_mall.copy()

# add clustering labels
hyd_merged["Cluster Labels"] = kmeans.labels_

In [120]:
hyd_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
hyd_merged.head()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,A. S. Rao Nagar,0.0,1
1,A.C. Guards,0.0,1
2,Abhyudaya Nagar,0.0,0
3,Abids,0.012346,1
4,Adikmet,0.0,1


In [121]:
hyd_merged = hyd_merged.join(hyd_df.set_index("Neighborhood"), on="Neighborhood")

print(hyd_merged.shape)
hyd_merged.head() # check the last columns!

(200, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,A. S. Rao Nagar,0.0,1,17.4112,78.50824
1,A.C. Guards,0.0,1,17.392977,78.456867
2,Abhyudaya Nagar,0.0,0,17.33765,78.56414
3,Abids,0.012346,1,17.3898,78.47658
4,Adikmet,0.0,1,17.41061,78.51513


In [122]:
print(hyd_merged.shape)
hyd_merged.sort_values(["Cluster Labels"], inplace=True)
hyd_merged

(200, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
173,Ramanthapur,0.111111,0,17.39032,78.54544
108,Lothkunta,0.0,0,17.49405,78.51514
179,Risala Bazar,0.0,0,17.38647,78.4051
140,"Nagaram, Medchal–Malkajgiri district",0.0,0,17.60993,78.49122
26,"Barkas, Hyderabad",0.0,0,17.31594,78.48107
139,Nacharam,0.0,0,17.43351,78.56673
75,Jalal Baba Nagar,0.111111,0,17.35442,78.43255
66,Hafeezpet,0.0,0,17.4899,78.3522
172,"Ramachandrapuram, Medak district",0.0,0,17.51159,78.29431
84,Karmanghat,0.0,0,17.34061,78.53258


In [123]:
hyd_merged['Cluster Labels'].value_counts()

1    153
0     39
2      8
Name: Cluster Labels, dtype: int64

In [124]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(hyd_merged['Latitude'],hyd_merged['Longitude'], hyd_merged['Neighborhood'], hyd_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [125]:
map_clusters.save('map_clusters.html')

## 8.examine clusters


### cluster 1

In [126]:
hyd_merged.loc[hyd_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
173,Ramanthapur,0.111111,0,17.39032,78.54544
108,Lothkunta,0.0,0,17.49405,78.51514
179,Risala Bazar,0.0,0,17.38647,78.4051
140,"Nagaram, Medchal–Malkajgiri district",0.0,0,17.60993,78.49122
26,"Barkas, Hyderabad",0.0,0,17.31594,78.48107
139,Nacharam,0.0,0,17.43351,78.56673
75,Jalal Baba Nagar,0.111111,0,17.35442,78.43255
66,Hafeezpet,0.0,0,17.4899,78.3522
172,"Ramachandrapuram, Medak district",0.0,0,17.51159,78.29431
84,Karmanghat,0.0,0,17.34061,78.53258


### cluster 2


In [127]:
hyd_merged.loc[hyd_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
137,Musheerabad,0.0,1,17.41469,78.50148
138,Mylargadda,0.0,1,17.42839,78.51606
142,Nalgonda 'X' Roads,0.0,1,17.375388,78.498487
136,Moula-Ali,0.0,1,17.46577,78.56018
143,Nallakunta,0.0,1,17.39749,78.50238
141,Nagole,0.0,1,17.37893,78.56204
0,A. S. Rao Nagar,0.0,1,17.4112,78.50824
131,Moazzam Jahi Market,0.018868,1,17.38448,78.47442
111,Madhapur,0.0,1,17.45694,78.39013
114,Maharajgunj,0.020408,1,17.37928,78.47749


### cluster 3

In [128]:
hyd_merged.loc[hyd_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
119,Mallapur,0.0,2,17.450017,78.609361
69,Hayathnagar,0.0,2,17.32707,78.60533
42,Cherlapally,0.0,2,17.46648,78.59999
72,IDA Bollaram,0.0,2,17.55403,78.34459
77,Jeedimetla,0.0,2,17.52183,78.45433
32,Bharat Nagar,0.0,2,17.52834,78.52504
159,Patancheru,0.0,2,17.52677,78.25234
130,Miyapur,0.0,2,17.42102,78.58244



### Observations:
###### Most of the shopping malls are concentrated in the central area of hyderabad city, with the highest number in cluster 2 and moderate number in cluster 1. On the other hand, cluster 3 has very low number to totally no shopping mall in the neighborhoods. This represents a great opportunity and high potential areas to open new shopping malls as there is very little to no competition from existing malls. Meanwhile, shopping malls in cluster 2 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. From another perspective, this also shows that the oversupply of shopping malls mostly happened in the central area of the city, with the suburb area still have very few shopping malls. Therefore, this project recommends property developers to capitalize on these findings to open new shopping malls in neighborhoods in cluster 3 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighborhoods in cluster 0 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 2 which already have high concentration of shopping malls and suffering from intense competition.