# Capstone Project - The Battle of Neighborhoods

## Week 5 Final Report

***_Opening a New Shopping Mall in New Delhi, India_***

* Build a dataframe of neighborhoods in New Delhi, India by web scraping the data from Wikipedia page.
* Get the geographical coordinates of the neighborhoods.
* Obtain the venue data for the neighborhoods from Foursquare Api.
* Explore & Cluster the neighborhoods.
* Select the best cluster to open a new Shopping Mall.

**------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------**

### 1. Import Libraries

In [1]:
import requests # library to handle requests
import json # library to handle JSON files
import numpy as np # library to handle data in vectorized manner

import pandas as pd # library for data analysis
pd.set_option('display.max_columns',None)
pd.set_option('display.max_rows',None)
pd.set_option('display.max_colwidth',None)

import folium # map rendering library
import geocoder # to get coordinates

#Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#import k-means from clustering stage
from sklearn.cluster import KMeans

# library to parse HTML and XML documents
from bs4 import BeautifulSoup

# convert an address into latitude and longitude values
from geopy.geocoders import Nominatim

print('Libraries Imported.')

Libraries Imported.


### 2. Scrap Data from Wikipedia page into a DataFrame

In [2]:
url = 'https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Delhi'

In [3]:
#send get requests
result = requests.get(url)
htmlContent = result.content

In [4]:
#parse the data from the html into a BeautifulSoup object
soup = BeautifulSoup(htmlContent,'html.parser')
#print(soup.prettify) # to view the parsed html data

In [5]:
# create a list to store neighborhood data
neighborhoodList = []

In [6]:
# append the data into the list
for row in soup.find_all('div',class_='mw-category')[0].find_all('li'):
    neighborhoodList.append(row.text)

In [7]:
delhi_df = pd.DataFrame({'Neighborhood':neighborhoodList})
delhi_df = delhi_df.drop(index=0).reset_index(drop=True)
delhi_df.head()

Unnamed: 0,Neighborhood
0,Ashok Nagar (Delhi)
1,Ashok Vihar
2,Ashram Chowk
3,Babarpur
4,"Badarpur, Delhi"


In [8]:
# print the number of rows of the dataframe
delhi_df.shape[0]

138

### 3. Get the Geographical Coordinates

In [9]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, New Delhi, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [10]:
coords = [get_latlng(neighborhood) for neighborhood in delhi_df['Neighborhood'].tolist() ]
print(coords)

[[28.692230000000052, 77.30124000000006], [28.69037000000003, 77.17609000000004], [28.710598435255907, 77.32696519316737], [28.50738000000007, 77.30346000000003], [28.50738000000007, 77.30346000000003], [28.65223022436032, 77.12941079026544], [28.79767000000004, 77.04522000000003], [28.549540000000036, 77.18167000000005], [28.699880000000064, 77.25906000000003], [28.595060000000046, 77.18573000000004], [28.656270000000063, 77.23232000000007], [28.67671000000007, 77.21767000000006], [28.633940000000052, 77.21968000000004], [28.60761000000008, 77.08714000000003], [28.654597885415757, 77.2333966005242], [28.62832000000003, 77.24727000000007], [28.60486000000003, 77.08511000000004], [28.560590000000047, 77.24678000000006], [28.57298000000003, 77.23357000000004], [28.591510000000028, 77.12945000000008], [28.700370000000078, 77.20493000000005], [28.592220036588714, 77.15998300657745], [28.684700000000078, 77.32774000000006], [28.679040000000043, 77.31476000000004], [28.589950000000044, 77.04

In [11]:
# create temporary dataframe to populate the coordinates into latitude and longitude
df_coords = pd.DataFrame(coords,columns=['Latitude','Longitude'])
df_coords.head()

Unnamed: 0,Latitude,Longitude
0,28.69223,77.30124
1,28.69037,77.17609
2,28.710598,77.326965
3,28.50738,77.30346
4,28.50738,77.30346


In [12]:
# merge the coordinates into the original dataframe
delhi_df['Latitude'] = df_coords['Latitude']
delhi_df['Longitude'] = df_coords['Longitude']

In [13]:
# check the neighborhoods and the coordinates 
print('The Shape is :',delhi_df.shape)
delhi_df.head()

The Shape is : (138, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Ashok Nagar (Delhi),28.69223,77.30124
1,Ashok Vihar,28.69037,77.17609
2,Ashram Chowk,28.710598,77.326965
3,Babarpur,28.50738,77.30346
4,"Badarpur, Delhi",28.50738,77.30346


### 4. Create a map of New Delhi with neighborhoods superimposed on top

In [14]:
# get the coordinates of New Delhi
address = 'New Delhi, India'
geolocator = Nominatim(user_agent='my_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinate of New Delhi , India {},{}'.format(latitude,longitude))

The geographical coordinate of New Delhi , India 28.6138954,77.2090057


In [15]:
# Create Map of New Delhi using latitude and longitude
delhi_map = folium.Map(location=[latitude,longitude],zoom_start=11)

# add markers to map
for lat,lng,neighborhood in zip(delhi_df['Latitude'],delhi_df['Longitude'],delhi_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label,parse_html=True)
    folium.CircleMarker(
        [lat,lng],radius=5,color='blue',popup=label,fill=True,fill_color='blue',fill_opacity=0.6,parse_html=False
    ).add_to(delhi_map)
delhi_map

In [16]:
# save the map as HTML file
delhi_map.save('delhi_map.html')

### 5. Use the Foursquare API to explore the neighborhoods

In [17]:
# define Foursquare Credentials, Version and Limit
CLIENT_ID = 'PESQIFIZTNDBTMZPSPLZXDCQVGVQT50IE3K1RP3QCQALSDCV' # your Foursquare ID
CLIENT_SECRET = 'ODUYPM1K0ZG1Z0LZTM4GE5IRA0KSMBARQMVDJWAD0HC04UBW' # your Foursquare Secret
VERSION = '20180605'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: PESQIFIZTNDBTMZPSPLZXDCQVGVQT50IE3K1RP3QCQALSDCV
CLIENT_SECRET:ODUYPM1K0ZG1Z0LZTM4GE5IRA0KSMBARQMVDJWAD0HC04UBW


**Now, let's get the top 100 venues that are within a radius of 2000 meters**

In [18]:
def get_VenueTypes(neighborhood,latitude,longitude,radius=2000):
    venues_list = []
    for neighbor,lat,lng in zip(neighborhood,latitude,longitude):
        url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(
            CLIENT_ID,CLIENT_SECRET,
            lat,lng,VERSION,radius,
            LIMIT)
        results = requests.get(url).json()["response"]["groups"][0]['items']
        venues_list.append([(
            neighbor,
            lat,
            lng,
            v['venue']['name'],
            v['venue']['location']['lat'],
            v['venue']['location']['lng'],
            v['venue']['categories'][0]['name']) for v in results])
    nearby_venues = pd.DataFrame(i for z in venues_list for i in z)
    nearby_venues.columns = ['Neighborhood','Latitude','Longitude','VenueName','VenueLatitude','VenueLongitude','VenueCategory']
    return nearby_venues

In [19]:
delhi_data = get_VenueTypes(neighborhood=delhi_df['Neighborhood'],
                            latitude=delhi_df['Latitude'],
                            longitude=delhi_df['Longitude'])

In [20]:
delhi_data.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Ashok Nagar (Delhi),28.69223,77.30124,Sutta Chowk,28.697897,77.30001,Smoke Shop
1,Ashok Nagar (Delhi),28.69223,77.30124,AFM PVT LTD,28.70477,77.309608,Tourist Information Center
2,Ashok Nagar (Delhi),28.69223,77.30124,yamuna vihar,28.689816,77.283876,Park
3,Ashok Nagar (Delhi),28.69223,77.30124,Shivaji park,28.682657,77.285503,Park
4,Ashok Nagar (Delhi),28.69223,77.30124,Mansarover Park Metro Station,28.67537,77.300932,Train Station


In [21]:
delhi_data.groupby('Neighborhood').count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ashok Nagar (Delhi),5,5,5,5,5,5
Ashok Vihar,23,23,23,23,23,23
Ashram Chowk,5,5,5,5,5,5
Babarpur,6,6,6,6,6,6
"Badarpur, Delhi",6,6,6,6,6,6
Bali Nagar,57,57,57,57,57,57
Bawana,2,2,2,2,2,2
Ber Sarai,98,98,98,98,98,98
Bhajanpura,10,10,10,10,10,10
Chanakyapuri,80,80,80,80,80,80


In [22]:
print('There are {} uniques categories.'.format(len(delhi_data['VenueCategory'].unique())))

There are 201 uniques categories.


In [23]:
# print out the list of categories
delhi_data['VenueCategory'].unique()[:50]

array(['Smoke Shop', 'Tourist Information Center', 'Park',
       'Train Station', 'Athletics & Sports', 'Asian Restaurant',
       'Sandwich Place', 'Snack Place', 'Pizza Place',
       'Indian Restaurant', 'South Indian Restaurant', 'Department Store',
       'Fast Food Restaurant', 'Coffee Shop', 'Market', 'Dessert Shop',
       'Basketball Court', 'Light Rail Station',
       'Vegetarian / Vegan Restaurant', 'ATM', 'Indian Sweet Shop',
       'Café', 'Bakery', 'American Restaurant', 'Donut Shop', 'Diner',
       'Hookah Bar', 'BBQ Joint', 'Sports Bar', 'Hotel', 'Pub',
       'Garden Center', 'Multiplex', 'Shopping Mall',
       'Furniture / Home Store', 'Fried Chicken Joint', 'Bar',
       'Restaurant', 'Gym / Fitness Center', 'Jewelry Store', 'Garden',
       'Cafeteria', 'Ice Cream Shop', 'Food Truck', 'Cosmetics Shop',
       'Business Service', 'Playground', 'Art Gallery',
       'Mediterranean Restaurant', 'Tea Room'], dtype=object)

In [24]:
'Neighborhood' in delhi_data['VenueCategory'].unique()

True

### 6. Analyze Each Neighborhood

In [25]:
delhi_onehot = pd.get_dummies(delhi_data[['VenueCategory']],prefix='',prefix_sep='')
delhi_onehot = delhi_onehot.drop('Neighborhood',axis=1)
delhi_onehot['Neighborhoods'] = delhi_data['Neighborhood']
fixed_columns = [delhi_onehot.columns[-1]] + list(delhi_onehot.columns[:-1])
delhi_onehot = delhi_onehot[fixed_columns]
print(delhi_onehot.shape)
delhi_onehot.head()

(5936, 201)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Airport Food Court,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Bed & Breakfast,Beer Garden,Bengali Restaurant,Big Box Store,Bike Shop,Bistro,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Bridal Shop,Burger Joint,Burmese Restaurant,Bus Station,Business Service,Cafeteria,Café,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,College Gym,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Cricket Ground,Deli / Bodega,Department Store,Dessert Shop,Diner,Dog Run,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant,Event Space,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,High School,Hindu Temple,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Irani Cafe,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Karnataka Restaurant,Korean Restaurant,Lake,Light Rail Station,Lounge,Market,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Mosque,Motel,Motorcycle Shop,Movie Theater,Moving Target,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Music Venue,Nightclub,Nightlife Spot,North Indian Restaurant,Northeast Indian Restaurant,Other Great Outdoors,Other Nightlife,Paper / Office Supplies Store,Park,Performing Arts Venue,Pizza Place,Planetarium,Playground,Plaza,Pool,Portuguese Restaurant,Print Shop,Pub,Public Art,Punjabi Restaurant,Racetrack,Restaurant,River,Road,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Sculpture Garden,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South Indian Restaurant,Spa,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Sushi Restaurant,Tapas Restaurant,Tea Room,Temple,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Tibetan Restaurant,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Turkish Restaurant,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Weight Loss Center,Women's Store,Yoga Studio,Zoo
0,Ashok Nagar (Delhi),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Ashok Nagar (Delhi),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
2,Ashok Nagar (Delhi),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Ashok Nagar (Delhi),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Ashok Nagar (Delhi),0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0


In [26]:
delhi_grouped = delhi_onehot.groupby('Neighborhoods').mean().reset_index()
print(delhi_grouped.shape)
delhi_grouped.head()

(138, 201)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Airport Food Court,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Bed & Breakfast,Beer Garden,Bengali Restaurant,Big Box Store,Bike Shop,Bistro,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Bridal Shop,Burger Joint,Burmese Restaurant,Bus Station,Business Service,Cafeteria,Café,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,College Cafeteria,College Gym,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Cricket Ground,Deli / Bodega,Department Store,Dessert Shop,Diner,Dog Run,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant,Event Space,Fabric Shop,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,High School,Hindu Temple,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Irani Cafe,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Karaoke Bar,Karnataka Restaurant,Korean Restaurant,Lake,Light Rail Station,Lounge,Market,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Mosque,Motel,Motorcycle Shop,Movie Theater,Moving Target,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Music Venue,Nightclub,Nightlife Spot,North Indian Restaurant,Northeast Indian Restaurant,Other Great Outdoors,Other Nightlife,Paper / Office Supplies Store,Park,Performing Arts Venue,Pizza Place,Planetarium,Playground,Plaza,Pool,Portuguese Restaurant,Print Shop,Pub,Public Art,Punjabi Restaurant,Racetrack,Restaurant,River,Road,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Sculpture Garden,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Soccer Stadium,Soup Place,South Indian Restaurant,Spa,Speakeasy,Spiritual Center,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Sushi Restaurant,Tapas Restaurant,Tea Room,Temple,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Tibetan Restaurant,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Turkish Restaurant,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Weight Loss Center,Women's Store,Yoga Studio,Zoo
0,Ashok Nagar (Delhi),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Ashok Vihar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.130435,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.086957,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0
2,Ashram Chowk,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Babarpur,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Badarpur, Delhi",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [27]:
len(delhi_grouped[delhi_grouped['Shopping Mall']>0])

42

In [28]:
delhi_malls = delhi_grouped[['Neighborhoods','Shopping Mall']]
delhi_malls.head()

Unnamed: 0,Neighborhoods,Shopping Mall
0,Ashok Nagar (Delhi),0.0
1,Ashok Vihar,0.0
2,Ashram Chowk,0.0
3,Babarpur,0.0
4,"Badarpur, Delhi",0.0


### 7. Cluster Neighborhood

In [29]:
# set numbers of Cluster
k_cluster = 3
delhi_clustering = delhi_malls.drop('Neighborhoods',axis=1)

# run k-means clustering 
kmeans = KMeans(n_clusters=k_cluster,random_state=0).fit(delhi_clustering)

# Check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 1, 0, 0, 0, 0])

In [30]:
# create a new dataframe that includes the cluster
delhi_merged = delhi_malls.copy()
# add clustering labels
delhi_merged['Cluster Labels'] = kmeans.labels_

In [31]:
delhi_merged.rename(columns={'Neighborhoods':'Neighborhood'},inplace=True)
delhi_merged.head()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Ashok Nagar (Delhi),0.0,0
1,Ashok Vihar,0.0,0
2,Ashram Chowk,0.0,0
3,Babarpur,0.0,0
4,"Badarpur, Delhi",0.0,0


In [32]:
delhi_merged = delhi_merged.join(delhi_df.set_index('Neighborhood'),on='Neighborhood')
delhi_merged.head()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Ashok Nagar (Delhi),0.0,0,28.69223,77.30124
1,Ashok Vihar,0.0,0,28.69037,77.17609
2,Ashram Chowk,0.0,0,28.710598,77.326965
3,Babarpur,0.0,0,28.50738,77.30346
4,"Badarpur, Delhi",0.0,0,28.50738,77.30346


In [33]:
# sort the results by Cluster Labels
delhi_merged.sort_values('Cluster Labels',ascending=False,inplace=True)
delhi_merged

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
96,Pitam Pura,0.1,2,28.6959,77.13725
41,"Kabir Nagar, New Delhi",0.1,2,28.68966,77.141059
119,Shalimar Bagh (Delhi Assembly constituency),0.095238,2,28.71423,77.15744
55,Laxmi Nagar (Delhi),0.086957,2,28.63875,77.27592
106,"Sadar Bazaar, Delhi",0.125,2,28.59028,77.12014
104,"Rohini, Delhi",0.111111,2,28.73356,77.10401
19,Delhi Cantonment,0.166667,2,28.59151,77.12945
44,"Kapasheda Border, Delhi",0.153846,2,28.52163,77.08645
46,Keshav Puram,0.103448,2,28.68801,77.15866
51,Kirti Nagar,0.057143,1,28.64821,77.14273


**Finally, Let's visualize the resulting cluster.**

In [34]:
# create map
delhi_clustered_map = folium.Map(location=[latitude,longitude],zoom_start=11)

# set color scheme for the clusters
x = np.arange(k_cluster)
ys = [i + x + (i*x)**2 for i in range(k_cluster)]
color_array = cm.rainbow(np.linspace(0,1,len(ys)))
rainbow = [colors.rgb2hex(i) for i in color_array]

# add markers to the map
markers_colors = []
for lat,lng,neighbor,cluster in zip(delhi_merged['Latitude'],delhi_merged['Longitude'],delhi_merged['Neighborhood'],delhi_merged['Cluster Labels']):
    label = '{}, Cluster {}'.format(neighbor,cluster)
    label = folium.Popup(label,parse_html=True)
    folium.CircleMarker(
        [lat,lng],radius=5,color=rainbow[cluster-1],popup=label,fill=True,fill_color=rainbow[cluster-1],fill_opacity=0.7
    ).add_to(delhi_clustered_map)
delhi_clustered_map

In [35]:
# save the map as HTML file
delhi_clustered_map.save('delhi_clustered_map.html')

### 8. Examine Clusters

#### Cluster 0

In [36]:
print(delhi_merged.loc[delhi_merged['Cluster Labels']==0].shape[0])
delhi_merged.loc[delhi_merged['Cluster Labels']==0]

107


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
94,Paschim Vihar,0.0,0,28.66933,77.09173
91,Palika Bazaar,0.0,0,28.63156,77.21959
79,Narela,0.0,0,28.83979,77.07696
88,Okhla,0.0,0,28.53247,77.27839
98,Raisina Hill,0.0,0,28.6184,77.215481
89,Old Delhi,0.0,0,28.65434,77.23258
80,Naveen Shahdara,0.0,0,28.67369,77.28326
90,Palam,0.0,0,28.59106,77.09117
81,"Netaji Nagar, Delhi",0.018182,0,28.57746,77.18517
86,Nizamuddin East,0.0,0,28.60124,77.264521


#### CLuster 1

In [37]:
print(delhi_merged.loc[delhi_merged['Cluster Labels']==1].shape[0])
delhi_merged.loc[delhi_merged['Cluster Labels']==1]

22


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
51,Kirti Nagar,0.057143,1,28.64821,77.14273
103,"Rani Bagh, Delhi",0.0625,1,28.68584,77.13188
101,Rajouri Garden,0.055556,1,28.64562,77.12209
120,Shankar Vihar,0.068966,1,28.63847,77.28912
37,Inder Puri,0.028571,1,28.65107,77.30669
53,"Krishna Nagar, Delhi",0.041667,1,28.65545,77.28336
109,Saket (Delhi),0.04,1,28.52407,77.20677
110,Saket District Centre,0.04,1,28.52813,77.21905
50,Kingsway Camp,0.052632,1,28.71169,77.20197
93,Pandav Nagar,0.066667,1,28.61458,77.27574


#### CLuster 2

In [38]:
print(delhi_merged.loc[delhi_merged['Cluster Labels']==2].shape[0])
delhi_merged.loc[delhi_merged['Cluster Labels']==2]

9


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
96,Pitam Pura,0.1,2,28.6959,77.13725
41,"Kabir Nagar, New Delhi",0.1,2,28.68966,77.141059
119,Shalimar Bagh (Delhi Assembly constituency),0.095238,2,28.71423,77.15744
55,Laxmi Nagar (Delhi),0.086957,2,28.63875,77.27592
106,"Sadar Bazaar, Delhi",0.125,2,28.59028,77.12014
104,"Rohini, Delhi",0.111111,2,28.73356,77.10401
19,Delhi Cantonment,0.166667,2,28.59151,77.12945
44,"Kapasheda Border, Delhi",0.153846,2,28.52163,77.08645
46,Keshav Puram,0.103448,2,28.68801,77.15866


### Observation

Most of the shopping malls are concentrated in the West area of New Delhi city, with the highest number in cluster 2 and moderate number in cluster 0. On the other hand, cluster 1 has very low number to totally no shopping mall in the neighborhoods. This represents a great opportunity and high potential areas to open new shopping malls as there is very little to no competition from existing malls. Meanwhile, shopping malls in cluster 2 are likely suffering from intense competition due to oversupply and high concentration of shopping malls. From another perspective, this also shows that the oversupply of shopping malls mostly happened in the central area of the city, with the suburb area still have very few shopping malls. Therefore, this project recommends property developers to capitalize on these findings to open new shopping malls in neighborhoods in cluster 1 with little to no competition. Property developers with unique selling propositions to stand out from the competition can also open new shopping malls in neighborhoods in cluster 0 with moderate competition. Lastly, property developers are advised to avoid neighborhoods in cluster 2 which already have high concentration of shopping malls and suffering from intense competition.