# Coursera- IBM Applied Data Science Capstone

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/3/38/Jupyter_logo.svg/1200px-Jupyter_logo.svg.png" style="width:128px;height:128px;" />

### Week 5 Final Report

#### Opening a New Shopping Mall in New Delhi, India

    1. Build a dataframe of neighborhoods in **New Delhi, India** by web scraping the data from Wikipedia page
    2. Get the geographical coordinates of the neighborhoods
    3. Obtain the venue data for the neighborhoods from Foursquare API
    4. Explore and cluster the neighborhoods
    5. Select the best cluster to open a new shopping mall


### 1.0 Import required libraries

In [6]:
#installing geocoder
! pip install geocoder



In [3]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")



Libraries imported.


### 2. Scrap data from Wikipedia page into a DataFrame

In [7]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:District_subdivisions_of_Delhi").text

In [8]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [9]:
# create a list to store neighborhood data
neighborhoodList = []

In [10]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [11]:
# create a new DataFrame from the list
dhl_df = pd.DataFrame({"Neighborhood": neighborhoodList})
dhl_df.head(70)

Unnamed: 0,Neighborhood
0,List of districts of Delhi
1,Bawana
2,Chanakyapuri
3,"Civil Lines, Delhi"
4,"Connaught Place, New Delhi"
5,"Dabri, New Delhi"
6,Daryaganj
7,Delhi Cantonment
8,Dilshad Colony
9,Districts of Delhi Police


In [12]:
#remove unwanted first row
dhl_df.drop(dhl_df.head(1).index, inplace=True)

In [13]:
#check removal of first row
dhl_df.head(40)

Unnamed: 0,Neighborhood
1,Bawana
2,Chanakyapuri
3,"Civil Lines, Delhi"
4,"Connaught Place, New Delhi"
5,"Dabri, New Delhi"
6,Daryaganj
7,Delhi Cantonment
8,Dilshad Colony
9,Districts of Delhi Police
10,"Dwarka, Delhi"


In [14]:
# print the number of rows of the dataframe
dhl_df.shape

(35, 1)

### 3. Get the geographical coordinates

In [15]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, New Delhi, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [16]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in dhl_df["Neighborhood"].tolist() ]

In [17]:
coords

[[28.79767000000004, 77.04522000000003],
 [28.595060000000046, 77.18573000000004],
 [28.67671000000007, 77.21767000000006],
 [28.633940000000052, 77.21968000000004],
 [28.60761000000008, 77.08714000000003],
 [28.62832000000003, 77.24727000000007],
 [28.591510000000028, 77.12945000000008],
 [28.684700000000078, 77.32774000000006],
 [28.550650000000076, 77.25187000000005],
 [28.589950000000044, 77.04004000000003],
 [28.64817000000005, 77.17833000000007],
 [28.660910000000058, 77.26432000000005],
 [28.551090000000045, 77.20399000000003],
 [28.627910000000043, 77.09060000000005],
 [28.536620000000028, 77.26094000000006],
 [28.650450000000035, 77.18873000000008],
 [28.57815000000005, 77.20618000000007],
 [28.533280000000047, 77.31645000000003],
 [28.66121000000004, 77.08690000000007],
 [28.705010000000073, 77.18950000000007],
 [28.62510000000003, 76.99740000000008],
 [28.83979000000005, 77.07696000000004],
 [28.580996661117194, 77.18182278573488],
 [28.64596000000006, 77.21492000000006],
 [

In [18]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [19]:
# merge the coordinates into the original dataframe
dhl_df['Latitude'] = df_coords['Latitude']
dhl_df['Longitude'] = df_coords['Longitude']

In [20]:
# check the neighborhoods and the coordinates
print(dhl_df.shape)
dhl_df

(35, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
1,Bawana,28.59506,77.18573
2,Chanakyapuri,28.67671,77.21767
3,"Civil Lines, Delhi",28.63394,77.21968
4,"Connaught Place, New Delhi",28.60761,77.08714
5,"Dabri, New Delhi",28.62832,77.24727
6,Daryaganj,28.59151,77.12945
7,Delhi Cantonment,28.6847,77.32774
8,Dilshad Colony,28.55065,77.25187
9,Districts of Delhi Police,28.58995,77.04004
10,"Dwarka, Delhi",28.64817,77.17833


In [24]:
# removing NANs
dhl_df =dhl_df.dropna()
dhl_df

Unnamed: 0,Neighborhood,Latitude,Longitude
1,Bawana,28.59506,77.18573
2,Chanakyapuri,28.67671,77.21767
3,"Civil Lines, Delhi",28.63394,77.21968
4,"Connaught Place, New Delhi",28.60761,77.08714
5,"Dabri, New Delhi",28.62832,77.24727
6,Daryaganj,28.59151,77.12945
7,Delhi Cantonment,28.6847,77.32774
8,Dilshad Colony,28.55065,77.25187
9,Districts of Delhi Police,28.58995,77.04004
10,"Dwarka, Delhi",28.64817,77.17833


In [25]:
# save the DataFrame as CSV file
dhl_df.to_csv("dhl_df.csv", index=False)

In [26]:
# get the coordinates of New Delhi
address = 'New Delhi, India'
geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of New Delhi, India is {}, {}.'.format(latitude, longitude))

The geograpical coordinate of New Delhi, India is 28.6141793, 77.2022662.


### 4. Create a map of  New Delhi with neighborhoods 

In [27]:
# create map of New Delhi using latitude and longitude values
map_dhl = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(dhl_df['Latitude'], dhl_df['Longitude'], dhl_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_dhl)  
    
map_dhl

In [28]:
# save the map as HTML file
map_dhl.save('map_dhl.html')

### 5. Use the Foursquare API to explore the neighborhoods

In [29]:
# define Foursquare Credentials and Version
CLIENT_ID = 'IWT3YTA1I40JOFZT2GRBCXLB4NPVG3WCOCHFOCJEJUGYGR0I' # your Foursquare ID
CLIENT_SECRET = 'JTQQSV2K0O0ELI4XA4H0CMMREGR1EIE04ICX454Y2FIJZX0M' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: IWT3YTA1I40JOFZT2GRBCXLB4NPVG3WCOCHFOCJEJUGYGR0I
CLIENT_SECRET:JTQQSV2K0O0ELI4XA4H0CMMREGR1EIE04ICX454Y2FIJZX0M


### Now, let's get the top 100 venues that are within a radius of 1500 meters.

In [30]:
radius = 1500
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(dhl_df['Latitude'], dhl_df['Longitude'], dhl_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [32]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head(100)

(968, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Bawana,28.59506,77.18573,Amour Bistro,28.601569,77.185923,Café
1,Bawana,28.59506,77.18573,Lázeez Affaire,28.602237,77.186044,Indian Restaurant
2,Bawana,28.59506,77.18573,Nehru Park | नेहरू पार्क (Nehru Park),28.591798,77.19286,Park
3,Bawana,28.59506,77.18573,Sanadige,28.601969,77.18702,Karnataka Restaurant
4,Bawana,28.59506,77.18573,ITC Maurya,28.59713,77.173643,Hotel
5,Bawana,28.59506,77.18573,Bukhara,28.596914,77.173358,North Indian Restaurant
6,Bawana,28.59506,77.18573,Dum Pukht,28.597194,77.173288,Indian Restaurant
7,Bawana,28.59506,77.18573,Cafe Coffee Day,28.595247,77.171954,Café
8,Bawana,28.59506,77.18573,Moti Mahal Delux,28.601677,77.187106,Indian Restaurant
9,Bawana,28.59506,77.18573,Taj Palace Hotel,28.595098,77.170913,Hotel


In [33]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bawana,68,68,68,68,68,68
Chanakyapuri,34,34,34,34,34,34
"Civil Lines, Delhi",100,100,100,100,100,100
"Connaught Place, New Delhi",5,5,5,5,5,5
"Dabri, New Delhi",26,26,26,26,26,26
Daryaganj,7,7,7,7,7,7
Delhi Cantonment,4,4,4,4,4,4
Dilshad Colony,51,51,51,51,51,51
Districts of Delhi Police,25,25,25,25,25,25
"Dwarka, Delhi",45,45,45,45,45,45


### Let's find out how many unique categories can be curated from all the returned venues

In [34]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 153 uniques categories.


In [35]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:100]

array(['Café', 'Indian Restaurant', 'Park', 'Karnataka Restaurant',
       'Hotel', 'North Indian Restaurant', 'Bar',
       'Northeast Indian Restaurant', 'Museum', 'BBQ Joint', 'Lounge',
       'Asian Restaurant', 'Tea Room', 'Multiplex', 'Smoke Shop',
       'Shopping Mall', 'Hotel Bar', 'Restaurant',
       'Vietnamese Restaurant', 'Chinese Restaurant',
       'Moroccan Restaurant', 'French Restaurant', 'Sculpture Garden',
       'Italian Restaurant', 'Nightclub', 'Pub',
       'Mediterranean Restaurant', 'Bistro', 'Seafood Restaurant',
       'Coffee Shop', 'Gym', 'Hotel Pool', 'Snack Place', 'Train Station',
       'Bus Station', 'Golf Course', 'Spa', 'Pool', 'College Gym',
       'Donut Shop', 'Grocery Store', 'Dumpling Restaurant',
       'Pizza Place', 'Fast Food Restaurant', 'Sandwich Place',
       'Light Rail Station', 'Convenience Store', 'Dessert Shop',
       'American Restaurant', 'Clothing Store', 'Flea Market', 'Parking',
       'Metro Station', 'Ice Cream Shop', 'Pla

In [36]:
# check if the results contain "Shopping Mall"
"Neighborhood" in venues_df['VenueCategory'].unique()

False

In [37]:
# one hot encoding
dhl_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
dhl_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [dhl_onehot.columns[-1]] + list(dhl_onehot.columns[:-1])
dhl_onehot = dhl_onehot[fixed_columns]

print(dhl_onehot.shape)
dhl_onehot.head()

(968, 154)


Unnamed: 0,Neighborhoods,ATM,Airport,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Bed & Breakfast,Beer Garden,Bengali Restaurant,Bike Shop,Bistro,Botanical Garden,Breakfast Spot,Burger Joint,Burrito Place,Bus Station,Cafeteria,Café,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,Comfort Food Restaurant,Convenience Store,Cricket Ground,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Dumpling Restaurant,Electronics Store,Event Space,Fabric Shop,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gastropub,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Hindu Temple,Historic Site,History Museum,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Karaoke Bar,Karnataka Restaurant,Korean Restaurant,Light Rail Station,Liquor Store,Lounge,Market,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,Motel,Movie Theater,Moving Target,Multiplex,Museum,Music Store,New American Restaurant,Nightclub,Nightlife Spot,North Indian Restaurant,Northeast Indian Restaurant,Other Nightlife,Park,Parking,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Resort,Restaurant,River,Road,Sandwich Place,Scandinavian Restaurant,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Stadium,South Indian Restaurant,Spa,Speakeasy,Stadium,Steakhouse,Tapas Restaurant,Tea Room,Temple,Thai Restaurant,Theater,Tibetan Restaurant,Track Stadium,Train Station,Turkish Restaurant,Vietnamese Restaurant,Women's Store,Yoga Studio
0,Bawana,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Bawana,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Bawana,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Bawana,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Bawana,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [38]:
dhl_grouped = dhl_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(dhl_grouped.shape)
dhl_grouped

(34, 154)


Unnamed: 0,Neighborhoods,ATM,Airport,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Bed & Breakfast,Beer Garden,Bengali Restaurant,Bike Shop,Bistro,Botanical Garden,Breakfast Spot,Burger Joint,Burrito Place,Bus Station,Cafeteria,Café,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Gym,Comfort Food Restaurant,Convenience Store,Cricket Ground,Deli / Bodega,Department Store,Dessert Shop,Diner,Donut Shop,Dumpling Restaurant,Electronics Store,Event Space,Fabric Shop,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Garden Center,Gastropub,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Hindu Temple,Historic Site,History Museum,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Karaoke Bar,Karnataka Restaurant,Korean Restaurant,Light Rail Station,Liquor Store,Lounge,Market,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Moroccan Restaurant,Motel,Movie Theater,Moving Target,Multiplex,Museum,Music Store,New American Restaurant,Nightclub,Nightlife Spot,North Indian Restaurant,Northeast Indian Restaurant,Other Nightlife,Park,Parking,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Resort,Restaurant,River,Road,Sandwich Place,Scandinavian Restaurant,Sculpture Garden,Seafood Restaurant,Shoe Store,Shopping Mall,Smoke Shop,Snack Place,Soccer Stadium,South Indian Restaurant,Spa,Speakeasy,Stadium,Steakhouse,Tapas Restaurant,Tea Room,Temple,Thai Restaurant,Theater,Tibetan Restaurant,Track Stadium,Train Station,Turkish Restaurant,Vietnamese Restaurant,Women's Store,Yoga Studio
0,Bawana,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.014706,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.014706,0.0,0.073529,0.058824,0.0,0.0,0.0,0.014706,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.073529,0.029412,0.014706,0.0,0.0,0.161765,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.014706,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.014706,0.014706,0.0,0.0,0.044118,0.0,0.014706,0.029412,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.014706,0.0,0.014706,0.0,0.0,0.0,0.0,0.014706,0.014706,0.0,0.029412,0.029412,0.014706,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.014706,0.0,0.0,0.0,0.0,0.0,0.014706,0.0,0.014706,0.0,0.0
1,Chanakyapuri,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.088235,0.0,0.029412,0.0,0.058824,0.0,0.0,0.029412,0.0,0.0,0.0,0.029412,0.0,0.058824,0.029412,0.0,0.0,0.0,0.0,0.117647,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.088235,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Civil Lines, Delhi",0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.03,0.0,0.05,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.07,0.03,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.15,0.0,0.0,0.01,0.01,0.13,0.0,0.01,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.02,0.0,0.04,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0
3,"Connaught Place, New Delhi",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Dabri, New Delhi",0.0,0.0,0.0,0.0,0.038462,0.115385,0.038462,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.076923,0.038462,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.0,0.0,0.076923,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Daryaganj,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Delhi Cantonment,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Dilshad Colony,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.039216,0.019608,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.019608,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.019608,0.058824,0.0,0.019608,0.0,0.0,0.0,0.019608,0.0,0.098039,0.039216,0.019608,0.0,0.019608,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.039216,0.0,0.0,0.0,0.0,0.078431,0.0,0.019608,0.019608,0.019608,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.019608,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.039216,0.0,0.0,0.039216,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.039216,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0,0.019608,0.0,0.0,0.0,0.0
8,Districts of Delhi Police,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.12,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Dwarka, Delhi",0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.044444,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.044444,0.022222,0.0,0.022222,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.044444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.088889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.088889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022222,0.0,0.0,0.022222,0.0,0.0,0.0,0.0,0.0,0.022222,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [39]:
len(dhl_grouped[dhl_grouped["Shopping Mall"] > 0])

7

In [40]:
dhl_mall = dhl_grouped[["Neighborhoods","Shopping Mall"]]

In [41]:
dhl_mall.head()

Unnamed: 0,Neighborhoods,Shopping Mall
0,Bawana,0.029412
1,Chanakyapuri,0.0
2,"Civil Lines, Delhi",0.0
3,"Connaught Place, New Delhi",0.0
4,"Dabri, New Delhi",0.0


### 6. Analyze Each Neighborhood

In [42]:
# set number of clusters
kclusters = 3

dhl_clustering = dhl_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(dhl_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 0, 0, 0, 0, 1, 0, 0, 0, 0])

In [43]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
dhl_merged = dhl_mall.copy()

# add clustering labels
dhl_merged["Cluster Labels"] = kmeans.labels_

In [44]:
dhl_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
dhl_merged.head()

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels
0,Bawana,0.029412,2
1,Chanakyapuri,0.0,0
2,"Civil Lines, Delhi",0.0,0
3,"Connaught Place, New Delhi",0.0,0
4,"Dabri, New Delhi",0.0,0


In [45]:
# merge dhl_grouped with dhl_data to add latitude/longitude for each neighborhood
dhl_merged = dhl_merged.join(dhl_df.set_index("Neighborhood"), on="Neighborhood")

print(dhl_merged.shape)
dhl_merged.head() # check the last columns!

(34, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Bawana,0.029412,2,28.59506,77.18573
1,Chanakyapuri,0.0,0,28.67671,77.21767
2,"Civil Lines, Delhi",0.0,0,28.63394,77.21968
3,"Connaught Place, New Delhi",0.0,0,28.60761,77.08714
4,"Dabri, New Delhi",0.0,0,28.62832,77.24727


In [45]:
# sort the results by Cluster Labels
print(dhl_merged.shape)
dhl_merged.sort_values(["Cluster Labels"], inplace=True)
dhl_merged

(34, 5)


Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Bawana,0.0,0,28.59506,77.18573
31,Shahdara district,0.01,0,28.64561,77.16682
30,Saraswati Vihar,0.0,0,28.68957,77.27802
24,Pandav Nagar,0.01,0,28.64783,77.16449
22,New Moti Bagh,0.0,0,28.64596,77.21492
20,Najafgarh,0.0,0,28.83979,77.07696
19,Model Town (Delhi),0.0,0,28.6251,76.9974
18,Meera Bagh,0.012821,0,28.70501,77.1895
15,Karol Bagh,0.0,0,28.57815,77.20618
14,Kalkaji,0.0,0,28.65045,77.18873


### 7. Cluster Neighborhoods

In [46]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(dhl_merged['Latitude'], dhl_merged['Longitude'], dhl_merged['Neighborhood'], dhl_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [47]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

In [48]:
#cluster 0
dhl_merged.loc[dhl_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
1,Chanakyapuri,0.0,0,28.67671,77.21767
2,"Civil Lines, Delhi",0.0,0,28.63394,77.21968
3,"Connaught Place, New Delhi",0.0,0,28.60761,77.08714
4,"Dabri, New Delhi",0.0,0,28.62832,77.24727
6,Delhi Cantonment,0.0,0,28.6847,77.32774
7,Dilshad Colony,0.0,0,28.55065,77.25187
8,Districts of Delhi Police,0.0,0,28.58995,77.04004
9,"Dwarka, Delhi",0.0,0,28.64817,77.17833
10,East Patel Nagar,0.0,0,28.66091,77.26432
11,"Gandhi Nagar, Delhi",0.0,0,28.55109,77.20399


In [49]:
#cluster 1
dhl_merged.loc[dhl_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
5,Daryaganj,0.142857,1,28.59151,77.12945


In [51]:
#cluster 2
dhl_merged.loc[dhl_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Shopping Mall,Cluster Labels,Latitude,Longitude
0,Bawana,0.029412,2,28.59506,77.18573
21,Narela,0.04,2,28.580997,77.181823
25,Patel Nagar,0.08,2,28.63903,77.29597
26,Preet Vihar,0.041667,2,28.66634,77.125
27,Punjabi Bagh,0.023256,2,28.64562,77.12209



### Observations:

Most of the shopping malls are concentrated in the central area of city, with the highest number in cluster 1 and moderate number in cluster 2. On the other hand, cluster 1 has very low number to totally no shopping mall in the neighborhoods. This represents a great opportunity and high potential areas to open new shopping malls as there is very little to no competition from existing malls. 

### Daryaganj is the best choice for opening shopping mall !

In [None]:
Thnaks for analysing & Visiting this Book.