# IBM Applied Data Science Capstone Course by Coursera

##### Opening a new Bakery in Delhi
##### Build a dataframe of neighborhoods in Delhi by web scraping the data from Wikipedia page
##### Get the geographical coordinates of the neighborhoods
##### Obtain the venue data for the neighborhoods from Foursquare API
##### Explore and cluster the neighborhoods
##### Select the best cluster to open a new bakery

In [3]:
!pip install geocoder

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Collecting geocoder
  Downloading geocoder-1.38.1-py2.py3-none-any.whl (98 kB)
[K     |████████████████████████████████| 98 kB 1.9 MB/s eta 0:00:01
Collecting ratelim
  Downloading ratelim-0.1.6-py2.py3-none-any.whl (4.0 kB)
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
Libraries imported.


## Scrapping data from the internet

In [8]:
import csv

In [6]:
source = requests.get('https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Delhi').text 
soup = BeautifulSoup(source, 'lxml')

In [9]:
csv_file = open('delhi.csv', 'w')
csv_writer = csv.writer(csv_file)
csv_writer.writerow(['Neighbourhood'])

15

In [10]:
mwcg = soup.find_all(class_ = "mw-category-group")

length = len(mwcg) # Gets the length of number of `mw-category-groups` present

for i in range(1, length):  # Gets all the neighbourhoods
    lists = mwcg [i].find_all('a')
    for list in lists:
        nbd = list.get('title') # Gets the title of the neighbourhood
        csv_writer.writerow([nbd]) # Writes the name of the neighbourhood in the csv file

In [11]:
csv_file.close()

# Creating dataframe

In [12]:
df = pd.read_csv('delhi.csv')

In [13]:
df.shape

(138, 1)

In [14]:
df.head()

Unnamed: 0,Neighbourhood
0,Ashok Nagar (Delhi)
1,Ashok Vihar
2,Ashram Chowk
3,Babarpur
4,"Badarpur, Delhi"


# Getting the coordinates of all the neighbourhoods

In [15]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Delhi, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [18]:
coords = [ get_latlng(neighborhood) for neighborhood in df["Neighbourhood"].tolist() ]

In [19]:
coords

[[28.692230000000052, 77.30124000000006],
 [28.69037000000003, 77.17609000000004],
 [28.710598435255907, 77.32696519316737],
 [28.50738000000007, 77.30346000000003],
 [28.50738000000007, 77.30346000000003],
 [28.65223022436032, 77.12941079026544],
 [28.800590000000057, 77.03473000000008],
 [28.549540000000036, 77.18167000000005],
 [28.699880000000064, 77.25906000000003],
 [28.595060000000046, 77.18573000000004],
 [28.656270000000063, 77.23232000000007],
 [28.67671000000007, 77.21767000000006],
 [28.633940000000052, 77.21968000000004],
 [28.60761000000008, 77.08714000000003],
 [28.654597885415757, 77.2333966005242],
 [28.62832000000003, 77.24727000000007],
 [28.60486000000003, 77.08511000000004],
 [28.560590000000047, 77.24678000000006],
 [28.57298000000003, 77.23357000000004],
 [28.591510000000028, 77.12945000000008],
 [28.699110000000076, 77.19105000000008],
 [28.592220036588714, 77.15998300657745],
 [28.684700000000078, 77.32774000000006],
 [28.679040000000043, 77.31476000000004],
 [

In [20]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [21]:
# merge the coordinates into the original dataframe
df['Latitude'] = df_coords['Latitude']
df['Longitude'] = df_coords['Longitude']

In [22]:
# check the neighborhoods and the coordinates
print(df.shape)
df

(138, 3)


Unnamed: 0,Neighbourhood,Latitude,Longitude
0,Ashok Nagar (Delhi),28.69223,77.30124
1,Ashok Vihar,28.69037,77.17609
2,Ashram Chowk,28.710598,77.326965
3,Babarpur,28.50738,77.30346
4,"Badarpur, Delhi",28.50738,77.30346
5,Bali Nagar,28.65223,77.129411
6,Bawana,28.80059,77.03473
7,Ber Sarai,28.54954,77.18167
8,Bhajanpura,28.69988,77.25906
9,Chanakyapuri,28.59506,77.18573


In [23]:
# save the DataFrame as CSV file
df.to_csv("df.csv", index=False)

In [26]:
# get the coordinates of Delhi
address = 'Delhi, India'

geolocator = Nominatim(user_agent="del-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Delhi, India {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Delhi, India 28.6517178, 77.2219388.


# Creating a map of Delhi to view all the neighbourhoods

In [27]:
# create map of Delhi using latitude and longitude values
map_delhi = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighbourhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_delhi)  
    
map_delhi

In [28]:
# save the map as HTML file
map_delhi.save('map_delhi.html')

In [29]:
# define Foursquare Credentials and Version
CLIENT_ID = 'FDISQPTRRXA3AHSUZM1TULIW2XYUUWD4YASPGSRMK1QCS5KQ' # your Foursquare ID
CLIENT_SECRET = 'RTR3EUBK51JLIIDLXV3SKLXMJ3HEPP1URHLMT0NFVTS3I0TK' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: FDISQPTRRXA3AHSUZM1TULIW2XYUUWD4YASPGSRMK1QCS5KQ
CLIENT_SECRET:RTR3EUBK51JLIIDLXV3SKLXMJ3HEPP1URHLMT0NFVTS3I0TK


# Obtaining the different kinds of venues in the neighbourhoods of Delhi

In [111]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighbourhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

'radius = 2000\nLIMIT = 100\n\nvenues = []\n\nfor lat, long, neighborhood in zip(df[\'Latitude\'], df[\'Longitude\'], df[\'Neighbourhood\']):\n    \n    # create the API request URL\n    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(\n        CLIENT_ID,\n        CLIENT_SECRET,\n        VERSION,\n        lat,\n        long,\n        radius, \n        LIMIT)\n    \n    # make the GET request\n    results = requests.get(url).json()["response"][\'groups\'][0][\'items\']\n    \n    # return only relevant information for each nearby venue\n    for venue in results:\n        venues.append((\n            neighborhood,\n            lat, \n            long, \n            venue[\'venue\'][\'name\'], \n            venue[\'venue\'][\'location\'][\'lat\'], \n            venue[\'venue\'][\'location\'][\'lng\'],  \n            venue[\'venue\'][\'categories\'][0][\'name\']))'

In [31]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighbourhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(2765, 7)


Unnamed: 0,Neighbourhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Ashok Nagar (Delhi),28.69223,77.30124,Sutta Chowk,28.697897,77.30001,Smoke Shop
1,Ashok Nagar (Delhi),28.69223,77.30124,AFM PVT LTD,28.70477,77.309608,Tourist Information Center
2,Ashok Nagar (Delhi),28.69223,77.30124,yamuna vihar,28.689816,77.283876,Park
3,Ashok Nagar (Delhi),28.69223,77.30124,Shivaji park,28.682657,77.285503,Park
4,Ashok Nagar (Delhi),28.69223,77.30124,Mansarover Park Metro Station,28.67537,77.300932,Train Station


In [32]:
venues_df.groupby(["Neighbourhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighbourhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ashok Nagar (Delhi),5,5,5,5,5,5
Ashok Vihar,24,24,24,24,24,24
Ashram Chowk,5,5,5,5,5,5
Babarpur,5,5,5,5,5,5
"Badarpur, Delhi",5,5,5,5,5,5
Bali Nagar,56,56,56,56,56,56
Bawana,2,2,2,2,2,2
Ber Sarai,94,94,94,94,94,94
Bhajanpura,5,5,5,5,5,5
Chanakyapuri,75,75,75,75,75,75


In [33]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))


There are 183 uniques categories.


In [34]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Smoke Shop', 'Tourist Information Center', 'Park',
       'Train Station', 'Athletics & Sports', 'Asian Restaurant',
       'Sandwich Place', 'Snack Place', 'Pizza Place',
       'Indian Restaurant', 'South Indian Restaurant', 'Department Store',
       'Fast Food Restaurant', 'Coffee Shop', 'Market', 'Dessert Shop',
       'Basketball Court', 'Light Rail Station', 'Restaurant',
       'Vegetarian / Vegan Restaurant', 'ATM', 'Indian Sweet Shop',
       'Café', 'American Restaurant', 'Donut Shop', 'Bakery', 'Diner',
       'Hookah Bar', 'BBQ Joint', 'Hotel', 'Pub', 'Sports Bar',
       'Garden Center', 'Multiplex', 'Shopping Mall',
       'Furniture / Home Store', 'Fried Chicken Joint', 'Garden', 'Bar',
       'Chinese Restaurant', 'Burger Joint', 'Gourmet Shop',
       'Convenience Store', 'Ice Cream Shop', 'Gym', 'Playground',
       'Electronics Store', 'Art Gallery', 'Mediterranean Restaurant',
       'Tea Room'], dtype=object)

In [85]:
# check if the results contain "Shopping Mall"
"Bakery" in venues_df['VenueCategory'].unique()

True

# One hot encoding the venues

In [77]:
# one hot encoding
delhi_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
delhi_onehot['Neighbourhoods'] = venues_df['Neighbourhood'] 

print(delhi_onehot.shape)
delhi_onehot.head()

(2765, 184)


Unnamed: 0,ATM,Airport,Airport Food Court,American Restaurant,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,Auto Workshop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Beer Garden,Bengali Restaurant,Big Box Store,Bistro,Bookstore,Boutique,Breakfast Spot,Burger Joint,Burmese Restaurant,Bus Station,Business Service,Cafeteria,Café,Campground,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Concert Hall,Convenience Store,Cosmetics Shop,Cricket Ground,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,English Restaurant,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flea Market,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gaming Cafe,Garden,Garden Center,Gastropub,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,High School,Historic Site,History Museum,Hobby Shop,Hockey Arena,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,IT Services,Ice Cream Shop,Indian Chinese Restaurant,Indian Restaurant,Indian Sweet Shop,Indie Movie Theater,Irani Cafe,Italian Restaurant,Japanese Restaurant,Jazz Club,Karaoke Bar,Karnataka Restaurant,Korean Restaurant,Lake,Light Rail Station,Lounge,Market,Mediterranean Restaurant,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Moroccan Restaurant,Mosque,Motel,Motorcycle Shop,Movie Theater,Moving Target,Mughlai Restaurant,Multicuisine Indian Restaurant,Multiplex,Museum,Music Store,Neighborhood,Nightclub,Nightlife Spot,North Indian Restaurant,Northeast Indian Restaurant,Office,Other Great Outdoors,Other Nightlife,Paper / Office Supplies Store,Park,Performing Arts Venue,Pizza Place,Planetarium,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Public Art,Racetrack,Restaurant,River,Road,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Sculpture Garden,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Stadium,South Indian Restaurant,Spa,Spiritual Center,Sporting Goods Shop,Sports Bar,Stadium,Tapas Restaurant,Tea Room,Temple,Tex-Mex Restaurant,Thai Restaurant,Theater,Tibetan Restaurant,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Turkish Restaurant,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Zoo,Neighbourhoods
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Ashok Nagar (Delhi)
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,Ashok Nagar (Delhi)
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Ashok Nagar (Delhi)
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Ashok Nagar (Delhi)
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,Ashok Nagar (Delhi)


In [92]:
del_grouped = delhi_onehot.groupby(["Neighbourhoods"],axis=0).mean().reset_index()

In [94]:
del_grouped.head()
print(del_grouped.shape)

(59, 184)


In [97]:
len(del_grouped[del_grouped["Bakery"] > 0])

27

# Making a dataframe of the bakeries

In [99]:
del_bake = del_grouped[["Neighbourhoods","Bakery"]]

In [100]:
del_bake.head()

Unnamed: 0,Neighbourhoods,Bakery
0,Ashok Nagar (Delhi),0.0
1,Ashok Vihar,0.0
2,Ashram Chowk,0.0
3,Babarpur,0.0
4,"Badarpur, Delhi",0.0


# Clustering the neighbourhoods

In [101]:
# set number of clusters
kclusters = 3

del_clustering = del_bake.drop(["Neighbourhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(del_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 1, 1, 1, 0, 1, 0, 1, 1], dtype=int32)

In [102]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
del_merged = del_bake.copy()

# add clustering labels
del_merged["Cluster Labels"] = kmeans.labels_

In [103]:
del_merged.rename(columns={"Neighbourhoods": "Neighbourhood"}, inplace=True)
del_merged.head()

Unnamed: 0,Neighbourhood,Bakery,Cluster Labels
0,Ashok Nagar (Delhi),0.0,1
1,Ashok Vihar,0.0,1
2,Ashram Chowk,0.0,1
3,Babarpur,0.0,1
4,"Badarpur, Delhi",0.0,1


In [104]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
del_merged = del_merged.join(df.set_index("Neighbourhood"), on="Neighbourhood")

print(del_merged.shape)
del_merged.head() # check the last columns!

(59, 5)


Unnamed: 0,Neighbourhood,Bakery,Cluster Labels,Latitude,Longitude
0,Ashok Nagar (Delhi),0.0,1,28.69223,77.30124
1,Ashok Vihar,0.0,1,28.69037,77.17609
2,Ashram Chowk,0.0,1,28.710598,77.326965
3,Babarpur,0.0,1,28.50738,77.30346
4,"Badarpur, Delhi",0.0,1,28.50738,77.30346


# Arranging the data according to the cluster group

In [105]:
# sort the results by Cluster Labels
print(del_merged.shape)
del_merged.sort_values(["Cluster Labels"], inplace=True)
del_merged

(59, 5)


Unnamed: 0,Neighbourhood,Bakery,Cluster Labels,Latitude,Longitude
29,Gole Market,0.02,0,28.6341,77.20569
35,Gulmohar Park,0.02,0,28.55439,77.21252
57,Lodhi Colony,0.03,0,28.58476,77.22534
36,Hauz Khas,0.02,0,28.55109,77.20399
25,East Patel Nagar,0.040816,0,28.64817,77.17833
39,Jangpura,0.038961,0,28.5834,77.24719
40,Jia Sarai,0.020408,0,28.54638,77.18897
42,Kailash Colony,0.04,0,28.55613,77.2406
20,Derawal Nagar,0.035714,0,28.69911,77.19105
43,"Kamla Nagar, New Delhi",0.026316,0,28.68376,77.20163


# Creating a amp of Delhi to visualise the various clusters and finding the place with the lowest number of bakeries

In [106]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(del_merged['Latitude'], del_merged['Longitude'], del_merged['Neighbourhood'], del_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Cluster:0

In [108]:
del_merged.loc[del_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighbourhood,Bakery,Cluster Labels,Latitude,Longitude
29,Gole Market,0.02,0,28.6341,77.20569
35,Gulmohar Park,0.02,0,28.55439,77.21252
57,Lodhi Colony,0.03,0,28.58476,77.22534
36,Hauz Khas,0.02,0,28.55109,77.20399
25,East Patel Nagar,0.040816,0,28.64817,77.17833
39,Jangpura,0.038961,0,28.5834,77.24719
40,Jia Sarai,0.020408,0,28.54638,77.18897
42,Kailash Colony,0.04,0,28.55613,77.2406
20,Derawal Nagar,0.035714,0,28.69911,77.19105
43,"Kamla Nagar, New Delhi",0.026316,0,28.68376,77.20163


## Cluster:1

In [109]:
del_merged.loc[del_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighbourhood,Bakery,Cluster Labels,Latitude,Longitude
34,Gulabi Bagh,0.0,1,28.62043,77.04941
56,Laxmibai Nagar,0.0,1,28.57815,77.20618
55,Laxmi Nagar (Delhi),0.0,1,28.63875,77.27592
37,Inder Puri,0.0,1,28.62803,77.14504
46,Keshav Puram,0.0,1,28.68801,77.15866
41,"Kabir Nagar, New Delhi",0.0,1,28.689611,77.141052
49,Khari Baoli,0.0,1,28.65726,77.22284
48,"Khanpur, Delhi",0.0,1,28.50963,77.23108
47,"Khaira, Delhi",0.0,1,28.5941,76.97036
38,Janakpuri,0.0,1,28.62791,77.0906


## Cluster:2

In [110]:
del_merged.loc[del_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighbourhood,Bakery,Cluster Labels,Latitude,Longitude
50,Kingsway Camp,0.090909,2,28.71169,77.20197


### Observations:
Most of the bakeries are concentrated in the northern area of Delhi, with the highest number in cluster 2 and moderate number in cluster 0. On the other hand, cluster 1 has very low number of bakeries in the neighborhoods. This represents a great opportunity and high potential areas to open new bakeries as there is very little to no competition from existing ones.
Meanwhile, bakeries in cluster 2 are likely suffering from intense competition due to high concentration of bakeries. 
Therefore, this project recommends bakery owners to capitalize on these findings to open new bakeries in neighborhoods in cluster 1 with little to no competition. 
Bakeries can also stand out from the competition if they have new items to offer in neighborhoods in cluster 0 with moderate competition.
Lastly, developers are advised to avoid neighborhoods in cluster 2 which already have high concentration of bakeries and suffering from intense competition.