# IBM Applied Data Science Capstone Course

###### Opening new Ice Cream Shop in Ahmedabad, India

Build a dataframe of neighborhoods in Ahmedabad, India by web scraping the data from Wikipedia page                                                 
Get the geographical coordinates of the neighborhoods                                                                                               
Obtain the venue data for the neighborhoods from Foursquare API                                                                                     
Explore and cluster the neighborhoods                                                                                                               
Select the best cluster to open a new ice cream shop                                                                                                 

###  1. Import libraries

In [43]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

!pip -q install geocoder
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!pip -q install folium
import folium 

print("Libraries imported.")

Libraries imported.


### 2. Scrap data from Wikipedia page into a DataFrame

In [3]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Ahmedabad").text

In [4]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [5]:
# create a list to store neighborhood data
neighborhoodList = []

In [6]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [7]:
# create a new DataFrame from the list
amd_df = pd.DataFrame({"Neighborhood": neighborhoodList})

amd_df.head()

Unnamed: 0,Neighborhood
0,Agol
1,Ahmedabad Cantonment
2,Alam Roza
3,Ambawadi
4,Amraiwadi


In [44]:
# print the number of rows of the dataframe
amd_df.shape

(80, 3)

In [45]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Ahmedabad, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [49]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in amd_df["Neighborhood"].tolist() ]

In [50]:
coords

[[23.027760000000058, 72.60027000000008],
 [23.027760000000058, 72.60027000000008],
 [23.002120000000048, 72.54979000000003],
 [23.018850000000043, 72.55441000000008],
 [23.00735000000003, 72.62268000000006],
 [23.011390000000063, 72.51712000000003],
 [23.04708000000005, 72.60481000000004],
 [23.04225742945364, 72.60456625728018],
 [22.84128000000004, 72.45453000000003],
 [23.027760000000058, 72.60027000000008],
 [23.034760000000063, 72.63024000000007],
 [23.00278000000003, 72.57706000000007],
 [22.315900000000056, 72.10697000000005],
 [23.002575410797863, 72.59815911107509],
 [23.159320000000037, 72.01855000000006],
 [23.030320000000074, 72.47247000000004],
 [23.000980000000027, 72.57459000000006],
 [22.806890000000067, 72.42511000000007],
 [23.112140000000068, 72.57989000000003],
 [23.087290000000053, 72.54899000000006],
 [23.027760000000058, 72.60027000000008],
 [23.036070000000052, 72.59213000000005],
 [23.32218000000006, 72.18817000000007],
 [23.022390333701104, 72.57669435394357]

In [51]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [52]:
# merge the coordinates into the original dataframe
amd_df['Latitude'] = df_coords['Latitude']
amd_df['Longitude'] = df_coords['Longitude']

In [53]:
#check the neighborhoods and the coordinates
print(amd_df.shape)
amd_df

(80, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Agol,23.02776,72.60027
1,Ahmedabad Cantonment,23.02776,72.60027
2,Alam Roza,23.00212,72.54979
3,Ambawadi,23.01885,72.55441
4,Amraiwadi,23.00735,72.62268
5,Anand Nagar (Ahmedabad),23.01139,72.51712
6,Asarwa,23.04708,72.60481
7,Asarwa Chakla,23.042257,72.604566
8,Badarkha,22.84128,72.45453
9,Bahiyal,23.02776,72.60027


In [54]:
# save the DataFrame as CSV file
amd_df.to_csv("amd_df.csv", index=False)

### 4. Create a map of Ahmedabad with neighborhoods superimposed on top

In [19]:
#get the coordinates of Ahmedabad
address = 'Ahmedabad, India'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Ahmedabad, India {}, {}.'.format(latitude, longitude))


The geograpical coordinate of Ahmedabad, India 23.0216238, 72.5797068.


In [55]:
# create map of Ahmedabad using latitude and longitude values
map_amd = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(amd_df['Latitude'], amd_df['Longitude'], amd_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_amd)  
    
map_amd

In [56]:
# save the map as HTML file
map_amd.save('map_amd.html')

### 5. Use the Foursquare API to explore the neighborhoods

In [57]:
# define Foursquare Credentials and Version
CLIENT_ID = '5MFLU0WNTTUADD2CCKFB3VM5XZ3W5JTZV3Q3XII4N3U2KNJ3' # your Foursquare ID
CLIENT_SECRET = '0VISW3PZWON0JCJ3ENNITBBJEWVBG4AID5KUZVDCE4ZJZ2AP' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


Your credentails:
CLIENT_ID: 5MFLU0WNTTUADD2CCKFB3VM5XZ3W5JTZV3Q3XII4N3U2KNJ3
CLIENT_SECRET:0VISW3PZWON0JCJ3ENNITBBJEWVBG4AID5KUZVDCE4ZJZ2AP


#### Now, let's get the top 100 venues that are within a radius of 2000 meters.

In [58]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(amd_df['Latitude'], amd_df['Longitude'], amd_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [59]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(1724, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Agol,23.02776,72.60027,Manek Chowk Khau Gali,23.023505,72.588539,Snack Place
1,Agol,23.02776,72.60027,Manek Chowk,23.023626,72.588553,Fast Food Restaurant
2,Agol,23.02776,72.60027,Moti Mahal,23.02912,72.599724,Indian Restaurant
3,Agol,23.02776,72.60027,Lucky Tea,23.027829,72.581394,Tea Room
4,Agol,23.02776,72.60027,Jama Masjid,23.024323,72.587042,Historic Site


In [60]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agol,22,22,22,22,22,22
Ahmedabad Cantonment,22,22,22,22,22,22
Alam Roza,9,9,9,9,9,9
Ambawadi,80,80,80,80,80,80
Amraiwadi,4,4,4,4,4,4
Anand Nagar (Ahmedabad),56,56,56,56,56,56
Asarwa,5,5,5,5,5,5
Asarwa Chakla,8,8,8,8,8,8
Bahiyal,22,22,22,22,22,22
Bapunagar,4,4,4,4,4,4


#### Let's find out how many unique categories can be curated from all the returned venues

In [26]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 103 uniques categories.


In [27]:
#print out the list of categories
venues_df['VenueCategory'].unique()[:50]


array(['Snack Place', 'Fast Food Restaurant', 'Indian Restaurant',
       'Tea Room', 'Historic Site', 'Hotel', 'Ice Cream Shop',
       'Pizza Place', 'Multiplex', 'Train Station',
       'Vegetarian / Vegan Restaurant', "Men's Store", 'Shopping Mall',
       'Clothing Store', 'Bus Station', 'Sports Club', 'Diner',
       'Coffee Shop', 'Sandwich Place', 'Mexican Restaurant', 'Café',
       'Park', 'Dessert Shop', 'Bookstore', 'Arts & Crafts Store',
       'Farmers Market', 'Theater', 'Restaurant', 'Breakfast Spot',
       'Arcade', 'Asian Restaurant', 'Gym / Fitness Center', 'Pharmacy',
       'Food Truck', 'Movie Theater', 'BBQ Joint', 'Chinese Restaurant',
       'Event Space', 'American Restaurant', 'Bakery',
       'Electronics Store', 'Athletics & Sports', 'Art Gallery',
       'History Museum', 'Market', 'Museum', 'Gourmet Shop', 'Zoo',
       'Lake', 'Comfort Food Restaurant'], dtype=object)

In [28]:
# check if the results contain "Ice Cream Shop"
"Ice Cream Shop" in venues_df['VenueCategory'].unique()

True

### 6. Analyze Each Neighborhood

In [29]:
# one hot encoding
amd_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
amd_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [amd_onehot.columns[-1]] + list(amd_onehot.columns[:-1])
amd_onehot = amd_onehot[fixed_columns]

print(amd_onehot.shape)
amd_onehot.head()


(1724, 104)


Unnamed: 0,Neighborhoods,ATM,Airport,Airport Service,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,BBQ Joint,Bakery,Bookstore,Breakfast Spot,Buffet,Bus Station,Business Service,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Cricket Ground,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fried Chicken Joint,Furniture / Home Store,Garden,General Entertainment,Golf Course,Gourmet Shop,Gym,Gym / Fitness Center,Historic Site,History Museum,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Juice Bar,Lake,Lounge,Market,Mattress Store,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Moroccan Restaurant,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Museum,North Indian Restaurant,Park,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Recreation Center,Rest Area,Restaurant,River,Sandwich Place,Sculpture Garden,Shoe Store,Shopping Mall,Ski Area,Smoke Shop,Snack Place,South Indian Restaurant,Spa,Speakeasy,Sports Club,Street Food Gathering,Supermarket,Tea Room,Theater,Toy / Game Store,Train Station,Tree,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Zoo
0,Agol,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Agol,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Agol,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Agol,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
4,Agol,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [30]:
amd_grouped = amd_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(amd_grouped.shape)
amd_grouped

(73, 104)


Unnamed: 0,Neighborhoods,ATM,Airport,Airport Service,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,BBQ Joint,Bakery,Bookstore,Breakfast Spot,Buffet,Bus Station,Business Service,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Construction & Landscaping,Cricket Ground,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fried Chicken Joint,Furniture / Home Store,Garden,General Entertainment,Golf Course,Gourmet Shop,Gym,Gym / Fitness Center,Historic Site,History Museum,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Juice Bar,Lake,Lounge,Market,Mattress Store,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Moroccan Restaurant,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Museum,North Indian Restaurant,Park,Performing Arts Venue,Pharmacy,Pizza Place,Platform,Recreation Center,Rest Area,Restaurant,River,Sandwich Place,Sculpture Garden,Shoe Store,Shopping Mall,Ski Area,Smoke Shop,Snack Place,South Indian Restaurant,Spa,Speakeasy,Sports Club,Street Food Gathering,Supermarket,Tea Room,Theater,Toy / Game Store,Train Station,Tree,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Zoo
0,Agol,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.045455,0.0,0.0,0.045455,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.090909,0.0,0.045455,0.0,0.0,0.0
1,Ahmedabad Cantonment,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.045455,0.0,0.0,0.045455,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.090909,0.0,0.045455,0.0,0.0,0.0
2,Alam Roza,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.222222,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0
3,Ambawadi,0.0,0.0,0.0,0.0,0.0125,0.0,0.0125,0.0125,0.0,0.0,0.0,0.0,0.0125,0.0125,0.0,0.0,0.0,0.1375,0.0,0.0125,0.05,0.0,0.0,0.0,0.0,0.0,0.05,0.0125,0.0,0.0,0.0,0.0,0.0,0.0125,0.075,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,0.0,0.0625,0.0,0.0,0.0125,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0375,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.025,0.0,0.0125,0.0375,0.0,0.0,0.0,0.0125,0.0,0.05,0.0,0.0,0.025,0.0,0.0,0.0375,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.025,0.0,0.0,0.0,0.025,0.0,0.0,0.0
4,Amraiwadi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
5,Anand Nagar (Ahmedabad),0.0,0.0,0.0,0.017857,0.0,0.017857,0.0,0.0,0.017857,0.0,0.053571,0.017857,0.0,0.017857,0.0,0.0,0.0,0.107143,0.017857,0.017857,0.053571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.017857,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.017857,0.107143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.053571,0.0,0.0,0.0,0.053571,0.0,0.053571,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.053571,0.0,0.0,0.0
6,Asarwa,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Asarwa Chakla,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.25,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0
8,Bahiyal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.045455,0.0,0.0,0.045455,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.090909,0.0,0.045455,0.0,0.0,0.0
9,Bapunagar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [31]:
len(amd_grouped[amd_grouped["Ice Cream Shop"] > 0])

41

#### Create a new DataFrame for Ice Cream Shop data only

In [32]:
amd_mall = amd_grouped[["Neighborhoods","Ice Cream Shop"]]
amd_mall.head()

Unnamed: 0,Neighborhoods,Ice Cream Shop
0,Agol,0.045455
1,Ahmedabad Cantonment,0.045455
2,Alam Roza,0.222222
3,Ambawadi,0.0125
4,Amraiwadi,0.0


### 7. Cluster Neighborhoods

Run k-means to cluster the neighborhoods in Ahmedabad into 3 clusters.

In [33]:
# set number of clusters
kclusters = 3

amd_clustering = amd_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(amd_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([2, 2, 1, 0, 0, 0, 1, 2, 2, 0], dtype=int32)

In [34]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
amd_merged = amd_mall.copy()

# add clustering labels
amd_merged["Cluster Labels"] = kmeans.labels_

In [35]:
amd_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
amd_merged

Unnamed: 0,Neighborhood,Ice Cream Shop,Cluster Labels
0,Agol,0.045455,2
1,Ahmedabad Cantonment,0.045455,2
2,Alam Roza,0.222222,1
3,Ambawadi,0.0125,0
4,Amraiwadi,0.0,0
5,Anand Nagar (Ahmedabad),0.017857,0
6,Asarwa,0.2,1
7,Asarwa Chakla,0.125,2
8,Bahiyal,0.045455,2
9,Bapunagar,0.0,0


In [36]:
amd_merged = amd_merged.join(amd_df.set_index("Neighborhood"), on="Neighborhood")

print(amd_merged.shape)
amd_merged.head() # check the last columns!

(73, 5)


Unnamed: 0,Neighborhood,Ice Cream Shop,Cluster Labels,Latitude,Longitude
0,Agol,0.045455,2,23.02776,72.60027
1,Ahmedabad Cantonment,0.045455,2,23.02776,72.60027
2,Alam Roza,0.222222,1,23.00212,72.54979
3,Ambawadi,0.0125,0,23.01885,72.55441
4,Amraiwadi,0.0,0,23.00735,72.62268


In [37]:
# sort the results by Cluster Labels
print(amd_merged.shape)
amd_merged.sort_values(["Cluster Labels"], inplace=True)
amd_merged

(73, 5)


Unnamed: 0,Neighborhood,Ice Cream Shop,Cluster Labels,Latitude,Longitude
36,Kalyanpura (Ahmedabad),0.025974,0,23.04764,72.56149
31,Jivrajpark,0.0,0,23.0061,72.53149
32,"Jodhpur, Gujarat",0.0,0,23.02063,72.52522
33,Juhapura,0.0,0,22.99862,72.52433
34,Kabirchowk,0.0,0,23.090257,72.585512
71,Vastrapur,0.02,0,23.03717,72.53085
37,"Khadia, Ahmedabad",0.0,0,23.02077,72.59244
40,Khokhra,0.0,0,23.00581,72.613334
41,Lambha,0.0,0,22.93802,72.58586
43,Maninagar,0.0,0,23.00526,72.60731


#### Vizualise Resulting Cluster

In [38]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(amd_merged['Latitude'], amd_merged['Longitude'], amd_merged['Ice Cream Shop'], amd_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [39]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

### 8. Examine Clusters

#### Cluster 0

In [40]:
amd_merged.loc[amd_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Ice Cream Shop,Cluster Labels,Latitude,Longitude
36,Kalyanpura (Ahmedabad),0.025974,0,23.04764,72.56149
31,Jivrajpark,0.0,0,23.0061,72.53149
32,"Jodhpur, Gujarat",0.0,0,23.02063,72.52522
33,Juhapura,0.0,0,22.99862,72.52433
34,Kabirchowk,0.0,0,23.090257,72.585512
71,Vastrapur,0.02,0,23.03717,72.53085
37,"Khadia, Ahmedabad",0.0,0,23.02077,72.59244
40,Khokhra,0.0,0,23.00581,72.613334
41,Lambha,0.0,0,22.93802,72.58586
43,Maninagar,0.0,0,23.00526,72.60731


#### cluster 1

In [41]:
amd_merged.loc[amd_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Ice Cream Shop,Cluster Labels,Latitude,Longitude
39,Khodiyarnagar,0.25,1,23.03435,72.64652
66,Thakkar Bapanagar,0.25,1,23.04537,72.64864
2,Alam Roza,0.222222,1,23.00212,72.54979
16,Chandlodiya,0.25,1,23.08729,72.54899
6,Asarwa,0.2,1,23.04708,72.60481


#### Cluster 2

In [42]:
amd_merged.loc[amd_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Ice Cream Shop,Cluster Labels,Latitude,Longitude
65,Subhash Bridge,0.058824,2,23.059424,72.586788
67,Thaltej,0.039474,2,23.05011,72.51108
63,Shardanagar,0.047619,2,23.01073,72.55525
68,Ujedia,0.045455,2,23.02776,72.60027
62,"Shahpur, Gujarat",0.042553,2,23.0357,72.58116
30,Jholapur,0.045455,2,23.02776,72.60027
69,Usmanpura,0.052632,2,23.04981,72.5712
59,Sardarnagar,0.1,2,23.08104,72.62806
1,Ahmedabad Cantonment,0.045455,2,23.02776,72.60027
8,Bahiyal,0.045455,2,23.02776,72.60027


In [None]:
#### Observations :

Most of the ice cream shops are concentrated in the area with dense population of Ahmedabad city, with the highest number in cluster 2 and moderate number in cluster 0. On the other hand, cluster 1 has very low number to very less ice cream shops in the neighborhoods. This represents a great opportunity and high potential areas to open new shopping malls as there is very little to no competition from existing malls. Meanwhile, ice cream shops in cluster 2 are likely suffering from intense competition due to oversupply and high concentration of shops. From another perspective, this also shows that the oversupply of ice cream shops mostly happened in the central area of the city, with the suburb area still have very few ice cream shops. Therefore, this project recommends dairy owners to capitalize on these findings to open new ice cream parlours in neighborhoods in cluster 1 with little to no competition. Owners with unique selling propositions to stand out from the competition can also open new parlours in neighborhoods in cluster 0 with moderate competition. Lastly, dairyowners or stakeholders are advised to avoid neighborhoods in cluster 2 which already have high concentration of ice cream shops/parlours and suffering from intense competition.