<h1>IBM Applied Data Science Capstone Course by Coursera</h1>

<h2>Week 5 Final Report</h2>

<h3>Recommending Neighbourhood Location to open a new Indian Restaurant in Ahmedabad City, India</h3>

<ul style="list-style-type:disc;">
  <li>Build a dataframe of neighborhoods in Ahmedabad, India by web scraping the data from Wikipedia page</li>
  <li>Get the geographical coordinates of the neighborhoods</li>
  <li>Obtain the venue data for the neighborhoods from Foursquare API</li>
  <li>Explore and cluster the neighborhoods</li>
  <li>Select the best cluster to open a new Indian Restaurant</li>
</ul>


<h3>1. Import Libraries</h3>

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


<h3>2. Scrap data from Wikipedia page into a DataFrame</h3>

In [2]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Ahmedabad").text

In [3]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [4]:
# create a list to store neighborhood data
neighborhoodList = []

In [5]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [6]:
# create a new DataFrame from the list
df = pd.DataFrame({"Neighbourhood": neighborhoodList})

df.head()

Unnamed: 0,Neighbourhood
0,Agol
1,Ahmedabad Cantonment
2,Alam Roza
3,Ambawadi
4,Amraiwadi


In [7]:
# print the number of rows of the dataframe
df.shape

(81, 1)

<h3>3. Get the geographical Co-Ordinates</h3>

<h4>3.1 Using Google Place API to get co-ordinates</h4>

In [8]:
API_KEY = ""

In [9]:
latitudes = [] # Initializing the latitude array
longitudes = [] # Initializing the longitude array

for nbd in df["Neighbourhood"] : 
    place_name = nbd + ",Ahmedabad,India" # Formats the place name
    
    url = 'https://maps.googleapis.com/maps/api/geocode/json?address={}&key={}'.format(place_name, API_KEY) # Gets the proper url to make the API call
    obj = json.loads(requests.get(url).text) # Loads the JSON file in the form of a python dictionary
    
    results = obj['results'] # Extracts the results information out of the JSON file
    lat = results[0]['geometry']['location']['lat'] # Extracts the latitude value
    lng = results[0]['geometry']['location']['lng'] # Extracts the longitude value
    
    latitudes.append(lat) # Appending to the list of latitudes
    longitudes.append(lng) # Appending to the list of longitudes

In [10]:
df['Latitude'] = latitudes
df['Longitude'] = longitudes

In [11]:
df.head()

Unnamed: 0,Neighbourhood,Latitude,Longitude
0,Agol,23.141419,72.273538
1,Ahmedabad Cantonment,23.063899,72.608736
2,Alam Roza,22.996181,72.588301
3,Ambawadi,23.02237,72.543044
4,Amraiwadi,22.999673,72.635381


In [12]:
col = 0
explored_lat_lng = []
for lat, lng, neighbourhood in zip(df['Latitude'], df['Longitude'], df['Neighbourhood']):
    if (lat, lng) in explored_lat_lng:
        col = col + 1
    else:
        explored_lat_lng.append((lat, lng))

print("Collisions : ", col)

Collisions :  3


In [13]:
df.shape

(81, 3)

In [14]:
df.head()

Unnamed: 0,Neighbourhood,Latitude,Longitude
0,Agol,23.141419,72.273538
1,Ahmedabad Cantonment,23.063899,72.608736
2,Alam Roza,22.996181,72.588301
3,Ambawadi,23.02237,72.543044
4,Amraiwadi,22.999673,72.635381


<h3>4. Create a map of Ahmedabad with neighborhoods superimposed on top</h3>

In [15]:
address = 'Ahmedabad, India'

geolocator = Nominatim(user_agent="ahmeadbad")
location = geolocator.geocode(address)
amd_lat = location.latitude
amd_lng = location.longitude
print('The geograpical coordinate of Ahmedabad City is {}, {}.'.format(amd_lat, amd_lng))

The geograpical coordinate of Ahmedabad City is 23.0216238, 72.5797068.


In [16]:
# create map of Amedabad using latitude and longitude values
amd_map = folium.Map(location=[amd_lat, amd_lng], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['Neighbourhood']):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(amd_map)  
    
amd_map

In [17]:
# save the map as HTML file
amd_map.save('amd_map.html')

<h3>5. Use the Foursquare API to explore the neighborhoods</h3>

In [1]:
# define Foursquare Credentials and Version
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20200101' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


<h6>Getting a list of venues in 3000m radius<h6>

In [19]:
radius = 3000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(df['Latitude'], df['Longitude'], df['Neighbourhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))

In [20]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(2810, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Ahmedabad Cantonment,23.063899,72.608736,Sardar Patel National Memorial,23.058293,72.590754,History Museum
1,Ahmedabad Cantonment,23.063899,72.608736,Sabarmati Ashram,23.060573,72.580854,History Museum
2,Ahmedabad Cantonment,23.063899,72.608736,Shambhu's,23.083308,72.620098,Coffee Shop
3,Ahmedabad Cantonment,23.063899,72.608736,Gandhji's Asharam museum,23.060345,72.580612,Sculpture Garden
4,Ahmedabad Cantonment,23.063899,72.608736,O2,23.071585,72.619483,Spa


<h6>Now, let's check how many venues were returned for each neighourhood</h6>

In [21]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ahmedabad Cantonment,30,30,30,30,30,30
Alam Roza,14,14,14,14,14,14
Ambawadi,100,100,100,100,100,100
Amraiwadi,4,4,4,4,4,4
Anand Nagar (Ahmedabad),15,15,15,15,15,15
Asarwa,25,25,25,25,25,25
Asarwa Chakla,25,25,25,25,25,25
Badarkha,1,1,1,1,1,1
Bahiyal,100,100,100,100,100,100
Bapunagar,4,4,4,4,4,4


<h6>Let's find out how many unique categories can be curated from all the returned venues</h6>

In [22]:
print('Found total {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

Found total 99 uniques categories.


In [23]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['History Museum', 'Coffee Shop', 'Sculpture Garden', 'Spa',
       'Speakeasy', 'Food Court', 'Pizza Place', 'Hotel Bar',
       'Athletics & Sports', 'Market', 'Airport Terminal', 'Hotel',
       'Tea Room', 'Accessories Store', 'Airport Service', 'Golf Course',
       'Bookstore', 'Ice Cream Shop', 'Airport Lounge', 'General Travel',
       'Fast Food Restaurant', 'Museum', 'Indian Restaurant',
       'Furniture / Home Store', 'Vegetarian / Vegan Restaurant',
       'Snack Place', 'Sandwich Place', 'Theater', 'Train Station', 'Zoo',
       'Multiplex', 'Shopping Mall', 'Bus Station', 'Dessert Shop',
       'Café', 'Mexican Restaurant', 'Diner', 'Clothing Store',
       'Art Gallery', 'Bakery', 'Restaurant', 'Breakfast Spot',
       'Cupcake Shop', 'Street Food Gathering', 'Farmers Market',
       'Moroccan Restaurant', 'Flower Shop', 'Arcade', 'Toy / Game Store',
       'Park'], dtype=object)

In [24]:
# check if the results contain "Indian Restaurant"
"Indian Restaurant" in venues_df['VenueCategory'].unique()

True

<h3>6. Analyze Each Neighborhood</h3>

In [25]:
# one hot encoding
amd_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
amd_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [amd_onehot.columns[-1]] + list(amd_onehot.columns[:-1])
amd_onehot = amd_onehot[fixed_columns]

print(amd_onehot.shape)
amd_onehot.head()

(2810, 100)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Airport,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bookstore,Breakfast Spot,Bus Station,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Cricket Ground,Cupcake Shop,Dance Studio,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flower Shop,Food Court,Food Truck,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,General Entertainment,General Travel,Golf Course,Gourmet Shop,Gujarati Restaurant,Gym,Gym / Fitness Center,Historic Site,History Museum,Hookah Bar,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Lake,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Moroccan Restaurant,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Museum,Noodle House,North Indian Restaurant,Other Nightlife,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Recreation Center,Restaurant,Sandwich Place,Sculpture Garden,Shopping Mall,Smoke Shop,Snack Place,South Indian Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Club,Street Food Gathering,Tea Room,Theater,Toy / Game Store,Train Station,Vegetarian / Vegan Restaurant,Video Store,Zoo
0,Ahmedabad Cantonment,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Ahmedabad Cantonment,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Ahmedabad Cantonment,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Ahmedabad Cantonment,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Ahmedabad Cantonment,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0


<h6>Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category</h6>

In [26]:
amd_grouped = amd_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(amd_grouped.shape)
amd_grouped

(75, 100)


Unnamed: 0,Neighborhoods,ATM,Accessories Store,Airport,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bookstore,Breakfast Spot,Bus Station,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Cricket Ground,Cupcake Shop,Dance Studio,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flower Shop,Food Court,Food Truck,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,General Entertainment,General Travel,Golf Course,Gourmet Shop,Gujarati Restaurant,Gym,Gym / Fitness Center,Historic Site,History Museum,Hookah Bar,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Italian Restaurant,Lake,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Moroccan Restaurant,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Museum,Noodle House,North Indian Restaurant,Other Nightlife,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Recreation Center,Restaurant,Sandwich Place,Sculpture Garden,Shopping Mall,Smoke Shop,Snack Place,South Indian Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Club,Street Food Gathering,Tea Room,Theater,Toy / Game Store,Train Station,Vegetarian / Vegan Restaurant,Video Store,Zoo
0,Ahmedabad Cantonment,0.0,0.033333,0.0,0.0,0.033333,0.033333,0.066667,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.033333,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.033333,0.033333,0.0,0.033333,0.066667,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.033333,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.033333,0.0,0.0
1,Alam Roza,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.071429,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.071429,0.0,0.0,0.071429
2,Ambawadi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.02,0.01,0.01,0.0,0.18,0.0,0.03,0.02,0.0,0.01,0.0,0.0,0.05,0.02,0.0,0.0,0.0,0.01,0.0,0.01,0.07,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.11,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.05,0.0,0.05,0.03,0.0,0.02,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.04,0.01,0.01,0.0,0.01,0.0,0.0
3,Amraiwadi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0
4,Anand Nagar (Ahmedabad),0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.133333,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.066667,0.0
5,Asarwa,0.0,0.0,0.04,0.04,0.0,0.0,0.08,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.08,0.12,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0
6,Asarwa Chakla,0.0,0.0,0.04,0.04,0.0,0.0,0.08,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.08,0.12,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0
7,Badarkha,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Bahiyal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.01,0.0,0.0,0.01,0.01,0.01,0.0,0.15,0.0,0.01,0.03,0.02,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.06,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.09,0.0,0.0,0.03,0.15,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.06,0.0,0.02,0.03,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.01,0.03,0.02,0.0,0.0,0.02,0.0,0.0
9,Bapunagar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [27]:
len(amd_grouped[amd_grouped["Indian Restaurant"] > 0])

56

<h6>Create a new DataFrame for Indian Restaurant data only</h6>

In [28]:
amd_mall = amd_grouped[["Neighborhoods","Indian Restaurant"]]

In [29]:
amd_mall.head()

Unnamed: 0,Neighborhoods,Indian Restaurant
0,Ahmedabad Cantonment,0.066667
1,Alam Roza,0.071429
2,Ambawadi,0.11
3,Amraiwadi,0.25
4,Anand Nagar (Ahmedabad),0.066667


<h3>7. Cluster Neighborhoods</h3>

Run k-means to cluster the neighborhoods in Ahmedabad into 3 clusters

In [30]:
# set number of clusters
kclusters = 3

amd_clustering = amd_mall.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(amd_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([1, 1, 0, 0, 1, 0, 0, 1, 0, 1])

In [31]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
amd_merged = amd_mall.copy()

# add clustering labels
amd_merged["Cluster Labels"] = kmeans.labels_

In [32]:
amd_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
amd_merged.head()

Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels
0,Ahmedabad Cantonment,0.066667,1
1,Alam Roza,0.071429,1
2,Ambawadi,0.11,0
3,Amraiwadi,0.25,0
4,Anand Nagar (Ahmedabad),0.066667,1


In [33]:
amd_merged = amd_merged.join(df.set_index("Neighbourhood"), on="Neighborhood")

print(amd_merged.shape)
amd_merged.head() # check the last columns!

(75, 5)


Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels,Latitude,Longitude
0,Ahmedabad Cantonment,0.066667,1,23.063899,72.608736
1,Alam Roza,0.071429,1,22.996181,72.588301
2,Ambawadi,0.11,0,23.02237,72.543044
3,Amraiwadi,0.25,0,22.999673,72.635381
4,Anand Nagar (Ahmedabad),0.066667,1,23.083329,72.566697


<h3>Finally, let's visualize the resulting clusters</h3>

In [34]:
# create map
map_clusters = folium.Map(location=[amd_lat, amd_lng], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(amd_merged['Latitude'], amd_merged['Longitude'], amd_merged['Neighborhood'], amd_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [35]:
# save the map as HTML file
map_clusters.save('map_clusters.html')

<h3>8. Examine Clusters</h3>

Cluster 0

In [36]:
amd_merged.loc[amd_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels,Latitude,Longitude
2,Ambawadi,0.11,0,23.02237,72.543044
3,Amraiwadi,0.25,0,22.999673,72.635381
5,Asarwa,0.12,0,23.048783,72.608349
6,Asarwa Chakla,0.12,0,23.048783,72.608349
8,Bahiyal,0.15,0,23.022505,72.571362
11,Behrampura,0.111111,0,23.004679,72.579476
15,Calico Mills (area),0.135593,0,23.002543,72.574644
18,Dariapur (Ahmedabad),0.212766,0,23.034057,72.593679
20,Ellis bridge (area),0.13,0,23.026233,72.562312
21,Ghatlodiya,0.2,0,23.074768,72.535598


<b>Cluster 1</b>

In [37]:
amd_merged.loc[amd_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels,Latitude,Longitude
0,Ahmedabad Cantonment,0.066667,1,23.063899,72.608736
1,Alam Roza,0.071429,1,22.996181,72.588301
4,Anand Nagar (Ahmedabad),0.066667,1,23.083329,72.566697
7,Badarkha,0.0,1,22.840458,72.45083
9,Bapunagar,0.0,1,23.038696,72.630753
10,Bareja (area),0.0,1,22.854617,72.591795
12,Bhairavnath Road,0.076923,1,22.995327,72.602085
13,Bhojva,0.0,1,23.153732,72.027557
16,Chandkheda,0.0,1,23.109098,72.584918
17,Chandlodiya,0.083333,1,23.082996,72.546277


<b>Cluster 2</b>

In [38]:
amd_merged.loc[amd_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Indian Restaurant,Cluster Labels,Latitude,Longitude
14,Bopal,0.375,2,23.033677,72.463412
27,"Gota, Gujarat",0.538462,2,23.101297,72.540705


<b>Observations:</b>
    
From the visual Map of the clusters we can observer that the concentration of Indian Cuisine serving restaurants is very well spread around the central areas of the city. However we can observer that cluster 2 contains the highest concentration of Indian Restaurants, represented by the Mint colours circles, are located towards the outskirts of the city, while cluster 1 has a very low number of restaurants to none in these neighborhoods, which is represented by purple colour markers. Cluster 0 shows a moderate number of Indian Restaurants represented by Red color markers.

Referring to the above observations we can say that Cluster 2 would be the least preferable choice to open an Indian Restaurant as it contains the highest number of Indian Restaurants which would lead to an intense competition, and it might take a lot of time for the restaurant to create awareness and image of itself in the market.

Neighborhoods Located in cluster 1 would be the best choice to open a new Indian Restaurants as there is little to no competition in these locations, and it would also be a great opportunity to capitalise the market in these areas.

However, if a person is looking to expand, and open his/her chain of restaurant at a new location. He/she can go with either cluster 1 or 0 assuming that the restaurant already has a known image ad popularity in the community. For such owners, cluster 0 could serve as a good opportunity as opening a restaurant which already has a brand recognition and reputation, can benefit immensely from the existing competition, as consumers may prefer their Indian Restaurant more over the others.