# IBM Applied Data Science Capstone Course by Coursera

## Week 5 Final Report

### *Opening a New Restaurant Chain in Ahmedabad, India*
* Build a dataframe of neighborhoods in Ahmedabad, India by web scraping the data from Wikipedia page (check report for link)
* Get the geographical coordinates of the neighborhoods
* Obtain the venue data for the neighborhoods from Foursquare API
* Explore and cluster the neighborhoods
* Select the best cluster to open a new restaurant

## 1. Importing Required Libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates

import requests # library to handle requests
from bs4 import BeautifulSoup # library to parse HTML and XML documents

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print("Libraries imported.")

Libraries imported.


## 2. Scrape data from Wikipedia page into a DataFrame

In [2]:
# send the GET request
data = requests.get("https://en.wikipedia.org/wiki/Category:Neighbourhoods_in_Ahmedabad").text

In [3]:
# parse data from the html into a beautifulsoup object
soup = BeautifulSoup(data, 'html.parser')

In [4]:
# create a list to store neighborhood data
neighborhoodList = []

In [5]:
# append the data into the list
for row in soup.find_all("div", class_="mw-category")[0].findAll("li"):
    neighborhoodList.append(row.text)

In [6]:
# create a new DataFrame from the list
ahmedabad_df = pd.DataFrame({"Neighborhood": neighborhoodList})
ahmedabad_df.head()

Unnamed: 0,Neighborhood
0,Agol
1,Ahmedabad Cantonment
2,Alam Roza
3,Ambawadi
4,Amraiwadi


In [7]:
# print the number of rows of the dataframe
ahmedabad_df.shape

(81, 1)

## 3. Get the geographical coordinates

In [8]:
# define a function to get coordinates
def get_latlng(neighborhood):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Ahmedabad, India'.format(neighborhood))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [9]:
# call the function to get the coordinates, store in a new list using list comprehension
coords = [ get_latlng(neighborhood) for neighborhood in ahmedabad_df["Neighborhood"].tolist() ]
coords

[[23.027760000000058, 72.60027000000008],
 [23.027760000000058, 72.60027000000008],
 [23.002120000000048, 72.54979000000003],
 [23.018850000000043, 72.55441000000008],
 [23.00735000000003, 72.62263000000007],
 [23.011390000000063, 72.51712000000003],
 [23.04708000000005, 72.60481000000004],
 [23.04233380265855, 72.60459013199771],
 [22.84128000000004, 72.45453000000003],
 [23.027760000000058, 72.60027000000008],
 [23.034760000000063, 72.63024000000007],
 [22.85570000000007, 72.59490000000005],
 [23.00278000000003, 72.57706000000007],
 [22.315900000000056, 72.10697000000005],
 [22.99831461188491, 72.59329617832454],
 [23.159320000000037, 72.01855000000006],
 [23.030320000000074, 72.47247000000004],
 [23.000980000000027, 72.57459000000006],
 [22.806890000000067, 72.42511000000007],
 [23.112140000000068, 72.57989000000003],
 [23.087290000000053, 72.54899000000006],
 [23.027760000000058, 72.60027000000008],
 [23.036070000000052, 72.59213000000005],
 [23.32218000000006, 72.18817000000007],


In [10]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
ahmedabad_df['Latitude'] = df_coords['Latitude']
ahmedabad_df['Longitude'] = df_coords['Longitude']

In [11]:
# check the neighborhoods and the coordinates
print(ahmedabad_df.shape)
ahmedabad_df

(81, 3)


Unnamed: 0,Neighborhood,Latitude,Longitude
0,Agol,23.02776,72.60027
1,Ahmedabad Cantonment,23.02776,72.60027
2,Alam Roza,23.00212,72.54979
3,Ambawadi,23.01885,72.55441
4,Amraiwadi,23.00735,72.62263
5,Anand Nagar (Ahmedabad),23.01139,72.51712
6,Asarwa,23.04708,72.60481
7,Asarwa Chakla,23.042334,72.60459
8,Badarkha,22.84128,72.45453
9,Bahiyal,23.02776,72.60027


In [39]:
# save the DataFrame as CSV file
ahmedabad_df.to_csv("ahmedabad_df.csv", index=False)

## 4. Create a map of Ahmedabad with neighborhoods superimposed on top

In [13]:
# get the coordinates of Ahmedabad
address = 'Ahmedabad, Gujarat, India'

geolocator = Nominatim(user_agent="my-application")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Ahmedabad, India is {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Ahmedabad, India is 23.0216238, 72.5797068.


In [14]:
# create map of Ahmedabad using latitude and longitude values
map_ahmedabad = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, neighborhood in zip(ahmedabad_df['Latitude'], ahmedabad_df['Longitude'], ahmedabad_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_ahmedabad)  
    
map_ahmedabad

In [15]:
# save the map as HTML file
map_ahmedabad.save('map_ahmedabad.html')

## 5. Use the Foursquare API to explore the neighborhoods

In [16]:
CLIENT_ID = 'I1COYOEIRSOHTSPOLCEF5REJ0SF1RDSFVGXADZLTGXSJEAL0'
CLIENT_SECRET = 'CTQZW0ZRRJYIJOOILIYJZGHZE0HT4NAOWJYDNO3PB0GQGQRH'
VERSION = '20180605'

In [17]:
radius = 2000
LIMIT = 100

venues = []

for lat, long, neighborhood in zip(ahmedabad_df['Latitude'], ahmedabad_df['Longitude'], ahmedabad_df['Neighborhood']):
    
    # create the API request URL
    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        long,
        radius, 
        LIMIT)
    
    # make the GET request
    results = requests.get(url).json()["response"]['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    for venue in results:
        venues.append((
            neighborhood,
            lat, 
            long, 
            venue['venue']['name'], 
            venue['venue']['location']['lat'], 
            venue['venue']['location']['lng'],  
            venue['venue']['categories'][0]['name']))


In [18]:
# convert the venues list into a new DataFrame
venues_df = pd.DataFrame(venues)

# define the column names
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']

print(venues_df.shape)
venues_df.head()

(1778, 7)


Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Agol,23.02776,72.60027,Manek Chowk Khau Gali,23.023505,72.588539,Snack Place
1,Agol,23.02776,72.60027,Manek Chowk,23.023626,72.588553,Fast Food Restaurant
2,Agol,23.02776,72.60027,Moti Mahal,23.02912,72.599724,Indian Restaurant
3,Agol,23.02776,72.60027,Lucky Tea,23.027829,72.581394,Tea Room
4,Agol,23.02776,72.60027,Jama Masjid,23.024323,72.587042,Historic Site


In [19]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agol,21,21,21,21,21,21
Ahmedabad Cantonment,21,21,21,21,21,21
Alam Roza,11,11,11,11,11,11
Ambawadi,85,85,85,85,85,85
Amraiwadi,5,5,5,5,5,5
Anand Nagar (Ahmedabad),58,58,58,58,58,58
Asarwa,6,6,6,6,6,6
Asarwa Chakla,10,10,10,10,10,10
Bahiyal,21,21,21,21,21,21
Bapunagar,6,6,6,6,6,6


In [20]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 104 uniques categories.


In [21]:
# print out the list of categories
venues_df['VenueCategory'].unique()[:50]

array(['Snack Place', 'Fast Food Restaurant', 'Indian Restaurant',
       'Tea Room', 'Historic Site', 'Hotel', 'Ice Cream Shop',
       'Vegetarian / Vegan Restaurant', 'Train Station', 'Multiplex',
       'Clothing Store', 'Motel', 'Shopping Mall', 'Bus Station', 'Diner',
       'Pizza Place', 'Coffee Shop', 'Sandwich Place', 'Flea Market',
       'Mexican Restaurant', 'Park', 'Café', 'Street Food Gathering',
       'Dessert Shop', 'Bookstore', 'Theater', 'Arts & Crafts Store',
       'Farmers Market', 'Restaurant', 'Breakfast Spot', 'Arcade',
       'Asian Restaurant', 'Department Store', 'Gym / Fitness Center',
       "Women's Store", 'Gas Station', 'BBQ Joint', 'American Restaurant',
       'Event Space', 'Bakery', 'Food Truck', 'Electronics Store',
       'Food Court', 'Art Gallery', 'Supermarket', 'History Museum',
       'Market', 'Museum', 'ATM', 'Health & Beauty Service'], dtype=object)

In [22]:
"Restaurant" in venues_df['VenueCategory'].unique()

True

## 6. Analyze Each Neighborhood

In [23]:
# one hot encoding
ahmedabad_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
ahmedabad_onehot['Neighborhoods'] = venues_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [ahmedabad_onehot.columns[-1]] + list(ahmedabad_onehot.columns[:-1])
ahmedabad_onehot = ahmedabad_onehot[fixed_columns]

print(ahmedabad_onehot.shape)
ahmedabad_onehot.head()

(1778, 105)


Unnamed: 0,Neighborhoods,ATM,Airport Service,Airport Terminal,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,BBQ Joint,Bakery,Big Box Store,Bistro,Boat or Ferry,Bookstore,Breakfast Spot,Bus Station,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Cricket Ground,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,General Entertainment,Gourmet Shop,Gujarati Restaurant,Gym,Gym / Fitness Center,Health & Beauty Service,Historic Site,History Museum,Hospital,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Industrial Estate,Italian Restaurant,Juice Bar,Lake,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Moroccan Restaurant,Motel,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Museum,North Indian Restaurant,Optical Shop,Park,Performing Arts Venue,Pizza Place,Restaurant,River,Sandwich Place,Sculpture Garden,Shopping Mall,Ski Area,Smoke Shop,Snack Place,South Indian Restaurant,Spa,Speakeasy,Street Food Gathering,Supermarket,Tea Room,Tennis Court,Theater,Toy / Game Store,Train Station,Tree,Vegetarian / Vegan Restaurant,Video Store,Women's Store,Yoga Studio,Zoo
0,Agol,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Agol,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Agol,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Agol,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0
4,Agol,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [24]:
ahmedabad_grouped = ahmedabad_onehot.groupby(["Neighborhoods"]).mean().reset_index()

print(ahmedabad_grouped.shape)
ahmedabad_grouped.head()

(74, 105)


Unnamed: 0,Neighborhoods,ATM,Airport Service,Airport Terminal,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,BBQ Joint,Bakery,Big Box Store,Bistro,Boat or Ferry,Bookstore,Breakfast Spot,Bus Station,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Cricket Ground,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,General Entertainment,Gourmet Shop,Gujarati Restaurant,Gym,Gym / Fitness Center,Health & Beauty Service,Historic Site,History Museum,Hospital,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Industrial Estate,Italian Restaurant,Juice Bar,Lake,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Moroccan Restaurant,Motel,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Museum,North Indian Restaurant,Optical Shop,Park,Performing Arts Venue,Pizza Place,Restaurant,River,Sandwich Place,Sculpture Garden,Shopping Mall,Ski Area,Smoke Shop,Snack Place,South Indian Restaurant,Spa,Speakeasy,Street Food Gathering,Supermarket,Tea Room,Tennis Court,Theater,Toy / Game Store,Train Station,Tree,Vegetarian / Vegan Restaurant,Video Store,Women's Store,Yoga Studio,Zoo
0,Agol,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.047619,0.0,0.0,0.047619,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.095238,0.0,0.047619,0.0,0.0,0.0,0.0
1,Ahmedabad Cantonment,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.047619,0.0,0.0,0.047619,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.095238,0.0,0.047619,0.0,0.0,0.0,0.0
2,Alam Roza,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.181818,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0
3,Ambawadi,0.0,0.0,0.0,0.0,0.011765,0.0,0.011765,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.011765,0.0,0.129412,0.0,0.011765,0.035294,0.0,0.0,0.0,0.011765,0.047059,0.011765,0.0,0.0,0.0,0.0,0.0,0.011765,0.070588,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.035294,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035294,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023529,0.0,0.047059,0.011765,0.0,0.047059,0.0,0.035294,0.0,0.0,0.047059,0.0,0.0,0.0,0.011765,0.0,0.047059,0.0,0.023529,0.0,0.0,0.0,0.035294,0.0,0.011765,0.0,0.0
4,Amraiwadi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0


In [25]:
# Picking related columns by visual inspection
ahmedabad_grouped["Food"] = ahmedabad_grouped['Asian Restaurant']+ahmedabad_grouped['BBQ Joint']
+ahmedabad_grouped['Breakfast Spot']+ahmedabad_grouped['Café']+ahmedabad_grouped['Chinese Restaurant']
+ahmedabad_grouped['Coffee Shop']+ahmedabad_grouped['Comfort Food Restaurant']+ahmedabad_grouped['Dessert Shop']
+ahmedabad_grouped['Diner']+ahmedabad_grouped['Falafel Restaurant']+ahmedabad_grouped['Fast Food Restaurant']
+ahmedabad_grouped['Food & Drink Shop']+ahmedabad_grouped['Food Court']+ahmedabad_grouped['Food Truck']
+ahmedabad_grouped['Fried Chicken Joint']+ahmedabad_grouped['Gujarati Restaurant']+ahmedabad_grouped['Indian Restaurant']
+ahmedabad_grouped['Italian Restaurant']+ahmedabad_grouped['Juice Bar']+ahmedabad_grouped['Mediterranean Restaurant']
+ahmedabad_grouped['Mexican Restaurant']+ahmedabad_grouped['North Indian Restaurant']+ahmedabad_grouped['Restaurant']
+ahmedabad_grouped['Sandwich Place']+ahmedabad_grouped['Snack Place']+ahmedabad_grouped['South Indian Restaurant']
+ahmedabad_grouped['Tea Room']+ahmedabad_grouped['Vegetarian / Vegan Restaurant']
    

0     0.142857
1     0.142857
2     0.090909
3     0.082353
4     0.000000
5     0.068966
6     0.000000
7     0.000000
8     0.142857
9     0.000000
10    0.000000
11    0.181818
12    0.000000
13    0.000000
14    0.100000
15    0.083333
16    0.000000
17    0.000000
18    0.142857
19    0.086957
20    0.090909
21    0.000000
22    0.000000
23    0.000000
24    0.192308
25    0.000000
26    0.000000
27    0.000000
28    0.000000
29    0.176471
30    0.000000
31    0.142857
32    0.050000
33    0.081633
34    0.130435
35    0.000000
36    0.111111
37    0.071429
38    0.130435
39    0.000000
40    0.000000
41    0.000000
42    0.000000
43    0.045455
44    0.000000
45    0.000000
46    0.050000
47    0.000000
48    0.000000
49    0.000000
50    0.026316
51    0.017241
52    0.000000
53    0.086207
54    0.142857
55    0.000000
56    0.166667
57    0.000000
58    0.000000
59    0.000000
60    0.000000
61    0.200000
62    0.062500
63    0.041667
64    0.062500
65    0.000000
66    0.00

In [26]:
ahmedabad_grouped.head()

Unnamed: 0,Neighborhoods,ATM,Airport Service,Airport Terminal,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,BBQ Joint,Bakery,Big Box Store,Bistro,Boat or Ferry,Bookstore,Breakfast Spot,Bus Station,Café,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Cricket Ground,Cupcake Shop,Department Store,Dessert Shop,Diner,Donut Shop,Electronics Store,Event Space,Falafel Restaurant,Farm,Farmers Market,Fast Food Restaurant,Flea Market,Flower Shop,Food & Drink Shop,Food Court,Food Truck,Fried Chicken Joint,Furniture / Home Store,Garden,Gas Station,General Entertainment,Gourmet Shop,Gujarati Restaurant,Gym,Gym / Fitness Center,Health & Beauty Service,Historic Site,History Museum,Hospital,Hotel,Hotel Bar,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Industrial Estate,Italian Restaurant,Juice Bar,Lake,Lounge,Market,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Moroccan Restaurant,Motel,Movie Theater,Moving Target,Multicuisine Indian Restaurant,Multiplex,Museum,North Indian Restaurant,Optical Shop,Park,Performing Arts Venue,Pizza Place,Restaurant,River,Sandwich Place,Sculpture Garden,Shopping Mall,Ski Area,Smoke Shop,Snack Place,South Indian Restaurant,Spa,Speakeasy,Street Food Gathering,Supermarket,Tea Room,Tennis Court,Theater,Toy / Game Store,Train Station,Tree,Vegetarian / Vegan Restaurant,Video Store,Women's Store,Yoga Studio,Zoo,Food
0,Agol,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.047619,0.0,0.0,0.047619,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.095238,0.0,0.047619,0.0,0.0,0.0,0.0,0.0
1,Ahmedabad Cantonment,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.047619,0.0,0.0,0.047619,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.095238,0.0,0.047619,0.0,0.0,0.0,0.0,0.0
2,Alam Roza,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.181818,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0
3,Ambawadi,0.0,0.0,0.0,0.0,0.011765,0.0,0.011765,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.011765,0.0,0.129412,0.0,0.011765,0.035294,0.0,0.0,0.0,0.011765,0.047059,0.011765,0.0,0.0,0.0,0.0,0.0,0.011765,0.070588,0.011765,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.035294,0.117647,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035294,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.023529,0.0,0.047059,0.011765,0.0,0.047059,0.0,0.035294,0.0,0.0,0.047059,0.0,0.0,0.0,0.011765,0.0,0.047059,0.0,0.023529,0.0,0.0,0.0,0.035294,0.0,0.011765,0.0,0.0,0.011765
4,Amraiwadi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [27]:
len(ahmedabad_grouped[ahmedabad_grouped["Food"] > 0])

18

#### Create a new DataFrame for Restaurant data only

In [28]:
ahmedabad_restaurants = ahmedabad_grouped[["Neighborhoods","Food"]]

In [29]:
ahmedabad_restaurants.head()

Unnamed: 0,Neighborhoods,Food
0,Agol,0.0
1,Ahmedabad Cantonment,0.0
2,Alam Roza,0.0
3,Ambawadi,0.011765
4,Amraiwadi,0.0


## 7. Cluster Neighborhoods

#### Running k-means to cluster the neighborhoods in Ahmedabad into 3 clusters

In [30]:
# set number of clusters
kclusters = 3

kl_clustering = ahmedabad_restaurants.drop(["Neighborhoods"], 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kl_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 0, 0, 0, 2, 0, 0, 0, 0])

In [31]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
ahmedabad_merged = ahmedabad_restaurants.copy()

# add clustering labels
ahmedabad_merged["Cluster Labels"] = kmeans.labels_

In [32]:
ahmedabad_merged = ahmedabad_merged.join(ahmedabad_df.set_index("Neighborhood"), on="Neighborhoods")

print(ahmedabad_merged.shape)
ahmedabad_merged.head() 

(74, 5)


Unnamed: 0,Neighborhoods,Food,Cluster Labels,Latitude,Longitude
0,Agol,0.0,0,23.02776,72.60027
1,Ahmedabad Cantonment,0.0,0,23.02776,72.60027
2,Alam Roza,0.0,0,23.00212,72.54979
3,Ambawadi,0.011765,0,23.01885,72.55441
4,Amraiwadi,0.0,0,23.00735,72.62263


In [33]:
print(ahmedabad_merged.shape)
ahmedabad_merged.sort_values(["Cluster Labels"], inplace=True)
ahmedabad_merged

(74, 5)


Unnamed: 0,Neighborhoods,Food,Cluster Labels,Latitude,Longitude
0,Agol,0.0,0,23.02776,72.60027
72,Vastrapur,0.01,0,23.03717,72.53085
37,Kalyanpura (Ahmedabad),0.0,0,23.041963,72.589711
39,Kharna,0.0,0,24.52296,73.674965
40,Khodiyarnagar,0.0,0,23.03435,72.64652
41,Khokhra,0.0,0,23.005818,72.613309
42,Lambha,0.0,0,22.93802,72.58586
44,Maninagar,0.0,0,23.00526,72.60731
45,Memnagar,0.016667,0,23.0561,72.53034
46,Mithakali,0.02,0,23.03345,72.56399


#### Finally, let's visualize the resulting clusters 

In [34]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(ahmedabad_merged['Latitude'], ahmedabad_merged['Longitude'], ahmedabad_merged['Neighborhoods'], ahmedabad_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [35]:
map_clusters.save('map_clusters.html')

## 8. Examine Clusters

#### Cluster 0

In [36]:
ahmedabad_merged.loc[ahmedabad_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhoods,Food,Cluster Labels,Latitude,Longitude
0,Agol,0.0,0,23.02776,72.60027
72,Vastrapur,0.01,0,23.03717,72.53085
37,Kalyanpura (Ahmedabad),0.0,0,23.041963,72.589711
39,Kharna,0.0,0,24.52296,73.674965
40,Khodiyarnagar,0.0,0,23.03435,72.64652
41,Khokhra,0.0,0,23.005818,72.613309
42,Lambha,0.0,0,22.93802,72.58586
44,Maninagar,0.0,0,23.00526,72.60731
45,Memnagar,0.016667,0,23.0561,72.53034
46,Mithakali,0.02,0,23.03345,72.56399


#### Cluster 1

In [37]:
ahmedabad_merged.loc[ahmedabad_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhoods,Food,Cluster Labels,Latitude,Longitude
56,Ramol,0.166667,1,22.98212,72.66305


#### Cluster 2

In [38]:
ahmedabad_merged.loc[ahmedabad_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhoods,Food,Cluster Labels,Latitude,Longitude
38,"Khadia, Ahmedabad",0.043478,2,23.02077,72.59244
43,Makarba,0.045455,2,22.99696,72.49837
27,"Gota, Gujarat",0.083333,2,23.1017,72.53898
51,Navjivan (Neighbourhood),0.034483,2,23.04413,72.56883
55,Rajpur Gomtipur,0.071429,2,23.0146,72.61543
33,"Jodhpur, Gujarat",0.030612,2,23.02063,72.52522
5,Anand Nagar (Ahmedabad),0.051724,2,23.01139,72.51712
73,Vejalpur,0.066667,2,23.00782,72.51819


### Observations:

Most of the restaurants are concentrated in the outer area of Ahmedabad city, with the highest number in cluster 1 and moderate number in cluster 2. On the other hand, cluster 0 has very low number to totally no restaurant in the neighborhoods. This represents a great opportunity and high potential areas to open new restaurants as there is very little to no competition from existing restaurants. Meanwhile, restaurants in cluster 2 are likely suffering from intense competition due to oversupply and high concentration of restaurants. From another perspective, this also shows that the oversupply of restaurants mostly happened in the outer areas of the city, with the suburb area still have very few restaurants. Therefore, this project recommends companies and entrepreneurs to capitalize on these findings to open new restaurants in neighborhoods in cluster 0 with little to no competition. Companies with unique selling propositions to stand out from the competition can also open new restaurants in neighborhoods in cluster 1 with moderate competition. Lastly, companies are advised to avoid neighborhoods in cluster 2 which already have high concentration of restaurants and suffering from intense competition.