# IBM Coursera Capstone Project


### Recommending the best area to open an Indian restaurant in the city of Bangalore, India


Build a dataframe of areas in Bangalore by web scraping the data

Get the geographical coordinates of the areas

Obtain the venue data for the areas from Foursquare API

Explore and cluster the areas

Select the best cluster to open a new Indian restaurant


### Importing the necessary libraries and getting started

In [2]:
#Importing the necessary libraries

import urllib.request
import geocoder
import pandas as pd
import numpy as np
from bs4 import BeautifulSoup
import requests
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors

### Web Scraping using BeautifulSoup and creating a pandas dataframe to store the data

In [3]:
#Specify which URL/web page we are going to be scraping

url = "https://finkode.com/ka/bangalore.html"

In [4]:
#Opening the url and putting the HTML into the page variable

page = urllib.request.urlopen(url)

In [5]:
#Parse the HTML from our URL into the BeautifulSoup parse tree format

soup = BeautifulSoup(page, 'lxml')

In [6]:
#Take a look at our HTML code

print(soup.prettify())

<!DOCTYPE html>
<html lang="en">
 <head>
  <title>
   Bangalore District Pincode List, Karnataka Postal Pin Codes | FinKode.com
  </title>
  <meta charset="utf-8"/>
  <meta content="width=device-width, initial-scale=1" name="viewport"/>
  <meta content="Bangalore Pin code list. Search and lookup pincode of all delivery Post Offices in Bangalore district of Karnataka." name="description"/>
  <style type="text/css">
   body {
  font-family: Arial, San-Serif;
  font-size: 15px;
  max-width: 960px;
  margin: 0 auto;
  background: #fff;
}
#c760 {max-width:960px;padding:0 .5em;background:#eef;border-radius:10px;}
input[type="text"] {padding: 8px;vertical-align: middle; margin:0px 6px}
input[type="submit"] {height: 37px;padding: 0px 12px;vertical-align: middle;}

table {
  width:100%;
  text-align:left;
}
td {line-height:2em;vertical-align:top}
th, .bc {
  background:#eee;
}
.hl {
 color: #666;
 font-size: .8em;
}
.plist {
  width:100%;
}
  </style>
 </head>
 <body>
  <div id="c760">
   <h1>


In [7]:
#Using BeautifulSoup library functions

soup.title.string

'Bangalore District Pincode List, Karnataka Postal Pin Codes | FinKode.com'

### Extracting the desired table from our HTML code

In [8]:
#Finding all the 'table' tags

tables = soup.find_all('table')
tables

[<table class="plist"><caption>List of Post Offices/ Pincodes in areas under Bangalore district, Karnataka</caption><tr><th scope="col">Post Office</th><th scope="col">District</th><th scope="col">Pincode</th></tr><tr><td><a href="/af-station-yelahanka-560063.html">A F Station Yelahanka S.O</a></td><td>Bangalore</td><td><a href="/560063.html">560063</a></td></tr><tr><td><a href="/adugodi-560030.html">Adugodi S.O</a></td><td>Bangalore</td><td><a href="/560030.html">560030</a></td></tr><tr><td><a href="/agara-560034.html">Agara B.O</a></td><td>Bangalore</td><td><a href="/560034.html">560034</a></td></tr><tr><td><a href="/agram-560007.html">Agram S.O</a></td><td>Bangalore</td><td><a href="/560007.html">560007</a></td></tr><tr><td><a href="/amruthahalli-560092.html">Amruthahalli B.O</a></td><td>Bangalore</td><td><a href="/560092.html">560092</a></td></tr><tr><td><a href="/anandnagar-s-o-560024.html">Anandnagar S.O (Bangalore)</a></td><td>Bangalore</td><td><a href="/560024.html">560024</a><

In [9]:
#Selecting our table

my_table = soup.find('table', class_ = 'plist')
my_table

<table class="plist"><caption>List of Post Offices/ Pincodes in areas under Bangalore district, Karnataka</caption><tr><th scope="col">Post Office</th><th scope="col">District</th><th scope="col">Pincode</th></tr><tr><td><a href="/af-station-yelahanka-560063.html">A F Station Yelahanka S.O</a></td><td>Bangalore</td><td><a href="/560063.html">560063</a></td></tr><tr><td><a href="/adugodi-560030.html">Adugodi S.O</a></td><td>Bangalore</td><td><a href="/560030.html">560030</a></td></tr><tr><td><a href="/agara-560034.html">Agara B.O</a></td><td>Bangalore</td><td><a href="/560034.html">560034</a></td></tr><tr><td><a href="/agram-560007.html">Agram S.O</a></td><td>Bangalore</td><td><a href="/560007.html">560007</a></td></tr><tr><td><a href="/amruthahalli-560092.html">Amruthahalli B.O</a></td><td>Bangalore</td><td><a href="/560092.html">560092</a></td></tr><tr><td><a href="/anandnagar-s-o-560024.html">Anandnagar S.O (Bangalore)</a></td><td>Bangalore</td><td><a href="/560024.html">560024</a></

### Looping through the rows and copying data from our HTML code

In [10]:
#Extracting the data for our dataframe

A = []
B = []
C = []

#Looping through the rows and copying data into lists A,B and C
#Excluding those rows which have a 'Not assigned value'

for row in my_table.findAll('tr'):
    cells = row.findAll('td')
    if len(cells) == 3:
        if cells[1].find(text = True) != "Not assigned\n":
            A.append(cells[0].find(text = True))
            B.append(cells[1].find(text = True))
            C.append(cells[2].find(text = True))

In [11]:
#Displaying the first five elements of the lists A,B and C

A[0:5], B[0:5], C[0:5]

(['A F Station Yelahanka S.O',
  'Adugodi S.O',
  'Agara B.O',
  'Agram S.O',
  'Amruthahalli B.O'],
 ['Bangalore', 'Bangalore', 'Bangalore', 'Bangalore', 'Bangalore'],
 ['560063', '560030', '560034', '560007', '560092'])

### Creating a pandas dataframe and storing the columns into the dataframe

In [12]:
#Creating a pandas dataframe and storing the data

df = pd.DataFrame()
df['PostOffice'] = A
df['District'] = B
df['PinCode'] = C

#Displaying the dataframe

df.sort_values(by = 'PinCode', inplace = True)
df.reset_index(inplace = True)
df.drop('index', inplace = True, axis = 1)
df

Unnamed: 0,PostOffice,District,PinCode
0,Bangalore G.P.O.,Bangalore,560001
1,Mahatma Gandhi Road S.O,Bangalore,560001
2,HighCourt S.O,Bangalore,560001
3,Legislators Home S.O,Bangalore,560001
4,Cubban Road S.O,Bangalore,560001
5,Rajbhavan S.O (Bangalore),Bangalore,560001
6,Bangalore Bazaar S.O,Bangalore,560001
7,Vidhana Soudha S.O,Bangalore,560001
8,Dr. Ambedkar Veedhi S.O,Bangalore,560001
9,Bangalore Corporation Building S.O,Bangalore,560002


### Shape of the dataset

In [13]:
#Displaying the shape of the dataframe

df.shape

(270, 3)

# Adding the Geospatial Cordinates into the dataframe

### Library to convert a Postal Code into latitude and longitude values(Geocoder not working properly in a loop) 

In [14]:
#Function to get the geospatial coordinates

def get_latlng(postalcode):
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Bangalore, India'.format(postalcode))
        lat_lng_coords = g.latlng
    return lat_lng_coords

In [15]:
#Getting the coordinates

coords = [ get_latlng(postalcode) for postalcode in df["PinCode"].tolist() ]

In [18]:
coords

[[12.97918500000003, 77.60662343000007],
 [12.97918500000003, 77.60662343000007],
 [12.97918500000003, 77.60662343000007],
 [12.97918500000003, 77.60662343000007],
 [12.97918500000003, 77.60662343000007],
 [12.97918500000003, 77.60662343000007],
 [12.97918500000003, 77.60662343000007],
 [12.97918500000003, 77.60662343000007],
 [12.97918500000003, 77.60662343000007],
 [12.96407000000005, 77.57764666700007],
 [12.96407000000005, 77.57764666700007],
 [12.96407000000005, 77.57764666700007],
 [13.003656157000023, 77.56974500000007],
 [13.003656157000023, 77.56974500000007],
 [13.003656157000023, 77.56974500000007],
 [13.003656157000023, 77.56974500000007],
 [13.003656157000023, 77.56974500000007],
 [12.945663964000062, 77.57507500000008],
 [12.945663964000062, 77.57507500000008],
 [12.945663964000062, 77.57507500000008],
 [12.998115000000041, 77.62084160800003],
 [13.010375000000067, 77.59129210500004],
 [13.010375000000067, 77.59129210500004],
 [12.95658269200004, 77.62849000000006],
 [12.

### Using the Geospatial dataset for adding the data into the dataframe

In [21]:
# create temporary dataframe to populate the coordinates into Latitude and Longitude
df2 = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])

In [22]:
df2

Unnamed: 0,Latitude,Longitude
0,12.979185,77.606623
1,12.979185,77.606623
2,12.979185,77.606623
3,12.979185,77.606623
4,12.979185,77.606623
5,12.979185,77.606623
6,12.979185,77.606623
7,12.979185,77.606623
8,12.979185,77.606623
9,12.964070,77.577647


### Adding Geospatial data into df

In [23]:
df['Latitude'] = df2['Latitude']
df['Longitude'] = df2['Longitude']
df

Unnamed: 0,PostOffice,District,PinCode,Latitude,Longitude
0,Bangalore G.P.O.,Bangalore,560001,12.979185,77.606623
1,Mahatma Gandhi Road S.O,Bangalore,560001,12.979185,77.606623
2,HighCourt S.O,Bangalore,560001,12.979185,77.606623
3,Legislators Home S.O,Bangalore,560001,12.979185,77.606623
4,Cubban Road S.O,Bangalore,560001,12.979185,77.606623
5,Rajbhavan S.O (Bangalore),Bangalore,560001,12.979185,77.606623
6,Bangalore Bazaar S.O,Bangalore,560001,12.979185,77.606623
7,Vidhana Soudha S.O,Bangalore,560001,12.979185,77.606623
8,Dr. Ambedkar Veedhi S.O,Bangalore,560001,12.979185,77.606623
9,Bangalore Corporation Building S.O,Bangalore,560002,12.964070,77.577647


In [24]:
df.shape

(270, 5)

# Exploring and Clustering the Neighborhoods in Bangalore

### Create a map of Bangalore with neighborhoods superimposed on top.

In [28]:
#Getting the coordinates of Banagalore

from geopy.geocoders import Nominatim
address = 'Bangalore'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Bangalore are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Bangalore are 12.9791198, 77.5912997.


In [29]:
import folium
# create map of New York using latitude and longitude values
map_bangalore = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'], df['Longitude'], df['District'], df['PostOffice']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='yellow',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bangalore)  
    
map_bangalore

### Define Foursquare credentials and version

In [30]:
#To be hidden
CLIENT_ID = '5J3TYKMEISGA5DUKEJ1DJ1G0XQ00QGFQJCG1GDYG4KINHFHI' # Foursquare ID
CLIENT_SECRET = 'RIV1EGCULCBDAXT5EQFWRAQIMRA5CIR3O0GOQ14SBB1IWQKT' # Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100
RADIUS = 500

### Explore Neighborhoods in Toronto

#### Let's create a function to repeat the same process to all the neighborhoods in Manhattan

In [40]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['PostOffice', 
                  'PostOffice Latitude', 
                  'PostOffice Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### The code to run the above function on each neighborhood and create a new dataframe called toronto_venues.

In [42]:
bangalore_venues = getNearbyVenues(names=df['PostOffice'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Bangalore G.P.O.
Mahatma Gandhi Road S.O
HighCourt S.O
Legislators Home S.O
Cubban Road S.O
Rajbhavan S.O (Bangalore)
Bangalore Bazaar S.O
Vidhana Soudha S.O
Dr. Ambedkar Veedhi S.O
Bangalore Corporation Building S.O
Bangalore City S.O
Sri Jayachamarajendra Road S.O
Palace Guttahalli S.O
Swimming Pool Extn S.O
Malleswaram S.O
Venkatarangapura S.O
Vyalikaval Extn S.O
Mavalli S.O
Pampamahakavi Road S.O
Basavanagudi H.O
Fraser Town S.O
Training Command IAF S.O
J.C.Nagar S.O
Agram S.O
H.A.L II Stage H.O
Hulsur Bazaar S.O
K. G. Road S.O
Bangalore Dist Offices Bldg S.O
Rajajinagar IVth Block S.O
Industrial Estate S.O (Bangalore)
Rajajinagar H.O
Madhavan Park S.O
Jayangar III Block S.O
Science Institute S.O
Jalahalli H.O
Jalahalli East S.O
Jalahalli West S.O
Ramamurthy Nagar S.O
Doorvaninagar S.O
Krishnarajapuram R S S.O
Vimanapura S.O
NAL S.O
Chamrajpet S.O (Bangalore)
Gaviopuram Extension S.O
Narasimharaja Colony S.O
Seshadripuram S.O
Srirampuram S.O
Gayathrinagar S.O
Yeshwanthpur Bazar S.O

#### Size of the resulting dataframe

In [43]:
print(bangalore_venues.shape)
bangalore_venues.head()

(2163, 7)


Unnamed: 0,PostOffice,PostOffice Latitude,PostOffice Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Bangalore G.P.O.,12.979185,77.606623,Peppa Zzing,12.9797,77.605907,Burger Joint
1,Bangalore G.P.O.,12.979185,77.606623,Samarkand,12.980616,77.604668,Afghan Restaurant
2,Bangalore G.P.O.,12.979185,77.606623,Unicorn Bar and Restaurant,12.97979,77.60571,Bar
3,Bangalore G.P.O.,12.979185,77.606623,M.G Road Boulevard,12.975771,77.603979,Plaza
4,Bangalore G.P.O.,12.979185,77.606623,The 13th Floor,12.975364,77.604995,Lounge


#### Let's check how many venues were returned for each neighborhood

In [44]:
bangalore_venues.groupby('PostOffice').count()

Unnamed: 0_level_0,PostOffice Latitude,PostOffice Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
PostOffice,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
A F Station Yelahanka S.O,1,1,1,1,1,1
Adugodi S.O,6,6,6,6,6,6
Agara B.O,22,22,22,22,22,22
Agram S.O,2,2,2,2,2,2
Amruthahalli B.O,10,10,10,10,10,10
Anandnagar S.O (Bangalore),11,11,11,11,11,11
Arabic College S.O,3,3,3,3,3,3
Ashoknagar S.O (Bangalore),2,2,2,2,2,2
Attur B.O,2,2,2,2,2,2
Austin Town S.O,4,4,4,4,4,4


### Let's find out how many unique categories can be curated from all the returned venues

In [45]:
print('There are {} uniques categories.'.format(len(bangalore_venues['Venue Category'].unique())))

There are 155 uniques categories.


## Analyzing each Post Office

In [46]:
# one hot encoding category variable
bangalore_onehot = pd.get_dummies(bangalore_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
bangalore_onehot['PostOffice'] = bangalore_venues['PostOffice'] 

# move neighborhood column to the first column
fixed_columns = [bangalore_onehot.columns[-1]] + list(bangalore_onehot.columns[:-1])
bangalore_onehot = bangalore_onehot[fixed_columns]

bangalore_onehot.head()

Unnamed: 0,PostOffice,ATM,Accessories Store,Afghan Restaurant,Airport Terminal,American Restaurant,Andhra Restaurant,Arcade,Art Gallery,Arts & Crafts Store,...,Thai Restaurant,Theater,Tibetan Restaurant,Toy / Game Store,Trail,Udupi Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Women's Store
0,Bangalore G.P.O.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bangalore G.P.O.,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bangalore G.P.O.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bangalore G.P.O.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bangalore G.P.O.,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


#### Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [48]:
bangalore_grouped = bangalore_onehot.groupby('PostOffice').mean().reset_index()
bangalore_grouped

Unnamed: 0,PostOffice,ATM,Accessories Store,Afghan Restaurant,Airport Terminal,American Restaurant,Andhra Restaurant,Arcade,Art Gallery,Arts & Crafts Store,...,Thai Restaurant,Theater,Tibetan Restaurant,Toy / Game Store,Trail,Udupi Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Women's Store
0,A F Station Yelahanka S.O,1.000000,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000
1,Adugodi S.O,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000
2,Agara B.O,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.045455,0.0,0.000000
3,Agram S.O,1.000000,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000
4,Amruthahalli B.O,0.000000,0.000000,0.000000,0.0,0.000000,0.1,0.0,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000
5,Anandnagar S.O (Bangalore),0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000
6,Arabic College S.O,0.666667,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000
7,Ashoknagar S.O (Bangalore),0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000
8,Attur B.O,1.000000,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000
9,Austin Town S.O,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.0,0.000000,0.000000,...,0.0,0.0,0.0,0.0,0.0,0.000000,0.000000,0.000000,0.0,0.000000


#### Let's print each neighborhood along with the top 5 most common venues

In [49]:
num_top_venues = 5

for hood in bangalore_grouped['PostOffice']:
    print("----"+hood+"----")
    temp = bangalore_grouped[bangalore_grouped['PostOffice'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----A F Station Yelahanka S.O----
                        venue  freq
0                         ATM   1.0
1                   Multiplex   0.0
2   Middle Eastern Restaurant   0.0
3          Miscellaneous Shop   0.0
4  Modern European Restaurant   0.0


----Adugodi S.O----
                    venue  freq
0       Indian Restaurant  0.17
1             Bus Station  0.17
2  Furniture / Home Store  0.17
3           Design Studio  0.17
4    Fast Food Restaurant  0.17


----Agara B.O----
                venue  freq
0   Indian Restaurant  0.18
1                Café  0.14
2    Department Store  0.09
3              Bakery  0.05
4  Italian Restaurant  0.05


----Agram S.O----
                        venue  freq
0                         ATM   1.0
1                   Multiplex   0.0
2   Middle Eastern Restaurant   0.0
3          Miscellaneous Shop   0.0
4  Modern European Restaurant   0.0


----Amruthahalli B.O----
                  venue  freq
0     Indian Restaurant   0.4
1               Brewery  

### Let's put that into a pandas dataframe

In [50]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [84]:
#let's create the new dataframe and display the top 10 venues for each neighborhood

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['PostOffice']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
PostOffice_venues_sorted = pd.DataFrame(columns=columns)
PostOffice_venues_sorted['PostOffice'] = bangalore_grouped['PostOffice']

for ind in np.arange(bangalore_grouped.shape[0]):
    PostOffice_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bangalore_grouped.iloc[ind, :], num_top_venues)

PostOffice_venues_sorted.head()

Unnamed: 0,PostOffice,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,A F Station Yelahanka S.O,ATM,Diner,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant,Donut Shop,Dessert Shop
1,Adugodi S.O,Bus Station,Design Studio,Furniture / Home Store,Restaurant,Fast Food Restaurant,Indian Restaurant,Cosmetics Shop,Cricket Ground,Cupcake Shop,Department Store
2,Agara B.O,Indian Restaurant,Café,Department Store,Ice Cream Shop,Italian Restaurant,Fast Food Restaurant,Burger Joint,Snack Place,Beer Garden,Bar
3,Agram S.O,ATM,Diner,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant,Donut Shop,Dessert Shop
4,Amruthahalli B.O,Indian Restaurant,Ice Cream Shop,Andhra Restaurant,Bubble Tea Shop,Brewery,Fast Food Restaurant,Pizza Place,Gym Pool,Dessert Shop,Hobby Shop


# Cluster the Areas in Bangalore

In [89]:
# set number of clusters
kclusters = 5

bangalore_grouped_clustering = bangalore_grouped.drop('PostOffice', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=None).fit(bangalore_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 1, 1, 0, 1, 1, 0, 3, 0, 1])

### Let's create a new dataframe that includes the cluster as well as the top 10 venues for each PostOffice.

In [98]:
# add clustering labels
#PostOffice_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

bangalore_merged = df

# merge bangalore_grouped with bangalore_data to add latitude/longitude for each neighborhood
bangalore_merged = bangalore_merged.join(PostOffice_venues_sorted.set_index('PostOffice'), on='PostOffice')

bangalore_merged # check the last columns!

Unnamed: 0,PostOffice,District,PinCode,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bangalore G.P.O.,Bangalore,560001,12.979185,77.606623,1.0,Indian Restaurant,Clothing Store,Women's Store,Fast Food Restaurant,Men's Store,Bar,Lounge,Café,Hotel,Financial or Legal Service
1,Mahatma Gandhi Road S.O,Bangalore,560001,12.979185,77.606623,1.0,Indian Restaurant,Clothing Store,Women's Store,Fast Food Restaurant,Men's Store,Bar,Lounge,Café,Hotel,Financial or Legal Service
2,HighCourt S.O,Bangalore,560001,12.979185,77.606623,1.0,Indian Restaurant,Clothing Store,Women's Store,Fast Food Restaurant,Men's Store,Bar,Lounge,Café,Hotel,Financial or Legal Service
3,Legislators Home S.O,Bangalore,560001,12.979185,77.606623,1.0,Indian Restaurant,Clothing Store,Women's Store,Fast Food Restaurant,Men's Store,Bar,Lounge,Café,Hotel,Financial or Legal Service
4,Cubban Road S.O,Bangalore,560001,12.979185,77.606623,1.0,Indian Restaurant,Clothing Store,Women's Store,Fast Food Restaurant,Men's Store,Bar,Lounge,Café,Hotel,Financial or Legal Service
5,Rajbhavan S.O (Bangalore),Bangalore,560001,12.979185,77.606623,1.0,Indian Restaurant,Clothing Store,Women's Store,Fast Food Restaurant,Men's Store,Bar,Lounge,Café,Hotel,Financial or Legal Service
6,Bangalore Bazaar S.O,Bangalore,560001,12.979185,77.606623,1.0,Indian Restaurant,Clothing Store,Women's Store,Fast Food Restaurant,Men's Store,Bar,Lounge,Café,Hotel,Financial or Legal Service
7,Vidhana Soudha S.O,Bangalore,560001,12.979185,77.606623,1.0,Indian Restaurant,Clothing Store,Women's Store,Fast Food Restaurant,Men's Store,Bar,Lounge,Café,Hotel,Financial or Legal Service
8,Dr. Ambedkar Veedhi S.O,Bangalore,560001,12.979185,77.606623,1.0,Indian Restaurant,Clothing Store,Women's Store,Fast Food Restaurant,Men's Store,Bar,Lounge,Café,Hotel,Financial or Legal Service
9,Bangalore Corporation Building S.O,Bangalore,560002,12.964070,77.577647,1.0,South Indian Restaurant,Historic Site,Middle Eastern Restaurant,Miscellaneous Shop,Women's Store,Diner,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant


## Examine Clusters

In [99]:
# Cluster 1
bangalore_merged.loc[bangalore_merged['Cluster Labels'] == 0]

Unnamed: 0,PostOffice,District,PinCode,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Agram S.O,Bangalore,560007,12.956583,77.62849,0.0,ATM,Diner,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant,Donut Shop,Dessert Shop
37,Ramamurthy Nagar S.O,Bangalore,560016,13.007242,77.677815,0.0,ATM,Mattress Store,Snack Place,Convenience Store,Dumpling Restaurant,Concert Hall,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Falafel Restaurant
38,Doorvaninagar S.O,Bangalore,560016,13.007242,77.677815,0.0,ATM,Mattress Store,Snack Place,Convenience Store,Dumpling Restaurant,Concert Hall,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Falafel Restaurant
39,Krishnarajapuram R S S.O,Bangalore,560016,13.007242,77.677815,0.0,ATM,Mattress Store,Snack Place,Convenience Store,Dumpling Restaurant,Concert Hall,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Falafel Restaurant
94,Arabic College S.O,Bangalore,560045,13.012302,77.611605,0.0,ATM,Falafel Restaurant,Diner,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Electronics Store,Dumpling Restaurant,Donut Shop,Dessert Shop
95,Nagawara B.O,Bangalore,560045,13.012302,77.611605,0.0,ATM,Falafel Restaurant,Diner,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Electronics Store,Dumpling Restaurant,Donut Shop,Dessert Shop
96,Venkateshapura S.O,Bangalore,560045,13.012302,77.611605,0.0,ATM,Falafel Restaurant,Diner,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Electronics Store,Dumpling Restaurant,Donut Shop,Dessert Shop
118,Mallathahalli B.O,Bangalore,560056,12.946615,77.473274,0.0,ATM,Diner,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant,Donut Shop,Dessert Shop
119,Ullalu Upanagara B.O,Bangalore,560056,12.946615,77.473274,0.0,ATM,Diner,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant,Donut Shop,Dessert Shop
120,Bnagalore Viswavidalaya S.O,Bangalore,560056,12.946615,77.473274,0.0,ATM,Diner,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant,Donut Shop,Dessert Shop


In [100]:
# Cluster 2
bangalore_merged.loc[bangalore_merged['Cluster Labels'] == 1]

Unnamed: 0,PostOffice,District,PinCode,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bangalore G.P.O.,Bangalore,560001,12.979185,77.606623,1.0,Indian Restaurant,Clothing Store,Women's Store,Fast Food Restaurant,Men's Store,Bar,Lounge,Café,Hotel,Financial or Legal Service
1,Mahatma Gandhi Road S.O,Bangalore,560001,12.979185,77.606623,1.0,Indian Restaurant,Clothing Store,Women's Store,Fast Food Restaurant,Men's Store,Bar,Lounge,Café,Hotel,Financial or Legal Service
2,HighCourt S.O,Bangalore,560001,12.979185,77.606623,1.0,Indian Restaurant,Clothing Store,Women's Store,Fast Food Restaurant,Men's Store,Bar,Lounge,Café,Hotel,Financial or Legal Service
3,Legislators Home S.O,Bangalore,560001,12.979185,77.606623,1.0,Indian Restaurant,Clothing Store,Women's Store,Fast Food Restaurant,Men's Store,Bar,Lounge,Café,Hotel,Financial or Legal Service
4,Cubban Road S.O,Bangalore,560001,12.979185,77.606623,1.0,Indian Restaurant,Clothing Store,Women's Store,Fast Food Restaurant,Men's Store,Bar,Lounge,Café,Hotel,Financial or Legal Service
5,Rajbhavan S.O (Bangalore),Bangalore,560001,12.979185,77.606623,1.0,Indian Restaurant,Clothing Store,Women's Store,Fast Food Restaurant,Men's Store,Bar,Lounge,Café,Hotel,Financial or Legal Service
6,Bangalore Bazaar S.O,Bangalore,560001,12.979185,77.606623,1.0,Indian Restaurant,Clothing Store,Women's Store,Fast Food Restaurant,Men's Store,Bar,Lounge,Café,Hotel,Financial or Legal Service
7,Vidhana Soudha S.O,Bangalore,560001,12.979185,77.606623,1.0,Indian Restaurant,Clothing Store,Women's Store,Fast Food Restaurant,Men's Store,Bar,Lounge,Café,Hotel,Financial or Legal Service
8,Dr. Ambedkar Veedhi S.O,Bangalore,560001,12.979185,77.606623,1.0,Indian Restaurant,Clothing Store,Women's Store,Fast Food Restaurant,Men's Store,Bar,Lounge,Café,Hotel,Financial or Legal Service
9,Bangalore Corporation Building S.O,Bangalore,560002,12.964070,77.577647,1.0,South Indian Restaurant,Historic Site,Middle Eastern Restaurant,Miscellaneous Shop,Women's Store,Diner,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant


In [101]:
# Cluster 3
bangalore_merged.loc[bangalore_merged['Cluster Labels'] == 2]

Unnamed: 0,PostOffice,District,PinCode,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
107,State Bank Of Mysore Colony S.O,Bangalore,560050,12.9354,77.556874,2.0,Pizza Place,Fast Food Restaurant,Dessert Shop,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant,Donut Shop,Diner,Design Studio
108,Dasarahalli(Srinagar) S.O,Bangalore,560050,12.9354,77.556874,2.0,Pizza Place,Fast Food Restaurant,Dessert Shop,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant,Donut Shop,Diner,Design Studio
109,Ashoknagar S.O (Bangalore),Bangalore,560050,12.9354,77.556874,2.0,Pizza Place,Fast Food Restaurant,Dessert Shop,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant,Donut Shop,Diner,Design Studio
110,Banashankari S.O,Bangalore,560050,12.9354,77.556874,2.0,Pizza Place,Fast Food Restaurant,Dessert Shop,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant,Donut Shop,Diner,Design Studio
122,Peenya I Stage S.O,Bangalore,560058,13.020865,77.505088,2.0,Fast Food Restaurant,Women's Store,Diner,Financial or Legal Service,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant,Donut Shop,Dessert Shop
123,Peenya II Stage S.O,Bangalore,560058,13.020865,77.505088,2.0,Fast Food Restaurant,Women's Store,Diner,Financial or Legal Service,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant,Donut Shop,Dessert Shop
124,Peenya Small Industries S.O,Bangalore,560058,13.020865,77.505088,2.0,Fast Food Restaurant,Women's Store,Diner,Financial or Legal Service,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant,Donut Shop,Dessert Shop
125,Laggere S.O,Bangalore,560058,13.020865,77.505088,2.0,Fast Food Restaurant,Women's Store,Diner,Financial or Legal Service,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant,Donut Shop,Dessert Shop
264,Hunasamaranahalli B.O,Bangalore,562157,13.16869,77.635941,2.0,Fast Food Restaurant,Café,Bike Shop,Women's Store,Donut Shop,Financial or Legal Service,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant
265,Bettahalsur S.O,Bangalore,562157,13.16869,77.635941,2.0,Fast Food Restaurant,Café,Bike Shop,Women's Store,Donut Shop,Financial or Legal Service,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant


In [102]:
# Cluster 4
bangalore_merged.loc[bangalore_merged['Cluster Labels'] == 3]

Unnamed: 0,PostOffice,District,PinCode,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
17,Mavalli S.O,Bangalore,560004,12.945664,77.575075,3.0,Indian Restaurant,Fast Food Restaurant,Ice Cream Shop,Bakery,Food Truck,Mediterranean Restaurant,Farmers Market,Dessert Shop,Café,Sandwich Place
18,Pampamahakavi Road S.O,Bangalore,560004,12.945664,77.575075,3.0,Indian Restaurant,Fast Food Restaurant,Ice Cream Shop,Bakery,Food Truck,Mediterranean Restaurant,Farmers Market,Dessert Shop,Café,Sandwich Place
19,Basavanagudi H.O,Bangalore,560004,12.945664,77.575075,3.0,Indian Restaurant,Fast Food Restaurant,Ice Cream Shop,Bakery,Food Truck,Mediterranean Restaurant,Farmers Market,Dessert Shop,Café,Sandwich Place
20,Fraser Town S.O,Bangalore,560005,12.998115,77.620842,3.0,Indian Restaurant,Chinese Restaurant,Movie Theater,Café,Shopping Mall,Vegetarian / Vegan Restaurant,Convenience Store,Cosmetics Shop,Cricket Ground,Cupcake Shop
26,K. G. Road S.O,Bangalore,560009,12.978412,77.578175,3.0,Indian Restaurant,Hotel,Bed & Breakfast,Flea Market,Asian Restaurant,Dessert Shop,Diner,Seafood Restaurant,Bookstore,Park
27,Bangalore Dist Offices Bldg S.O,Bangalore,560009,12.978412,77.578175,3.0,Indian Restaurant,Hotel,Bed & Breakfast,Flea Market,Asian Restaurant,Dessert Shop,Diner,Seafood Restaurant,Bookstore,Park
34,Jalahalli H.O,Bangalore,560013,13.047915,77.544193,3.0,Playground,Vegetarian / Vegan Restaurant,Indian Restaurant,Indie Movie Theater,Women's Store,Dessert Shop,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant
43,Gaviopuram Extension S.O,Bangalore,560019,12.94761,77.563241,3.0,Fast Food Restaurant,Indian Restaurant,Theater,Art Gallery,Women's Store,Dessert Shop,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant
44,Narasimharaja Colony S.O,Bangalore,560019,12.94761,77.563241,3.0,Fast Food Restaurant,Indian Restaurant,Theater,Art Gallery,Women's Store,Dessert Shop,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant
57,Governmemnt Electric Factory S.O,Bangalore,560026,12.951388,77.54677,3.0,Indian Restaurant,Bus Station,Department Store,Diner,Financial or Legal Service,Fast Food Restaurant,Farmers Market,Falafel Restaurant,Electronics Store,Dumpling Restaurant


# Done!

### Conclusion

Most of the Indian restaurant are concentrated in the cluster 2 and 4 in Bangalore city, with the highest number in cluster 4 and moderate number in cluster 2. On the other hand, cluster 3 has very low number of Indian restaurant in the area. This represents a great opportunity and high potential areas to open new Indian restaurants as there is very little competition from existing Indian restaurants, as there is a scarcity of Indian restaurants in this cluster only fast food joints are present. Meanwhile, Indian restaurants in cluster 1 are likely suffering from intense competition due to oversupply and high concentration of restaurants. Therefore, this project recommends the investor to capitalize on these findings to open new Indian restaurants in neighborhoods in cluster 3 with little to no competition. 