<h1>Capstone Project - Battle of Neighborhoods (week 2)</h1>
<h2>Applied Data Science Capstone</h2>

<h3>Introduction to Business Problem</h3>

<h4>Opening a new Italian Restaurant in Bangalore, Karnataka</h4>

<t>The objective of this report is to determine the best possible location to open an Italian Restaurant in Bangalore, Karnataka based on the different localities of the city, already established Italian restaurant in varios geographical location and ease of accessibility by maximum number of people so that the revenue from the latest venture can be maximized.</t>

<h3>Data</h3>

<t>This project will use data from : </t>
<ul>
    <li>Geopy - For getting the co-ordinated of different locations.</li>
    <li>Foursquare API - To get the list of vanues and their details around a given location.</li>
</ul>

<h3>Methodology</h3>

<ol>
    <li>Getting the co-ordinates of the target city.</li>
    <li>Getting the list of neighborhoods and their co-ordinates.</li>
    <li>Exploring the most visited venues in the target localities.</li>
    <li>Clustering the localities.</li>
    <li>Analyzing the clusters formed.</li>
</ol>

<h3>1. Importing required libraries</h3>

In [1]:
#Importing required libraries
import numpy as np
import pandas as pd

from geopy.geocoders import Nominatim
try:
    import geocoder
except:
    !pip install geocoder
    import geocoder

import requests
from bs4 import BeautifulSoup

try:
    import folium
except:
    !pip install folium
    import folium
    
from sklearn.cluster import KMeans

Collecting geocoder
[?25l  Downloading https://files.pythonhosted.org/packages/4f/6b/13166c909ad2f2d76b929a4227c952630ebaf0d729f6317eb09cbceccbab/geocoder-1.38.1-py2.py3-none-any.whl (98kB)
[K     |████████████████████████████████| 102kB 11.0MB/s ta 0:00:01
Collecting ratelim (from geocoder)
  Downloading https://files.pythonhosted.org/packages/f2/98/7e6d147fd16a10a5f821db6e25f192265d6ecca3d82957a4fdd592cad49c/ratelim-0.1.6-py2.py3-none-any.whl
Installing collected packages: ratelim, geocoder
Successfully installed geocoder-1.38.1 ratelim-0.1.6
Collecting folium
[?25l  Downloading https://files.pythonhosted.org/packages/72/ff/004bfe344150a064e558cb2aedeaa02ecbf75e60e148a55a9198f0c41765/folium-0.10.0-py2.py3-none-any.whl (91kB)
[K     |████████████████████████████████| 92kB 11.0MB/s eta 0:00:01
Collecting branca>=0.3.0 (from folium)
  Downloading https://files.pythonhosted.org/packages/63/36/1c93318e9653f4e414a2e0c3b98fc898b4970e939afeedeee6075dd3b703/branca-0.3.1-py3-none-any.whl
I

<h3>2. Getting the location</h3>

In [2]:
g = geocoder.arcgis('Bangalore, India')
blr_lat = g.latlng[0]
blr_lng = g.latlng[1]
print("The Latitude and Longitude of Bangalore is {} and {}".format(blr_lat, blr_lng))

The Latitude and Longitude of Bangalore is 12.966870000000029 and 77.58734000000004


<h3>3. Getting the List of Neighborhoods in Bangalore from Wikipedia</h3>

In [3]:
#Scraping the webpage for list of localities
neig = requests.get("https://commons.wikimedia.org/wiki/Category:Suburbs_of_Bangalore").text

In [4]:
soup = BeautifulSoup(neig, 'html.parser')

In [5]:
#Creating a list to store neighborhood data
neighborhoodlist = []

In [6]:
for i in soup.find_all('div', class_='mw-category')[0].find_all('a'):
    neighborhoodlist.append(i.text)

#Creating a dataframe from the list
neig_df = pd.DataFrame({"Locality": neighborhoodlist})
neig_df.head()

Unnamed: 0,Locality
0,"Agara, Bangalore"
1,Arekere
2,Banashankari
3,Banaswadi
4,Basavanagudi


In [7]:
#Shape of dataframe neig_df
neig_df.shape

(58, 1)

<h3>4. Getting the location of the Localities</h3>

In [8]:
#Defining a function to get the location of the localities
def get_location(localities):
    g = geocoder.arcgis('{}, Bangalore, India'.format(localities))
    get_latlng = g.latlng
    return get_latlng

In [9]:
co_ordinates = []
for i in neig_df["Locality"].tolist():
    co_ordinates.append(get_location(i))
print(co_ordinates)

[[12.842700000000036, 77.48882000000003], [12.885640000000024, 77.59669000000008], [12.922280000000057, 77.56986000000006], [13.028473466463632, 77.63189195846024], [12.939000000000021, 77.57135000000005], [12.882490000000075, 77.62475000000006], [12.927350000000047, 77.67184000000003], [12.975753300305838, 77.6162602769923], [12.960540000000037, 77.64381000000003], [12.966870000000029, 77.58734000000004], [12.817540000000065, 77.67879000000005], [12.966210117271197, 77.60678848215437], [12.793990000000065, 77.70018000000005], [12.966780000000028, 77.63344000000006], [12.966870000000029, 77.58734000000004], [12.943300000000022, 77.65603000000004], [12.845470000000034, 77.66430000000008], [12.998850000000061, 77.61271000000005], [12.942780000000027, 77.54121000000004], [13.02642000000003, 77.62432000000007], [13.049690000000055, 77.58951000000008], [13.077180000000055, 77.80178000000006], [12.912160000000029, 77.64490000000006], [12.973930000000053, 77.64390000000003], [12.9234400000000

In [10]:
#Creating a dataframe from the list of location
co_ordinates_df = pd.DataFrame(co_ordinates, columns=['Latitudes', 'Longitudes'])

In [11]:
#Adding co-ordinated to neig_df dataframe
neig_df["Latitudes"] = co_ordinates_df["Latitudes"]
neig_df["Longitudes"] = co_ordinates_df["Longitudes"]

In [12]:
neig_df.head()

Unnamed: 0,Locality,Latitudes,Longitudes
0,"Agara, Bangalore",12.8427,77.48882
1,Arekere,12.88564,77.59669
2,Banashankari,12.92228,77.56986
3,Banaswadi,13.028473,77.631892
4,Basavanagudi,12.939,77.57135


<h3>5. Plotting the Localities on map</h3>

In [13]:
#Creating a map
blr_map = folium.Map(location=[blr_lat, blr_lng],zoom_start=11)

#adding markers to the map for localities
#marker for Bangalore
folium.Marker([blr_lat, blr_lng], popup='<i>Bangalore</i>', color='red', tooltip="Click to see").add_to(blr_map)

#markers for localities
for latitude,longitude,name in zip(neig_df["Latitudes"], neig_df["Longitudes"], neig_df["Locality"]):
    folium.CircleMarker(
        [latitude, longitude],
        radius=6,
        color='blue',
        popup=name,
        fill=True,
        fill_color='#3186ff'
    ).add_to(blr_map)

blr_map

<h3>6. Using Foursquare API to explore the localities</h3>

In [14]:
# The code was removed by Watson Studio for sharing.

Your credentails:
CLIENT_ID: JTB4R2ZERJU1QIVN1L4DXTEHZZS3ALDRVPDITI5KSV45D0DG
CLIENT_SECRET:ICQ5C1WJOIFWHALH01K3XKDN4UFX3Q5PT3I4ZBNVW3P1SVKD


In [15]:
#Getting the top 100 venues in each locality
radius = 2000
LIMIT = 100

venues = []

for lat, lng, locality in zip(neig_df["Latitudes"], neig_df["Longitudes"], neig_df["Locality"]):
    url = 'https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&ll={},{}&v={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, lat, lng, VERSION, radius, LIMIT)
    results = requests.get(url).json()['response']['groups'][0]['items']

    for venue in results:
        venues.append((locality, lat, lng, venue['venue']['name'], venue['venue']['location']['lat'], venue['venue']['location']['lng'], venue['venue']['categories'][0]['name']))

In [16]:
venues[0]

('Arekere',
 12.885640000000024,
 77.59669000000008,
 'Decathlon Sports India Pvt Ltd',
 12.887513041243446,
 77.59771185000064,
 'Sporting Goods Shop')

In [17]:
#Convert the venue list into dataframe
venues_df = pd.DataFrame(venues)
venues_df.columns = ['Locality', 'Latitude', 'Longitude', 'Venue name', 'Venue Lat', 'Venue Lng', 'Venue Category']
venues_df.head()

Unnamed: 0,Locality,Latitude,Longitude,Venue name,Venue Lat,Venue Lng,Venue Category
0,Arekere,12.88564,77.59669,Decathlon Sports India Pvt Ltd,12.887513,77.597712,Sporting Goods Shop
1,Arekere,12.88564,77.59669,Cinepolis,12.876119,77.595455,Multiplex
2,Arekere,12.88564,77.59669,Swensens,12.876071,77.595542,Ice Cream Shop
3,Arekere,12.88564,77.59669,Natural Ice Cream,12.892188,77.598222,Ice Cream Shop
4,Arekere,12.88564,77.59669,Ingu Tengu,12.883268,77.607514,South Indian Restaurant


In [18]:
#Number of venues for each Locality
venues_df.groupby(['Locality']).count()

Unnamed: 0_level_0,Latitude,Longitude,Venue name,Venue Lat,Venue Lng,Venue Category
Locality,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Arekere,70,70,70,70,70,70
BEML,100,100,100,100,100,100
Banashankari,100,100,100,100,100,100
Banaswadi,55,55,55,55,55,55
Basavanagudi,100,100,100,100,100,100
"Begur, Bangalore",17,17,17,17,17,17
Bellandur,88,88,88,88,88,88
Bengaluru Pete,100,100,100,100,100,100
Bidadi,100,100,100,100,100,100
Bommasandra,9,9,9,9,9,9


In [19]:
#Getting the unique categories
print('There are {} unique categries.'.format(len(venues_df['Venue Category'])))

There are 3563 unique categries.


In [20]:
#List of categories
print('Total number of unique catefories are {}'.format(len(venues_df['Venue Category'].unique().tolist())))
#First 10 categories
venues_df['Venue Category'].unique().tolist()#[:10]

Total number of unique catefories are 212


['Sporting Goods Shop',
 'Multiplex',
 'Ice Cream Shop',
 'South Indian Restaurant',
 'Indian Restaurant',
 'Lounge',
 'BBQ Joint',
 'Beer Garden',
 'Bowling Alley',
 'Chinese Restaurant',
 'Shopping Mall',
 'Restaurant',
 'Office',
 'Pizza Place',
 'Café',
 'General Entertainment',
 'Sandwich Place',
 'Department Store',
 'Middle Eastern Restaurant',
 'Liquor Store',
 'Fast Food Restaurant',
 'Eastern European Restaurant',
 'Coffee Shop',
 'Burger Joint',
 'American Restaurant',
 'Rajasthani Restaurant',
 'Mughlai Restaurant',
 'Italian Restaurant',
 'Supermarket',
 'Dive Bar',
 'Dumpling Restaurant',
 'Electronics Store',
 'Food Truck',
 'Clothing Store',
 'Food Court',
 'Breakfast Spot',
 'Badminton Court',
 'Kebab Restaurant',
 'Vegetarian / Vegan Restaurant',
 'Pharmacy',
 'Donut Shop',
 'Salad Place',
 'Juice Bar',
 'Snack Place',
 'Mexican Restaurant',
 'Indian Chinese Restaurant',
 "Women's Store",
 'Gym / Fitness Center',
 'Seafood Restaurant',
 'Bookstore',
 'Gym',
 'Tea Room

<h3>7. Analyzing the Localities according to the venues</h3>

In [21]:
#one hot encoding
blr_onehot = pd.get_dummies(venues_df[['Venue Category']], prefix="", prefix_sep="")

blr_onehot['Locality'] = venues_df['Locality']

#move the locality column to the front
blr_onehot = blr_onehot[ [ 'Locality' ] + [ col for col in blr_onehot.columns if col!='Locality' ] ]
blr_onehot.head()

Unnamed: 0,Locality,ATM,Accessories Store,Afghan Restaurant,Airport,American Restaurant,Andhra Restaurant,Arcade,Art Gallery,Art Museum,...,Train Station,Travel & Transport,Turkish Restaurant,Udupi Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Arekere,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Arekere,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Arekere,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Arekere,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Arekere,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


<h4>Grouping the categories</h4>

In [22]:
blr_grouped = blr_onehot.groupby(['Locality']).mean().reset_index()
print(blr_grouped.shape)
blr_grouped.head()

(57, 213)


Unnamed: 0,Locality,ATM,Accessories Store,Afghan Restaurant,Airport,American Restaurant,Andhra Restaurant,Arcade,Art Gallery,Art Museum,...,Train Station,Travel & Transport,Turkish Restaurant,Udupi Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Arekere,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.014286,0.0,0.0,0.0,0.0,0.0
1,BEML,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0
2,Banashankari,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0
3,Banaswadi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.018182,0.0,0.0,0.0,0.0,0.0
4,Basavanagudi,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,...,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0


In [23]:
#numbers of localities having Italian Restaurants
len(blr_grouped[blr_grouped['Italian Restaurant'] > 0])

34

<h4>Dataframe for Italian Restaurant</h4>

In [24]:
blr_italian = blr_grouped[['Locality', 'Italian Restaurant']]
blr_italian.head()

Unnamed: 0,Locality,Italian Restaurant
0,Arekere,0.014286
1,BEML,0.01
2,Banashankari,0.03
3,Banaswadi,0.018182
4,Basavanagudi,0.02


<h3>8. Clustering The Localities</h3>

In [25]:
#K-means clustering
cluster = 3 

#Dataframe for clustering
blr_clustering = blr_italian.drop(['Locality'], 1)

#run K-means clustering
k_means = KMeans(init="k-means++", n_clusters=cluster, n_init=12).fit(blr_clustering)

#getting the labels for first 10 locality 
print(k_means.labels_[0:10])

[2 2 1 2 2 0 2 2 1 0]


In [26]:
#Creating a of blr_italian dataframe
blr_labels = blr_italian.copy()

#addring label to blr_labels
blr_labels["Cluster Label"] = k_means.labels_

blr_labels.head()

Unnamed: 0,Locality,Italian Restaurant,Cluster Label
0,Arekere,0.014286,2
1,BEML,0.01,2
2,Banashankari,0.03,1
3,Banaswadi,0.018182,2
4,Basavanagudi,0.02,2


In [27]:
#Merging the blr_labels and neig_df dataframes to get the latitude and longitudes for each locality
blr_labels = blr_labels.join(neig_df.set_index('Locality'), on='Locality')
blr_labels.head()

Unnamed: 0,Locality,Italian Restaurant,Cluster Label,Latitudes,Longitudes
0,Arekere,0.014286,2,12.88564,77.59669
1,BEML,0.01,2,12.975753,77.61626
2,Banashankari,0.03,1,12.92228,77.56986
3,Banaswadi,0.018182,2,13.028473,77.631892
4,Basavanagudi,0.02,2,12.939,77.57135


In [28]:
#Grouping the localities according to their Cluster Labels
blr_labels.sort_values(["Cluster Label"], inplace=True)
blr_labels.head()

Unnamed: 0,Locality,Italian Restaurant,Cluster Label,Latitudes,Longitudes
28,Kettohalli,0.0,0,12.90671,77.40467
54,Yelahanka,0.0,0,13.09932,77.5926
51,Ulsoor,0.0,0,12.98916,77.62798
48,Thubarahalli,0.0,0,12.9535,77.72113
46,Seetharampalya,0.0,0,13.11315,77.42458


In [29]:
#Plot the cluster on map
cluster_map = folium.Map(location=[blr_lat, blr_lng],zoom_start=11)

#marker for Bangalore
folium.Marker([blr_lat, blr_lng], popup='<i>Bangalore</i>', color='red', tooltip="Click to see").add_to(cluster_map)

#Getting the colors for the clusters
col = ['red', 'green', 'blue']

#markers for localities
for latitude,longitude,name,clus in zip(blr_labels["Latitudes"], blr_labels["Longitudes"], blr_labels["Locality"], blr_labels["Cluster Label"]):
    label = folium.Popup(name + ' - Cluster ' + str(clus))
    folium.CircleMarker(
        [latitude, longitude],
        radius=6,
        color=col[clus],
        popup=label,
        fill=False,
        fill_color=col[clus],
        fill_opacity=0.3
    ).add_to(cluster_map)
       
cluster_map

<h3>9. Analyzing The Cluster</h3>

In [47]:
#First Cluster
cluster_1 = blr_labels[blr_labels['Cluster Label'] == 0]
print("There are {} localities in cluster-1".format(cluster_1.shape[0]))
mean_presence_1 = cluster_1['Italian Restaurant'].mean()
print("The mean occurence of Italian restaurant in cluster-1 is {0:.2f}".format(mean_presence_1))
cluster_1

There are 23 localities in cluster-1
The mean occurence of Italian restaurant in cluster-1 is 0.00


Unnamed: 0,Locality,Italian Restaurant,Cluster Label,Latitudes,Longitudes
28,Kettohalli,0.0,0,12.90671,77.40467
54,Yelahanka,0.0,0,13.09932,77.5926
51,Ulsoor,0.0,0,12.98916,77.62798
48,Thubarahalli,0.0,0,12.9535,77.72113
46,Seetharampalya,0.0,0,13.11315,77.42458
44,Ramamurthy Nagar,0.0,0,13.02378,77.67787
43,"Rajarajeshwari Nagar, Bangalore",0.0,0,12.93162,77.52699
41,Nagarbhavi,0.0,0,12.95624,77.50933
38,Marathahalli,0.0,0,12.95467,77.70752
34,Magadi,0.0,0,12.98627,77.488578


In [46]:
#Second Cluster
cluster_2 = blr_labels[blr_labels['Cluster Label'] == 1]
print("There are {} localities in cluster-2".format(cluster_2.shape[0]))
mean_presence_2 = cluster_2['Italian Restaurant'].mean()
print("The mean occurence of Italian restaurant in cluster-2 is {0:.2f}".format(mean_presence_2))
cluster_2

There are 15 localities in cluster-2
The mean occurence of Italian restaurant in cluster-2 is 0.03


Unnamed: 0,Locality,Italian Restaurant,Cluster Label,Latitudes,Longitudes
32,Krishnarajapura,0.031746,1,13.0004,77.68378
39,Mathikere,0.031746,1,13.03236,77.55865
19,HSR Layout,0.044776,1,12.91216,77.6449
33,Madiwala,0.04,1,12.9205,77.6209
13,Dhobi Ghat (Bangalore),0.03,1,12.96687,77.58734
31,Koramangala,0.04,1,12.92005,77.62543
37,Malleswaram,0.04,1,13.0063,77.568289
42,Rajajinagar,0.043011,1,13.00543,77.55682
55,Yeshwantpur,0.030769,1,13.03912,77.57795
27,Jeevanbheemanagar,0.03,1,12.96605,77.65765


In [45]:
#Third Cluster
cluster_3 = blr_labels[blr_labels['Cluster Label'] == 2]
print("There are {} localities in cluster-3".format(cluster_3.shape[0]))
mean_presence_3 = cluster_3['Italian Restaurant'].mean()
print("The mean occurence of Italian restaurant in cluster-3 is {0:.2f}".format(mean_presence_3))
cluster_3

There are 19 localities in cluster-3
The mean occurence of Italian restaurant in cluster-3 is 0.02


Unnamed: 0,Locality,Italian Restaurant,Cluster Label,Latitudes,Longitudes
4,Basavanagudi,0.02,2,12.939,77.57135
53,"Whitefield, Bangalore",0.014493,2,12.97936,77.73368
50,UB City,0.02,2,12.97138,77.59583
1,BEML,0.01,2,12.975753,77.61626
47,Shivajinagar,0.02,2,12.98719,77.604
3,Banaswadi,0.018182,2,13.028473,77.631892
36,Majestic (Bangalore),0.01,2,12.97759,77.57256
7,Bengaluru Pete,0.02,2,12.96054,77.64381
40,Murugeshpalya,0.012195,2,12.95558,77.65334
35,Mahadevapura,0.022727,2,12.9941,77.66635


<h3>10. Conclusion</h3>

<ul>
    <li>From above analysis we can infer that cluster 1(shown with red color) has almost no Italian Restaurant with the highest numbers of the same in cluster 2(shown with green color) and moderate number of Italian Restaurants are present in cluster 3(shown with blue color) located in the central part of the city.</li>
    <li>This analysis presents a great opportunity to entrepreneurs to tap into the unutilized potential of the outer parts of the city of Bangalore by opening Italian Restaurants.</li>
    <li>It is also evident that cluster 2(around the central part of the city) is suffering from high competition and over supply, hense investment in this area should be avoided by developers. </li>
    <li>Developers with unique selling propositions that can stand out from the moderate competiton in cluster 3 can take moderate risk and attract the customers already visiting the locality of this cluster because of the existing Italian Restaurant.</li>
</ul>