<img src = "https://upload.wikimedia.org/wikipedia/commons/thumb/8/8d/Bangalore_palace.jpg/1280px-Bangalore_palace.jpg" width = 300>

<h1 align=center><font size = 6>Battle of the Neighborhoods - Bangalore, India</font></h1>

### Introduction

Opening a New Café in Bangalore, India

* Build a dataframe of neighborhoods in Bangalore, India by web scraping the data from Wikipedia page
* Get the geographical coordinates of the neighborhoods
* Obtain the venue data for the neighborhoods from Foursquare API
* Explore and cluster the neighborhoods
* Select the best cluster to open a new café


### 1. Import Libraries

In [1]:
import pandas as pd
import numpy as np
import requests
import json
import matplotlib.cm as cm
import matplotlib.colors as colors
from bs4 import BeautifulSoup
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium

print('Libraries imported')

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... done

# All requested packages already installed.

Libraries imported


### 2. Defining a method to get scrape co-ordinates of a locality using Google search

In [2]:
def googleSearchLatLong(url):
    source = requests.get(url).text
    soup = BeautifulSoup(source, 'lxml')
    loc = soup.find('div', class_='BNeawe iBp4i AP7Wnd')
    #print(soup.prettify())
    try:
        return([i.split('°', 1)[0] for i in loc.text.split(', ')])
    except:
        return ['','']

###### We are creating the dataframe that will store all the data. We will work with 3 columns here, Neighborhood, Latitude, and Logitude.

In [3]:
df = pd.DataFrame(columns=['Neighborhood', 'Latitude', 'Longitude'])
df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude


### 3. Scraping data for the Neighborhood names from wikipedia

In [4]:
src_url = 'https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Bangalore'

source = requests.get(src_url).text
soup = BeautifulSoup(source, 'lxml')
i=0
table = soup.find_all('tr')
for row in table:
    col1 = row.find('td')
    if col1:
        locality = col1.text.replace(' ', '+')[0:-1]
        #print(locality)
        local_url = "https://www.google.com/search?q="+locality+"%2C+Bangalore+latitude+longitude"
        latlong = googleSearchLatLong(local_url)
        df.loc[i] = ([col1.text[0:-1]] + latlong)
        i=i+1
        if 'Vijayanagar' in locality:
            break
            
df.head(10)

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Cantonment area,,
1,Domlur,12.961,77.6387
2,Indiranagar,12.9784,77.6408
3,Jeevanbheemanagar,12.9642,77.6581
4,Malleswaram,13.0055,77.5692
5,Pete area,,
6,Sadashivanagar,13.0068,77.5813
7,Seshadripuram,12.9889,77.574
8,Shivajinagar,12.9857,77.6057
9,Ulsoor,12.9817,77.6284


###### We'll only consider areas where we could find latitude and longitude

In [5]:
df = df[df['Latitude']!='']
df.head(10)

Unnamed: 0,Neighborhood,Latitude,Longitude
1,Domlur,12.961,77.6387
2,Indiranagar,12.9784,77.6408
3,Jeevanbheemanagar,12.9642,77.6581
4,Malleswaram,13.0055,77.5692
6,Sadashivanagar,13.0068,77.5813
7,Seshadripuram,12.9889,77.574
8,Shivajinagar,12.9857,77.6057
9,Ulsoor,12.9817,77.6284
10,Vasanth Nagar,12.992,77.5943
11,Bellandur,12.9304,77.6784


In [6]:
df.shape

(63, 3)

In [7]:
type(df.iloc[0,1])

str

###### Converting the lat, long columns from 'str' to float

In [8]:
df['Latitude'] = df['Latitude'].astype(float)
df['Longitude'] = df['Longitude'].astype(float)

In [9]:
df.to_csv('BLR_localities.csv')

### 4. Plot the neighborhoods on a map

In [10]:
# create map of Bangalore using latitude and longitude values
latitude = 12.95
longitude = 77.6
map_blr = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, long, label in zip(df['Latitude'], df['Longitude'], df['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_blr)

map_blr

### 5. Use Foursquare API to explore the neighborhoods

In [11]:
CLIENT_ID = 'KXF3LOAKFBTIXISMFZ2H4WMDYSVY3ZOHRVDJKMD5GRXSNUD1HIDE'
CLIENT_SECRET = 'PDCNDXHVYUAGHGSTBSQYF1445FWCRY0F4AMHIDJENBRXK1ODHIDE'
VERSION = '20180605'
LIMIT = 100

###### We will now fetch venues that are within a radius of 10 km

In [12]:
def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, long in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            long, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            long, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = [
                    'Neighborhood',
                    'Neighborhood Latitude',
                    'Neighborhood Longitude',
                    'Venue',
                    'Venue Latitude',
                    'Venue Longitude',
                    'Venue Category']
    
    return(nearby_venues)

In [13]:
blr_venues = getNearbyVenues(names=df['Neighborhood'],
                             latitudes=df['Latitude'],
                             longitudes=df['Longitude'])

In [14]:
blr_venues.head(10)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Domlur,12.961,77.6387,Lavonne,12.963909,77.638579,Café
1,Domlur,12.961,77.6387,Barbeque Nation,12.962684,77.641599,BBQ Joint
2,Domlur,12.961,77.6387,Smoke House Deli,12.965584,77.641498,Deli / Bodega
3,Domlur,12.961,77.6387,Anand Sweets,12.960166,77.645168,Indian Restaurant
4,Domlur,12.961,77.6387,League of Extraordinary Gamers,12.967099,77.636919,Gaming Cafe
5,Domlur,12.961,77.6387,Starbucks,12.965649,77.641718,Coffee Shop
6,Domlur,12.961,77.6387,Bodycraft,12.968497,77.641289,Spa
7,Domlur,12.961,77.6387,Puma Social Club,12.967254,77.641212,Nightclub
8,Domlur,12.961,77.6387,Big Pitcher,12.960101,77.646946,Brewery
9,Domlur,12.961,77.6387,Murphy's,12.953659,77.639397,Irish Pub


In [15]:
blr_venues.shape

(1704, 7)

In [16]:
blr_venues.to_csv('BLR_venues.csv')

###### Let's find out the count of venues we have for each neighborhood

In [17]:
blr_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Anjanapura,2,2,2,2,2,2
Arekere,21,21,21,21,21,21
BTM Layout,46,46,46,46,46,46
Banashankari,18,18,18,18,18,18
Banaswadi,17,17,17,17,17,17
Basavanagudi,54,54,54,54,54,54
Basaveshwaranagar,23,23,23,23,23,23
Begur,8,8,8,8,8,8
Bellandur,37,37,37,37,37,37
Bommanahalli,7,7,7,7,7,7


###### And the unique venue categories...

In [34]:
print('There are {} uniques categories.'.format(len(blr_venues['Venue Category'].unique())))

There are 189 uniques categories.


In [19]:
blr_venues.groupby('Venue Category').count().sort_values(['Venue'], axis=0, ascending=False)

Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Indian Restaurant,311,311,311,311,311,311
Café,116,116,116,116,116,116
Fast Food Restaurant,71,71,71,71,71,71
Pizza Place,58,58,58,58,58,58
Ice Cream Shop,57,57,57,57,57,57
Chinese Restaurant,55,55,55,55,55,55
Coffee Shop,52,52,52,52,52,52
Bakery,44,44,44,44,44,44
Department Store,42,42,42,42,42,42
Hotel,38,38,38,38,38,38


### 6. Analyze each neighborhood

In [20]:
# one hot encoding
blr_onehot = pd.get_dummies(blr_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
blr_onehot['Neighborhood'] = blr_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [blr_onehot.columns[-1]] + list(blr_onehot.columns[:-1])
blr_onehot = blr_onehot[fixed_columns]

blr_onehot.head(10)

Unnamed: 0,Yoga Studio,ATM,Accessories Store,Afghan Restaurant,American Restaurant,Andhra Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,...,Train Station,Tram Station,Travel & Transport,Turkish Restaurant,Udupi Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Watch Shop,Wine Bar,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
6,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
8,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
9,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [21]:
blr_onehot.shape

(1704, 189)

In [22]:
blr_grouped = blr_onehot.groupby('Neighborhood').mean().reset_index()
blr_grouped.head(10)

Unnamed: 0,Neighborhood,Yoga Studio,ATM,Accessories Store,Afghan Restaurant,American Restaurant,Andhra Restaurant,Arcade,Art Gallery,Art Museum,...,Train Station,Tram Station,Travel & Transport,Turkish Restaurant,Udupi Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Watch Shop,Wine Bar,Women's Store
0,Anjanapura,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Arekere,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,BTM Layout,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.065217,0.0,0.0,0.0,0.0
3,Banashankari,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Banaswadi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0
5,Basavanagudi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Basaveshwaranagar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,Begur,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Bellandur,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.027027,0.0,0.0,0.027027,0.0,0.0,0.0,0.0
9,Bommanahalli,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [23]:
blr_grouped.shape

(62, 189)

In [24]:
len(blr_grouped[blr_grouped["Café"] > 0])

38

In [25]:
blr_cafe = blr_grouped[["Neighborhood","Café"]]

In [26]:
blr_cafe.head(10)

Unnamed: 0,Neighborhood,Café
0,Anjanapura,0.0
1,Arekere,0.095238
2,BTM Layout,0.065217
3,Banashankari,0.111111
4,Banaswadi,0.176471
5,Basavanagudi,0.018519
6,Basaveshwaranagar,0.086957
7,Begur,0.0
8,Bellandur,0.081081
9,Bommanahalli,0.0


### 7. Cluster neighborhoods

###### Using k-means to cluster the neighborhoods in Bangalore into clusters

In [41]:
# set number of clusters
kclusters = 4

blr_clustering = blr_cafe.drop(["Neighborhood"], axis=1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(blr_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 2, 2, 2, 1, 0, 2, 0, 2, 0])

In [42]:
# create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
blr_merged = blr_cafe.copy()

# add clustering labels
blr_merged["Cluster Labels"] = kmeans.labels_

In [43]:
blr_merged.head(10)

Unnamed: 0,Neighborhood,Café,Cluster Labels
0,Anjanapura,0.0,0
1,Arekere,0.095238,2
2,BTM Layout,0.065217,2
3,Banashankari,0.111111,2
4,Banaswadi,0.176471,1
5,Basavanagudi,0.018519,0
6,Basaveshwaranagar,0.086957,2
7,Begur,0.0,0
8,Bellandur,0.081081,2
9,Bommanahalli,0.0,0


In [44]:
# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
blr_merged = blr_merged.join(df.set_index("Neighborhood"), on="Neighborhood")

print(blr_merged.shape)
blr_merged.head() # check the last columns!

(62, 5)


Unnamed: 0,Neighborhood,Café,Cluster Labels,Latitude,Longitude
0,Anjanapura,0.0,0,12.8549,77.5543
1,Arekere,0.095238,2,12.8874,77.5969
2,BTM Layout,0.065217,2,12.9166,77.6101
3,Banashankari,0.111111,2,12.9255,77.5468
4,Banaswadi,0.176471,1,13.0104,77.6482


In [45]:
# sort the results by Cluster Labels
print(blr_merged.shape)
blr_merged.sort_values(["Cluster Labels"], inplace=True)
blr_merged

(62, 5)


Unnamed: 0,Neighborhood,Café,Cluster Labels,Latitude,Longitude
0,Anjanapura,0.000000,0,12.8549,77.5543
21,Hulimavu,0.000000,0,12.8791,77.6098
24,Jalahalli,0.000000,0,13.0528,77.5419
28,Kamakshipalya,0.000000,0,12.9857,77.5267
32,Kothnur,0.000000,0,12.8706,77.5831
35,Lingarajapuram,0.000000,0,13.0130,77.6262
38,Mahalakshmi Layout,0.032258,0,13.0146,77.5514
40,Marathahalli,0.000000,0,12.9569,77.7011
20,Horamavu,0.000000,0,13.0326,77.6583
41,Mathikere,0.025641,0,13.0334,77.5640


In [53]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(blr_merged['Latitude'], blr_merged['Longitude'], blr_merged['Neighborhood'], blr_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [54]:
map_clusters.save('map_clusters.html')

In [52]:
blr_merged.to_csv('blr_cluster.csv')

### 8. Examine clusters

In [47]:
blr_merged.loc[blr_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Café,Cluster Labels,Latitude,Longitude
0,Anjanapura,0.0,0,12.8549,77.5543
21,Hulimavu,0.0,0,12.8791,77.6098
24,Jalahalli,0.0,0,13.0528,77.5419
28,Kamakshipalya,0.0,0,12.9857,77.5267
32,Kothnur,0.0,0,12.8706,77.5831
35,Lingarajapuram,0.0,0,13.013,77.6262
38,Mahalakshmi Layout,0.032258,0,13.0146,77.5514
40,Marathahalli,0.0,0,12.9569,77.7011
20,Horamavu,0.0,0,13.0326,77.6583
41,Mathikere,0.025641,0,13.0334,77.564


In [48]:
blr_merged.loc[blr_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Café,Cluster Labels,Latitude,Longitude
49,Rajarajeshwari Nagar,0.25,1,12.9149,77.5206
4,Banaswadi,0.176471,1,13.0104,77.6482
33,Krishnarajapuram,0.285714,1,13.017,77.7044


In [49]:
blr_merged.loc[blr_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Café,Cluster Labels,Latitude,Longitude
26,Jeevanbheemanagar,0.078947,2,12.9642,77.6581
47,R. T. Nagar,0.045455,2,13.0196,77.5968
48,Rajajinagar,0.074074,2,12.9982,77.553
51,Sadashivanagar,0.055556,2,13.0068,77.5813
53,Shivajinagar,0.05,2,12.9857,77.6057
2,BTM Layout,0.065217,2,12.9166,77.6101
55,Uttarahalli,0.055556,2,12.907,77.5521
1,Arekere,0.095238,2,12.8874,77.5969
57,Vasanth Nagar,0.04918,2,12.992,77.5943
58,Vidyaranyapura,0.090909,2,13.0811,77.5562


In [50]:
blr_merged.loc[blr_merged['Cluster Labels'] == 3]

Unnamed: 0,Neighborhood,Café,Cluster Labels,Latitude,Longitude
42,Nagarbhavi,0.5,3,12.9719,77.5127


###### Observation:

Most of the cafés are concentrated in the central area of Bangalore, most of which belong to cluster 2. Cluster 1 has very less concentration of cafés, and are mostly away from the center of the city. However looking at the plotted map, we can observe 2 neighborhoods, which are in the central part of the city but belong to cluster 1. We can deduce that these are 2 neighborhoods that haven't been commercialized, so this project recommends that Basavnagudi and Sheshadripuram would be an ideal place to set up a new café.