<a href="https://colab.research.google.com/github/fulcrum3/IBM_Capstone_Project/blob/master/capstone_project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Capstone Project
This project aims to perform a clustering analysis on the neighbourhoods of Bengaluru

## Importing Libraries

In [0]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
from geopy.geocoders import Nominatim
import re
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans

## Scraping data from web with beautiful soup

In [0]:
source = requests.get('https://en.wikipedia.org/wiki/List_of_neighbourhoods_in_Bangalore').text
soup = BeautifulSoup(source, 'lxml')
tables = soup.findAll('table')
dfs = []
for idx, table in enumerate(tables): 
  if(idx) <= 7:
    table_rows = table.find_all('tr')
    l = []
    for tr in table_rows:
        td = tr.find_all('td')
        row = [tr.text for tr in td]
        l.append(row)
    neighbourhoods = pd.DataFrame(l, columns=["Neighbourhood", "Image_link", "Inforamtion"])
    neighbourhoods = neighbourhoods.replace('\n','', regex=True)
    neighbourhoods = neighbourhoods.drop(neighbourhoods.index[0])
    dfs.append(neighbourhoods)

In [4]:
headings = soup.findAll('span', class_ = 'mw-headline')
regions = [heading.text for heading in headings]
regions = regions[0:8]
print('The regions in Bangalore are:', regions)

The regions in Bangalore are: ['Central', 'Eastern', 'North-Eastern', 'Northern', 'South-Eastern', 'Southern', 'Southern suburbs', 'Western']


In [5]:
neighbourhood_data = pd.DataFrame()
for region, neighbourhood in zip(regions, dfs):
  _ = pd.DataFrame({ 'Region' : region, 'Neighbourhood' : neighbourhood['Neighbourhood']})
  neighbourhood_data = neighbourhood_data.append(_,ignore_index = True)
neighbourhood_data['Neighbourhood'] = neighbourhood_data['Neighbourhood'] + ', Bengaluru'
neighbourhood_data.sample(5)

Unnamed: 0,Region,Neighbourhood
3,Central,"Jeevanbheemanagar, Bengaluru"
16,Eastern,"Marathahalli, Bengaluru"
26,Northern,"Hebbal, Bengaluru"
20,North-Eastern,"HBR Layout, Bengaluru"
29,Northern,"Peenya, Bengaluru"


## Geolocator for coordinates
The geolocator library is used for obtain the coordinates for a particular address.

In [6]:
geolocator = Nominatim(user_agent="bengaluru_explorer")
latitudes = []
longitudes = []
for neighbourhood in neighbourhood_data['Neighbourhood']:
  location = geolocator.geocode(neighbourhood)
  try:
    latitude = location.latitude
    longitude = location.longitude
    print('The geograpical coordinate of {} are {}, {}.'.format(neighbourhood, latitude, longitude))
  except:
    neighbourhood_modified = re.sub('Bengaluru','Bangalore',neighbourhood)
    location = geolocator.geocode(neighbourhood_modified)
    print('Modified Neighbourhood -->', neighbourhood_modified)
    try:
      latitude = location.latitude
      longitude = location.longitude
      print('The geograpical coordinate of {} are {}, {}.'.format(neighbourhood_modified, latitude, longitude))
    except:
      latitude = -1
      longitude = -1
      print("We couldn't obtain the corrdinates for {}.".format(neighbourhood))
  latitudes.append(latitude)
  longitudes.append(longitude)

neighbourhood_data['Latitude'] = latitudes
neighbourhood_data['Longitude'] = longitudes

The geograpical coordinate of Cantonment area, Bengaluru are 8.46810875, -13.2505252481029.
The geograpical coordinate of Domlur, Bengaluru are 12.9624669, 77.6381958.
The geograpical coordinate of Indiranagar, Bengaluru are 12.9732913, 77.6404672.
Modified Neighbourhood --> Jeevanbheemanagar, Bangalore
We couldn't obtain the corrdinates for Jeevanbheemanagar, Bengaluru.
The geograpical coordinate of Malleswaram, Bengaluru are 13.0163411, 77.5586641823841.
The geograpical coordinate of Pete area, Bengaluru are 48.4373068, -124.099707.
The geograpical coordinate of Sadashivanagar, Bengaluru are 13.0077079, 77.5795893.
The geograpical coordinate of Seshadripuram, Bengaluru are 12.9931876, 77.5753419.
The geograpical coordinate of Shivajinagar, Bengaluru are 12.986391, 77.6075416.
The geograpical coordinate of Ulsoor, Bengaluru are 12.9778793, 77.6246697.
The geograpical coordinate of Vasanth Nagar, Bengaluru are 12.98872125, 77.5851687760182.
The geograpical coordinate of Bellandur, Beng

## Filtering for correct coordinates
The latitude and longitude values of Bengaluru lie within a given set of values. Hence filtering only the possible values of coordinates for the city of Bengaluru

In [7]:
location_issue = neighbourhood_data[~((neighbourhood_data['Latitude'] > 12) & 
                                      (neighbourhood_data['Latitude'] < 14) & 
                                      (neighbourhood_data['Longitude'] > 77) & 
                                      (neighbourhood_data['Longitude'] < 78))]

neighbourhood_data = neighbourhood_data[(neighbourhood_data['Latitude'] > 12) & 
                                        (neighbourhood_data['Latitude'] < 14) & 
                                        (neighbourhood_data['Longitude'] > 77) & 
                                        (neighbourhood_data['Longitude'] < 78)]

neighbourhood_data = neighbourhood_data.reset_index()
print(f"""The coordinates for these {len(location_issue)} neighbourhoods: 
{location_issue['Neighbourhood'].tolist()} 
couldn't be obtained. 
Hence I've removed them from analysis""")
print('The final shape of neighbourhood data is', neighbourhood_data.shape) 

The coordinates for these 5 neighbourhoods: 
['Cantonment area, Bengaluru', 'Jeevanbheemanagar, Bengaluru', 'Pete area, Bengaluru', 'CV Raman Nagar, Bengaluru', 'R. T. Nagar, Bengaluru'] 
couldn't be obtained. 
Hence I've removed them from analysis
The final shape of neighbourhood data is (60, 5)


In [8]:
neighbourhood_data.head()

Unnamed: 0,index,Region,Neighbourhood,Latitude,Longitude
0,1,Central,"Domlur, Bengaluru",12.962467,77.638196
1,2,Central,"Indiranagar, Bengaluru",12.973291,77.640467
2,4,Central,"Malleswaram, Bengaluru",13.016341,77.558664
3,6,Central,"Sadashivanagar, Bengaluru",13.007708,77.579589
4,7,Central,"Seshadripuram, Bengaluru",12.993188,77.575342


## Plotting the Initial Neighborhoods with Folium

In [9]:
location = geolocator.geocode('Bengaluru, India')
bengaluru_latitude = location.latitude
bengaluru_longitude = location.longitude
print(f"The coordinates for Bengaluru are: {bengaluru_latitude}, {bengaluru_longitude}")

The coordinates for Bengaluru are: 12.9791198, 77.5912997


In [10]:
map_bengaluru = folium.Map(location=[bengaluru_latitude, bengaluru_longitude], zoom_start=12)

for lat, lng, region, neighborhood in zip(neighbourhood_data['Latitude'], 
                                           neighbourhood_data['Longitude'], 
                                           neighbourhood_data['Region'], 
                                          neighbourhood_data['Neighbourhood']):
    label = '{}, {}'.format(neighborhood.split(',')[0], region)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bengaluru)  
    
map_bengaluru

## Getting Venues data from the foursquare API

In [30]:
CLIENT_ID = 'values removed' 
CLIENT_SECRET = 'values removed' 
VERSION = '20180605' 

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: values removed
CLIENT_SECRET:values removed


In [0]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT = 25):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [13]:
bengaluru_venues = getNearbyVenues(names=neighbourhood_data['Neighbourhood'],
                                   latitudes=neighbourhood_data['Latitude'],
                                   longitudes=neighbourhood_data['Longitude'], 
                                   radius = 2000
                                  )


Domlur, Bengaluru
Indiranagar, Bengaluru
Malleswaram, Bengaluru
Sadashivanagar, Bengaluru
Seshadripuram, Bengaluru
Shivajinagar, Bengaluru
Ulsoor, Bengaluru
Vasanth Nagar, Bengaluru
Bellandur, Bengaluru
Hoodi, Bengaluru
Krishnarajapuram, Bengaluru
Mahadevapura, Bengaluru
Marathahalli, Bengaluru
Varthur, Bengaluru
Whitefield, Bengaluru
Banaswadi, Bengaluru
HBR Layout, Bengaluru
Horamavu, Bengaluru
Kalyan Nagar, Bengaluru
Kammanahalli, Bengaluru
Lingarajapuram, Bengaluru
Ramamurthy Nagar, Bengaluru
Hebbal, Bengaluru
Jalahalli, Bengaluru
Mathikere, Bengaluru
Peenya, Bengaluru
Vidyaranyapura, Bengaluru
Yelahanka, Bengaluru
Yeshwanthpur, Bengaluru
Bommanahalli, Bengaluru
Bommasandra, Bengaluru
BTM Layout, Bengaluru
Electronic City, Bengaluru
HSR Layout, Bengaluru
Koramangala, Bengaluru
Madiwala, Bengaluru
Banashankari, Bengaluru
Basavanagudi, Bengaluru
Girinagar, Bengaluru
J. P. Nagar, Bengaluru
Jayanagar, Bengaluru
Kumaraswamy Layout, Bengaluru
Padmanabhanagar, Bengaluru
Uttarahalli, Benga

In [14]:
print('Shape of dataset:', bengaluru_venues.shape)
print('Unique neighborhood venue categories in our dataset:', bengaluru_venues['Venue Category'].nunique())

Shape of dataset: (1297, 7)
Unique neighborhood venue categories in our dataset: 138


In [15]:
bengaluru_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Anjanapura, Bengaluru",4,4,4,4,4,4
"Arekere, Bengaluru",25,25,25,25,25,25
"BTM Layout, Bengaluru",25,25,25,25,25,25
"Banashankari, Bengaluru",25,25,25,25,25,25
"Banaswadi, Bengaluru",25,25,25,25,25,25
"Basavanagudi, Bengaluru",25,25,25,25,25,25
"Basaveshwaranagar, Bengaluru",25,25,25,25,25,25
"Begur, Bengaluru",4,4,4,4,4,4
"Bellandur, Bengaluru",25,25,25,25,25,25
"Bommanahalli, Bengaluru",25,25,25,25,25,25


## Analysing Each Neighborhood

In [16]:
bengaluru_venues = bengaluru_venues.set_index('Neighborhood')
bengaluru_onehot = pd.get_dummies(bengaluru_venues[['Venue Category']], prefix="", prefix_sep="")
bengaluru_onehot = bengaluru_onehot.drop('Neighborhood', axis = 1)
bengaluru_onehot = bengaluru_onehot.reset_index()
bengaluru_onehot.head()

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Andhra Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Beer Garden,Bistro,Bookstore,Botanical Garden,Boutique,Bowling Alley,Breakfast Spot,Brewery,Building,Burger Joint,Burrito Place,Bus Station,Bus Stop,Café,Candy Store,Capitol Building,Chettinad Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Construction & Landscaping,Creperie,Cricket Ground,Cupcake Shop,Deli / Bodega,Department Store,...,Office,Pakistani Restaurant,Park,Performing Arts Venue,Pizza Place,Plaza,Pool,Pub,Punjabi Restaurant,Racetrack,Rajasthani Restaurant,Recreation Center,Residential Building (Apartment / Condo),Resort,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,South Indian Restaurant,Spa,Sporting Goods Shop,Steakhouse,Supermarket,Tea Room,Thai Restaurant,Theater,Toll Booth,Toy / Game Store,Trail,Train Station,Travel & Transport,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Women's Store,Yoga Studio
0,"Domlur, Bengaluru",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Domlur, Bengaluru",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Domlur, Bengaluru",0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Domlur, Bengaluru",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Domlur, Bengaluru",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [17]:
bengaluru_grouped = bengaluru_onehot.groupby('Neighborhood').mean().reset_index()
bengaluru_grouped

Unnamed: 0,Neighborhood,Afghan Restaurant,American Restaurant,Andhra Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Beer Garden,Bistro,Bookstore,Botanical Garden,Boutique,Bowling Alley,Breakfast Spot,Brewery,Building,Burger Joint,Burrito Place,Bus Station,Bus Stop,Café,Candy Store,Capitol Building,Chettinad Restaurant,Chinese Restaurant,Chocolate Shop,Clothing Store,Coffee Shop,Construction & Landscaping,Creperie,Cricket Ground,Cupcake Shop,Deli / Bodega,Department Store,...,Office,Pakistani Restaurant,Park,Performing Arts Venue,Pizza Place,Plaza,Pool,Pub,Punjabi Restaurant,Racetrack,Rajasthani Restaurant,Recreation Center,Residential Building (Apartment / Condo),Resort,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,South Indian Restaurant,Spa,Sporting Goods Shop,Steakhouse,Supermarket,Tea Room,Thai Restaurant,Theater,Toll Booth,Toy / Game Store,Trail,Train Station,Travel & Transport,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Women's Store,Yoga Studio
0,"Anjanapura, Bengaluru",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0
1,"Arekere, Bengaluru",0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,...,0.0,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"BTM Layout, Bengaluru",0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0
3,"Banashankari, Bengaluru",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0
4,"Banaswadi, Bengaluru",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.08,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Basavanagudi, Bengaluru",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.04,0.04,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.08,0.0,0.04,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,"Basaveshwaranagar, Bengaluru",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0
7,"Begur, Bengaluru",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"Bellandur, Bengaluru",0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.04,0.0,...,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Bommanahalli, Bengaluru",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.08,0.0,0.0,0.04,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0


In [0]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [19]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = bengaluru_grouped['Neighborhood']

for ind in np.arange(bengaluru_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bengaluru_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Anjanapura, Bengaluru",Pool,Lounge,Train Station,Residential Building (Apartment / Condo),Yoga Studio,Dive Bar,Falafel Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
1,"Arekere, Bengaluru",Ice Cream Shop,Pizza Place,Indian Restaurant,Café,Multiplex,Beer Garden,Department Store,Chinese Restaurant,Dumpling Restaurant,Rajasthani Restaurant
2,"BTM Layout, Bengaluru",Ice Cream Shop,Indian Restaurant,Bakery,Indie Movie Theater,Burger Joint,Gym,Italian Restaurant,Garden,Furniture / Home Store,Mediterranean Restaurant
3,"Banashankari, Bengaluru",Ice Cream Shop,Indian Restaurant,Fast Food Restaurant,South Indian Restaurant,Performing Arts Venue,Burger Joint,Seafood Restaurant,Breakfast Spot,Snack Place,Café
4,"Banaswadi, Bengaluru",Indian Restaurant,Ice Cream Shop,Korean Restaurant,BBQ Joint,Bakery,Bistro,Pizza Place,Pub,Chinese Restaurant,Falafel Restaurant


## Clustering with K means unsupervised learning

In [20]:
kclusters = 5

bengaluru_grouped_clustering = bengaluru_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bengaluru_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 2, 2, 2, 1, 1, 1, 1, 3, 2], dtype=int32)

In [23]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

bengaluru_merged = neighbourhood_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
bengaluru_merged = bengaluru_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighbourhood')

bengaluru_merged.head() # check the last columns!

Unnamed: 0,index,Region,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Central,"Domlur, Bengaluru",12.962467,77.638196,2,Ice Cream Shop,Indian Restaurant,Pub,Brewery,Spa,Music Venue,Nightclub,Coffee Shop,Hotel,Chocolate Shop
1,2,Central,"Indiranagar, Bengaluru",12.973291,77.640467,2,Ice Cream Shop,Café,Asian Restaurant,Pub,Bakery,Boutique,Brewery,Mexican Restaurant,Lounge,German Restaurant
2,4,Central,"Malleswaram, Bengaluru",13.016341,77.558664,3,Lounge,Multiplex,Indian Restaurant,Coffee Shop,Clothing Store,Bowling Alley,French Restaurant,Hotel,Pub,Movie Theater
3,6,Central,"Sadashivanagar, Bengaluru",13.007708,77.579589,2,Indian Restaurant,Coffee Shop,Hotel,Gym,Café,Chinese Restaurant,French Restaurant,Plaza,Department Store,Pub
4,7,Central,"Seshadripuram, Bengaluru",12.993188,77.575342,2,Indian Restaurant,Ice Cream Shop,Hotel,Donut Shop,Karnataka Restaurant,Clothing Store,Snack Place,Coffee Shop,Chinese Restaurant,Pub


## Map with Clustered Neighborhoods

In [24]:
# create map
map_clusters = folium.Map(location=[bengaluru_latitude, bengaluru_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bengaluru_merged['Latitude'], bengaluru_merged['Longitude'], bengaluru_merged['Neighbourhood'], 
                                  bengaluru_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Results 

### Cluster 0

In [25]:
names = [name[0] for name in bengaluru_merged["Neighbourhood"].str.split(',')]
bengaluru_merged['Neighbourhood'] = names
columns = ["Region","Neighbourhood","1st Most Common Venue", "2nd Most Common Venue", "3rd Most Common Venue"]
bengaluru_merged[bengaluru_merged['Cluster Labels'] == 0][columns]

Unnamed: 0,Region,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
13,Eastern,Varthur,Indian Restaurant,Multicuisine Indian Restaurant,Bakery
27,Northern,Yelahanka,Café,American Restaurant,Lake
34,South-Eastern,Koramangala,Café,Breakfast Spot,Bakery
35,South-Eastern,Madiwala,Café,Mobile Phone Shop,Breakfast Spot
38,Southern,Girinagar,Café,Breakfast Spot,Fast Food Restaurant
49,Southern suburbs,Kothnur,Café,Indian Restaurant,Department Store
54,Western,Nagarbhavi,Café,Indian Restaurant,Supermarket
58,Western,Rajarajeshwari Nagar,Café,Pizza Place,Arcade


### Cluster 1

In [26]:
bengaluru_merged[bengaluru_merged['Cluster Labels'] == 1][columns]

Unnamed: 0,Region,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
12,Eastern,Marathahalli,Indian Restaurant,BBQ Joint,Vegetarian / Vegan Restaurant
15,North-Eastern,Banaswadi,Indian Restaurant,Ice Cream Shop,Korean Restaurant
16,North-Eastern,HBR Layout,Indian Restaurant,BBQ Joint,Fast Food Restaurant
17,North-Eastern,Horamavu,Indian Restaurant,Bistro,Coffee Shop
21,North-Eastern,Ramamurthy Nagar,Indian Restaurant,Department Store,Supermarket
22,Northern,Hebbal,Indian Restaurant,Ice Cream Shop,Vegetarian / Vegan Restaurant
23,Northern,Jalahalli,Dessert Shop,Indian Restaurant,Bakery
37,Southern,Basavanagudi,Indian Restaurant,Breakfast Spot,Sandwich Place
39,Southern,J. P. Nagar,Indian Restaurant,Coffee Shop,Performing Arts Venue
40,Southern,Jayanagar,Indian Restaurant,Ice Cream Shop,Sandwich Place


### Cluster 2

In [27]:
bengaluru_merged[bengaluru_merged['Cluster Labels'] == 2][columns]

Unnamed: 0,Region,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,Central,Domlur,Ice Cream Shop,Indian Restaurant,Pub
1,Central,Indiranagar,Ice Cream Shop,Café,Asian Restaurant
3,Central,Sadashivanagar,Indian Restaurant,Coffee Shop,Hotel
4,Central,Seshadripuram,Indian Restaurant,Ice Cream Shop,Hotel
5,Central,Shivajinagar,Indian Restaurant,Café,Pub
6,Central,Ulsoor,Hotel,Ice Cream Shop,Brewery
7,Central,Vasanth Nagar,Hotel,Indian Restaurant,Chinese Restaurant
9,Eastern,Hoodi,Coffee Shop,Hotel,Brewery
14,Eastern,Whitefield,Hotel,Café,Indian Restaurant
18,North-Eastern,Kalyan Nagar,BBQ Joint,Ice Cream Shop,Indian Restaurant


### Cluster 3

In [28]:
bengaluru_merged[bengaluru_merged['Cluster Labels'] == 3][columns]

Unnamed: 0,Region,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
2,Central,Malleswaram,Lounge,Multiplex,Indian Restaurant
8,Eastern,Bellandur,Lounge,Hotel,Italian Restaurant
10,Eastern,Krishnarajapuram,Coffee Shop,Shopping Mall,Donut Shop
11,Eastern,Mahadevapura,Coffee Shop,Multiplex,Donut Shop
28,Northern,Yeshwanthpur,Multiplex,Lounge,Coffee Shop
32,South-Eastern,Electronic City,Lounge,Hotel,Italian Restaurant
53,Western,Mahalakshmi Layout,Multiplex,Coffee Shop,Clothing Store
55,Western,Nandini Layout,Multiplex,Lounge,Bowling Alley


### Cluster 4

In [29]:
bengaluru_merged[bengaluru_merged['Cluster Labels'] == 4][columns]

Unnamed: 0,Region,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
44,Southern suburbs,Anjanapura,Pool,Lounge,Train Station
