# Battle of the Neigborhoods
## Capstone Project - Applied Data Science specialization

Author - Abhinav Paul || Created on 27/04/2020

### Problem Statement


For people coming to Bangalore it can be difficult to explore the city in a targeted way. There is a need of an exploritory page that can provide information on similar Niegborhoods. The similarity can be based on the types of venue categories which are most frequently available in a given Neighborhood of Bangalore. 

This page will be exploring Bangalore for what kind of venues are mainly available in any given Neighborhoods of the City and cluster similar Neighborhoods. I have used the method, which is same as Week 3 Assignment, of K-mean clustering for the process.

### Data Sources

**Postal Code Raw Data** -  https://data.gov.in/catalog/all-india-pincode-directory

    Content - Pincode, Suboffice name, Taluka , District, State and other official data
    Pre-Processing - Filtered data to keep District 'Bangalore' and to keep following columns. Re-labelled them for better understanding
        1. pincode     : PostalCode
        2. officename  : Neighborhood
        3. Districtname: City
        4. latitude
        5. longitude

**Venue Data** - Foursqaure API : https://api.foursquare.com/v2/venues/explore

    Content - Venue, Venue latitude, Venue longitude, Neighborhood Name, Neighborhood latitude, Neighborhood longitude
    Pre-processing - Data was extracted and renamed as per week 3 assignment. Used the function "getNearbyVenues" to get the details

In [1]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 

import os #to change directory

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium # plotting library

print('Libraries imported.')

Libraries imported.


### Data Import and Pre-processing

In [2]:
os.chdir('/Users/abhinavpaul/Documents/GitHub/Coursera_capstone/Capstone project')
#os.getcwd() # to check the change
# importing and filtering for Bangalore District Data into a dataframe
path = '{}/all_india_PO_list_without_APS_offices_ver2_lat_long.csv'.format(os.getcwd())
raw_data = pd.read_csv(path).dropna(subset=['longitude','latitude']).reset_index()
Bang_raw_data = raw_data[(raw_data['Districtname']=='Bangalore')].reset_index()
Bangalore_data = Bang_raw_data[['pincode','officename','Districtname','longitude','latitude']].dropna()
Bangalore_data.rename(columns={'pincode':'PostalCode','officename':'Neighborhood','Districtname':'City'},inplace=True)
Bangalore_data.head()

Unnamed: 0,PostalCode,Neighborhood,City,longitude,latitude
0,560045,Arabic College S.O,Bangalore,77.6206,13.0291
1,560103,Bellandur S.O,Bangalore,77.676,12.9298
2,560071,Domlur S.O,Bangalore,77.6359,12.9611
3,560077,Dr. Shivarama Karanth Nagar S.O,Bangalore,77.6293,13.0681
4,560005,Fraser Town S.O,Bangalore,77.6164,13.0005


### Mapping out the Neighborhood data available to be explored

In [3]:
address = 'Bangalore'

geolocator = Nominatim(user_agent='Bangalore_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Bangalore are {}, {}.\nBelow are the Nieghborhoods I will be exploring.'.format(latitude, longitude))

# create map of New York using latitude and longitude values
map_bangalore = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(Bangalore_data['latitude'],
                                           Bangalore_data['longitude'],
                                           Bangalore_data['Neighborhood'],
                                           Bangalore_data['City']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_bangalore)  
    
map_bangalore

The geograpical coordinate of Bangalore are 12.9791198, 77.5912997.
Below are the Nieghborhoods I will be exploring.


### Foursquare credentials upload

In [4]:
CLIENT_ID = 'TBPLKSLPDCOMMX2YIDPJ5UG2NRWHG4YOB3VXMPJI5YYP54H3' # your Foursquare ID
CLIENT_SECRET = 'OV15HD3ZH2152BWBWDWFSKBEONQJNSQCJAI0S4BO0WLFWLPE' # your Foursquare Secret
VERSION = '20200101' # Foursquare API version

print('Credentails Uploaded')

Credentails Uploaded


### Defining fuction to extract categories and listing the Neighborhood

In [5]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, LIMIT=150):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    result = requests.get(url).json()
    return nearby_venues, result

# type your answer here
Bangalore_venues ,Bang_result = getNearbyVenues(names=Bangalore_data['Neighborhood'],
                                   latitudes=Bangalore_data['latitude'],
                                   longitudes=Bangalore_data['longitude']
                                  )

Arabic College S.O
Bellandur S.O
Domlur S.O
Dr. Shivarama Karanth Nagar S.O
Fraser Town S.O
G.K.V.K. S.O
H.A.L II Stage H.O
HighCourt S.O
Jeevanbhimanagar S.O
Kothanur S.O
Mahatma Gandhi Road S.O
Marathahalli Colony S.O
NAL S.O
New Thippasandra S.O
Sadashivanagar S.O
Sahakaranagar P.O S.O
Vimanapura S.O
Yelahanka S.O
Yelahanka Satellite Town S.O
Bangalore G.P.O. 
Ashoknagar S.O (Bangalore)
B Sk II Stage S.O
Bannerghatta Road S.O
Basavanagudi H.O
Bommanahalli S.O (Bangalore)
Bommasandra Industrial Estate S.O
Carmelram S.O
Chandapura S.O
Dharmaram College S.O
Electronics City S.O
Gottigere S.O
HSR Layout S.O
Hulimavu S.O
J P Nagar S.O
Jayanagar H.O
Jayangar III Block S.O
JP Nagar III Phase S.O
Koramangala VI Bk S.O
Mico Layout S.O
Padmanabhnagar S.O
St. John's Medical College S.O
Gayathrinagar S.O
Jalahalli East S.O
Jalahalli H.O
Mahalakshmipuram Layout S.O
Malleswaram S.O
Malleswaram West S.O
Mathikere S.O
Msrit S.O
Palace Guttahalli S.O
Rajajinagar H.O
Rajajinagar IVth Block S.O
Swimmi

### Venue data check and pre-processed to avoid Indian restaurant

In [6]:
#Bangalore_venues = Bangalore_venues[Bangalore_venues['Venue Category']!=r'^.$Restaurant'].reset_index(drop=True)
Bangalore_venues.dropna(inplace=True)
print('There are {} uniques categories.\nData looks like ...'.format(len(Bangalore_venues['Venue Category'].unique())))
Bangalore_venues.head()

There are 150 uniques categories.
Data looks like ...


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Arabic College S.O,13.0291,77.6206,New Krishna Sagar,13.026125,77.622722,Indian Restaurant
1,Arabic College S.O,13.0291,77.6206,Mateen Marketing,13.033118,77.619645,Furniture / Home Store
2,Bellandur S.O,12.9298,77.676,Kicks On Grass,12.930045,77.679679,Soccer Field
3,Bellandur S.O,12.9298,77.676,Cafe Coffee Day Central 3,12.926107,77.675755,Café
4,Bellandur S.O,12.9298,77.676,McDonald's,12.927228,77.675688,Fast Food Restaurant


### K-Mean Clustering of the Nieghborhoods based on the venue into 5 Cluster

In [7]:
# one hot encoding
bangalore_onehot = pd.get_dummies(Bangalore_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
bangalore_onehot['Neighborhood'] = Bangalore_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [bangalore_onehot.columns[-1]] + list(bangalore_onehot.columns[:-1])
bangalore_onehot = bangalore_onehot[fixed_columns]

bangalore_grouped = bangalore_onehot.groupby('Neighborhood').mean().reset_index()

# set number of clusters
kclusters = 5

bangalore_grouped_clustering = bangalore_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bangalore_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = bangalore_grouped['Neighborhood']

for ind in np.arange(bangalore_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bangalore_grouped.iloc[ind, :], num_top_venues)

# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

bangalore_merged = Bangalore_data

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
bangalore_merged = bangalore_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

bangalore_merged['Cluster Labels']=bangalore_merged['Cluster Labels'].fillna(0.0).astype(int) # check the last columns!

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bangalore_merged['latitude'],
                                  bangalore_merged['longitude'],
                                  bangalore_merged['Neighborhood'],
                                  bangalore_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Cluster 1 details

In [8]:
bangalore_merged.loc[bangalore_merged['Cluster Labels'] == 0, bangalore_merged.columns[[1] + list(range(5, bangalore_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Arabic College S.O,0,Indian Restaurant,Furniture / Home Store,Donut Shop,Financial or Legal Service,Fast Food Restaurant
2,Domlur S.O,0,Indian Restaurant,Café,Pizza Place,American Restaurant,Andhra Restaurant
4,Fraser Town S.O,0,Indian Restaurant,Bakery,Fast Food Restaurant,Chinese Restaurant,Middle Eastern Restaurant
8,Jeevanbhimanagar S.O,0,Indian Restaurant,Café,Chinese Restaurant,Dessert Shop,Fried Chicken Joint
13,New Thippasandra S.O,0,Indian Restaurant,Chinese Restaurant,Food Truck,Department Store,Grocery Store
15,Sahakaranagar P.O S.O,0,Indian Restaurant,Ice Cream Shop,Brewery,Italian Restaurant,Fast Food Restaurant
18,Yelahanka Satellite Town S.O,0,Indian Restaurant,Bus Station,Ice Cream Shop,Dessert Shop,Seafood Restaurant
19,Bangalore G.P.O.,0,Indian Restaurant,Café,Metro Station,Asian Restaurant,Dance Studio
20,Ashoknagar S.O (Bangalore),0,Indian Restaurant,Athletics & Sports,Breakfast Spot,Szechuan Restaurant,Juice Bar
23,Basavanagudi H.O,0,Indian Restaurant,Fast Food Restaurant,Bakery,Hookah Bar,Food Truck


### Cluster 2 details

In [9]:
bangalore_merged.loc[bangalore_merged['Cluster Labels'] == 1, bangalore_merged.columns[[1] + list(range(5, bangalore_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
42,Jalahalli East S.O,1,Indian Restaurant,Flea Market,Financial or Legal Service,Fast Food Restaurant,Farmers Market


### Cluster 3 details

In [10]:
bangalore_merged.loc[bangalore_merged['Cluster Labels'] == 2, bangalore_merged.columns[[1] + list(range(5, bangalore_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,Dr. Shivarama Karanth Nagar S.O,2,Pharmacy,Café,Dumpling Restaurant,Asian Restaurant,Fast Food Restaurant
5,G.K.V.K. S.O,2,Basketball Court,Garden,Women's Store,Flea Market,Financial or Legal Service
6,H.A.L II Stage H.O,2,Italian Restaurant,Pub,Indian Restaurant,Sports Bar,Lounge
7,HighCourt S.O,2,Metro Station,Capitol Building,Dog Run,Tennis Stadium,Park
9,Kothanur S.O,2,Coffee Shop,Bakery,Pizza Place,Mediterranean Restaurant,Italian Restaurant
10,Mahatma Gandhi Road S.O,2,Café,Indian Restaurant,Pub,Hotel,Donut Shop
11,Marathahalli Colony S.O,2,Road,Shoe Store,Fried Chicken Joint,Dessert Shop,Dog Run
12,NAL S.O,2,Café,Indian Restaurant,Bar,Middle Eastern Restaurant,Coffee Shop
14,Sadashivanagar S.O,2,Coffee Shop,Café,Indian Restaurant,Seafood Restaurant,Gym
16,Vimanapura S.O,2,ATM,Bus Stop,Food Truck,Antique Shop,Farmers Market


### Cluster 4 details

In [11]:
bangalore_merged.loc[bangalore_merged['Cluster Labels'] == 3, bangalore_merged.columns[[1] + list(range(5, bangalore_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
39,Padmanabhnagar S.O,3,Convenience Store,Snack Place,Dog Run,Fast Food Restaurant,Farmers Market


### Cluster 5 details

In [12]:
bangalore_merged.loc[bangalore_merged['Cluster Labels'] == 4, bangalore_merged.columns[[1] + list(range(5, bangalore_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Bellandur S.O,4,Indian Restaurant,Fast Food Restaurant,Café,Kerala Restaurant,Pizza Place
28,Dharmaram College S.O,4,Fast Food Restaurant,Indian Restaurant,Plaza,Sandwich Place,Breakfast Spot
43,Jalahalli H.O,4,Indian Restaurant,Plaza,Shopping Mall,Fast Food Restaurant,Cupcake Shop
48,Msrit S.O,4,Indian Restaurant,Fast Food Restaurant,Diner,Bus Station,Donut Shop
50,Rajajinagar H.O,4,Fast Food Restaurant,Café,Bakery,Donut Shop,Financial or Legal Service


### Thank-you for using and exploring the Data. 