# Exploring Hospitals in the neighbourhoods of Bangalore City

## Introduction
<p>Bangalore, also known as Bengaluru (Kannada) is the capital of the Indian State of Karnataka. Bangalore is nicknamed the Garden City and was once called a Pensioner's Paradise. Located on the Deccan Plateau in the south-eastern part of Karnataka, Bangalore is India's third most populous city.
Today as a large city and growing metropolis, Bangalore is home to many of the most well-recognised colleges and research institutions in India. Numerous public sector heavy industries, software companies, aerospace, telecommunications, and defence organizations are located in the city. Bangalore is known as the Silicon Valley of India because of its position as the nation's leading IT exporter. A demographically diverse city, Bangalore is a major economic and cultural hub and the fastest growing major metropolis in India.</p>

    
### Problem Description
<p>Bangalore has many hospitals. The main objective of the problem is to explore the neighbourhoods of Bangalore and find the 
    number of hospitals in each neighbourhood.We need to make a data analysis of the number of hospitals in each neighbourhood 
    using suitable clustering algorithm.<br>The idea here is to recognize those areas in Bangalore having minimum number of    hospitals and guide the stake holder in constructing the hospitals in these areas. </p>

### Methodology:
  <p>We need to first collect the data corresponding to all areas in Bangalore which will have location co-ordinates.
     Next using this data ,we have to use the FourSquare API to explore these neighbourhoods and visualize them on a map.
     Further hospital statistics need to be computed in each neighbourhood.
     We have to cluster the neighbourhoods based on the number of hospitals.
     The information in these clusters will guide the stake holder to detect the optimal areas for construction of hospitals.</p>


###  Dataset:
I am using Bangalore neighbourhoods dataset downloaded  from Kaggle 'https://www.kaggle.com/rmenon1998/bangalore-neighborhoods'
which has the location co-ordinates of each region of Bangalore.


### Importing the necessary Libraries for the project

In [1]:
#Installing the packages
#get_ipython().system(u' pip install --upgrade pip')
#get_ipython().system(u' pip install beautifulsoup4')
#!pip install lxml
#!pip install html5lib
#!pip install requests

#Importing packages
from bs4 import BeautifulSoup
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation

#!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
    
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize
# import k-means from clustering stage
from sklearn.cluster import KMeans
#!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library
import matplotlib.cm as cm
import matplotlib.colors as colors
print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


### Importing the data set and displaying the dataset

In [2]:
df = pd.read_csv("blr_neighbourhood.csv")
df.head()

Unnamed: 0.1,Unnamed: 0,Neighborhood,Latitude,Longitude
0,0,Agram,45.813177,15.977048
1,1,Amruthahalli,13.066513,77.596624
2,2,Attur,11.663711,78.533551
3,3,Banaswadi,13.014162,77.651854
4,4,Bellandur,58.235358,26.683116


In [3]:
df.shape

(352, 4)

### Data Preprocessing.Removing Columns without Labels and duplicates

In [4]:
df.drop(columns=['Unnamed: 0'],axis=1,inplace=True)
df.drop_duplicates(inplace=True)
#df_post
df_new = pd.DataFrame(df)


In [5]:
df_new.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Agram,45.813177,15.977048
1,Amruthahalli,13.066513,77.596624
2,Attur,11.663711,78.533551
3,Banaswadi,13.014162,77.651854
4,Bellandur,58.235358,26.683116


In [6]:
df_new.shape

(329, 3)

In [7]:
df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Agram,45.813177,15.977048
1,Amruthahalli,13.066513,77.596624
2,Attur,11.663711,78.533551
3,Banaswadi,13.014162,77.651854
4,Bellandur,58.235358,26.683116


### Defining credentials using FourSquare API

In [121]:
CLIENT_ID = 'ZOGYZYX4FJBVRUACLNGHYT1VAMOPMM4UEEMNN3DMXNX1CUEF' # your Foursquare ID
CLIENT_SECRET = 'GO4VKJZVNWY3ZHEL4AVRNWMPV4AQMRKV3WIRGVMPQOULXLAU' # your Foursquare Secret
#CLIENT_ID='VQEMV3MW5RJ3JOJ2VFAUYG50KYQY5UEIEQBQVOCPMWFOSUOI'
#CLIENT_SECRET='B5BQIFVBHPOQIIBKXY2IKSWADTNNPUZLAVZDAKN2BMZFQEJF'
VERSION = '20180605'
LIMIT = 30
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)


Your credentails:
CLIENT_ID: VQEMV3MW5RJ3JOJ2VFAUYG50KYQY5UEIEQBQVOCPMWFOSUOI
CLIENT_SECRET:B5BQIFVBHPOQIIBKXY2IKSWADTNNPUZLAVZDAKN2BMZFQEJF


### Getting the latitude and longitude of Bangalore

In [9]:
address = 'Bangalore, BLR'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Bangalore are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Bangalore are 13.196697, 77.70758655918868.


### Plotting the neighbourhood locations on Bangalore Map

In [10]:
# create map of Bangalore using latitude and longitude values
map_blr = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(df_new['Latitude'], df_new['Longitude'], df_new['Neighborhood']):
    label = '{}'.format(neighborhood)
    
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_blr)  
    
map_blr


#### Getting Hospital Data Based on Latitude  and Longitude of Neighbourhood places in Bangalore

In [132]:
def get_hospital_data(lat, lng, neighborhood):
    """
    We are going to utilize foursquare API to fetch hospital data. It will take latitudlongitude and return hospital information.
    """
    radius = 1000
    LIMIT = 100
    catid='4bf58dd8d48988d196941735'
    url = 'https://api.foursquare.com/v2/venues/search?ll={},{}&categoryId={}&client_id={}&client_secret={}&limit={}&radius={}&v={}'.format(
            lat, 
            lng,
            catid,
            CLIENT_ID, 
            CLIENT_SECRET, 
            LIMIT,
            radius,
            VERSION
           
            )
    response = requests.get(url)
    if not response.status_code == 200:
        print("ERROR", response.status_code, response.content)
        return None
    results = response.json()
    venue_data = results["response"]["venues"]
    venue_details = []
    for row in venue_data:
        try:
            venue_id = row['id']
            venue_name = row['name']
            lat = row["location"]["lat"]
            lng = row["location"]["lng"]
            venue_details.append(
                [venue_id, venue_name, lat, lng,neighborhood])
        except KeyError:
            pass

    column_names = ['ID', 'Name', 'Latitude',
                    'Longitude', "Neighborhood"]
    df = pd.DataFrame(venue_details, columns=column_names)
    return df

### Function is used to fetch hospitals located in Each Neighbourhood

In [136]:

def get_hospital_per_neighborhood(df):
    """
    It will utilize Bangalore_neighbourhood dataset to get neighborhood data.
    """
    column_names = ['ID', 'Name', 'Latitude',
                    'Longitude', "Neighborhood"]
    data = []
    for i, row in df.iterrows():
        h_df = get_hospital_data(
            row["Latitude"], row["Longitude"], row["Neighborhood"])
        if h_df is not None:
            for x, hrow in h_df.iterrows():
                data.append([hrow[column] for column in column_names])

    n_df = pd.DataFrame(data, columns=column_names)
    n_df.to_csv('hospital.csv')
    return n_df



### Returns the hospital DataFrame

In [135]:
hospital_df = get_hospital_per_neighborhood(df_new)

#Blr_venues.head()


ERROR 429 b'{"meta":{"code":429,"errorType":"quota_exceeded","errorDetail":"Quota exceeded","requestId":"5ed8b805b9a389001b04402f"},"response":{}}'
ERROR 429 b'{"meta":{"code":429,"errorType":"quota_exceeded","errorDetail":"Quota exceeded","requestId":"5ed8b80c1835dd001badd126"},"response":{}}'
ERROR 429 b'{"meta":{"code":429,"errorType":"quota_exceeded","errorDetail":"Quota exceeded","requestId":"5ed8b8456d8c56001ba2b8c9"},"response":{}}'
ERROR 429 b'{"meta":{"code":429,"errorType":"quota_exceeded","errorDetail":"Quota exceeded","requestId":"5ed8b8d79fcb92001beee6a8"},"response":{}}'
ERROR 429 b'{"meta":{"code":429,"errorType":"quota_exceeded","errorDetail":"Quota exceeded","requestId":"5ed8b81e0cc1fd001bf53858"},"response":{}}'
ERROR 429 b'{"meta":{"code":429,"errorType":"quota_exceeded","errorDetail":"Quota exceeded","requestId":"5ed8b8950de0d9001ba4df21"},"response":{}}'
ERROR 429 b'{"meta":{"code":429,"errorType":"quota_exceeded","errorDetail":"Quota exceeded","requestId":"5ed8b84

ERROR 429 b'{"meta":{"code":429,"errorType":"quota_exceeded","errorDetail":"Quota exceeded","requestId":"5ed8b8701e152c001bc85bf9"},"response":{}}'
ERROR 429 b'{"meta":{"code":429,"errorType":"quota_exceeded","errorDetail":"Quota exceeded","requestId":"5ed8b8d9be61c9001bd94d1c"},"response":{}}'
ERROR 429 b'{"meta":{"code":429,"errorType":"quota_exceeded","errorDetail":"Quota exceeded","requestId":"5ed8b84c0cc1fd001bf5a790"},"response":{}}'
ERROR 429 b'{"meta":{"code":429,"errorType":"quota_exceeded","errorDetail":"Quota exceeded","requestId":"5ed8b8ba542890001be50803"},"response":{}}'
ERROR 429 b'{"meta":{"code":429,"errorType":"quota_exceeded","errorDetail":"Quota exceeded","requestId":"5ed8b869b1cac0001b10ad21"},"response":{}}'


Unnamed: 0,ID,Name,Latitude,Longitude,Neighborhood
0,4f571240e4b01cdf1e06d991,KBC Sestre milosrdnice - referentni centar za ...,45.811628,15.967922,Agram
1,4c930a5c6cfea093b610b78b,KBC Šalata,45.818408,15.983909,Agram
2,4f7d5a07e4b09204d24d545f,Porta Klinike Za Djecje Bolesti Klaiceva,45.809175,15.964371,Agram
3,4d5ea5b55c39b1f7c231ee49,Bolnica Runjaninova,45.804951,15.96943,Agram
4,4f7a097fe4b0d8cda6c75113,RTG Klinike Za Djecje Bolesti Klaiceva,45.809635,15.964857,Agram


### Displays hospital information in each Neighbourhood

In [138]:
hospital_df.head(30)


Unnamed: 0,ID,Name,Latitude,Longitude,Neighborhood
0,4f571240e4b01cdf1e06d991,KBC Sestre milosrdnice - referentni centar za ...,45.811628,15.967922,Agram
1,4c930a5c6cfea093b610b78b,KBC Šalata,45.818408,15.983909,Agram
2,4f7d5a07e4b09204d24d545f,Porta Klinike Za Djecje Bolesti Klaiceva,45.809175,15.964371,Agram
3,4d5ea5b55c39b1f7c231ee49,Bolnica Runjaninova,45.804951,15.96943,Agram
4,4f7a097fe4b0d8cda6c75113,RTG Klinike Za Djecje Bolesti Klaiceva,45.809635,15.964857,Agram
5,4f7a1261e4b0a76294aea23c,Operacijska Sala Klinike Za Djecje Bolesti Kla...,45.809313,15.963948,Agram
6,4dd13d68d4c065592fc125cb,Psihijatrijska bolnica za djecu i mladez,45.81463,15.963653,Agram
7,4f76b785e4b0b009ed9630d4,C-Urologija,45.809372,15.963305,Agram
8,4f8abde2e4b029818b66e176,klinika za djecje bolesti Garderoba,45.809088,15.96439,Agram
9,4d9d7d07c99fb60c0b0ec88b,Napuštena Vojna Bolnica,45.813908,15.989718,Agram



### Grouping the Hospitals by Neighbourhood and getting the count of Hospitals in Each Neighbourhood

In [221]:
count_hosp=pd.DataFrame(hospital_df.groupby(['Neighborhood'],as_index=False).count())

count_hosp.head()

#count_hosp.columns

Unnamed: 0,Neighborhood,ID,Name,Latitude,Longitude
0,Adugodi,4,4,4,4
1,Agram,24,24,24,24
2,Amruthahalli,3,3,3,3
3,Anekal,1,1,1,1
4,Banaswadi,13,13,13,13


### One hot Encoding done to transform categorical values

In [149]:
# one hot encoding
hosp_onehot = pd.get_dummies(hospital_df[['Name']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
hosp_onehot['Neighborhood'] = hospital_df['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [hosp_onehot.columns[-1]] + list(hosp_onehot.columns[:-1])
hosp_onehot = hosp_onehot[fixed_columns]

hosp_onehot.head()

Unnamed: 0,Neighborhood,4th floor BGS hospital,A. V. Multispeciality Hospital,ACTS College,AVD 53 NÄL,AVD 64 NÄL,Aarthi Scans,Acura Speciality Hospital,Aditya nethralaya,Akshaya Nethralaya,...,pobbati maternity home,pristine consultation and diagnostics,rao's maternity hospital,recoup,"sagar appolo hospital,indiranagar",sriranga plumanary clinic,the sagar clinic,uzv klinike za djecje bolesti,vijai hospital,Öron-näs-halsmott
0,Agram,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,Agram,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,Agram,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,Agram,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,Agram,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [150]:
hosp_grouped = hosp_onehot.groupby('Neighborhood').mean().reset_index()
hosp_grouped

Unnamed: 0,Neighborhood,4th floor BGS hospital,A. V. Multispeciality Hospital,ACTS College,AVD 53 NÄL,AVD 64 NÄL,Aarthi Scans,Acura Speciality Hospital,Aditya nethralaya,Akshaya Nethralaya,...,pobbati maternity home,pristine consultation and diagnostics,rao's maternity hospital,recoup,"sagar appolo hospital,indiranagar",sriranga plumanary clinic,the sagar clinic,uzv klinike za djecje bolesti,vijai hospital,Öron-näs-halsmott
0,Adugodi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Agram,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0
2,Amruthahalli,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Anekal,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Banaswadi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Basaveshwaranagar,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Bhattarahalli,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0
7,Byatarayanapura,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Chickpet,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,Chikkalasandra,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### K -means Clustering Based on groups
Here we are doing 2 types of analysis.
 1. First type of clustering is creating cluster groups containing nearby neighbourhoods(Places closer to each other) with their hospital information.This is mainly to see the area statistics with hospital information
 2. Second type  of clustering is grouping the neighbourhoods based on the count of hospitals.
     a) Areas with very less number of Hospitals are put in1 cluster.
     b) Areas with maximum number of hospitals are put in another cluster and so on

#### First type of Clustering

In [151]:
# set number of clusters
kclusters = 5

hosp_grouped_clustering = hosp_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(hosp_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 3, 0, 0, 0])

##### Identify and display the hospital frequency in top 5 venues

In [152]:
num_top_venues = 5

for hood in hosp_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = hosp_grouped[hosp_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adugodi----
                                   venue  freq
0                             M S Clinic  0.25
1  pristine consultation and diagnostics  0.25
2                       Ayaansh Hospital  0.25
3                    Prasad Eye Hospital  0.25
4                        Promed Hospital  0.00


----Agram----
                              venue  freq
0           Napuštena Vojna Bolnica  0.04
1      gospodarski poslovi klaiceve  0.04
2  Klinika za dječje bolesti Zagreb  0.04
3             Klinika za ortopediju  0.04
4                        Kod Mesara  0.04


----Amruthahalli----
                                       venue  freq
0              Motherhood Maternity Hospital  0.33
1             Amrutha Hospital, Amruthahalli  0.33
2  North Side Hospital and Diagnostic Centre  0.33
3  Samatvam Diabetes and Endocrinolgy center  0.00
4                      Psychiatry Department  0.00


----Anekal----
                                        venue  freq
0                    Ganga specialty

4      Manjunatha Maternity Home  0.12


----Vidyaranyapura----
                               venue  freq
0                    Spandana Clinic   0.5
1                  Kruthika Hospital   0.5
2             4th floor BGS hospital   0.0
3              Psychiatry Department   0.0
4  People Tree Hospitals @Ragavendra   0.0


----Vijayanagar S.O (Bangalore)----
                                               venue  freq
0                                   Global Hospitals  0.12
1                                     Madhu Hospital  0.12
2  dr satyaprakash's Center for Digestive & Liver...  0.12
3                            The Eye Surgical Centre  0.12
4                                  Gayathri Hospital  0.12


----Vimanapura----
                            venue  freq
0  Institute Of Aerospace Medicne   0.2
1                      Cloud Nine   0.2
2             Cloud Nine Hospital   0.2
3                       CloudNine   0.2
4                    HAL Hospital   0.2


----Yelachenahalli----


### Return the most common venues

In [154]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [155]:
 num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = hosp_grouped['Neighborhood']

for ind in np.arange(hosp_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(hosp_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Adugodi,pristine consultation and diagnostics,Ayaansh Hospital,M S Clinic,Prasad Eye Hospital,HAL Hospital,Global Hospitals,Government Hospital Rajanukunte,Gunam Super Speciality Hospital,Öron-näs-halsmott,Garden city hospital
1,Agram,Magnet,Kod Mesara,Klinika za dječje bolesti Zagreb,Zavod za Sudsku Medicinu i Kriminalistiku,"Klaićeva, vađenje mandula",KBC Šalata,KBC Sestre milosrdnice - referentni centar za ...,Bolnica Runjaninova,Operacijska Sala Klinike Za Djecje Bolesti Kla...,Ortopticko-pleopticka ambulanta Klinike za Dje...
2,Amruthahalli,North Side Hospital and Diagnostic Centre,Motherhood Maternity Hospital,"Amrutha Hospital, Amruthahalli",Öron-näs-halsmott,Hitna,Global Hospitals,Government Hospital Rajanukunte,Gunam Super Speciality Hospital,HAL Hospital,"Hrvatski Institut za istraživanje mozga ""Neuron"""
3,Anekal,Ganga specialty hospital,Öron-näs-halsmott,Garden city hospital,K R Hospital,K K Hospital,Institute Of Aerospace Medicne,Indira Gandhi Institute of Child Health,Inchara ayur kuteera,"Hrvatski Institut za istraživanje mozga ""Neuron""",Hitna
4,Banaswadi,Specialist Hospital,Cloudnine Hospital,Dr. Ayyappa's Clinic,Chaya Hospital,Motherhood Hospital,NMPC hospital,Express Clinic,Plexus Neuro And Stem Cell Research Centre,Prime Orthopedic Care,Pranav Diaganostic Centre


In [160]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels1', kmeans.labels_)
neighborhoods_venues_sorted.head()

hosp_merged = df_new

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
hosp_merged = hosp_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

hosp_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Latitude,Longitude,Cluster Labels1,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agram,45.813177,15.977048,0.0,Magnet,Kod Mesara,Klinika za dječje bolesti Zagreb,Zavod za Sudsku Medicinu i Kriminalistiku,"Klaićeva, vađenje mandula",KBC Šalata,KBC Sestre milosrdnice - referentni centar za ...,Bolnica Runjaninova,Operacijska Sala Klinike Za Djecje Bolesti Kla...,Ortopticko-pleopticka ambulanta Klinike za Dje...
1,Amruthahalli,13.066513,77.596624,0.0,North Side Hospital and Diagnostic Centre,Motherhood Maternity Hospital,"Amrutha Hospital, Amruthahalli",Öron-näs-halsmott,Hitna,Global Hospitals,Government Hospital Rajanukunte,Gunam Super Speciality Hospital,HAL Hospital,"Hrvatski Institut za istraživanje mozga ""Neuron"""
2,Attur,11.663711,78.533551,,,,,,,,,,,
3,Banaswadi,13.014162,77.651854,0.0,Specialist Hospital,Cloudnine Hospital,Dr. Ayyappa's Clinic,Chaya Hospital,Motherhood Hospital,NMPC hospital,Express Clinic,Plexus Neuro And Stem Cell Research Centre,Prime Orthopedic Care,Pranav Diaganostic Centre
4,Bellandur,58.235358,26.683116,,,,,,,,,,,


### Create Map and visualize the clusters for first type analysis

In [247]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
hosp_merged["Cluster Labels1"].replace(np.nan, 0, inplace=True)

for lat, lon, poi, cluster in zip(hosp_merged['Latitude'], hosp_merged['Longitude'], hosp_merged['Neighborhood'], hosp_merged['Cluster Labels1'].astype(int)):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Cluster Map

####   Examine Clusters of First Type Analysis

Cluster 1

In [171]:
hosp_merged.loc[hosp_merged['Cluster Labels1'] == 0, hosp_merged.columns[[0] + list(range(5, hosp_merged.shape[1]))]]

Unnamed: 0,Neighborhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agram,Kod Mesara,Klinika za dječje bolesti Zagreb,Zavod za Sudsku Medicinu i Kriminalistiku,"Klaićeva, vađenje mandula",KBC Šalata,KBC Sestre milosrdnice - referentni centar za ...,Bolnica Runjaninova,Operacijska Sala Klinike Za Djecje Bolesti Kla...,Ortopticko-pleopticka ambulanta Klinike za Dje...
1,Amruthahalli,Motherhood Maternity Hospital,"Amrutha Hospital, Amruthahalli",Öron-näs-halsmott,Hitna,Global Hospitals,Government Hospital Rajanukunte,Gunam Super Speciality Hospital,HAL Hospital,"Hrvatski Institut za istraživanje mozga ""Neuron"""
2,Attur,,,,,,,,,
3,Banaswadi,Cloudnine Hospital,Dr. Ayyappa's Clinic,Chaya Hospital,Motherhood Hospital,NMPC hospital,Express Clinic,Plexus Neuro And Stem Cell Research Centre,Prime Orthopedic Care,Pranav Diaganostic Centre
4,Bellandur,,,,,,,,,
...,...,...,...,...,...,...,...,...,...,...
347,Virupakshipura,,,,,,,,,
348,Vishwanathapura,,,,,,,,,
349,Yadamaranahalli,,,,,,,,,
350,Yadavanahalli,,,,,,,,,


Cluster 2

In [179]:
hosp_merged.loc[hosp_merged['Cluster Labels1'] == 1, hosp_merged.columns[[0] + list(range(5, hosp_merged.shape[1]))]]

Unnamed: 0,Neighborhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,Doddanekkundi,Öron-näs-halsmott,Garden City Hospital,K K Hospital,Institute Of Aerospace Medicne,Indira Gandhi Institute of Child Health,Inchara ayur kuteera,"Hrvatski Institut za istraživanje mozga ""Neuron""",Hitna,HAL Hospital


Cluster 3

In [172]:
hosp_merged.loc[hosp_merged['Cluster Labels1'] == 2, hosp_merged.columns[[0] + list(range(5, hosp_merged.shape[1]))]]

Unnamed: 0,Neighborhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
58,Deepanjalinagar,Öron-näs-halsmott,Garden city hospital,K R Hospital,K K Hospital,Institute Of Aerospace Medicne,Indira Gandhi Institute of Child Health,Inchara ayur kuteera,"Hrvatski Institut za istraživanje mozga ""Neuron""",Hitna
80,Nayandahalli,Öron-näs-halsmott,Garden city hospital,K R Hospital,K K Hospital,Institute Of Aerospace Medicne,Indira Gandhi Institute of Child Health,Inchara ayur kuteera,"Hrvatski Institut za istraživanje mozga ""Neuron""",Hitna


Cluster 4

In [173]:
hosp_merged.loc[hosp_merged['Cluster Labels1'] == 3, hosp_merged.columns[[0] + list(range(5, hosp_merged.shape[1]))]]

Unnamed: 0,Neighborhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Bhattarahalli,Garden City Hospital,K R Hospital,K K Hospital,Institute Of Aerospace Medicne,Indira Gandhi Institute of Child Health,Inchara ayur kuteera,"Hrvatski Institut za istraživanje mozga ""Neuron""",Hitna,HAL Hospital


Cluster 5

In [174]:
hosp_merged.loc[hosp_merged['Cluster Labels1'] == 4, hosp_merged.columns[[0] + list(range(5, hosp_merged.shape[1]))]]

Unnamed: 0,Neighborhood,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
61,Gottigere,Garden City Hospital,K R Hospital,K K Hospital,Institute Of Aerospace Medicne,Indira Gandhi Institute of Child Health,Inchara ayur kuteera,"Hrvatski Institut za istraživanje mozga ""Neuron""",Hitna,HAL Hospital


### Clustering the neighbourhoods based on the number of hospitals.
#### This is second type of Analysis
     1.Dividing into 5 clusters 
     2.Each cluster shows the count of hospitals and helps to focus on the clusters where hospital count is less.
     3.Plotting the cluster points on  a map.
     

In [245]:
# one hot encoding
count_onehot = pd.get_dummies(count_hosp[['Name']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
count_onehot['Neighborhood'] = count_hosp['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [count_onehot.columns[-1]] + list(count_onehot.columns[:-1])
count_onehot= count_onehot[fixed_columns]

count_onehot.head()
count_grouped = count_onehot.groupby('Neighborhood').mean().reset_index()
count_grouped

Unnamed: 0,Neighborhood,Name
0,Adugodi,4
1,Agram,24
2,Amruthahalli,3
3,Anekal,1
4,Banaswadi,13
5,Basaveshwaranagar,9
6,Bhattarahalli,1
7,Byatarayanapura,6
8,Chickpet,5
9,Chikkalasandra,6


In [246]:
# set number of clusters
kclusters = 5

hosp_grouped_clustering = hosp_grouped.drop('Neighborhood', 1)
#hosp_grouped_clustering
# run k-means clustering
count_hosp_clustering=count_grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(count_hosp_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 3, 3, 4, 2, 3, 0, 0, 0])

Cluster 1

In [248]:
count_hosp.loc[count_hosp['Cluster Labels1'] == 0, count_hosp.columns[[1] + list(range(5, count_hosp.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude
0,Adugodi,4
7,Byatarayanapura,6
8,Chickpet,5
9,Chikkalasandra,6
11,Doddakallasandra,5
19,Hosur,6
27,Konanakunte,4
29,Kundalahalli,4
32,Mallathahalli,4
33,Mathikere,5


Cluster 2

In [249]:
count_hosp.loc[count_hosp['Cluster Labels1'] == 1, count_hosp.columns[[1] + list(range(5, count_hosp.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude
1,Agram,24
22,Indiranagar S.O (Bangalore),24
24,Jayanagar H.O,25
28,Koramangala,19


Cluster 3

In [250]:
count_hosp.loc[count_hosp['Cluster Labels1'] == 2, count_hosp.columns[[1] + list(range(5, count_hosp.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude
5,Basaveshwaranagar,9
15,Girinagar S.O (Bangalore),10
35,Msrit,9
41,Sadashivanagar,8
43,Vijayanagar S.O (Bangalore),8


Cluster 4

In [255]:
count_hosp.loc[count_hosp['Cluster Labels1'] == 3, count_hosp.columns[[1] + list(range(5, count_hosp.shape[1]))]]


Unnamed: 0,Neighborhood,Longitude
2,Amruthahalli,3
3,Anekal,1
6,Bhattarahalli,1
10,Deepanjalinagar,1
12,Doddanekkundi,1
14,EPIP,1
16,Gottigere,1
17,Hessarghatta,2
18,Horamavu,2
20,Hunasamaranahalli,1


Cluster 5

In [251]:
count_hosp.loc[count_hosp['Cluster Labels1'] == 4, count_hosp.columns[[1] + list(range(5, count_hosp.shape[1]))]]

Unnamed: 0,Neighborhood,Longitude
4,Banaswadi,13
13,Domlur,13
26,Kathriguppe,12
34,Mavalli,11
36,NAL,12


### Creating Map to visualize the clusters

In [256]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
count_hosp["Cluster Labels1"].replace(np.nan, 0, inplace=True)

for lat, lon, poi, cluster in zip(hosp_merged['Latitude'], hosp_merged['Longitude'], count_hosp['Neighborhood'], count_hosp['Cluster Labels1']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Results and Discussions
###### By Analysis Cluster 4 has the least number of Hospitals.
###### This are the places where hospitals need to be constructed.