# Exploring Boston Neighborhoods for Urgent Care Centers

## Introduction

The steep increase in the cost of ER visits has given rise to Urgent Care Centers. These centers are not only much less the cost of ERs for minor ailments but also have much less wait times. There are more and more Urgent Care Centers coming up every day to meet the need of people. This notebook is an analysis of Boston neighborhoods to find out the number of Urgent Care Centers in each neighborhood and also find possible locations for new Urgent Care Centers. This analysis will benefit the residents of Boston neighborhoods in finding the nearest Urgent Care and also Urgent Care Corporates who are looking for possible venues to open new Urgent Care Centers.

Steps taken to achieve this:

- Get List of neighborhoods from Wikipedia using web scraping
- Create a pandas dataframe from this data
- Get geographical co-ordinates of the neighborhoods using geocoder
- Get Venue data for Urgent Care Centers in the neighborhoods using Foursquare API
- Exploring and Clustering the neighborhoods using K-Means
- Examining the clusters for the best possible locations for New Urgent Care Centers


## Retrieving Data of Boston Neighborhoods from Wikipedia Page

### Importing all Required Libraries

In [329]:
import urllib.request #library used to open URL
from bs4 import BeautifulSoup #library used to parse HTML
import pandas as pd
import numpy as np
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import geocoder # to get coordinates
import folium
import requests
import json 
from pandas.io.json import json_normalize
from sklearn.cluster import KMeans
import matplotlib.cm as cm
import matplotlib.colors as colors
from IPython.display import Image
print('All Libraries and Packages imported')


All Libraries and Packages imported


In [330]:
#Specify which URl page to use for scraping Data
url = 'https://en.wikipedia.org/wiki/Neighborhoods_in_Boston'

In [331]:
# open URl using urllib.request and save HTML in variable page
page = urllib.request.urlopen(url)

In [332]:
#parse HTML from our URL into BeautifulSoup parse tree format
soup = BeautifulSoup(page, "lxml")

In [333]:
neighborhood_lst = []
for row in soup.find_all("div", class_="div-col columns column-width")[0].findAll("li"):
    neighborhood_lst.append(row.text)
neighborhood_lst

['Allston',
 'Back Bay',
 'Bay Village',
 'Beacon Hill',
 'Brighton',
 'Charlestown',
 'Chinatown/Leather District',
 'Dorchester (divided for planning purposes into Mid Dorchester and Dorchester)',
 'Downtown',
 'East Boston',
 'Fenway Kenmore (includes Longwood)',
 'Hyde Park',
 'Jamaica Plain',
 'Mattapan',
 'Mission Hill',
 'North End',
 'Roslindale',
 'Roxbury',
 'South Boston',
 'South End',
 'West End',
 'West Roxbury']

In [334]:
# Creating a Pandas Dataframe using the Lists above
df_boston = pd.DataFrame(neighborhood_lst, columns = ['Neighborhood'])
df_boston.head()

Unnamed: 0,Neighborhood
0,Allston
1,Back Bay
2,Bay Village
3,Beacon Hill
4,Brighton


In [335]:
# Checking size of dataframe
df_boston.shape

(22, 1)

## Get Longitude and Latitude Values of Neighborhoods using Geocoder

In [336]:
# Adding empty columns Latitude and Longitude to dataframe for storing longitude and Latitude values
df_boston['Longitude'] = ""
df_boston['Latitude'] = ""
df_boston.head()

Unnamed: 0,Neighborhood,Longitude,Latitude
0,Allston,,
1,Back Bay,,
2,Bay Village,,
3,Beacon Hill,,
4,Brighton,,


In [338]:
# Getting Latitude and Longitude values for all neighborhoods using Geocoder
lat_lng_coords = None
#looping through the dataframe for all neighborhoods
for neighborhood in df_boston['Neighborhood']:
    g = geocoder.arcgis('{} Boston, Massachusetts'.format(neighborhood))
    lat_lng_coords = g.latlng
# locating row where "Neighborhood" = neighborhood
    i = df_boston.loc[df_boston['Neighborhood']== neighborhood].index
# Assigning the Co-ordinates returned by geocoder to the dataframe Latitude and Longitude columns of row i
    df_boston.loc[i,'Latitude'] = lat_lng_coords[0] 
    df_boston.loc[i,'Longitude'] = lat_lng_coords[1]
df_boston

Unnamed: 0,Neighborhood,Longitude,Latitude
0,Allston,-71.0567,42.3587
1,Back Bay,-71.0876,42.35
2,Bay Village,-71.0685,42.3482
3,Beacon Hill,-71.0686,42.3584
4,Brighton,-71.0567,42.3587
5,Charlestown,-71.0567,42.3587
6,Chinatown/Leather District,-71.0609,42.3525
7,Dorchester (divided for planning purposes into...,-71.0598,42.3015
8,Downtown,-71.0566,42.3583
9,East Boston,-71.0567,42.3514


## Exploring and Clustering Neighborhoods in Boston

Importing all required libraries and packages

#### Using geopy to retrieve Latitude and Longitude of Boston

In [339]:

address = 'Boston, Massachusetts'
geolocator = Nominatim(user_agent = "boston_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical co-ordinates of Boston are: {}, {}'.format(latitude, longitude))

The geographical co-ordinates of Boston are: 42.3602534, -71.0582912


#### Create a map of Boston with neighborhoods superimposed on top

In [344]:
# Create map of Toronto using the Latitude and Longitude values from above
map_boston= folium.Map(location = [latitude, longitude], zoom_start = 12)
map_boston
# add markers to map showing all buroughs in df_toronto dataframe

for lat, lng, neighborhood in zip(df_boston['Latitude'], df_boston['Longitude'], df_boston['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
        [lat, lng],
        radius = 5,
        popup = label,
        color = 'blue',
        fill = True,
        fill_color = '#3186cc',
        fill_opacity = 0.7,
        parse_html = False).add_to(map_boston)
map_boston

![title](bos_map1.jpg)

### Exploring the Neighborhoods in Boston

##### Using Foursquare API to explore neighborhoods in Boston

### Defining Foursquare API Credentials

In [345]:
#Assigning Credentials to create Foursquare URL
CLIENT_ID = 'UM1K52E4KEGGMPMZIDTLRDE0FEWWNBDSA3R0F0KN3RLAXS3U'
CLIENT_SECRET = 'KNVM15Q0PDOZBUUWBQ51DGJEUJZQYEXHCP0OAC3JDUB0LUB0'
VERSION = '20180605'

print('Foursquare Credentials')
print('CLient_id:',CLIENT_ID)
print('Client_Secret:', CLIENT_SECRET)

Foursquare Credentials
CLient_id: UM1K52E4KEGGMPMZIDTLRDE0FEWWNBDSA3R0F0KN3RLAXS3U
Client_Secret: KNVM15Q0PDOZBUUWBQ51DGJEUJZQYEXHCP0OAC3JDUB0LUB0


#### Create a Function to repeat the process of retrieving the venues for a neighborhood for all neighborhoods

#### Note:
Since Urgent Care Centers are not the most popular venues they do not show up with the Explore API call, so here we use the 
Search API call, which searches for venues equal to or similar to Urgent Care Centers.

In [346]:
# Assigning query and radius values
query = 'Urgent Care Center'
radius = 2000
# Defining the function
def getNearbyVenues(names, latitudes, longitudes, radius = 2000):
    venues_list = []
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
        
        #Create the API request URL
        url = 'https://api.foursquare.com/v2/venues/search?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&query={}'.format(
              CLIENT_ID,
              CLIENT_SECRET,
              VERSION,
              lat,
              lng,
              radius,
              query)
        # make the Get request
        results = requests.get(url).json()
        medlst = results['response']
 # Creating list of the venues returned       
        for v in medlst['venues']:
            if len(v['categories'])!= 0:
                venues_list.append([
                       name,
                       lat,
                       lng,
                       v['categories'][0]['name'],
                       v['location']['lat'],
                       v['location']['lng']
                       ] )
# COnverting List to pandas DataFrame
                nearby_med = pd.DataFrame(venues_list)
                nearby_med.columns = ['Neighborhood',
                                  'Neighborhood Latitude',
                                  'Neighborhood Longitude',
                                  'Category',
                                  'Category Latitude',
                                  'Category Longitude']
    return(nearby_med)

In [312]:
# Calling the Function above to get venue data for each neighborhood
boston_venues = getNearbyVenues(names=df_boston['Neighborhood'],
                                   latitudes=df_boston['Latitude'],
                                   longitudes=df_boston['Longitude']
                                  )
boston_venues.head(50)

Allston
Back Bay
Bay Village
Beacon Hill
Brighton
Charlestown
Chinatown/Leather District
Dorchester (divided for planning purposes into Mid Dorchester and Dorchester)
Downtown
East Boston
Fenway Kenmore (includes Longwood)
Hyde Park
Jamaica Plain
Mattapan
Mission Hill
North End
Roslindale
Roxbury
South Boston
South End
West End
West Roxbury


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Category,Category Latitude,Category Longitude
0,Allston,42.35866,-71.05674,Doctor's Office,42.361766,-71.069484
1,Allston,42.35866,-71.05674,Urgent Care Center,42.350742,-71.073604
2,Allston,42.35866,-71.05674,Urgent Care Center,42.351227,-71.065959
3,Allston,42.35866,-71.05674,Doctor's Office,42.350382,-71.064548
4,Allston,42.35866,-71.05674,Doctor's Office,42.362578,-71.068608
5,Allston,42.35866,-71.05674,Health & Beauty Service,42.350108,-71.072574
6,Allston,42.35866,-71.05674,Hospital,42.337096,-71.07331
7,Allston,42.35866,-71.05674,Event Space,42.347813,-71.08005
8,Back Bay,42.34999,-71.08765,Urgent Care Center,42.350742,-71.073604
9,Back Bay,42.34999,-71.08765,Doctor's Office,42.361766,-71.069484


Checking the size of the dataframe

In [347]:
print(boston_venues.shape)

(247, 6)


Checking number of venues returned for each Neighborhood

In [348]:
boston_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Category,Category Latitude,Category Longitude
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Allston,8,8,8,8,8
Back Bay,29,29,29,29,29
Bay Village,18,18,18,18,18
Beacon Hill,13,13,13,13,13
Brighton,8,8,8,8,8
Charlestown,8,8,8,8,8
Chinatown/Leather District,9,9,9,9,9
Downtown,7,7,7,7,7
East Boston,9,9,9,9,9
Fenway Kenmore (includes Longwood),27,27,27,27,27


Getting the count of unique categories returned

In [349]:
print('There are {} unique catogories:'.format(len(boston_venues['Category'].unique())))

There are 19 unique catogories:


In [350]:

# print out the list of categories
boston_venues['Category'].unique()


array(["Doctor's Office", 'Urgent Care Center', 'Health & Beauty Service',
       'Hospital', 'Event Space', 'Medical Center', 'Garden',
       'Cosmetics Shop', 'Student Center',
       'College Administrative Building', 'Eye Doctor', 'Spa',
       'Automotive Shop', 'Office', 'Daycare', 'School',
       "Dentist's Office", 'Assisted Living', 'Veterinarian'],
      dtype=object)

### Analyzing Each Neighborhood

In [351]:
# one hot encoding
bos_onehot = pd.get_dummies(boston_venues[['Category']], prefix="", prefix_sep="")

# There is a venue category 'Neighborhood', changing its name to 'Hoods'
bos_onehot.rename(columns={'Neighborhood': 'Hoods'}, inplace = True)

# add neighborhood column back as first column of dataframe
bos_onehot['Neighborhood'] = boston_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [bos_onehot.columns[-1]] + list(bos_onehot.columns[:-1])
bos_onehot = bos_onehot[fixed_columns]

bos_onehot.head()

Unnamed: 0,Neighborhood,Assisted Living,Automotive Shop,College Administrative Building,Cosmetics Shop,Daycare,Dentist's Office,Doctor's Office,Event Space,Eye Doctor,Garden,Health & Beauty Service,Hospital,Medical Center,Office,School,Spa,Student Center,Urgent Care Center,Veterinarian
0,Allston,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
1,Allston,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
2,Allston,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0
3,Allston,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
4,Allston,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0


Getting Size of new dataframe

In [352]:
bos_onehot.shape

(247, 20)

#### Grouping rows by neighborhoods and taking the mean of the frequencies of each category

In [353]:
bos_grouped = bos_onehot.groupby('Neighborhood').mean().reset_index()
bos_grouped.head()

Unnamed: 0,Neighborhood,Assisted Living,Automotive Shop,College Administrative Building,Cosmetics Shop,Daycare,Dentist's Office,Doctor's Office,Event Space,Eye Doctor,Garden,Health & Beauty Service,Hospital,Medical Center,Office,School,Spa,Student Center,Urgent Care Center,Veterinarian
0,Allston,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.125,0.0,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.25,0.0
1,Back Bay,0.0,0.034483,0.103448,0.034483,0.0,0.0,0.241379,0.034483,0.034483,0.034483,0.034483,0.206897,0.034483,0.0,0.0,0.034483,0.068966,0.103448,0.0
2,Bay Village,0.0,0.0,0.0,0.055556,0.055556,0.0,0.333333,0.055556,0.0,0.0,0.055556,0.166667,0.055556,0.055556,0.0,0.0,0.055556,0.111111,0.0
3,Beacon Hill,0.0,0.076923,0.0,0.0,0.0,0.0,0.307692,0.076923,0.0,0.0,0.076923,0.230769,0.0,0.0,0.0,0.0,0.076923,0.153846,0.0
4,Brighton,0.0,0.0,0.0,0.0,0.0,0.0,0.375,0.125,0.0,0.0,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.25,0.0


#### Confirming New Size

In [354]:
bos_grouped.shape

(21, 20)

#### Creating a Data frame with Urgent Care Data Only

In [355]:
bos_urg_care = bos_grouped[['Neighborhood', 'Urgent Care Center']]
bos_urg_care.head()

Unnamed: 0,Neighborhood,Urgent Care Center
0,Allston,0.25
1,Back Bay,0.103448
2,Bay Village,0.111111
3,Beacon Hill,0.153846
4,Brighton,0.25


### Clustering Neighborhoods

Running K-means to cluster neighborhoods into 3 clusters

In [356]:
# setting  number of clusters
kclusters = 3

bos_urg_care_cluster = bos_urg_care.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bos_urg_care_cluster)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 1, 1, 1, 0, 0, 0, 0, 0, 1])

Creating new dataframe which includes the cluster as well as Urgent Care Data

In [357]:
# Create new Dataframe with the cluster labels
bos_urg_care_merged = bos_urg_care.copy()
# Add Cluster labels Column
bos_urg_care_merged['Cluster Labels'] = kmeans.labels_
bos_urg_care_merged.head()

Unnamed: 0,Neighborhood,Urgent Care Center,Cluster Labels
0,Allston,0.25,0
1,Back Bay,0.103448,1
2,Bay Village,0.111111,1
3,Beacon Hill,0.153846,1
4,Brighton,0.25,0


In [358]:
# Merge bos_urg_care_merged with df_boston to get latitude and Longitude values of neighborhoods
bos_urg_care_merged = bos_urg_care_merged.join(df_boston.set_index("Neighborhood"), on="Neighborhood")
bos_urg_care_merged.head()

Unnamed: 0,Neighborhood,Urgent Care Center,Cluster Labels,Longitude,Latitude
0,Allston,0.25,0,-71.0567,42.3587
1,Back Bay,0.103448,1,-71.0876,42.35
2,Bay Village,0.111111,1,-71.0685,42.3482
3,Beacon Hill,0.153846,1,-71.0686,42.3584
4,Brighton,0.25,0,-71.0567,42.3587


Visualizing the clusters

In [359]:
# Sort values by Cluster labels
bos_urg_care_merged.sort_values(["Cluster Labels"], inplace=True)
bos_urg_care_merged.head()

Unnamed: 0,Neighborhood,Urgent Care Center,Cluster Labels,Longitude,Latitude
0,Allston,0.25,0,-71.0567,42.3587
17,South Boston,0.25,0,-71.0557,42.3522
16,Roxbury,0.25,0,-71.0567,42.3587
15,Roslindale,0.25,0,-71.0567,42.3587
14,North End,0.2,0,-71.053,42.3655


In [361]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bos_urg_care_merged['Latitude'], bos_urg_care_merged['Longitude'], bos_urg_care_merged['Neighborhood'], bos_urg_care_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster Labels' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

![title](bos_map2.jpg)

### Examining Clusters

#### Cluster 0

In [362]:
bos_urg_care_merged.loc[bos_urg_care_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Urgent Care Center,Cluster Labels,Longitude,Latitude
0,Allston,0.25,0,-71.0567,42.3587
17,South Boston,0.25,0,-71.0557,42.3522
16,Roxbury,0.25,0,-71.0567,42.3587
15,Roslindale,0.25,0,-71.0567,42.3587
14,North End,0.2,0,-71.053,42.3655
12,Mattapan,0.25,0,-71.0567,42.3587
11,Jamaica Plain,0.25,0,-71.0567,42.3587
19,West End,0.222222,0,-71.0674,42.3639
10,Hyde Park,0.25,0,-71.0567,42.3587
7,Downtown,0.285714,0,-71.0566,42.3583


#### Cluster 1

In [363]:
bos_urg_care_merged.loc[bos_urg_care_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Urgent Care Center,Cluster Labels,Longitude,Latitude
9,Fenway Kenmore (includes Longwood),0.111111,1,-71.1016,42.3436
3,Beacon Hill,0.153846,1,-71.0686,42.3584
2,Bay Village,0.111111,1,-71.0685,42.3482
1,Back Bay,0.103448,1,-71.0876,42.35
18,South End,0.105263,1,-71.0736,42.3426


#### Cluster 2

In [364]:
bos_urg_care_merged.loc[bos_urg_care_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Urgent Care Center,Cluster Labels,Longitude,Latitude
13,Mission Hill,0.045455,2,-71.1098,42.3358


### Conclusion

On examining the cluster data we find that the neighborhoods in cluster 0 have the most number of urgent Care Centers. Cluster 2 and Cluster 1 have the least number of Urgent Care Centers, therefore neighborhoods like Mission Hill, Beacon hill, Fenway, Bay Village, Back Bay and South End are best suited for opening new Urgent Care Centers. This project recommends Urgent Care Companies to avoid opening New Centers in neighborhoods in cluster 0 as they already have a good number of Urgent Care Centers. The neighborhoods in clusters 1 and 2 would be best suited. 