## Introduction

Both studies and simply walking around a city will tell you that not all neighborhoods are equal. At least in New York City, there is large variability among neighborhoods in wealth and, thereby, access to amenities and necessities, including access to healthcare. I hypothesize that certain neighborhoods have many more locations to receive medical care than do others. This decreased access to care can lead to poor outcomes as it makes it more difficult to be seen by a healthcare provider.

<br>

Therefore, my questions are these: Which neighborhoods are underserved by healthcare locations in New York City? Can we define a healthcare access metric and apply it accross neighborhoods to determine which ones have more equitable access to healthcare locations?

<br>

Answers to these questions would allow a city to invest in healthcare access in an equitable way, instead of simply building medical centers where convenient. Thus, city planners, social scientists, public health officials, and healthcare workers will find this work of interest.

## Data

Conveniently, Foursquare has a venue category called "Medical Center" which encompasses the following venue categories that may be useful for this project:
- Acupuncturist
- Alternative Healer
- Chiropractor
- Dentist's Office
- Doctor's Office
- Emergency Room
- Eye Doctor
- Hospital
- Maternity Clinic
- Medical Lab
- Mental Health Office
- Nutritionist
- Physical Therapist
- Rehab Center
- Urgent Care Center

The "Medical Center" category also includes veterinarians, but I will be excluding them from this analysis.  
<br>
Neighborhoods are of all different shapes and sizes, so we cannot simply look at the counts of medical venues as this will advantage neighborhoods that are larger. We can normalize by comparing the number of medical venues to the number of a different venue type. For this project, I will use the number of stores selling alcohol as this normalizign factor, and compute the medical center to alcohol store ratio for each neighborhood (hereafter, MC:AS). Foursquare has the following venue categories for stores selling alcohol:
- Liquor Store
- Wine Shop
- Beer Store
<br>
I will use neighborhood information scraped from Wikipedia in order to define the neighborhoods of NYC.

## Methodology

Once the data has been acquired and cleaned, I will use the python libraries `folium` and `geopy` as well as the Foursquare API to define and visualize neighborhoods. I will then calculate the MC:AS for each neighborhood and visualize using `folium`. Using the K-means algorithm, I will cluster the neighborhoods, with `k = 3` to attempt to find neighborhoods with high, medium, and low access to healthcare. K-means will be a good fit for this task, as I don't want the clustering to be supervised, but I do want to define how many clusters are present.

### Importing Libraries

In [136]:
import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import branca.colormap as cm2
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

### Downloading and Mapping NYC Data

In [2]:
!wget -q -O 'newyork_data.json' https://cocl.us/new_york_dataset
with open('newyork_data.json') as json_data:
    newyork_data = json.load(json_data)

In [3]:
neighborhoods_data = newyork_data['features']

column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 
neighborhoods = pd.DataFrame(columns=column_names)

In [4]:
for data in neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Bronx,Wakefield,40.894705,-73.847201
1,Bronx,Co-op City,40.874294,-73.829939
2,Bronx,Eastchester,40.887556,-73.827806
3,Bronx,Fieldston,40.895437,-73.905643
4,Bronx,Riverdale,40.890834,-73.912585


In [118]:
# we know that the geograpical coordinate of New York City are 40.7127281, -74.0060152
latitude = 40.7127281
longitude = -74.0060152
# create map of New York using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)  
    
map_newyork

### Gathering Data from Foursquare

In [14]:
CLIENT_ID = 'XXX' # your Foursquare ID
CLIENT_SECRET = 'XXX' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
categories = ['52e81612bcbc57f1066b7a3b', '52e81612bcbc57f1066b7a3c', '52e81612bcbc57f1066b7a3a', '4bf58dd8d48988d178941735',
              '4bf58dd8d48988d177941735', '4bf58dd8d48988d194941735', '522e32fae4b09b556e370f19', '4bf58dd8d48988d196941735',
              '56aa371be4b08b9a8d5734ff', '4f4531b14b9074f6e4fb0103', '52e81612bcbc57f1066b7a39', '58daa1558bbb0b01f18ec1d0',
              '5744ccdfe4b0c0459246b4af', '56aa371be4b08b9a8d57351d', '56aa371be4b08b9a8d573526',
              '5370f356bcbc57f1066c94c2', '4bf58dd8d48988d186941735', '4bf58dd8d48988d119951735']

In [7]:
', '.join(categories)

'52e81612bcbc57f1066b7a3b, 52e81612bcbc57f1066b7a3c, 52e81612bcbc57f1066b7a3a, 4bf58dd8d48988d1789417354bf58dd8d48988d177941735, 4bf58dd8d48988d194941735, 522e32fae4b09b556e370f19, 4bf58dd8d48988d196941735, 56aa371be4b08b9a8d5734ff, 4f4531b14b9074f6e4fb0103, 52e81612bcbc57f1066b7a39, 58daa1558bbb0b01f18ec1d0, 5744ccdfe4b0c0459246b4af, 56aa371be4b08b9a8d57351d, 56aa371be4b08b9a8d573526, 5370f356bcbc57f1066c94c2, 4bf58dd8d48988d186941735, 4bf58dd8d48988d119951735'

In [28]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            ','.join(categories))

        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [30]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            ','.join(categories))
        
        try:
            # make the GET request
            results = requests.get(url).json()["response"]['groups'][0]['items']
        except ValueError:
            print(name)
            print(url)
            continue
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [31]:
NYC_venues = getNearbyVenues(names=neighborhoods['Neighborhood'],
                                   latitudes=neighborhoods['Latitude'],
                                   longitudes=neighborhoods['Longitude']
                                  )
NYC_venues.head()

Wakefield
Co-op City
Eastchester
Fieldston
Riverdale
Kingsbridge
Marble Hill
Woodlawn
Norwood
Williamsbridge
Baychester
Pelham Parkway
City Island
Bedford Park
University Heights
Morris Heights
Fordham
East Tremont
West Farms
High  Bridge
Melrose
Mott Haven
Port Morris
Longwood
Hunts Point
Morrisania
Soundview
Clason Point
Throgs Neck
Country Club
Parkchester
Westchester Square
Van Nest
Morris Park
Belmont
Spuyten Duyvil
North Riverdale
Pelham Bay
Schuylerville
Edgewater Park
Castle Hill
Olinville
Pelham Gardens
Concourse
Unionport
Edenwald
Bay Ridge
Bensonhurst
Sunset Park
Greenpoint
Gravesend
Brighton Beach
Sheepshead Bay
Manhattan Terrace
Flatbush
Crown Heights
East Flatbush
Kensington
Windsor Terrace
Prospect Heights
Brownsville
Williamsburg
Bushwick
Bedford Stuyvesant
Brooklyn Heights
Cobble Hill
Carroll Gardens
Red Hook
Gowanus
Fort Greene
Park Slope
Cypress Hills
East New York
Starrett City
Canarsie
Flatlands
Mill Island
Manhattan Beach
Coney Island
Bath Beach
Borough Park
Dyker

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Wakefield,40.894705,-73.847201,Janvier Wellness Chiropractic Care,40.895032,-73.842354,Chiropractor
1,Wakefield,40.894705,-73.847201,"Rest Medical Care, P.C",40.890821,-73.85009,Doctor's Office
2,Wakefield,40.894705,-73.847201,Edenwald Liquors,40.890942,-73.850455,Liquor Store
3,Co-op City,40.874294,-73.829939,Advanced Dental Group,40.875545,-73.829761,Dentist's Office
4,Co-op City,40.874294,-73.829939,Dental Group NY,40.875545,-73.829761,Dentist's Office


In [56]:
NYC_venues.shape

(4119, 7)

It seems Foursquare returned some venues in unwated categories, so we'll have to drop venues that don't fit our list.

In [55]:
wanted_categories = ['Acupuncturist', 
'Alternative Healer', 
'Chiropractor', 
'Dentist\'s Office', 
'Doctor\'s Office', 
'Emergency Room', 
'Eye Doctor', 
'Hospital', 
'Maternity Clinic', 
'Medical Lab', 
'Mental Health Office', 
'Nutritionist', 
'Physical Therapist', 
'Rehab Center', 
'Urgent Care Center', 
'Liquor Store', 
'Wine Shop', 
'Beer Store']

In [57]:
NYC_venues_narrowed = NYC_venues[NYC_venues['Venue Category'].isin(wanted_categories)]
NYC_venues_narrowed.shape

(3938, 7)

In [58]:
NYC_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Allerton,8,8,8,8,8,8
Annadale,7,7,7,7,7,7
Arden Heights,2,2,2,2,2,2
Arlington,1,1,1,1,1,1
Arrochar,5,5,5,5,5,5
...,...,...,...,...,...,...
Woodhaven,12,12,12,12,12,12
Woodlawn,3,3,3,3,3,3
Woodrow,3,3,3,3,3,3
Woodside,30,30,30,30,30,30


It looks like many neighborhoods have very few venues falling into our desired categories. Let's exclude those with fewer than 10 venues.

In [59]:
df = NYC_venues.groupby('Neighborhood').count()
print('number to exclude:', len(df[df['Neighborhood Latitude'] < 10].index.values))
print('total number:', NYC_venues.groupby('Neighborhood').count().shape[0])
to_exclude = df[df['Neighborhood Latitude'] < 10].index.values
print('excluded neighborhoods:', to_exclude)

number to exclude: 118
total number: 273
excluded neighborhoods: ['Allerton' 'Annadale' 'Arden Heights' 'Arlington' 'Arrochar' 'Arverne'
 'Astoria Heights' 'Auburndale' 'Baychester' 'Bedford Park'
 'Bedford Stuyvesant' 'Beechhurst' 'Bellaire' 'Belle Harbor' 'Bloomfield'
 'Briarwood' 'Broadway Junction' 'Brookville' 'Brownsville' 'Butler Manor'
 'Cambria Heights' 'Canarsie' 'Castle Hill' 'Charleston' 'City Island'
 'Clifton' 'Concord' 'Concourse' 'Coney Island' 'Corona' 'Country Club'
 'Cypress Hills' 'Douglaston' 'East Elmhurst' 'East Flatbush'
 'East New York' 'East Williamsburg' 'Edenwald' 'Edgemere'
 'Edgewater Park' 'Egbertville' 'Emerson Hill' 'Fieldston' 'Floral Park'
 'Fox Hills' 'Fresh Meadows' 'Fulton Ferry' 'Gerritsen Beach'
 'Graniteville' 'Grant City' 'Grasmere' 'Great Kills' 'Hammels'
 'High  Bridge' 'Highland Park' 'Hillcrest' 'Hollis' 'Holliswood'
 'Howard Beach' 'Huguenot' 'Jamaica Estates' 'Kew Gardens Hills'
 'Kingsbridge Heights' 'Laurelton' 'Longwood' 'Manor Heights

In [60]:
NYC_venues_excluded = NYC_venues_narrowed[~NYC_venues_narrowed['Neighborhood'].isin(to_exclude)]
NYC_venues_excluded.groupby('Neighborhood').count().shape

(155, 6)

### Calculating MC:AS

We can now calculate the MC:AS for each neighborhood. The numerator will simply be the number of medical center venues in the neighborhood. Since some neighborhoods may not have any alcohol stores, we'll set the denominator equal to the maximum of 1 and the number of alcohol stores.

In [64]:
MC = ['Acupuncturist', 
'Alternative Healer', 
'Chiropractor', 
'Dentist\'s Office', 
'Doctor\'s Office', 
'Emergency Room', 
'Eye Doctor', 
'Hospital', 
'Maternity Clinic', 
'Medical Lab', 
'Mental Health Office', 
'Nutritionist', 
'Physical Therapist', 
'Rehab Center', 
'Urgent Care Center']

AS = ['Liquor Store', 
'Wine Shop', 
'Beer Store']

In [79]:
mcas_dict = {}
for i in range(NYC_venues_excluded.shape[0]):
    nb = NYC_venues_excluded['Neighborhood'].iloc[i]
    
    if nb not in mcas_dict.keys():
        mcas_dict[nb] = {'Medical Centers': 0, 'Alcohol Stores' : 0}
    
    if NYC_venues_excluded['Venue Category'].iloc[i] in MC:
        mcas_dict[nb]['Medical Centers'] += 1
    elif NYC_venues_excluded['Venue Category'].iloc[i] in AS:
        mcas_dict[nb]['Alcohol Stores'] += 1
    else:
        print('uh oh')
mcas_dict

{'Astoria': {'Alcohol Stores': 6, 'Medical Centers': 23},
 'Bath Beach': {'Alcohol Stores': 4, 'Medical Centers': 25},
 'Battery Park City': {'Alcohol Stores': 3, 'Medical Centers': 22},
 'Bay Ridge': {'Alcohol Stores': 2, 'Medical Centers': 27},
 'Bay Terrace': {'Alcohol Stores': 3, 'Medical Centers': 33},
 'Bayside': {'Alcohol Stores': 1, 'Medical Centers': 28},
 'Bellerose': {'Alcohol Stores': 3, 'Medical Centers': 7},
 'Belmont': {'Alcohol Stores': 4, 'Medical Centers': 22},
 'Bensonhurst': {'Alcohol Stores': 1, 'Medical Centers': 27},
 'Boerum Hill': {'Alcohol Stores': 5, 'Medical Centers': 24},
 'Borough Park': {'Alcohol Stores': 1, 'Medical Centers': 24},
 'Brighton Beach': {'Alcohol Stores': 6, 'Medical Centers': 24},
 'Bronxdale': {'Alcohol Stores': 0, 'Medical Centers': 17},
 'Brooklyn Heights': {'Alcohol Stores': 2, 'Medical Centers': 27},
 'Bulls Head': {'Alcohol Stores': 3, 'Medical Centers': 16},
 'Bushwick': {'Alcohol Stores': 4, 'Medical Centers': 9},
 'Carnegie Hill': 

In [81]:
# make sure no neighborhood has 0 alcohol stores so that the MC:AS ratio is defined for all neighborhoods
for key in mcas_dict.keys():
    if mcas_dict[key]['Alcohol Stores'] == 0:
        mcas_dict[key]['Alcohol Stores'] += 1

In [103]:
MCAS_counts = pd.DataFrame.from_dict(mcas_dict).T
MCAS_counts.head()

Unnamed: 0,Medical Centers,Alcohol Stores
Co-op City,10,2
Riverdale,11,1
Kingsbridge,18,5
Marble Hill,28,2
Norwood,28,1


In [104]:
MCAS_counts['MC:AS'] = MCAS_counts['Medical Centers'].div(MCAS_counts['Alcohol Stores'])
MCAS_counts

Unnamed: 0,Medical Centers,Alcohol Stores,MC:AS
Co-op City,10,2,5.000000
Riverdale,11,1,11.000000
Kingsbridge,18,5,3.600000
Marble Hill,28,2,14.000000
Norwood,28,1,28.000000
...,...,...,...
Homecrest,22,6,3.666667
Madison,29,1,29.000000
Bronxdale,17,1,17.000000
Erasmus,14,8,1.750000


In [106]:
MCAS_counts.index = MCAS_counts.index.set_names(['Neighborhood'])
MCAS_counts.reset_index(inplace=True)
MCAS_counts.head()

Unnamed: 0,Neighborhood,Medical Centers,Alcohol Stores,MC:AS
0,Co-op City,10,2,5.0
1,Riverdale,11,1,11.0
2,Kingsbridge,18,5,3.6
3,Marble Hill,28,2,14.0
4,Norwood,28,1,28.0


### Clustering Neighborhoods

In [61]:
# one hot encoding
NYC_onehot = pd.get_dummies(NYC_venues_excluded[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
NYC_onehot['Neighborhood'] = NYC_venues_excluded['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [NYC_onehot.columns[-1]] + list(NYC_onehot.columns[:-1])
NYC_onehot = NYC_onehot[fixed_columns]

NYC_onehot.head()

Unnamed: 0,Neighborhood,Acupuncturist,Alternative Healer,Beer Store,Chiropractor,Dentist's Office,Doctor's Office,Emergency Room,Eye Doctor,Hospital,Liquor Store,Maternity Clinic,Medical Lab,Mental Health Office,Nutritionist,Physical Therapist,Rehab Center,Urgent Care Center,Wine Shop
3,Co-op City,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Co-op City,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0
5,Co-op City,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0
6,Co-op City,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
7,Co-op City,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0


In [111]:
NYC_grouped = NYC_onehot.groupby('Neighborhood').mean().reset_index()
NYC_grouped.head()

Unnamed: 0,Neighborhood,Acupuncturist,Alternative Healer,Beer Store,Chiropractor,Dentist's Office,Doctor's Office,Emergency Room,Eye Doctor,Hospital,Liquor Store,Maternity Clinic,Medical Lab,Mental Health Office,Nutritionist,Physical Therapist,Rehab Center,Urgent Care Center,Wine Shop
0,Astoria,0.0,0.0,0.0,0.0,0.241379,0.413793,0.0,0.0,0.0,0.137931,0.0,0.068966,0.034483,0.0,0.0,0.0,0.034483,0.068966
1,Bath Beach,0.068966,0.0,0.0,0.0,0.206897,0.448276,0.0,0.0,0.068966,0.137931,0.0,0.034483,0.0,0.0,0.034483,0.0,0.0,0.0
2,Battery Park City,0.04,0.0,0.0,0.08,0.2,0.36,0.0,0.08,0.04,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.12
3,Bay Ridge,0.0,0.068966,0.0,0.068966,0.206897,0.413793,0.0,0.034483,0.034483,0.034483,0.0,0.0,0.0,0.0,0.068966,0.0,0.034483,0.034483
4,Bay Terrace,0.0,0.0,0.0,0.027778,0.25,0.555556,0.0,0.027778,0.027778,0.083333,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0


In [112]:
# set number of clusters
kclusters = 3

NYC_grouped_clustering = NYC_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(NYC_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 0, 0, 2, 0, 1, 0], dtype=int32)

In [113]:
# add clustering labels
NYC_grouped.insert(0, 'Cluster Labels', kmeans.labels_)
NYC_grouped

Unnamed: 0,Cluster Labels,Neighborhood,Acupuncturist,Alternative Healer,Beer Store,Chiropractor,Dentist's Office,Doctor's Office,Emergency Room,Eye Doctor,Hospital,Liquor Store,Maternity Clinic,Medical Lab,Mental Health Office,Nutritionist,Physical Therapist,Rehab Center,Urgent Care Center,Wine Shop
0,0,Astoria,0.000000,0.000000,0.000000,0.000000,0.241379,0.413793,0.0,0.000000,0.000000,0.137931,0.0,0.068966,0.034483,0.0,0.000000,0.000000,0.034483,0.068966
1,0,Bath Beach,0.068966,0.000000,0.000000,0.000000,0.206897,0.448276,0.0,0.000000,0.068966,0.137931,0.0,0.034483,0.000000,0.0,0.034483,0.000000,0.000000,0.000000
2,0,Battery Park City,0.040000,0.000000,0.000000,0.080000,0.200000,0.360000,0.0,0.080000,0.040000,0.000000,0.0,0.040000,0.000000,0.0,0.040000,0.000000,0.000000,0.120000
3,0,Bay Ridge,0.000000,0.068966,0.000000,0.068966,0.206897,0.413793,0.0,0.034483,0.034483,0.034483,0.0,0.000000,0.000000,0.0,0.068966,0.000000,0.034483,0.034483
4,0,Bay Terrace,0.000000,0.000000,0.000000,0.027778,0.250000,0.555556,0.0,0.027778,0.027778,0.083333,0.0,0.027778,0.000000,0.0,0.000000,0.000000,0.000000,0.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
150,1,Windsor Terrace,0.000000,0.000000,0.090909,0.000000,0.090909,0.727273,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.090909
151,2,Wingate,0.000000,0.083333,0.000000,0.000000,0.166667,0.250000,0.0,0.000000,0.250000,0.083333,0.0,0.000000,0.000000,0.0,0.083333,0.083333,0.000000,0.000000
152,0,Woodhaven,0.000000,0.000000,0.000000,0.000000,0.181818,0.545455,0.0,0.000000,0.000000,0.272727,0.0,0.000000,0.000000,0.0,0.000000,0.000000,0.000000,0.000000
153,0,Woodside,0.000000,0.000000,0.000000,0.034483,0.310345,0.413793,0.0,0.000000,0.000000,0.103448,0.0,0.000000,0.000000,0.0,0.068966,0.000000,0.000000,0.068966


### Merging the Data

In [124]:
NYC_merged = neighborhoods[~neighborhoods['Neighborhood'].isin(to_exclude)]
NYC_merged = NYC_merged.join(NYC_grouped.set_index('Neighborhood'), on='Neighborhood')
NYC_merged = NYC_merged.join(MCAS_counts.set_index('Neighborhood'), on='Neighborhood')
NYC_merged.dropna(inplace=True)
NYC_merged

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,Acupuncturist,Alternative Healer,Beer Store,Chiropractor,Dentist's Office,Doctor's Office,Emergency Room,Eye Doctor,Hospital,Liquor Store,Maternity Clinic,Medical Lab,Mental Health Office,Nutritionist,Physical Therapist,Rehab Center,Urgent Care Center,Wine Shop,Medical Centers,Alcohol Stores,MC:AS
1,Bronx,Co-op City,40.874294,-73.829939,0.0,0.000000,0.000000,0.0,0.000000,0.416667,0.333333,0.000000,0.000000,0.083333,0.166667,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.000000,10.0,2.0,5.000000
4,Bronx,Riverdale,40.890834,-73.912585,0.0,0.000000,0.000000,0.0,0.000000,0.454545,0.272727,0.000000,0.090909,0.090909,0.000000,0.000000,0.000000,0.090909,0.0,0.000000,0.0,0.000000,0.000000,11.0,1.0,11.000000
5,Bronx,Kingsbridge,40.881687,-73.902818,0.0,0.000000,0.043478,0.0,0.000000,0.130435,0.434783,0.000000,0.000000,0.130435,0.217391,0.000000,0.000000,0.000000,0.0,0.043478,0.0,0.000000,0.000000,18.0,5.0,3.600000
6,Manhattan,Marble Hill,40.876551,-73.910660,0.0,0.000000,0.000000,0.0,0.000000,0.133333,0.466667,0.100000,0.000000,0.233333,0.033333,0.000000,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.033333,28.0,2.0,14.000000
8,Bronx,Norwood,40.877224,-73.879391,0.0,0.035714,0.000000,0.0,0.000000,0.178571,0.321429,0.142857,0.000000,0.285714,0.000000,0.000000,0.035714,0.000000,0.0,0.000000,0.0,0.000000,0.000000,28.0,1.0,28.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
289,Brooklyn,Homecrest,40.598525,-73.959185,0.0,0.035714,0.000000,0.0,0.000000,0.250000,0.464286,0.000000,0.000000,0.000000,0.178571,0.000000,0.000000,0.000000,0.0,0.035714,0.0,0.000000,0.035714,22.0,6.0,3.666667
296,Brooklyn,Madison,40.609378,-73.948415,0.0,0.000000,0.000000,0.0,0.034483,0.448276,0.413793,0.000000,0.000000,0.068966,0.000000,0.034483,0.000000,0.000000,0.0,0.000000,0.0,0.000000,0.000000,29.0,1.0,29.000000
297,Bronx,Bronxdale,40.852723,-73.861726,1.0,0.000000,0.058824,0.0,0.000000,0.117647,0.705882,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.058824,0.0,0.000000,0.0,0.058824,0.000000,17.0,1.0,17.000000
300,Brooklyn,Erasmus,40.646926,-73.948177,2.0,0.000000,0.000000,0.0,0.000000,0.363636,0.181818,0.000000,0.000000,0.045455,0.363636,0.000000,0.045455,0.000000,0.0,0.000000,0.0,0.000000,0.000000,14.0,8.0,1.750000


### Visualizations

In [138]:
# coloring by cluster

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(NYC_merged['Latitude'], 
                                  NYC_merged['Longitude'], 
                                  NYC_merged['Neighborhood'], 
                                  NYC_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)

In [139]:
# mapping by MC:AS

# create map
map_mcas = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme
maximum = NYC_merged['MC:AS'].max()
minimum = NYC_merged['MC:AS'].min()
linear = cm2.LinearColormap(['red', 'orange', 'yellow', 'green'],
                           vmin=minimum, vmax=maximum)

# add markers to the map
markers_colors = []
for lat, lon, poi, mcas in zip(NYC_merged['Latitude'], 
                                  NYC_merged['Longitude'], 
                                  NYC_merged['Neighborhood'], 
                                  NYC_merged['MC:AS']):
    label = folium.Popup(str(poi) + ' MC:AS ' + str(mcas), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=linear(mcas),
        fill=True,
        fill_color=linear(mcas),
        fill_opacity=0.7).add_to(map_mcas)

## Results

My MC:AS ratio ranged from 0.45 at the lowest, up to 30 at the highest, indicating large disparities in acess to healthcare in NYC.

MC:AS tended to be lowest in Manhattan and highest in the outlying suburbs, roughly correlating with population and building density.

In [134]:
map_mcas

Clustering the neighborhoods according to the medical center and alcohol store venues gave mixed results, with the clusters relatively even spread throughout the buroughs. There may be some correlation between clusters and MC:AS, though it is difficult to tell from visual inspection.

In [140]:
map_clusters

## Discussion

There is a wide range of densities of medical centers in neighborhoods in New York City. The results presented above demonstrate clear spatial variability in those densities, both in terms of my MC:AS ratio and in terms of neighborhood clustering according to venue counts. The MC:AS ratio appears to anticorrelate somewhat with population and building density, though this does not explain the entirety of the disparities between neighborhoods.
<br>
These data should be examined by public health officials in NYC to determine which areas are being underserved by medical centers. Developers could be encouraged to open new medical center venues in these underserved areas.


## Conclusion

We know that ZIP code plays a huge role in a person's lifelong health, and easy access to healthcare surely influences that health. These data support the idea that healthcare in NYC is not evenly distributed, and can hopefully help improve access for those populations that need it the most.