# Capstone Project - The Battle of the Neighborhoods (Week 2)


By Carter Tu

# 1,Problem

In this project we will try to find an optimal location for an adult day care center in the Queens Borough of New York City whose main focus will be on recreational activities and social stimulation for the elderly population who would otherwise stay at home alone. The recreational activities would include: daily exercise regimes in a local park (tai chi, yoga, pilates, walking), while the social stimulation would consists of arts and crafts, music, games (bingo, scrabble, etc.) and general socialization and conversations to form friendly relationships. Only in case of a bad weather daily exercise regimes would be performed indoor. The center would have a nurse on-site so that participants’ vital signs can be checked and evaluated regularly. The center would also fill prescriptions at a local pharmacy if participants request such service. In addition, the center would provide healthy meals and snacks and transportation to participants. Since the center would provide daily exercise regimes, we prefer locations as close to parks as possible. To avoid competition, we don't want to be in a proximity of existing adult day care centers. To be able to fill prescriptions we want to be in a proximity of a pharmacy.

Specifically, since we want to be as close to parks as possible our problem will be to identify parks in Queens satisfying the following 2 conditions:

No existing adult day care center within 1 km of the park latitude and longitude coordinates.
At least one pharmacy within 2 km of the park latitude and longitude coordinates.


# 2, Code

In [52]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Libraries imported.


In [53]:
#The geograpical coordinate of Queens are 40.6524927, -73.7914214158161.
address = 'Queens, NY'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Queens are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Queens are 40.6524927, -73.7914214158161.


In [54]:
import io
url_ny = 'https://cocl.us/new_york_dataset'
s=requests.get(url_ny).content
ny_json_dat=json.load(io.StringIO(s.decode('utf-8')))

In [55]:
# ny_json_dat is a dictinary so let's see how many keys are in this dictionary
all_keys = ny_json_dat.keys()
print('There are', len(all_keys), 'keys in ny_json_dat:')
print(all_keys)

#display keys and respective values (for features key e display only first element)
for key in all_keys:
    if key == 'features':
        print(key,':',ny_json_dat[key][0])
    else:
        print(key,':',ny_json_dat[key])

There are 5 keys in ny_json_dat:
dict_keys(['type', 'totalFeatures', 'features', 'crs', 'bbox'])
type : FeatureCollection
totalFeatures : 306
features : {'type': 'Feature', 'id': 'nyu_2451_34572.1', 'geometry': {'type': 'Point', 'coordinates': [-73.84720052054902, 40.89470517661]}, 'geometry_name': 'geom', 'properties': {'name': 'Wakefield', 'stacked': 1, 'annoline1': 'Wakefield', 'annoline2': None, 'annoline3': None, 'annoangle': 0.0, 'borough': 'Bronx', 'bbox': [-73.84720052054902, 40.89470517661, -73.84720052054902, 40.89470517661]}}
crs : {'type': 'name', 'properties': {'name': 'urn:ogc:def:crs:EPSG::4326'}}
bbox : [-74.2492599487305, 40.5033187866211, -73.7061614990234, 40.9105606079102]


In [56]:
ny_neighborhoods_data = ny_json_dat['features']


In [57]:
# define the dataframe columns
column_names = ['Borough', 'Neighborhood', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighborhoods = pd.DataFrame(columns=column_names)

for data in ny_neighborhoods_data:
    borough = neighborhood_name = data['properties']['borough'] 
    neighborhood_name = data['properties']['name']
        
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    
    neighborhoods = neighborhoods.append({'Borough': borough,
                                          'Neighborhood': neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)
    
neighborhoods[['Borough','Neighborhood']].groupby('Borough').count()

Unnamed: 0_level_0,Neighborhood
Borough,Unnamed: 1_level_1
Bronx,52
Brooklyn,70
Manhattan,40
Queens,81
Staten Island,63


In [58]:
queens_neighborhoods=neighborhoods[neighborhoods['Borough']=='Queens']
queens_neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
129,Queens,Astoria,40.768509,-73.915654
130,Queens,Woodside,40.746349,-73.901842
131,Queens,Jackson Heights,40.751981,-73.882821
132,Queens,Elmhurst,40.744049,-73.881656
133,Queens,Howard Beach,40.654225,-73.838138


In [59]:
CLIENT_ID = '0GPXYMSVP3OFHPOSHWOC3OHCC5CC4OSWJYY3TF3J5T0HO5PB' # your Foursquare ID
CLIENT_SECRET = 'MXZXRJBP34BEVNUV14HYX5ARVONOLYX1MQCWJNXOSAVRWN3Z' # your Foursquare Secret
VERSION = '20191019' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 0GPXYMSVP3OFHPOSHWOC3OHCC5CC4OSWJYY3TF3J5T0HO5PB
CLIENT_SECRET:MXZXRJBP34BEVNUV14HYX5ARVONOLYX1MQCWJNXOSAVRWN3Z


In [60]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print("processing neighborhhod: ", name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [61]:
LIMIT = 100
radius=500

queens_venues = getNearbyVenues(names=queens_neighborhoods['Neighborhood'],
                                   latitudes=queens_neighborhoods['Latitude'],
                                   longitudes=queens_neighborhoods['Longitude']
                                  )

processing neighborhhod:  Astoria
processing neighborhhod:  Woodside
processing neighborhhod:  Jackson Heights
processing neighborhhod:  Elmhurst
processing neighborhhod:  Howard Beach
processing neighborhhod:  Corona
processing neighborhhod:  Forest Hills
processing neighborhhod:  Kew Gardens
processing neighborhhod:  Richmond Hill
processing neighborhhod:  Flushing
processing neighborhhod:  Long Island City
processing neighborhhod:  Sunnyside
processing neighborhhod:  East Elmhurst
processing neighborhhod:  Maspeth
processing neighborhhod:  Ridgewood
processing neighborhhod:  Glendale
processing neighborhhod:  Rego Park
processing neighborhhod:  Woodhaven
processing neighborhhod:  Ozone Park
processing neighborhhod:  South Ozone Park
processing neighborhhod:  College Point
processing neighborhhod:  Whitestone
processing neighborhhod:  Bayside
processing neighborhhod:  Auburndale
processing neighborhhod:  Little Neck
processing neighborhhod:  Douglaston
processing neighborhhod:  Glen 

In [62]:
queens_pharmacies_and_parks = queens_venues[queens_venues['Venue Category'].isin(['Pharmacy','Park' ])].reset_index(drop=True)
queens_pharmacies_and_parks.shape

(77, 7)

In [63]:
queens_pharmacies_and_parks


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Woodside,40.746349,-73.901842,Rite Aid,40.744682,-73.903707,Pharmacy
1,Woodside,40.746349,-73.901842,Duane Reade,40.745298,-73.90419,Pharmacy
2,Jackson Heights,40.751981,-73.882821,Rite Aid,40.750023,-73.883977,Pharmacy
3,Jackson Heights,40.751981,-73.882821,Rite Aid,40.755766,-73.882127,Pharmacy
4,Howard Beach,40.654225,-73.838138,Rite Aid,40.656352,-73.839615,Pharmacy
5,Howard Beach,40.654225,-73.838138,Duane Reade,40.651612,-73.838626,Pharmacy
6,Corona,40.742382,-73.856825,William F. Moore Park ('Spaghetti Park'),40.743666,-73.855443,Park
7,Forest Hills,40.725264,-73.844475,Yellowstone Park,40.726251,-73.847759,Park
8,Forest Hills,40.725264,-73.844475,MacDonald Park,40.722239,-73.847141,Park
9,Forest Hills,40.725264,-73.844475,Walgreens,40.724004,-73.847911,Pharmacy


In [67]:
# The code was removed by Watson Studio for sharing.

In [68]:

from ibm_botocore.client import Config
import ibm_boto3
cos = ibm_boto3.client(service_name='s3',
    ibm_api_key_id=credentials['IBM_API_KEY_ID'],
    ibm_service_instance_id=credentials['IAM_SERVICE_ID'],
    ibm_auth_endpoint=credentials['IBM_AUTH_ENDPOINT'],
    config=Config(signature_version='oauth'),
    endpoint_url=credentials['ENDPOINT'])

In [69]:
from ibm_botocore.client import Config
import ibm_boto3
def download_file_cos(credentials,local_file_name,key):  
    cos = ibm_boto3.client(service_name='s3',
    ibm_api_key_id=credentials['IBM_API_KEY_ID'],
    ibm_service_instance_id=credentials['IAM_SERVICE_ID'],
    ibm_auth_endpoint=credentials['IBM_AUTH_ENDPOINT'],
    config=Config(signature_version='oauth'),
    endpoint_url=credentials['ENDPOINT'])
    try:
        res=cos.download_file(Bucket=credentials['BUCKET'],Key=key,Filename=local_file_name)
    except Exception as e:
        print(Exception, e)
    else:
        print('File Downloaded')
        
download_file_cos(credentials,'Social_Adult_Day_Care_Services.csv', 'Department_for_the_Aging__DFTA___-_Social_Adult_Day_Care_Services.csv')

File Downloaded


In [70]:
import pandas as pd
import csv
# Read data from csv file
# (in the same directory that your python process is based)
# Control delimiters, rows, column names with read_csv
adult_care_services_data = pd.read_csv("Social_Adult_Day_Care_Services.csv") 
# Preview the first 5 lines of the loaded data 
adult_care_services_data.shape

(328, 34)

In [71]:
queens_adult_care_services_data=adult_care_services_data[adult_care_services_data["Borough"] == 'Queens']
queens_adult_care_services_data.shape


(121, 34)

In [72]:
queens_adult_care_services_data.reset_index(drop=True, inplace=True)
queens_adult_care_services_data.head()

Unnamed: 0,ProviderType,DFTA ID,ProgramName,SponsorName,ProgramAddress,ProgramCity,ProgramState,Postcode,Borough,ProgramPhone,DFTA Funded,MonHourOpen,MonHourClose,TueHourOpen,TueHourClose,WedHourOpen,WedHourClose,ThuHourOpen,ThuHourClose,FriHourOpen,FriHourClose,SatHourOpen,SatHourClose,SunHourOpen,SunHourClose,Latitude,Longitude,Community Board,Council District,Census Tract,BIN,BBL,NTA,Location 1
0,SOCIAL ADULT DAY CARE SERVICES,S39201,"QUEENS BOROUGH ADULT DAY CARE, LLC","Queens Borough Adult Day Care, Llc",137-08 31ST ROAD 1 FLOOR,FLUSHING,NY,11354,Queens,347-732-4588,N,08:30,06:30,08:30,06:30,08:30,06:30,08:30,06:30,08:30,06:30,08:30,06:30,00:00,00:00,40.769719,-73.83128,407,20,88901,4537661.0,4044108000.0,Flushing,"(40.769719, -73.83128)"
1,SOCIAL ADULT DAY CARE SERVICES,S4301,"UNICARE ADULT DAYCARE, INC.","Unicare Adult Daycare, Inc.",176-60 UNION TURNPIKE SUITE 115,FRESH MEADOWS,NY,11366,Queens,347-770-0466,N,09:30,03:30,09:30,03:30,09:30,03:30,09:30,03:30,09:30,03:30,00:00,00:00,00:00,00:00,40.725856,-73.791441,408,24,1277,4155422.0,4072270000.0,Jamaica Estates-Holliswood,"(40.725856, -73.791441)"
2,SOCIAL ADULT DAY CARE SERVICES,S37801,"EVERGREEN ADULT DAYCARE IN NY, INC.","Evergreen Adult Daycare In Ny, Inc.",37-10 149 PLACE 1A,FLUSHING,NY,11354,Queens,718-321-2112,N,07:00,05:00,07:00,05:00,07:00,05:00,07:00,05:00,07:00,05:00,07:00,01:00,00:00,00:00,40.765159,-73.816107,407,20,1157,4483670.0,4050178000.0,Murray Hill,"(40.765159, -73.816107)"
3,SOCIAL ADULT DAY CARE SERVICES,S52701,CAREFIRST SOCIAL DAY CARE INC.,Carefirst Social Day Care Inc.,135-10 35TH AVE UNIT A,FLUSHING,NY,11354,Queens,732-312-5713,N,09:00,05:00,09:00,05:00,09:00,05:00,09:00,05:00,09:00,05:00,09:00,05:00,09:00,05:00,40.764504,-73.831531,407,20,869,4112140.0,4049598000.0,Flushing,"(40.764504, -73.831531)"
4,SOCIAL ADULT DAY CARE SERVICES,S46401,CRYSTAL ADULT SOCIAL DAY CARE LLC,Crystal Adult Social Day Care Llc,138-31 QUEENS BLVD GROUND FLOOR,JAMAICA,NY,11435,Queens,718-642-0011,N,08:30,02:00,08:30,02:00,08:30,02:00,08:30,02:00,08:30,02:00,00:00,00:00,08:30,02:00,40.708216,-73.818028,408,24,214,4206512.0,4096490000.0,Briarwood-Jamaica Hills,"(40.708216, -73.818028)"


In [73]:
print("For queens_neighborhoods dataset shape is",queens_neighborhoods.shape, ", total number of null values is", queens_neighborhoods.isnull().sum().sum(), ", total number of NaN values is" , queens_neighborhoods.isna().sum().sum())


For queens_neighborhoods dataset shape is (81, 4) , total number of null values is 0 , total number of NaN values is 0


In [74]:
print("For queens_pharmacies_and_parks dataset shape is",queens_pharmacies_and_parks.shape, ", total number of null values is", queens_pharmacies_and_parks.isnull().sum().sum(), ", total number of NaN values is" , queens_pharmacies_and_parks.isna().sum().sum())


For queens_pharmacies_and_parks dataset shape is (77, 7) , total number of null values is 0 , total number of NaN values is 0


In [75]:
print("For queens_adult_care_services_data dataset shape is",queens_adult_care_services_data.shape, ", total number of null values is", queens_adult_care_services_data.isnull().sum().sum(), ", total number of NaN values is" , queens_adult_care_services_data.isna().sum().sum())


For queens_adult_care_services_data dataset shape is (121, 34) , total number of null values is 0 , total number of NaN values is 0


In [76]:
print("For queens_venues dataset shape is",queens_venues.shape, ", total number of null values is", queens_venues.isnull().sum().sum(), ", total number of NaN values is" , queens_venues.isna().sum().sum())



For queens_venues dataset shape is (2111, 7) , total number of null values is 0 , total number of NaN values is 0


In [77]:

queens_venues_stats = queens_venues[['Neighborhood','Venue Category']].groupby('Venue Category').count()
queens_venues_stats.rename(columns={'Neighborhood':'Cnt'}, inplace=True)
queens_venues_stats.sort_values(by="Cnt",ascending=False, inplace=True)
queens_venues_stats.head(30)

Unnamed: 0_level_0,Cnt
Venue Category,Unnamed: 1_level_1
Pizza Place,86
Deli / Bodega,69
Chinese Restaurant,65
Donut Shop,55
Bakery,51
Pharmacy,49
Bank,48
Bar,46
Grocery Store,40
Mexican Restaurant,40


In [78]:
queens_neighborhoods.describe()


Unnamed: 0,Latitude,Longitude
count,81.0,81.0
mean,40.706424,-73.824131
std,0.062189,0.061892
min,40.557401,-73.953868
25%,40.675211,-73.862525
50%,40.723825,-73.820878
75%,40.749441,-73.776133
max,40.792781,-73.708847


In [79]:
queens_pharmacies_and_parks.describe()


Unnamed: 0,Neighborhood Latitude,Neighborhood Longitude,Venue Latitude,Venue Longitude
count,77.0,77.0,77.0,77.0
mean,40.713831,-73.828292,40.71374,-73.827978
std,0.048719,0.060114,0.048203,0.060332
min,40.576156,-73.953868,40.578282,-73.953225
25%,40.689887,-73.85811,40.691591,-73.859863
50%,40.725264,-73.841534,40.726251,-73.840083
75%,40.745652,-73.776802,40.745298,-73.775963
max,40.784903,-73.715481,40.78813,-73.714218


In [80]:
# one hot encoding
queens_pharmacies_and_parks_onehot = pd.get_dummies(queens_pharmacies_and_parks[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
queens_pharmacies_and_parks_onehot['Neighborhood'] = queens_pharmacies_and_parks['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [queens_pharmacies_and_parks_onehot.columns[-1]] + list(queens_pharmacies_and_parks_onehot.columns[:-1])
queens_pharmacies_and_parks_onehot = queens_pharmacies_and_parks_onehot[fixed_columns]

queens_pharmacies_and_parks_onehot.head()

Unnamed: 0,Neighborhood,Park,Pharmacy
0,Woodside,0,1
1,Woodside,0,1
2,Jackson Heights,0,1
3,Jackson Heights,0,1
4,Howard Beach,0,1


In [81]:
queens_pharmacies_and_parks_grouped = queens_pharmacies_and_parks_onehot.groupby('Neighborhood').mean().reset_index()
queens_pharmacies_and_parks_grouped.head()

Unnamed: 0,Neighborhood,Park,Pharmacy
0,Auburndale,0.0,1.0
1,Bay Terrace,0.0,1.0
2,Bayside,0.0,1.0
3,Bayswater,1.0,0.0
4,Belle Harbor,0.0,1.0


In [82]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 2

indicators = ['st', 'nd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = queens_pharmacies_and_parks_grouped['Neighborhood']

for ind in np.arange(queens_pharmacies_and_parks_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(queens_pharmacies_and_parks_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue
0,Auburndale,Pharmacy,Park
1,Bay Terrace,Pharmacy,Park
2,Bayside,Pharmacy,Park
3,Bayswater,Park,Pharmacy
4,Belle Harbor,Pharmacy,Park


In [83]:
Sum_of_squared_distances = []

queens_pharmacies_and_parks_grouped_clustering = queens_pharmacies_and_parks_grouped.drop('Neighborhood', 1)

K = range(1,25)
for k in K:
    km = KMeans(n_clusters=k)
    km = km.fit(queens_pharmacies_and_parks_grouped_clustering)
    Sum_of_squared_distances.append(km.inertia_)
    
Sum_of_squared_distances

  return_n_iter=True)
  return_n_iter=True)
  return_n_iter=True)
  return_n_iter=True)
  return_n_iter=True)
  return_n_iter=True)
  return_n_iter=True)
  return_n_iter=True)
  return_n_iter=True)
  return_n_iter=True)
  return_n_iter=True)
  return_n_iter=True)
  return_n_iter=True)
  return_n_iter=True)
  return_n_iter=True)
  return_n_iter=True)
  return_n_iter=True)
  return_n_iter=True)
  return_n_iter=True)


[16.592171717171716,
 2.469298245614035,
 0.20679012345679015,
 0.009259259259259262,
 1.2029358432642675e-30,
 3.216303007126684e-31,
 3.2047474274603605e-31,
 3.2047474274603605e-31,
 3.2047474274603605e-31,
 3.2047474274603605e-31,
 3.2047474274603605e-31,
 3.2047474274603605e-31,
 3.2047474274603605e-31,
 3.2047474274603605e-31,
 3.2047474274603605e-31,
 3.2047474274603605e-31,
 3.2047474274603605e-31,
 3.2047474274603605e-31,
 3.2047474274603605e-31,
 3.2047474274603605e-31,
 3.2047474274603605e-31,
 3.2047474274603605e-31,
 3.2047474274603605e-31,
 3.2047474274603605e-31]

In [84]:
Below is a plot of sum of squared distances for k in the range specified above. Since the plot looks like an arm, then the elbow on the arm is optimal k

SyntaxError: invalid syntax (<ipython-input-84-e96883a768aa>, line 1)

In [85]:

import matplotlib.pyplot as plt

plt.plot(K, Sum_of_squared_distances, 'bx-')
plt.xlabel('k')
plt.ylabel('Sum_of_squared_distances')
plt.title('Elbow Method For Optimal k')

plt.show()

<Figure size 640x480 with 1 Axes>

In [86]:
# set number of clusters
kclusters = 3

queens_pharmacies_and_parks_grouped_clustering = queens_pharmacies_and_parks_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(queens_pharmacies_and_parks_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:20]

array([0, 0, 0, 1, 0, 0, 2, 1, 0, 1, 0, 2, 2, 0, 2, 1, 0, 1, 0, 0],
      dtype=int32)

In [87]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

queens_pharmacies_and_parks_merged = queens_pharmacies_and_parks

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
queens_pharmacies_and_parks_merged = queens_pharmacies_and_parks_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

queens_pharmacies_and_parks_merged.head() # check the 8th column for Cluster Labels!

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue
0,Woodside,40.746349,-73.901842,Rite Aid,40.744682,-73.903707,Pharmacy,0,Pharmacy,Park
1,Woodside,40.746349,-73.901842,Duane Reade,40.745298,-73.90419,Pharmacy,0,Pharmacy,Park
2,Jackson Heights,40.751981,-73.882821,Rite Aid,40.750023,-73.883977,Pharmacy,0,Pharmacy,Park
3,Jackson Heights,40.751981,-73.882821,Rite Aid,40.755766,-73.882127,Pharmacy,0,Pharmacy,Park
4,Howard Beach,40.654225,-73.838138,Rite Aid,40.656352,-73.839615,Pharmacy,0,Pharmacy,Park


In [88]:
# create map
f = folium.Figure(width=1000, height=500)
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12).add_to(f)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(queens_pharmacies_and_parks_merged['Venue Latitude'], queens_pharmacies_and_parks_merged['Venue Longitude'], queens_pharmacies_and_parks_merged['Venue'], queens_pharmacies_and_parks_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [89]:
queens_pharmacies_and_parks_merged.loc[queens_pharmacies_and_parks_merged['Cluster Labels'] == 0]


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue
0,Woodside,40.746349,-73.901842,Rite Aid,40.744682,-73.903707,Pharmacy,0,Pharmacy,Park
1,Woodside,40.746349,-73.901842,Duane Reade,40.745298,-73.90419,Pharmacy,0,Pharmacy,Park
2,Jackson Heights,40.751981,-73.882821,Rite Aid,40.750023,-73.883977,Pharmacy,0,Pharmacy,Park
3,Jackson Heights,40.751981,-73.882821,Rite Aid,40.755766,-73.882127,Pharmacy,0,Pharmacy,Park
4,Howard Beach,40.654225,-73.838138,Rite Aid,40.656352,-73.839615,Pharmacy,0,Pharmacy,Park
5,Howard Beach,40.654225,-73.838138,Duane Reade,40.651612,-73.838626,Pharmacy,0,Pharmacy,Park
11,Kew Gardens,40.705179,-73.829819,CVS pharmacy,40.703557,-73.824861,Pharmacy,0,Pharmacy,Park
16,Ridgewood,40.708323,-73.901435,Rite Aid,40.708905,-73.905848,Pharmacy,0,Pharmacy,Park
24,Ozone Park,40.680708,-73.843203,Rite Aid,40.680039,-73.84219,Pharmacy,0,Pharmacy,Park
25,Ozone Park,40.680708,-73.843203,CVS pharmacy,40.681177,-73.84173,Pharmacy,0,Pharmacy,Park


In [90]:
queens_pharmacies_and_parks_merged.loc[queens_pharmacies_and_parks_merged['Cluster Labels'] == 1]


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue
6,Corona,40.742382,-73.856825,William F. Moore Park ('Spaghetti Park'),40.743666,-73.855443,Park,1,Park,Pharmacy
27,South Ozone Park,40.66855,-73.809865,Back Streets Park (Officer Edward Byrn Park),40.667846,-73.806453,Park,1,Park,Pharmacy
28,South Ozone Park,40.66855,-73.809865,Pals Oval Park,40.668634,-73.805878,Park,1,Park,Pharmacy
29,South Ozone Park,40.66855,-73.809865,Back Street Park,40.666542,-73.806407,Park,1,Park,Pharmacy
46,Hollis,40.711243,-73.75925,Kings Park,40.712344,-73.764469,Park,1,Park,Pharmacy
47,Hollis,40.711243,-73.75925,Jamaica Park,40.712351,-73.764478,Park,1,Park,Pharmacy
50,Springfield Gardens,40.66623,-73.760421,Springfield Park,40.665932,-73.758064,Park,1,Park,Pharmacy
56,Edgemere,40.595642,-73.776133,Bayswater Park,40.596248,-73.77097,Park,1,Park,Pharmacy
57,Queensboro Hill,40.744572,-73.825809,Mateos Park,40.742153,-73.827568,Park,1,Park,Pharmacy
58,Laurelton,40.667884,-73.740256,Laurelton Park,40.670598,-73.7359,Park,1,Park,Pharmacy


In [91]:
queens_pharmacies_and_parks_merged.loc[queens_pharmacies_and_parks_merged['Cluster Labels'] == 2]


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Labels,1st Most Common Venue,2nd Most Common Venue
7,Forest Hills,40.725264,-73.844475,Yellowstone Park,40.726251,-73.847759,Park,2,Pharmacy,Park
8,Forest Hills,40.725264,-73.844475,MacDonald Park,40.722239,-73.847141,Park,2,Pharmacy,Park
9,Forest Hills,40.725264,-73.844475,Walgreens,40.724004,-73.847911,Pharmacy,2,Pharmacy,Park
10,Forest Hills,40.725264,-73.844475,CVS pharmacy,40.721396,-73.843421,Pharmacy,2,Pharmacy,Park
12,Sunnyside,40.740176,-73.926916,"Thomas P. Noonan, Jr. Playground",40.741053,-73.922213,Park,2,Pharmacy,Park
13,Sunnyside,40.740176,-73.926916,Sunny Pharmacy,40.740266,-73.922839,Pharmacy,2,Pharmacy,Park
14,Maspeth,40.725427,-73.896217,CVS pharmacy,40.727184,-73.892861,Pharmacy,2,Pharmacy,Park
15,Maspeth,40.725427,-73.896217,Whitefish Triangle Park,40.726517,-73.901752,Park,2,Pharmacy,Park
17,Rego Park,40.728974,-73.857827,CVS pharmacy,40.730898,-73.860729,Pharmacy,2,Pharmacy,Park
18,Rego Park,40.728974,-73.857827,CVS pharmacy,40.726791,-73.853772,Pharmacy,2,Pharmacy,Park


In [92]:
queens_pharmacies_and_parks_neigh_list = sorted(queens_pharmacies_and_parks_merged["Neighborhood"].unique().tolist())
print("queens_pharmacies_and_parks_neigh_list:",queens_pharmacies_and_parks_neigh_list)


queens_pharmacies_and_parks_neigh_list: ['Auburndale', 'Bay Terrace', 'Bayside', 'Bayswater', 'Belle Harbor', 'Cambria Heights', 'College Point', 'Corona', 'Douglaston', 'Edgemere', 'Far Rockaway', 'Forest Hills', 'Forest Hills Gardens', 'Fresh Meadows', 'Glen Oaks', 'Hollis', 'Howard Beach', 'Hunters Point', 'Jackson Heights', 'Jamaica Center', 'Jamaica Hills', 'Kew Gardens', 'Laurelton', 'Lefrak City', 'Maspeth', 'Middle Village', 'Oakland Gardens', 'Ozone Park', 'Pomonok', 'Queensboro Hill', 'Queensbridge', 'Rego Park', 'Ridgewood', 'Rochdale', 'Rockaway Park', 'Rosedale', 'Somerville', 'South Ozone Park', 'Springfield Gardens', 'Steinway', 'Sunnyside', 'Sunnyside Gardens', 'Woodhaven', 'Woodside']


In [93]:

# NTA column contains Neighborhood information
queens_adult_care_serv_neigh_list = sorted(queens_adult_care_services_data["NTA"].unique().tolist())
print("queens_adult_care_serv_neigh_list:",queens_adult_care_serv_neigh_list)

queens_adult_care_serv_neigh_list: ['Bayside-Bayside Hills', 'Bellerose', 'Breezy Point-Belle Harbor-Rockaway Park-Broad Channel', 'Briarwood-Jamaica Hills', 'College Point', 'Corona', 'East Elmhurst', 'East Flushing', 'Elmhurst', 'Elmhurst-Maspeth', 'Far Rockaway-Bayswater', 'Flushing', 'Forest Hills', 'Ft. Totten-Bay Terrace-Clearview', 'Glen Oaks-Floral Park-New Hyde Park', 'Hammels-Arverne-Edgemere', 'Hollis', 'Hunters Point-Sunnyside-West Maspeth', 'Jamaica', 'Jamaica Estates-Holliswood', 'Kew Gardens', 'Murray Hill', 'North Corona', 'Pomonok-Flushing Heights-Hillcrest', 'Queens Village', 'Queensboro Hill', 'Rego Park', 'Richmond Hill', 'Ridgewood', 'South Ozone Park', 'St. Albans', 'Woodhaven']


In [94]:
queens_adult_care_services_massaged = queens_adult_care_services_data[["ProgramName","Latitude","Longitude","NTA"]]
queens_adult_care_services_massaged["Neighborhood"]= queens_adult_care_services_massaged["NTA"].str.split("-", n=0, expand = True)[0]
queens_adult_care_services_massaged.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


Unnamed: 0,ProgramName,Latitude,Longitude,NTA,Neighborhood
0,"QUEENS BOROUGH ADULT DAY CARE, LLC",40.769719,-73.83128,Flushing,Flushing
1,"UNICARE ADULT DAYCARE, INC.",40.725856,-73.791441,Jamaica Estates-Holliswood,Jamaica Estates
2,"EVERGREEN ADULT DAYCARE IN NY, INC.",40.765159,-73.816107,Murray Hill,Murray Hill
3,CAREFIRST SOCIAL DAY CARE INC.,40.764504,-73.831531,Flushing,Flushing
4,CRYSTAL ADULT SOCIAL DAY CARE LLC,40.708216,-73.818028,Briarwood-Jamaica Hills,Briarwood


In [95]:
queens_adult_care_serv_neigh_list = sorted(queens_adult_care_services_massaged["Neighborhood"].unique().tolist())
s = set(queens_pharmacies_and_parks_neigh_list)
queens_adult_care_serv_neigh_not_in_ph_and_park_list = [x for x in queens_adult_care_serv_neigh_list if x not in s]
print("Not in queens_pharmacies_and_parks_neigh_list:", queens_adult_care_serv_neigh_not_in_ph_and_park_list)

Not in queens_pharmacies_and_parks_neigh_list: ['Bellerose', 'Breezy Point', 'Briarwood', 'East Elmhurst', 'East Flushing', 'Elmhurst', 'Flushing', 'Ft. Totten', 'Hammels', 'Jamaica', 'Jamaica Estates', 'Murray Hill', 'North Corona', 'Queens Village', 'Richmond Hill', 'St. Albans']


In [96]:
queens_neigh_info = queens_pharmacies_and_parks_merged[["Neighborhood","Neighborhood Latitude","Neighborhood Longitude", "Cluster Labels"]]
queens_neigh_info["Neighborhood Check"]=queens_neigh_info["Neighborhood"]
adult_care_serv_in_pharm_and_parks=pd.merge(queens_adult_care_services_massaged, queens_neigh_info, how='inner', on='Neighborhood',suffixes=('_1', '_2'))
adult_care_serv_in_pharm_and_parks.drop_duplicates(inplace=True)
adult_care_serv_in_pharm_and_parks.reset_index(drop=True, inplace=True)
adult_care_serv_in_pharm_and_parks.shape

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  from ipykernel import kernelapp as app


(35, 9)

In [97]:
adult_care_serv_in_pharm_and_parks


Unnamed: 0,ProgramName,Latitude,Longitude,NTA,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Cluster Labels,Neighborhood Check
0,RICHMOND HILL / SOUTH OZONE PARK SENIOR CENTER,40.604966,-73.752186,Far Rockaway-Bayswater,Far Rockaway,40.603134,-73.75498,0,Far Rockaway
1,VIMINI LLC,40.595193,-73.755398,Far Rockaway-Bayswater,Far Rockaway,40.603134,-73.75498,0,Far Rockaway
2,VIMINI LLC,40.602445,-73.750312,Far Rockaway-Bayswater,Far Rockaway,40.603134,-73.75498,0,Far Rockaway
3,"TRI-MED SOCIAL ADULT DAY SERVICES, INC.",40.604966,-73.752186,Far Rockaway-Bayswater,Far Rockaway,40.603134,-73.75498,0,Far Rockaway
4,ROCKAWAY ADULT SOCIAL CENTER LLC,40.602293,-73.751101,Far Rockaway-Bayswater,Far Rockaway,40.603134,-73.75498,0,Far Rockaway
5,SAFER ADULT DAY CARE INC.,40.59561,-73.743874,Far Rockaway-Bayswater,Far Rockaway,40.603134,-73.75498,0,Far Rockaway
6,BLESSING SOCIAL DAY CARE INC.,40.729822,-73.807178,Pomonok-Flushing Heights-Hillcrest,Pomonok,40.734936,-73.804861,1,Pomonok
7,BOULEVARD ADULT DAY CARE OF FLUSHING,40.727863,-73.810824,Pomonok-Flushing Heights-Hillcrest,Pomonok,40.734936,-73.804861,1,Pomonok
8,"HESTIA ADULT SOCIAL DAY CARE CENTER, INC.",40.73042,-73.806862,Pomonok-Flushing Heights-Hillcrest,Pomonok,40.734936,-73.804861,1,Pomonok
9,EMPIRE ADULT DAY CARE CENTER INC.,40.728491,-73.809025,Pomonok-Flushing Heights-Hillcrest,Pomonok,40.734936,-73.804861,1,Pomonok


In [98]:
pd.DataFrame(adult_care_serv_in_pharm_and_parks['Cluster Labels']).apply(pd.value_counts)

Unnamed: 0,Cluster Labels
1,14
2,11
0,10


In [99]:
# create map
f = folium.Figure(width=1000, height=500)
map_adult_services = folium.Map(location=[latitude, longitude], zoom_start=11).add_to(f)

# add markers to the map
markers_colors = []
for lat, lon, pgn in zip(adult_care_serv_in_pharm_and_parks['Latitude'], adult_care_serv_in_pharm_and_parks['Longitude'],adult_care_serv_in_pharm_and_parks['ProgramName'] ):
    label = folium.Popup(' ProgramName ' + str(pgn), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color='black',
        fill=True,
        fill_color='yellow',
        fill_opacity=0.7).add_to(map_adult_services)

map_adult_services

In [100]:
import math

def haversine(coord1, coord2):
    R = 6372800  # Earth radius in meters
    lat1, lon1 = coord1
    lat2, lon2 = coord2
    
    phi1, phi2 = math.radians(lat1), math.radians(lat2) 
    dphi       = math.radians(lat2 - lat1)
    dlambda    = math.radians(lon2 - lon1)
    
    a = math.sin(dphi/2)**2 + \
        math.cos(phi1)*math.cos(phi2)*math.sin(dlambda/2)**2
    
    return 2*R*math.atan2(math.sqrt(a), math.sqrt(1 - a))

In [101]:
queens_parks = queens_pharmacies_and_parks_merged[queens_pharmacies_and_parks_merged["Venue Category"]=="Park"]
queens_pharmacies = queens_pharmacies_and_parks_merged[queens_pharmacies_and_parks_merged["Venue Category"]=="Pharmacy"]

park_pharm_serv_distance = pd.DataFrame(columns=['Park Latitude', 'Park Longitude', 'Venue Category','Venue Latitude', 'Venue Longitude', 'Distance', 'Venue'])

for lat1, lon1 in zip(queens_parks["Venue Latitude"], queens_parks["Venue Longitude"]):
    for lat2, lon2, ven in zip(queens_pharmacies["Venue Latitude"], queens_pharmacies["Venue Longitude"], queens_pharmacies["Venue"]):
        dist = haversine((lat1, lon1),(lat2, lon2))
      #  print(dist)
        park_pharm_serv_distance = park_pharm_serv_distance.append({'Park Latitude': lat1, 'Park Longitude': lon1, 'Venue Category': 'Pharmacy', 'Venue Latitude':lat2, 'Venue Longitude':lon2, 'Distance':dist, 'Venue':ven}, ignore_index=True)

park_pharm_serv_distance.sort_values(by=["Park Latitude", "Park Longitude", "Distance"], inplace=True)

#limit to pharmacies within 2 km from a park
park_pharm_serv_distance_2km_or_less = park_pharm_serv_distance[park_pharm_serv_distance["Distance"] <= 2000]
park_pharm_serv_distance_2km_or_less.reset_index(drop=True, inplace=True)

park_adult_care_serv_distance = pd.DataFrame(columns=['Park Latitude', 'Park Longitude', 'Venue Category','Venue Latitude', 'Venue Longitude', 'Distance', 'Venue'])

for lat1, lon1 in zip(queens_parks["Venue Latitude"], queens_parks["Venue Longitude"]):
    for lat2, lon2, ven in zip(adult_care_serv_in_pharm_and_parks["Latitude"], adult_care_serv_in_pharm_and_parks["Longitude"], adult_care_serv_in_pharm_and_parks["ProgramName"]):
        dist = haversine((lat1, lon1),(lat2, lon2))
      #  print(dist)
        park_adult_care_serv_distance = park_adult_care_serv_distance.append({'Park Latitude': lat1, 'Park Longitude': lon1, 'Venue Category': 'AdultCareService', 'Venue Latitude':lat2, 'Venue Longitude':lon2, 'Distance':dist, 'Venue':ven}, ignore_index=True)
        
park_adult_care_serv_distance.sort_values(by=["Park Latitude", "Park Longitude", "Distance"], ascending=True, inplace=True)   

#limit to adult care services within 1 km from a park
park_adult_care_serv_distance_1km_or_less = park_adult_care_serv_distance[park_adult_care_serv_distance["Distance"] <= 1000]
park_adult_care_serv_distance_1km_or_less.reset_index(drop=True, inplace=True)

In [102]:
park_pharm_serv_distance_2km_or_less.head()


Unnamed: 0,Park Latitude,Park Longitude,Venue Category,Venue Latitude,Venue Longitude,Distance,Venue
0,40.596248,-73.77097,Pharmacy,40.600532,-73.754032,1507.769812,Vista Pharmacy & Surgical
1,40.614236,-73.761475,Pharmacy,40.600532,-73.754032,1648.750837,Vista Pharmacy & Surgical
2,40.665932,-73.758064,Pharmacy,40.673555,-73.77072,1363.404179,Variety Drugs
3,40.665932,-73.758064,Pharmacy,40.670883,-73.773941,1448.226853,GBB Wellness Pharmacy
4,40.665932,-73.758064,Pharmacy,40.660851,-73.739247,1685.246699,Walgreens


In [103]:
park_adult_care_serv_distance_1km_or_less.head()


Unnamed: 0,Park Latitude,Park Longitude,Venue Category,Venue Latitude,Venue Longitude,Distance,Venue
0,40.691591,-73.853016,AdultCareService,40.694529,-73.849064,466.784688,WOODHAVEN ADULT DAYCARE INC
1,40.691591,-73.853016,AdultCareService,40.693713,-73.85925,576.299961,WOODHAVEN LIFESTYLE SENIOR CENTER INC.
2,40.691591,-73.853016,AdultCareService,40.6892,-73.861477,761.507232,DIVERSIFIED SOCIAL ADULT DAY CARE INC.
3,40.691591,-73.853016,AdultCareService,40.683768,-73.85688,929.118693,"DESHI SENIOR CENTER, LLC"
4,40.691645,-73.853378,AdultCareService,40.694529,-73.849064,485.071206,WOODHAVEN ADULT DAYCARE INC


In [104]:

park_adult_care_serv_distance_1km_or_less_stats=pd.DataFrame(park_adult_care_serv_distance_1km_or_less[['Park Latitude', 'Park Longitude']])
park_adult_care_serv_distance_1km_or_less_stats['Cnt']=1
pacsd_1km_or_less_stats = park_adult_care_serv_distance_1km_or_less_stats.groupby(['Park Latitude', 'Park Longitude']).count()
pacsd_1km_or_less_stats.sort_values(by="Cnt", ascending=False, inplace=True)
pacsd_1km_or_less_stats.reset_index(drop=False, inplace=True)
pacsd_1km_or_less_stats

Unnamed: 0,Park Latitude,Park Longitude,Cnt
0,40.733952,-73.808854,5
1,40.691591,-73.853016,4
2,40.691645,-73.853378,4
3,40.726679,-73.862636,3
4,40.743666,-73.855443,3
5,40.741053,-73.922213,2
6,40.747019,-73.921128,2
7,40.712344,-73.764469,1
8,40.712351,-73.764478,1
9,40.749273,-73.714957,1


In [105]:
park_pharm_serv_distance_2km_or_less_stats=pd.DataFrame(park_pharm_serv_distance_2km_or_less[['Park Latitude', 'Park Longitude']])
park_pharm_serv_distance_2km_or_less_stats['Cnt']=1
pphsd_2km_or_less_stats=park_pharm_serv_distance_2km_or_less_stats.groupby(['Park Latitude', 'Park Longitude']).count()
pphsd_2km_or_less_stats.sort_values(by="Cnt", ascending=False, inplace=True)
pphsd_2km_or_less_stats.reset_index(drop=False, inplace=True)
pphsd_2km_or_less_stats

Unnamed: 0,Park Latitude,Park Longitude,Cnt
0,40.726251,-73.847759,6
1,40.741053,-73.922213,6
2,40.747019,-73.921128,6
3,40.691591,-73.853016,5
4,40.691645,-73.853378,5
5,40.716422,-73.840083,5
6,40.722239,-73.847141,5
7,40.726679,-73.862636,5
8,40.665932,-73.758064,3
9,40.749273,-73.714957,3


In [106]:

queens_final_parks_pharm_adult_serv_info = queens_parks[['Neighborhood', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category', 'Cluster Labels']]
queens_final_parks_pharm_adult_serv_info.rename(columns={'Venue Latitude':'Park Latitude', 'Venue Longitude':'Park Longitude'}, inplace=True)
queens_final_parks_pharm_adult_serv_info1=pd.merge(queens_final_parks_pharm_adult_serv_info, pacsd_1km_or_less_stats, how='left', left_on=['Park Latitude','Park Longitude'], right_on=['Park Latitude', 'Park Longitude'])
queens_final_parks_pharm_adult_serv_info1.rename(columns={'Cnt':'AdultService Count'}, inplace=True)
queens_final_parks_pharm_adult_serv_info2=pd.merge(queens_final_parks_pharm_adult_serv_info1, pphsd_2km_or_less_stats, how='left', left_on=['Park Latitude','Park Longitude'], right_on=['Park Latitude', 'Park Longitude'])
queens_final_parks_pharm_adult_serv_info2.rename(columns={'Cnt':'Pharmacy Count'}, inplace=True)
values = {'AdultService Count': 0, 'Pharmacy Count': 0}
queens_final_parks_pharm_adult_serv_info2.fillna(value=values, inplace=True)
queens_final_parks_pharm_adult_serv_info=queens_final_parks_pharm_adult_serv_info2.astype({"AdultService Count": int, "Pharmacy Count": int})
queens_final_parks_pharm_adult_serv_info

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  return super(DataFrame, self).rename(**kwargs)


Unnamed: 0,Neighborhood,Venue,Park Latitude,Park Longitude,Venue Category,Cluster Labels,AdultService Count,Pharmacy Count
0,Corona,William F. Moore Park ('Spaghetti Park'),40.743666,-73.855443,Park,1,3,3
1,Forest Hills,Yellowstone Park,40.726251,-73.847759,Park,2,0,6
2,Forest Hills,MacDonald Park,40.722239,-73.847141,Park,2,0,5
3,Sunnyside,"Thomas P. Noonan, Jr. Playground",40.741053,-73.922213,Park,2,2,6
4,Maspeth,Whitefish Triangle Park,40.726517,-73.901752,Park,2,0,2
5,Rego Park,Fleetwood Triangle,40.726679,-73.862636,Park,2,3,5
6,Woodhaven,Equity Park,40.691645,-73.853378,Park,2,4,5
7,Woodhaven,60 Park,40.691591,-73.853016,Park,2,4,5
8,South Ozone Park,Back Streets Park (Officer Edward Byrn Park),40.667846,-73.806453,Park,1,0,0
9,South Ozone Park,Pals Oval Park,40.668634,-73.805878,Park,1,0,0


In [107]:
queens_final_parks_pharm_adult_serv_candidates = queens_final_parks_pharm_adult_serv_info[(queens_final_parks_pharm_adult_serv_info["AdultService Count"]==0) & (queens_final_parks_pharm_adult_serv_info["Pharmacy Count"] > 0)]
queens_final_parks_pharm_adult_serv_candidates_potential = queens_final_parks_pharm_adult_serv_candidates.sort_values(by=["Cluster Labels", "Pharmacy Count"], ascending=False)
queens_final_parks_pharm_adult_serv_candidates_potential.reset_index(drop=True, inplace=True)
queens_final_parks_pharm_adult_serv_candidates_potential

Unnamed: 0,Neighborhood,Venue,Park Latitude,Park Longitude,Venue Category,Cluster Labels,AdultService Count,Pharmacy Count
0,Forest Hills,Yellowstone Park,40.726251,-73.847759,Park,2,0,6
1,Forest Hills,MacDonald Park,40.722239,-73.847141,Park,2,0,5
2,Forest Hills Gardens,Hawthorne Park,40.716422,-73.840083,Park,2,0,5
3,Maspeth,Whitefish Triangle Park,40.726517,-73.901752,Park,2,0,2
4,College Point,Popepnhausen Park,40.781653,-73.844672,Park,2,0,2
5,College Point,Poppenhuesen Triangle Park,40.78813,-73.84597,Park,2,0,2
6,Springfield Gardens,Springfield Park,40.665932,-73.758064,Park,1,0,3
7,Edgemere,Bayswater Park,40.596248,-73.77097,Park,1,0,1
8,Laurelton,Laurelton Park,40.670598,-73.7359,Park,1,0,1
9,Middle Village,Juniper Valley Park,40.720281,-73.881258,Park,1,0,1


In [108]:
park_adult_care_serv_distance2 = pd.DataFrame(columns=['Park Latitude', 'Park Longitude', 'Venue Category','Venue Latitude', 'Venue Longitude', 'Distance', 'Venue'])

for lat1, lon1 in zip(queens_parks["Venue Latitude"], queens_parks["Venue Longitude"]):
    for lat2, lon2, ven in zip(queens_adult_care_services_data["Latitude"], queens_adult_care_services_data["Longitude"], queens_adult_care_services_data["ProgramName"]):
        dist = haversine((lat1, lon1),(lat2, lon2))
      #  print(dist)
        park_adult_care_serv_distance2 = park_adult_care_serv_distance2.append({'Park Latitude': lat1, 'Park Longitude': lon1, 'Venue Category': 'AdultCareService', 'Venue Latitude':lat2, 'Venue Longitude':lon2, 'Distance':dist, 'Venue':ven}, ignore_index=True)
        
park_adult_care_serv_distance2.sort_values(by=["Park Latitude", "Park Longitude", "Distance"], ascending=True, inplace=True)   

#limit to adult care services within 1 km from a park
park_adult_care_serv_distance_1km_or_less2 = park_adult_care_serv_distance2[park_adult_care_serv_distance2["Distance"] <= 1000]
park_adult_care_serv_distance_1km_or_less2.reset_index(drop=True, inplace=True)

park_adult_care_serv_distance_1km_or_less_stats2=pd.DataFrame(park_adult_care_serv_distance_1km_or_less2[['Park Latitude', 'Park Longitude']])
park_adult_care_serv_distance_1km_or_less_stats2['Cnt']=1
pacsd_1km_or_less_stats2 = park_adult_care_serv_distance_1km_or_less_stats2.groupby(['Park Latitude', 'Park Longitude']).count()
pacsd_1km_or_less_stats2.sort_values(by="Cnt", ascending=False, inplace=True)
pacsd_1km_or_less_stats2.reset_index(drop=False, inplace=True)
pacsd_1km_or_less_stats2

queens_final_parks_pharm_adult_serv_info = queens_parks[['Neighborhood', 'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category', 'Cluster Labels']]
queens_final_parks_pharm_adult_serv_info.rename(columns={'Venue Latitude':'Park Latitude', 'Venue Longitude':'Park Longitude'}, inplace=True)
queens_final_parks_pharm_adult_serv_info1=pd.merge(queens_final_parks_pharm_adult_serv_info, pacsd_1km_or_less_stats2, how='left', left_on=['Park Latitude','Park Longitude'], right_on=['Park Latitude', 'Park Longitude'])
queens_final_parks_pharm_adult_serv_info1.rename(columns={'Cnt':'AdultService Count'}, inplace=True)
queens_final_parks_pharm_adult_serv_info2=pd.merge(queens_final_parks_pharm_adult_serv_info1, pphsd_2km_or_less_stats, how='left', left_on=['Park Latitude','Park Longitude'], right_on=['Park Latitude', 'Park Longitude'])
queens_final_parks_pharm_adult_serv_info2.rename(columns={'Cnt':'Pharmacy Count'}, inplace=True)
values = {'AdultService Count': 0, 'Pharmacy Count': 0}
queens_final_parks_pharm_adult_serv_info2.fillna(value=values, inplace=True)
queens_final_parks_pharm_adult_serv_info=queens_final_parks_pharm_adult_serv_info2.astype({"AdultService Count": int, "Pharmacy Count": int})

queens_final_parks_pharm_adult_serv_candidates = queens_final_parks_pharm_adult_serv_info[(queens_final_parks_pharm_adult_serv_info["AdultService Count"]==0) & (queens_final_parks_pharm_adult_serv_info["Pharmacy Count"] > 0)]
queens_final_parks_pharm_adult_serv_candidates_potential2 = queens_final_parks_pharm_adult_serv_candidates.sort_values(by=["Cluster Labels", "Pharmacy Count"], ascending=False)
queens_final_parks_pharm_adult_serv_candidates_potential2.reset_index(drop=True, inplace=True)
queens_final_parks_pharm_adult_serv_candidates_potential2

Unnamed: 0,Neighborhood,Venue,Park Latitude,Park Longitude,Venue Category,Cluster Labels,AdultService Count,Pharmacy Count
0,Forest Hills,Yellowstone Park,40.726251,-73.847759,Park,2,0,6
1,Forest Hills,MacDonald Park,40.722239,-73.847141,Park,2,0,5
2,Forest Hills Gardens,Hawthorne Park,40.716422,-73.840083,Park,2,0,5
3,Maspeth,Whitefish Triangle Park,40.726517,-73.901752,Park,2,0,2
4,College Point,Popepnhausen Park,40.781653,-73.844672,Park,2,0,2
5,College Point,Poppenhuesen Triangle Park,40.78813,-73.84597,Park,2,0,2
6,Springfield Gardens,Springfield Park,40.665932,-73.758064,Park,1,0,3
7,Laurelton,Laurelton Park,40.670598,-73.7359,Park,1,0,1
8,Middle Village,Juniper Valley Park,40.720281,-73.881258,Park,1,0,1
9,Bayswater,Inwood Park,40.614236,-73.761475,Park,1,0,1


In [109]:
pharm_df_dist = pd.DataFrame(columns=['Neighborhood', 'Park', 'Pharmacy Latitude', 'Pharmacy Longitude', 'Venue', 'Distance'])

for i in range(0,queens_final_parks_pharm_adult_serv_candidates_potential.shape[0]):
    #print('i=', i, queens_final_parks_pharm_adult_serv_candidates_final.iloc[i]['Park Latitude'], queens_final_parks_pharm_adult_serv_candidates_final.iloc[i]['Park Longitude'], queens_final_parks_pharm_adult_serv_candidates_final.iloc[i]['Pharmacy Count'])
    df_temp = park_pharm_serv_distance_2km_or_less[(park_pharm_serv_distance_2km_or_less['Park Latitude'] == queens_final_parks_pharm_adult_serv_candidates_potential.iloc[i]['Park Latitude']) & (park_pharm_serv_distance_2km_or_less['Park Longitude'] == queens_final_parks_pharm_adult_serv_candidates_potential.iloc[i]['Park Longitude'])]
    for lat1, lon1, ven, dist in zip(df_temp['Venue Latitude'], df_temp['Venue Longitude'], df_temp['Venue'], df_temp['Distance']):
        pharm_df_dist = pharm_df_dist.append({'Neighborhood':queens_final_parks_pharm_adult_serv_candidates_potential.iloc[i]['Neighborhood'], 'Park':queens_final_parks_pharm_adult_serv_candidates_potential.iloc[i]['Venue'],'Pharmacy Latitude': lat1, 'Pharmacy Longitude': lon1, 'Venue': ven, 'Distance':dist}, ignore_index=True)

pharm_df_dist.sort_values(by=["Park", "Distance"], ascending=True, inplace=True)   
pharm_df_dist

Unnamed: 0,Neighborhood,Park,Pharmacy Latitude,Pharmacy Longitude,Venue,Distance
25,Edgemere,Bayswater Park,40.600532,-73.754032,Vista Pharmacy & Surgical,1507.769812
11,Forest Hills Gardens,Hawthorne Park,40.718438,-73.838177,Rite Aid,275.817504
12,Forest Hills Gardens,Hawthorne Park,40.721396,-73.843421,CVS pharmacy,620.683159
13,Forest Hills Gardens,Hawthorne Park,40.724004,-73.847911,Walgreens,1070.793864
14,Forest Hills Gardens,Hawthorne Park,40.726791,-73.853772,CVS pharmacy,1631.436981
15,Forest Hills Gardens,Hawthorne Park,40.703557,-73.824861,CVS pharmacy,1922.18049
28,Bayswater,Inwood Park,40.600532,-73.754032,Vista Pharmacy & Surgical,1648.750837
27,Middle Village,Juniper Valley Park,40.727184,-73.892861,CVS pharmacy,1243.460892
26,Laurelton,Laurelton Park,40.660851,-73.739247,Walgreens,1120.385569
6,Forest Hills,MacDonald Park,40.724004,-73.847911,Walgreens,206.755769


In [110]:
adult_care_serv_df_dist = pd.DataFrame(columns=['Neighborhood', 'Park', 'Care Latitude', 'Care Longitude', 'Venue', 'Distance'])

for i in range(0,queens_final_parks_pharm_adult_serv_candidates_potential.shape[0]):
    #print('i=', i, queens_final_parks_pharm_adult_serv_candidates_final.iloc[i]['Park Latitude'], queens_final_parks_pharm_adult_serv_candidates_final.iloc[i]['Park Longitude'], queens_final_parks_pharm_adult_serv_candidates_final.iloc[i]['Pharmacy Count'])
    df_temp = park_adult_care_serv_distance[(park_adult_care_serv_distance['Park Latitude'] == queens_final_parks_pharm_adult_serv_candidates_potential.iloc[i]['Park Latitude']) & (park_adult_care_serv_distance['Park Longitude'] == queens_final_parks_pharm_adult_serv_candidates_potential.iloc[i]['Park Longitude'])]
    for lat1, lon1, ven, dist in zip(df_temp['Venue Latitude'], df_temp['Venue Longitude'], df_temp['Venue'], df_temp['Distance']):
        adult_care_serv_df_dist = adult_care_serv_df_dist.append({'Neighborhood':queens_final_parks_pharm_adult_serv_candidates_potential.iloc[i]['Neighborhood'], 'Park':queens_final_parks_pharm_adult_serv_candidates_potential.iloc[i]['Venue'],'Care Latitude': lat1, 'Care Longitude': lon1, 'Venue': ven, 'Distance':dist}, ignore_index=True)

adult_care_serv_df_dist.sort_values(by=["Park", "Distance"], ascending=True, inplace=True)   
adult_care_serv_df_dist_stats=pd.DataFrame(adult_care_serv_df_dist.groupby(['Neighborhood', 'Park'])['Distance'].apply(lambda x: x.min()))
adult_care_serv_df_dist_stats

Unnamed: 0_level_0,Unnamed: 1_level_0,Distance
Neighborhood,Park,Unnamed: 2_level_1
Bayswater,Inwood Park,1295.510255
College Point,Popepnhausen Park,1450.871273
College Point,Poppenhuesen Triangle Park,2172.749876
Edgemere,Bayswater Park,1320.392766
Forest Hills,MacDonald Park,1039.123592
Forest Hills,Yellowstone Park,1001.67897
Forest Hills Gardens,Hawthorne Park,1366.428196
Laurelton,Laurelton Park,5424.023681
Maspeth,Whitefish Triangle Park,2770.078949
Middle Village,Juniper Valley Park,1882.680678


In [111]:
#replace ' with blank since Folium map would "choke" on ' in the popup part and the map would not be displayed
pharm_df_dist['Venue'].replace(regex=True,inplace=True,to_replace=r"'",value=r'')

In [112]:

from folium.plugins import HeatMap
from folium.features import DivIcon

f = folium.Figure(width=1000, height=600)
base_heatmap = folium.Map(location=[latitude, longitude], zoom_start=10).add_to(f)
HeatMap(queens_final_parks_pharm_adult_serv_candidates_potential[['Park Latitude','Park Longitude','Pharmacy Count']].values.tolist()).add_to(base_heatmap)

#mark parks
for i in range(0,queens_final_parks_pharm_adult_serv_candidates_potential.shape[0]):
    ParkName = '<div style="font-size: 12pt">'+queens_final_parks_pharm_adult_serv_candidates_potential.iloc[i]['Venue']+'</div>'    
    folium.map.Marker(
        [queens_final_parks_pharm_adult_serv_candidates_potential.iloc[i]['Park Latitude'],queens_final_parks_pharm_adult_serv_candidates_potential.iloc[i]['Park Longitude']],
        icon=DivIcon(
        icon_size=(150,36),
        icon_anchor=(0,0),
        html=ParkName,
        )
    ).add_to(base_heatmap)
    
#mark pharmacies
for i in range(0,pharm_df_dist.shape[0]):
    folium.Marker([pharm_df_dist.iloc[i]['Pharmacy Latitude'], pharm_df_dist.iloc[i]['Pharmacy Longitude']], popup=pharm_df_dist.iloc[i]['Venue']).add_to(base_heatmap)

base_heatmap

In [113]:
queens_final_parks_pharm_adult_serv_candidates_potential


Unnamed: 0,Neighborhood,Venue,Park Latitude,Park Longitude,Venue Category,Cluster Labels,AdultService Count,Pharmacy Count
0,Forest Hills,Yellowstone Park,40.726251,-73.847759,Park,2,0,6
1,Forest Hills,MacDonald Park,40.722239,-73.847141,Park,2,0,5
2,Forest Hills Gardens,Hawthorne Park,40.716422,-73.840083,Park,2,0,5
3,Maspeth,Whitefish Triangle Park,40.726517,-73.901752,Park,2,0,2
4,College Point,Popepnhausen Park,40.781653,-73.844672,Park,2,0,2
5,College Point,Poppenhuesen Triangle Park,40.78813,-73.84597,Park,2,0,2
6,Springfield Gardens,Springfield Park,40.665932,-73.758064,Park,1,0,3
7,Edgemere,Bayswater Park,40.596248,-73.77097,Park,1,0,1
8,Laurelton,Laurelton Park,40.670598,-73.7359,Park,1,0,1
9,Middle Village,Juniper Valley Park,40.720281,-73.881258,Park,1,0,1
