# Scoring private residential data: calculation of number of amenities in the vicinity

Ansel Lim

Created 1 November 2021, updated Dec 2021

Given a dataset from the Urban Redevelopment Authority (private residential property transactions, available via API calls https://www.ura.gov.sg/maps/api/#private-residential-property-transactions) as well as other datasets (taxi stands, primary schools, mrt, hawker, carparks, bus stops, chas clinics, sports facilities, community centers), we calculate the NUMBER of each of these amenities within a specified RADIUS (e.g. 1 kilometer) of each property.

In [1]:
# Specify radius in kilometers
RADIUS=1.0

In [2]:
from datetime import datetime
import pandas as pd
from geopy.distance import geodesic
import re
import os
import time

In [3]:
timestamp=datetime.now().strftime("%d/%m/%Y %H:%M:%S")
print(timestamp) # GMT

04/12/2021 17:06:49


Load data

In [4]:
os.chdir("../data/raw")

In [5]:
ura = pd.read_csv("./ura.csv")
malls = pd.read_csv("./data_malls.csv")
taxi_stands = pd.read_csv("./taxi_stands.csv")
primary_schools=pd.read_csv("./data_prischools.csv")
mrt = pd.read_csv("./data_MRT.csv")
hawker=pd.read_csv("./data_hawker.csv")
carparks=pd.read_csv("./carparks.csv")
bus_stops = pd.read_csv("./bus_stops.csv")
amenities = pd.read_csv("./amenities.csv")
supermarkets = pd.read_csv("./supermarkets.csv")
secondary_schools=pd.read_csv("./secondary_schools.csv")
eating_establishments=pd.read_csv("./eating_establishments.csv")
parks = pd.read_csv("./parks.csv")

In [6]:
taxi_stands["lat"],taxi_stands["long"] = taxi_stands["Latitude"],taxi_stands["Longitude"]
primary_schools[['long','lat']]=primary_schools['coordinates'].str.split(',',1,expand=True)
mrt[['long','lat']]=mrt['Coordinates'].str.split(',',1,expand=True)
hawker[['long','lat']]=hawker['Coordinates'].str.split(',',1,expand=True)
hawker['lat']=hawker['lat'].str.rstrip(",0.0")
carparks['lat'],carparks['long']=carparks['latitude'],carparks['longitude']
bus_stops['lat'],bus_stops['long']=bus_stops['Latitude'],bus_stops['Longitude']
eating_establishments.rename(columns={'lon':'long'}, inplace=True)

In [7]:
df = ura.copy(deep=True)

In [8]:
# Function for extracting coordinate data from polygon data in some dataframes
def getLatLong(x):
  coordinates = x['coordinates']
  lat,long=re.findall('[0-9.]+', coordinates)[:2] # get first pair of coordinates (because some geometries are polygons)
  return (long,lat)

In [9]:
sports_facility_types = list(amenities.facility_type.unique())
sports_facility_types.remove('CHAS Clinic')
sports_facility_types.remove('Community Centre')
chas_clinics = amenities[amenities['facility_type']=='CHAS Clinic'].reset_index()
sports_facilities = amenities[amenities['facility_type'].isin(sports_facility_types)].reset_index()
community_centers = amenities[amenities['facility_type']=='Community Centre'].reset_index()

In [10]:
places = [malls,taxi_stands,primary_schools,mrt,hawker,carparks,bus_stops,chas_clinics,sports_facilities,community_centers,supermarkets,secondary_schools,eating_establishments,parks]
places_names = ['malls','taxi_stands','primary_schools','mrt','hawker','carparks','bus_stops','chas_clinics','sports_facilities','community_centers','supermarkets','secondary_schools','eating_establishments','parks']

Calculate distances

In [None]:
start=time.time()
for i in range(len(places)):
  interm=time.time()
  dataframe = places[i]
  place_name = places_names[i]
  print("Working on feature dataframe {}".format(place_name))
  print("Number of places of interest:", dataframe.shape[0])
  print("Estimated number of pairwise computations:", dataframe.shape[0] * df.shape[0])
  new_column_name = "num_"+place_name
  df[new_column_name] = 0
  for i in range(df.shape[0]):
    lat1 = df.loc[i,"lat"]
    long1 = df.loc[i,"long"]
    origin = (lat1,long1)
    counter = 0
    for j in range(dataframe.shape[0]):
      lat2 = dataframe.loc[j,"lat"]
      long2 = dataframe.loc[j,"long"]
      dest = (lat2,long2)
      try:
        dist = geodesic(origin,dest).km
        if dist<=RADIUS:
          counter+=1
      except ValueError:
        continue
    df.loc[i,new_column_name]=counter
  print("Completed working on feature dataframe:",place_name)
  print("Time taken for this feature dataframe (seconds):",time.time() - interm)
  df.to_csv('../processed/df.csv')
  print("Checkpointed; time elapsed:",time.time()-start)
end=time.time()
print("time taken: {}".format(end-start))

Working on feature dataframe malls
Number of places of interest: 169
Estimated number of pairwise computations: 399178
Completed working on feature dataframe: malls
Time taken for this feature dataframe (seconds): 71.01735091209412
Checkpointed; time elapsed: 71.03272914886475
Working on feature dataframe taxi_stands
Number of places of interest: 279
Estimated number of pairwise computations: 658998
Completed working on feature dataframe: taxi_stands
Time taken for this feature dataframe (seconds): 110.86588907241821
Checkpointed; time elapsed: 181.91343998908997
Working on feature dataframe primary_schools
Number of places of interest: 186
Estimated number of pairwise computations: 439332
Completed working on feature dataframe: primary_schools
Time taken for this feature dataframe (seconds): 74.72010397911072
Checkpointed; time elapsed: 256.64821219444275
Working on feature dataframe mrt
Number of places of interest: 189
Estimated number of pairwise computations: 446418
Completed work

In [None]:
df.to_csv('../processed/df.csv')