# Step 1b: Get Coordinates for HDB Blocks

## Import Libraries

In [1]:
import pandas as pd
import requests
import json as json
import time
import csv

from myFunctions import *
keys = getKeys('GoogleAPIKey.txt')

## User Inputs
I like to date my output files so that I don't accidentally overwrite files of my previous runs. For more automated systems, it should be quite simple to append today's date to the filename. A function will be needed to append a version number to the date if there is already a file with the same name in the directory. But for developmental purposes, it is better to have the user manually change this so that the user knows what he/she is doing. 

The data file is downloaded from data.gov.sg. 

We are expecting 8260 unique HDB addresses so this code will need to be run several times to not exceed the daily API usage limit.  The split can be done by changing the 'start' and 'end' variables which are used to slice the pandas data frame. Note that contrary to usual python slices, both the start and the end are included.

In [2]:
dataFile = 'resale_prices_2015_.csv' # This file is from data.gov.sg
outputFile = 'HDB_address_latlong_20180307.csv'
outputFailureFile = 'HDB_address_latlong_failed_20180307.csv'

# Set data range to query for this run. Total data point is 8260. 
start = 0
end = 2100

## Get HDB Addresses to Query
HDB address formats are created. <br>
Duplicate addresses are removed to prevent API request redundancies. <br>
The unique addresses are stored in a list to be sliced for separate queries later. <br>
The data frame that is read from the data file is preserved to:
- contain all required columns. 
- be indexed by HDB address.

This is so that details of failed requests can be accessed from this data frame and saved to a failure file.  The failure file can thus be re-run with this same code later. 

In [3]:
raw = pd.read_csv(dataFile)
print('No of HDB addresses: %i' %(len(raw)))

# Create address format. 
address = 'Blk ' + raw['block'] + ' ' + raw['street_name'] + ' ' + 'Singapore'
address = address.tolist()
address = ['+'.join(i.split(' '))for i in address]

# Remove duplicates by address, to prevent API request redundancies
raw['address'] = address
raw.drop_duplicates(subset = 'address', inplace = True) 

# Create address list to query and slice accordingly. 
address = raw['address'].tolist()
address = address[start:end]

# Preserve raw for failure retry later
raw = raw.loc[:, ['address','block', 'street_name']] 
raw.set_index('address', inplace = True)

No of HDB addresses: 40


## API Query
The outermost ***with*** loop ensures that retrieved data are all written to a file if the code is interrupted.   

The ***success*** and ***failure*** dictionaries store the coordinate/error for each address. This is to ease troubleshooting. In the event that these dictionaries take up too much memory space, the ***success*** dictionary can be removed because the data are already written to file in the ***with*** loop. However, the ***failure*** dictinary needs to be kept as the data are converted to a data frame and file at the end of the code. 

In [4]:
success = {}
failure = {}
url = 'https://maps.googleapis.com/maps/api/geocode/json?'

In [5]:
with open(outputFile, 'w', newline = '', encoding = 'utf-8') as output:
    writer = csv.writer(output, delimiter=',')
    writer.writerow(['HDB Address', 'Latitude', 'Longtitude', 'latlong'])

    for i in address:
        params = {'key':keys['GoogleMapsGeocoding'],'address':i}
        paramString = getParamString(params)
        reqStatus, returnedStatus, data, tries = tryGET(3,url+paramString)
    
        if reqStatus == 200 and returnedStatus == 'OK':
            lat = data['results'][0]['geometry']['location']['lat']
            long = data['results'][0]['geometry']['location']['lng']
            success[i] = (lat,long)
            writer.writerow([i, lat, long, str(lat) + ',' + str(long)])
            print(i, ' Geocoding succeeded.')
        else:
            failure[i] = (raw.loc[i,'block'], raw.loc[i,'street_name'], reqStatus, returnedStatus)
            print(i, 'Geocoding failed after ', tries, ' tries.')
            print(reqStatus, returnedStatus)

print('Number of success: ', len(success))
print('Number of failures: ', len(failure))

Try: 1. 
Blk+83+COMMONWEALTH+CL+Singapore  Geocoding succeeded.
Try: 1. 
Blk+97+COMMONWEALTH+CRES+Singapore  Geocoding succeeded.
Try: 1. 
Blk+95+COMMONWEALTH+DR+Singapore  Geocoding succeeded.
Try: 1. 
Blk+98+COMMONWEALTH+CRES+Singapore  Geocoding succeeded.
Try: 1. 
Blk+93+COMMONWEALTH+DR+Singapore  Geocoding succeeded.
Try: 1. 
Blk+85+COMMONWEALTH+CL+Singapore  Geocoding succeeded.
Try: 1. 
Blk+101+COMMONWEALTH+CRES+Singapore  Geocoding succeeded.
Try: 1. 
Blk+90+COMMONWEALTH+DR+Singapore  Geocoding succeeded.
Try: 1. 
Blk+7+COMMONWEALTH+AVE+Singapore  Geocoding succeeded.
Try: 1. 
Blk+410+COMMONWEALTH+AVE+WEST+Singapore  Geocoding succeeded.
Try: 1. 
Blk+110+COMMONWEALTH+CRES+Singapore  Geocoding succeeded.
Try: 1. 
Blk+100+COMMONWEALTH+CRES+Singapore  Geocoding succeeded.
Try: 1. 
Blk+88+COMMONWEALTH+CL+Singapore  Geocoding succeeded.
Try: 1. 
Blk+84+COMMONWEALTH+CL+Singapore  Geocoding succeeded.
Try: 1. 
Blk+412+COMMONWEALTH+AVE+WEST+Singapore  Geocoding succeeded.
Try: 1. 
Blk+

## Save Failure Data

In [6]:
if len(failure)!=0:
    failure = pd.DataFrame.from_dict(failure, orient = 'Index')
    failure.reset_index(level=0, inplace=True)
    failure.columns = ['HDB Address', 'block', 'street_name','reqStatus', 'returnedStatus']
    failure.to_csv(outputFailureFile, index = False)