## Geocoding with the E911 API

This notebook demonstrates how to quickly read CSV files for batch geocoding using the Rhode Island E911 Locator API. The E911 Locator dataset was created to serve as an accurate spatial reference for emergency dispatchers in Rhode Island. It includes addressed and unaddressed locations, as well as occupied and unoccupied structures throughout Rhode Island.

In this first code block, we read a CSV file of independent schools in Rhode Island, filtering only those within the state, and prepare to lookup an address using two geocoding services: the address-level E911 Sites Address Locator and the street-level E911 Street Range Locator.

In [2]:
import requests, csv
import pandas as pd

In [3]:
base_url_ad = 'https://risegis.ri.gov/gpserver/rest/services/E911_Sites_AddressLocator/GeocodeServer/findAddressCandidates?'
base_url_st = 'https://risegis.ri.gov/gpserver/rest/services/E911_StreetRange_Locator/GeocodeServer/findAddressCandidates?'
records = []
multiples = []
addfile='RI_Independent_Schools.csv' #Input file with addresses
df = pd.read_csv(addfile)

ID = 'org_ID'
NAME = 'name'
ADDRESS = 'location_address1'
STATE = 'location_state'
CITY = 'location_city'
ZIP = 'location_zip'
MAX_LOCATIONS = 5

df = df[df[STATE] == 'RI'].iloc[:,range(0,16)]

Here, we define a function which will query the API for the Street, City, Address, and Zip provided in each row.

In [4]:
#Given a row with information to construct a query, 
# returns the JSON candidates identified with the query

def get_request_e911(row, base_url=base_url_ad):
    #Look up by Street, City, Address, Zip
    add_url=f'Street={row[ADDRESS]}&City={row[CITY]}&State={row[STATE]}&ZIP={row[ZIP]}'
    data_url=f'{base_url}{add_url}&maxLocations={MAX_LOCATIONS}&matchOutOfRange=true&f=pjson'
    response=requests.get(data_url)
    return response.json()['candidates']

Then, using the function we defined above, for each row we find the best match. We might have zero, one or multiple matches: if we have more than one, we will sort the candidates by their score and return the best performing candidate. We write the matching address and newly geolocated coordinates to each row.

In [5]:
from tqdm import tqdm
import traceback

df[ADDRESS] = df[ADDRESS].str.replace('One', '1')
for idx, row in tqdm(df.iterrows(), total=len(df)):
    try:
        request_data = get_request_e911(row)
        if len(request_data) > 0:
            multiples = []
            for m in request_data:
                multiples.append([row[ID], row[NAME], 
                                    m['score'],
                                    m['address'], 
                                    m['location']['x'],
                                    m['location']['y']])
            multiples.sort(key=lambda x: x[2], reverse=True)
            best_match = multiples[0]
            match_ct = len(request_data)
            score,match_address,x,y = best_match[2:6]
        else:
            score,match_address,x,y = None,None,None, None
        df.loc[idx, 'matches'] = match_ct
        df.loc[idx, 'score'] = score
        df.loc[idx, 'match_address'] = match_address
        df.loc[idx, 'x'] = x
        df.loc[idx,'y'] = y

    except Exception as e:
        traceback.print_exc() 
df

100%|██████████| 66/66 [00:06<00:00,  9.48it/s]


Unnamed: 0,org_ID,parent_ID,code,finance_code,name,name_short_30,name_short_15,org_type_ID,org_type,active,...,location_address2,location_address3,location_city,location_state,location_zip,matches,score,match_address,x,y
0,2829,,07353,,A Childs University - Cranston,A Childs University,A Childs Univer,2,School,Y,...,,,Cranston,RI,2910,1.0,100.00,"695 PARK AV, CRANSTON, RI, 02910",347536.614258,253094.306443
1,2928,,17304,,A Childs University - Smithfield,A Childs University,AChilds Univers,2,School,Y,...,,,Smithfield,RI,2917,2.0,100.00,"370 GEORGE WASHINGTON HWY, SMITHFIELD, RI, 02917",325939.653104,306458.937338
2,2607,,32340,,Middlebridge School,Middlebridge School,Middlebridge,2,School,Y,...,,,Narragansett,RI,2882,2.0,100.00,"333 OCEAN RD, NARRAGANSETT, RI, 02882",339544.107535,121061.940960
3,3232,,27306,,Sea Rose Montessori Co-op,Sea Rose Montessori Co-op,Sea Rose Montes,2,School,Y,...,St. Mary's Episcapol Church House,,Portsmouth,RI,2871,5.0,100.00,"324 E MAIN RD, PORTSMOUTH, RI, 02871",393209.428557,168938.668691
5,139,,01332,,Barrington Christian Academy,Barrington Christian Academy,Barr. Academy,2,School,Y,...,,,Barrington,RI,2806,1.0,100.00,"9 OLD COUNTY RD, BARRINGTON, RI, 02806",375140.195519,245072.839996
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
62,1365,,35371,,The Stork's Nest Child Academy III,The Stork's Nest,Stork's Nest,2,School,Y,...,,,Warwick,RI,2886,2.0,98.65,"1100 TOLL GATE RD, WARWICK, RI, 02886",328182.294513,228614.944655
63,1438,,38306,,Islamic School of RI,Islamic School of RI,Islamic School,2,School,Y,...,,,West Warwick,RI,2893,2.0,100.00,"840 PROVIDENCE ST, WEST WARWICK, RI, 02893",330112.395392,233252.688399
64,1437,,38305,,The Tides School - West Warwick,The Tides School - WW,The Tides,2,School,Y,...,,,West Warwick,RI,2893,1.0,100.00,"222 WASHINGTON ST, WEST WARWICK, RI, 02893",320475.811653,225052.034163
65,1497,,39332,,Hillside Alternative Program,Hillside Alternative Program,Hillside,2,School,Y,...,,,Woonsocket,RI,2895,1.0,100.00,"141 MAIN ST, WOONSOCKET, RI, 02895",324259.005613,334899.740933


The `isna()` filter is a quick way to check for rows in a DataFrame which have no matches.

In [8]:
missing_idxs = df[df['match_address'].isna()].index
missing_idxs

Int64Index([35], dtype='int64')

In this case, we have one missing entry, the 35th. When using this notebook in conjunction with your own dataset, you may encounter many more cases with missing matches. In order to provide approximate locations, we can use the street-level locator instead. This will find more matches at the expense of accuracy.

We iterate over the missing entries, this time using the street address locator.

In [9]:
for idx in missing_idxs:
    request_data = get_request_e911(df.iloc[idx], base_url=base_url_st)

    #Get the best match.
    if len(request_data) > 0:
        multiples = []
        for m in request_data:
            multiples.append([row[ID], row[NAME], 
                                m['score'],
                                m['address'], 
                                m['location']['x'],
                                m['location']['y']])
        multiples.sort(key=lambda x: x[2], reverse=True)
        best_match = multiples[0]
        match_ct = len(request_data)
        score,match_address,x,y = best_match[2:6]
    else:
        score,match_address,x,y = None,None,None, None

    df.loc[idx, 'matches'] = match_ct
    df.loc[idx, 'score'] = score
    df.loc[idx, 'match_address'] = match_address
    df.loc[idx, 'x'] = x
    df.loc[idx,'y'] = y
    df.loc[idx]

In [11]:
assert(df['match_address'].isna().sum() == 0)

Now that we have found an address for each location, we can save the file to CSV format.

In [12]:
from datetime import date, time
#Write Output Records to CSV
today =str(date.today())
outfile = addfile.split('.')[0]+'_MATCHED_'+today+'.csv'
df.to_csv(outfile, index=False)

Finally, we can load the CSV into geoprocessing software and verify that the locations are correct. Note that the RI geocoder outputs longitude and latitude in EPSG:3438.

![Image of geolocated coordinates](images/geoloc.png)
