# Geocode a list of placenames using TLC

Read in a CSV file containing placenames (country, capital etc) and query the [TLC map API](https://tlcmap.org/help/guides/ghap-guide/#ws) to return latitudes and longitudes for each of them.

This code was derived from [ATAP geolocation tools workshop](https://github.com/Australian-Text-Analytics-Platform/geolocation-tools-workshop) notebooks.

## Install and import required dependencies

In [1]:
!pip install pandas
!pip install ratelimit

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
import urllib
import pandas as pd
import requests
import json
from ratelimit import limits, sleep_and_retry

Querying the TLC database requires construction of URLs for each name, and handling multiple responses. The following code will build the query strings and perform the API requests. The TLC API has limits on how many requests can be performed. To stay within the limits, you may need to adjust `TLC_query_delay` below.

In [3]:
# Build a url to query the tlcmap/ghap API.
# - placename: the place we're trying to locate
# - search_type: what search type to use (accepts one of ['contains','fuzzy','exact'])
# ref: https://www.tlcmap.org/guides/ghap/#ws
def tlc_build_url(placename: str,
                  search_type: str,
                  search_public_data: bool = False,
                  TLCMap_limit: int = 1
                  ) -> str:

    safe_placename = urllib.parse.quote(placename.strip().lower())

    url = f"https://tlcmap.org/ghap/search?"

    if search_type == "fuzzy":
        url += f"fuzzyname={safe_placename}"
    elif search_type == "exact":
        url += f"name={safe_placename}"
    elif search_type == "contains":
        url += f"containsname={safe_placename}"
    else:
        return None

    # Search Australian National Placenames Survey provided data
    url += "&searchausgaz=on"

    # Search public provided data, this data could be unreliable
    if search_public_data == True:
        url += "&searchpublicdatasets=on"
    else:
        url += "&searchpublicdatasets=off"

    # Retrieve data as JSON
    url += "&format=json"

    # Limit the number of results
    url += "&paging=" + str(TLCMap_limit)

    return url


# Send rate-limited requests that stay within 1 query per n seconds
# number of seconds between queries
TLC_query_delay = 5

@sleep_and_retry
@limits(calls=1, period=TLC_query_delay)
def tlc_call_api(url):
    r = requests.get(url)
    if r.url == "https://tlcmap.org/ghap/maxpaging":
        return None

    # If the reply says the placename wasn't found, customise the JSON data for the reply
    if r.content.decode() == "No search results to display.":
        # This should have obviously just be an empty list of features, but TLCMap is badly behaved
        response = json.loads('{"type": "FeatureCollection","metadata": {},"features": []}')
    # SUCCESS! Record the spatial data provided in the reply
    elif r.ok:
        response = r.json()  # get [lon, lat] etc. for spatial matches

    return response


# Use TLCMap/GHAP API to check a placename
def tlc_query_name(placename: str,
                   search_type: str,
                   search_public_data: bool,
                   TLCMap_limit: int):

    url = tlc_build_url(placename, search_type, search_public_data, TLCMap_limit)
    if url:
        return tlc_call_api(url)

    return None


## Data preparation

In [4]:
# placenames = ["Caloundra", "Brisbane"]

placenames_df = pd.read_csv('placenames.csv')
placenames = placenames_df['LocationName'].tolist()

print(placenames)

['Melbourne', 'Brisbane', 'Perth', 'Darwin', 'Alice Springs', 'Ballarat', 'Canberra', 'Hobart', 'Hobart Town', 'Adelaide', 'Sydney', 'New South Wales', 'Victoria', 'Queensland', 'Tasmania', 'Northern Territory']


## Query the TLC database

First, set some parameters which will affect the performance of the search, and how many results we want to handle.

In [8]:
# Which order to do different searches for known locations?
# Alt values: 'exact', 'fuzzy', 'contains'
search_type = "exact"

# Flag whether to use data provided by the public
search_public_data = False

# How many (max) results do we want for each name?
TLCMap_limit = 1

Iterate a list of names and compile the results

In [9]:
results = []

for name in placenames:

  response = tlc_query_name(name, search_type, search_public_data, TLCMap_limit)

  for feature in response["features"]:
    print(feature)
    feature_cleaned = [feature["properties"]["name"],
                      feature["geometry"]["coordinates"][1],
                      feature["geometry"]["coordinates"][0]]
    results.append(feature_cleaned)

results

{'type': 'Feature', 'geometry': {'type': 'Point', 'coordinates': [144.97224010000005, -37.8135365]}, 'properties': {'name': 'Melbourne', 'placename': 'Melbourne', 'id': 'ta7e9', 'datestart': '1945-01-01', 'dateend': '1974-01-01', 'latitude': '-37.8135365', 'longitude': '144.97224010000005', 'linkback': 'HARROP, MITCHELL; MAY, ANDREW (2022): Candidate places for the Gazetteer of Historical Australian Places. University of Melbourne. Dataset. https://doi.org/10.26188/20346825.v220', 'TLCMapLinkBack': 'https://ghap.tlcmap.org/search?id=ta7e9', 'TLCMapDataset': 'https://ghap.tlcmap.org/publicdatasets/222', 'BUSTYPE': 'Clubs - Social', 'ADDRESS': 'Melbourne, 36 Collins-St, Melb., C.1', 'ANZSIC_SUB': '453', 'ANZS_STTL': 'Clubs (Hospitality)', 'VICMAPADD': '36-50 COLLINS STREET MELBOURNE 3000', 'LGA': '', 'State': 'VIC'}}
{'type': 'Feature', 'geometry': {'type': 'Point', 'coordinates': [153.023502, -27.470933]}, 'properties': {'name': 'Brisbane', 'placename': 'Brisbane', 'description': "A ven

[['Melbourne', -37.8135365, 144.97224010000005],
 ['Brisbane', -27.470933, 153.023502],
 ['Perth', -31.951, 115.857],
 ['Darwin', -12.45803272, 130.8416748],
 ['Alice Springs', -23.8858377, 133.7420654],
 ['Ballarat', -37.55902863, 143.8587494],
 ['Canberra', -35.29972222, 149.1330556],
 ['Hobart', -42.88249709631, 147.32516600682],
 ['Adelaide', -34.925458, 138.599709],
 ['Sydney', -33.868901, 151.207089],
 ['New South Wales', -32.16291878689, 147.0321716843],
 ['Victoria', -36.62860680245, 144.27023474743],
 ['Queensland', -21.61718799541, 144.54686275043],
 ['Tasmania', -41.9132931693, 146.59839932432],
 ['Northern Territory', -19.18328046774, 133.99211729881]]

## Export the data

Format the data into a dataframe for export.

In [10]:
results_df = pd.DataFrame(results, columns=["Placename", "Latitude", "Longitude"])
results_df.to_csv('geocoded-data.csv', index=False)
print(results_df)


             Placename   Latitude   Longitude
0            Melbourne -37.813536  144.972240
1             Brisbane -27.470933  153.023502
2                Perth -31.951000  115.857000
3               Darwin -12.458033  130.841675
4        Alice Springs -23.885838  133.742065
5             Ballarat -37.559029  143.858749
6             Canberra -35.299722  149.133056
7               Hobart -42.882497  147.325166
8             Adelaide -34.925458  138.599709
9               Sydney -33.868901  151.207089
10     New South Wales -32.162919  147.032172
11            Victoria -36.628607  144.270235
12          Queensland -21.617188  144.546863
13            Tasmania -41.913293  146.598399
14  Northern Territory -19.183280  133.992117
