# Toponym Resolution with T-Res

This notebook demonstrates the use of the T-Res HTTP API for performing toponym resolution.

Toponym resolution refers to the task of identifying place names (toponyms) in a piece of text and linking each of them to a known physical location.

This process involves three distinct steps:
 1. **named entity recognition** to identify which characters in the text are in fact toponyms
 1. **candidate selection** to generate a list of candidate places within a knowledge base
 1. **entity linking** to determine which candidate place is the best match for the given toponym

[T-Res](https://github.com/Living-with-machines/T-Res) is a software tool that provides an end-to-end pipeline for toponym resolution, using [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page) as its knowledge base.

The T-Res HTTP API enables users to make toponym resolution queries to a remote server via an HTTP connection.

To run the examples in this notebook a server must be available to handle the API requests. During the workshop, such a server will be provided with the host IP address given below.

Technical documentation on T-Res can be found [here](https://living-with-machines.github.io/T-Res/index.html). Developers may also find the [API docs](http://20.0.184.45:8000/v2/t-res_deezy_reldisamb-wpubl-wmtops/docs) useful.

## Setup

Let's begin by importing some Python libraries:

In [196]:
import requests
import operator
from typing import Optional
from dataclasses import dataclass
from dacite import from_dict

Next, we specify the hostname and URL for connecting to the server running the T-Res API:

In [197]:
HOST = "20.0.184.45"
API_URL = f"http://{HOST}:8000/v2/t-res_deezy_reldisamb-wpubl-wmtops"

The following Helper functions will make it easy to call the T-Res API and handle the response:

In [198]:
@dataclass
class Toponym:
    mention: str
    sentence: str
    pos: int
    end_pos: int
    tag: str
    prediction: str
    cross_cand_score: dict
    latlon: Optional[list]
    wkdt_class: Optional[str]
    string_match_score: dict
    
    def __str__(self):
        toponym = self.toponym()
        s = f"Toponym:\t{toponym}"
        if self.mention != toponym:
            s += f"\nMention:\t{self.mention}"
        if self.tag != 'LOC':
            s += f"\nTag:\t\t{self.tag}"
        s += f"\nWikidata ID:\t{self.prediction}"
        s += f"\nCoordinates:\t{self.latlon}"
        if self.prediction in self.cross_cand_score.keys():
            s += f"\nLinking score:\t{self.cross_cand_score[self.prediction]}"
        return s

    def __repr__(self):
        return self.__str__()
    
    def toponym(self):
        if not self.string_match_score:
            return None
        # Identify the best string match.
        d = {i[0]: i[1][0] for i in self.string_match_score.items()}
        return max(d.items(), key=operator.itemgetter(1))[0]

class Toponyms:
    toponyms: list

    def __init__(self, data):
        if not isinstance(data, list):
            raise ValueError("Toponyms data must be a list.")
        self.toponyms = [from_dict(data_class=Toponym, data=t) for t in data]

    def __str__(self):
        if not self.toponyms:
            return "Empty list of toponyms."
        return '\n\n'.join([t.__str__() for t in self.toponyms])

    def __repr__(self):
        return self.__str__()

def validate_query(query):
    if not "text" in query.keys():
        raise ValueError("T-Res API query must contain an item named `text`")
    return

def call_api(query, parse = True):
    validate_query(query)
    response = requests.get(f'{API_URL}/toponym_resolution', json=query)
    if not parse:
        return response
    return parse_api_response(response)

def parse_api_response(response):
    if response.status_code != 200:
        print(f"HTTP error code: {response.status_code}")
        print(f"Reason: {response.reason}")
    result = Toponyms(response.json())
    if len(result.toponyms) == 1:
        return result.toponyms[0]
    return result

## Toponym resolution examples

We're now ready to query the API by sending chunks of input text.

### Simple toponym resolution example

### Example with place of publication information

In [None]:
query = {
        "text": "A remarkable case of rattening has just occurred in the building trade at Newtown.",
        "place": "Powys",
        "place_wqid": "Q156150"
        }
call_api(query)

### Example of successful linking despite OCR error

In [None]:
query = {"text": "A remarkable case of rattening has just occurred in the building trade at Shefiield."}
call_api(query)

### Example with multiple toponyms

In [None]:
query = {"text": "A remarkable case of rattening has just occurred in the building trade at Shefiield, but also in Leeds. Not in London though."}
call_api(query)

### Example of a toponym referring to a particular building

Toponyms in T-Res are labelled with a "Tag" property, referring to the type of the named entity.

The most common tag is "LOC" (for location) but if the best match is a specific building, as in the following example, this will be reflected in the "Tag":

In [None]:
query = {"text": "A large crowd gathered, and plenty of volunteers aided in the work of rescue, whilst ambulances and stretchers were fetched from the Middlesex Hospital."}
call_api(query)

### Example of place linking error

In [None]:
query = {
    "text": "Early on Monday morning a fire was discovered ois the premises of Mr. .Toseph Boyle, woollen manufacturer, Prospect Mill, at Longwood, near Huddersfield, and before the flames could be extinguished they had done damage to the extent of about ¬£16,000 or ¬£17,000.",
    "place": "Stourbridge, West Midlands, England",
    "place_wqid": "Q661707"
    }
call_api(query)