# Using an existing DeezyMatch model (option 3)

This notebook shows how to use an existing DeezyMatch model.

To do so, the `resources/` folder should **(at least)** contain the following folders, in the following locations:
```
toponym-resolution/
   ├── ...
   ├── resources/
   │   ├── deezymatch/
   │   │   └── models/
   │   │       └── w2v_ocr/
   │   │           ├── input_dfm.yaml
   │   │           ├── w2v_ocr.model
   │   │           ├── w2v_ocr.model_state_dict
   │   │           └── w2v_ocr.vocab
   │   ├── models/
   │   ├── news_datasets/
   │   ├── wikidata/
   │   │   └── mentions_to_wikidata.json
   │   └── wikipedia/
   └── ...
```
**Note** that we will need to generate the candidate vectors from the `mentions_to_wikidata.json` file, so it may take some minutes to run this the first time.

We start by importing some libraries, and the `ranking` script from the `geoparser` folder:

In [None]:
import os
import sys
from pathlib import Path

sys.path.insert(0, os.path.abspath(os.path.pardir))
from geoparser import ranking

Create a `myranker` object of the `Ranker` class.

In [None]:
myranker = ranking.Ranker(
    method="deezymatch", # Here we're telling the ranker to use DeezyMatch.
    resources_path="../resources/wikidata/", # Here, the path to the Wikidata resources.
    mentions_to_wikidata=dict(), # We'll load the mentions-to-wikidata model here, leave it empty.
    wikidata_to_mentions=dict(), # We'll load the wikidata-to-mentions model here, leave it empty.
    # Parameters to create the string pair dataset:
    strvar_parameters={
        "overwrite_dataset": False,
    },
    # Parameters to train, load and use a DeezyMatch model:
    deezy_parameters={
        # Paths and filenames of DeezyMatch models and data:
        "dm_path": str(Path("../resources/deezymatch/").resolve()), # Path to the DeezyMatch directory where the model is saved.
        "dm_cands": "wkdtalts", # Name we'll give to the folder that will contain the wikidata candidate vectors.
        "dm_model": "w2v_ocr", # Name of the DeezyMatch model.
        "dm_output": "deezymatch_on_the_fly", # Name of the file where the output of DeezyMatch will be stored. Feel free to change that.
        # Ranking measures:
        "ranking_metric": "faiss", # Metric used by DeezyMatch to rank the candidates.
        "selection_threshold": 25, # Threshold for that metric.
        "num_candidates": 3, # Number of name variations for a string (e.g. "London", "Londra", and "Londres" are three different variations in our gazetteer of "Londcn").
        "search_size": 3, # That should be the same as `num_candidates`.
        "verbose": False, # Whether to see the DeezyMatch progress or not.
        # DeezyMatch training:
        "overwrite_training": False, # You can choose to overwrite the model if it exists: in this case we're loading an existing model, so that should be False.
        "do_test": False, # Whether the DeezyMatch model we're loading was a test, or not.
    },
)

Load the resources (i.e. the `mentions-to-wikidata` and `wikidata-to-mentions` mappers) that will be used by the ranker:

In [None]:
# Load the resources:
myranker.mentions_to_wikidata = myranker.load_resources()

Given the DeezyMatch model that has been loaded, find candidates on Wikidata:

In [None]:
# Find candidates given a toponym:
toponym = "Ashton-cnderLyne"
print(myranker.find_candidates([{"mention": toponym}])[0][toponym])

In [None]:
# Find candidates given a toponym:
toponym = "Sheftield"
print(myranker.find_candidates([{"mention": toponym}])[0][toponym])