Skip to content

Compare API

simonoakesepimorphics edited this page Aug 1, 2019 · 2 revisions

Overview

When creating or updating an item in the registry, it may be necessary to check whether an item with similar characteristics already exists in the same register, or elsewhere in the registry. For example, two items cannot share the same item URI in the registry (that is, the URI which locates the item in the registry rather than the entity URI, if they are different). Similarly, two items sharing a label might indicate that they are duplicates, but only the user can know for sure.

In order to make this type of check easy to perform, the registry provides a "compare" API, which accepts the same RDF payloads (TTL, XML, etc.) as the "register" API, and returns information about any similarities between the submitted items and the current contents of the registry. This functionality is integrated into the UI on each of the pages where you would normally be able to create or update registry items.

A similarity between new and existing items may be detected in the following ways.

  • If they have the same item URI, they are in conflict and the new registration would fail.
  • If they have the same value for one or more label properties (case insensitive) and they are both registers or both register items, then the new item may be a duplicate.
  • If they have a similar value for one or more label properties and they are both registers or both register items, then the new item may be a duplicate.

Note

Existing items which have an "Invalid" status are considered to be no longer part of the registry, and as such are excluded from the comparison results.

Note

You must be logged in as a registry user to access this API. No other permissions are required.

Configuration

The label properties whose values are compared by the API are determined by the text-indexed properties that are configured on the store. You can configure these by setting the baseStore.textIndex property in your app.conf file. These properties also determine the behaviour of the search API.

The similarity of labels is determined by performing a "fuzzy" search on the text index (see the Lucene documentation for more information). You can configure the precision of this search by setting the config.similarityParam property in your app.conf file.

Example

# The underlying RDF store
basestore            = com.epimorphics.registry.store.impl.TDBStore
basestore.textIndex  = rdfs:label,dct:title,foaf:name,skos:prefLabel,skos:altLabel,rdfs:comment
basestore.index      = /var/opt/ldregistry/index

# The Registry store API wrapper, which uses the base RDF store and indexer
storeapi             = com.epimorphics.registry.store.StoreBaseImpl
storeapi.store       = $basestore

# Additional configuration paramaters, typically to control UI behaviour
config                 = com.epimorphics.appbase.core.GenericConfig
config.similarityParam = 0.7

# The Registry configuration itself
registry                  = com.epimorphics.registry.core.Registry
registry.store            = $storeapi
registry.configExtensions = $config

API

You can access the API by sending a POST request to the register to which you intend to register the prospective new items, with the query parameter compare.

Request

The body of the request should contain the details of the new items in RDF format. This can be in simple form (containing only the entities to be added) or with registry metadata. Only the characteristics of the entities will be compared, not the metadata.

The compare API should accept any request body that you could send to the registration API, and vice versa.

Optionally, you can use the query parameter compare-edit in addition to compare to signal that the body contains updates to existing items. As a result, clashing URIs will be ignored, since they denote updates to existing entries.

Note

Although the request targets a specific register, the payload will be compared to the entire registry.

Response

The response will be in an RDF format determined by the Accept header of the request. You can also request text/html to get the results panel that would normally be rendered in the UI.

The root of the response will be a node with the type reg:CompareResult. This is a marker type which has no particular meaning outside of this API. The root has rdfs:member relationships with the register items corresponding to the originally submitted items. Even if the items were given in entity form (without metadata), they will be presented with metadata in the response.

The register items in the response may have skos:exactMatch or skos:closeMatch relationships with the existing register items that they resemble (if any).

  • skos:exactMatch denotes that the register items have the same URI, or that their entities have the same label.
  • skos:closeMatch denotes that their entities have similar labels.

The details of the matching register items and entities will be included in the response in the usual form.

Example

For example, to compare a prospective new item for the "colour" register, you could send a POST request to:

localhost/registry/colour?compare

With the headers:

Accept: text/turtle
Content-Type: text/turtle
Authorization: # your authorisation here #

With the body:

@prefix skos:  <http://www.w3.org/2004/02/skos/core#> .
@prefix rdfs:  <http://www.w3.org/2000/01/rdf-schema#> .
@prefix reg:   <http://purl.org/linked-data/registry#> .

<http://example.org/registry/colour/grn>
        a                skos:Concept ;
        rdfs:label       "green" ;
        reg:notation     "grn" .

And receive a response containing results of this form.

User Interface

In order to access the API from the UI, you will need to have the latest changes from the registry-config-base project in your /opt/ldregistry directory.

The "Create Register", "Manual Entry" and "Register New or Changed Entries" action pages have a button labelled "Find Similar" next to the usual confirmation button. You can click on it to submit the contents of the form to the compare API. The response will be rendered as an alert panel at the bottom of the page, displaying exact and close matches in separate tables. The table has the following columns:

Column Meaning
New The URI of an item that was submitted by the user. Only displayed when uploading multiple items.
Suggested An item currently in the registry which has similar characteristics to the new item.
Register The register where the suggested item was located. / denotes the root register.
Types The RDF types of the suggested item.
Status The status of the suggested item.

On the "Register New or Changed Entries" page, you can choose which type of registration to perform. When submitting new entries (including those submitted in "batch" mode), clicking on the "Find Similar" button will have the same effect as on the single entry pages. If the option to add new and update existing entries is chosen, then clashing item URIs will be ignored, since they denote updates to existing entries. Similarly, the labels of entries will not be checked for similarity with their current state in the register.

Clone this wiki locally