Reconcilable Data Sources
Clone this wiki locally
Listing of Reconcilable Data Sources
With OpenRefine you can perform reconciliation against any web service supporting the Reconciliation Service API. Reconciliation against Freebase is built in, but there are several other reconciliation services available as describe on this page. You can alternatively extend your data by calling web services
Reconcile-csv is a reconciliation service for OpenRefine running from a CSV file. It uses fuzzy matching to match entries in one dataset to entries in another dataset, helping to introduce unique IDs into the system - so they can be used to join your data painlessly.
The RDF Extension by DERI at NUI Galway includes reconciliation against any SPARQL endpoint or RDF dump file and publishing of the results in RDF. http://lab.linkeddata.deri.ie/2010/grefine-rdf-extension/
VIVO Scientific Collaboration Platform
VIVO is a U.S. national interdisciplinary open source scientific collaboration platform funded by the NIH with development led by Cornell. Their reconciliation service allows reconciling against VIVO entities (faculty members, journals, etc) in any VIVO installation. Extending Google Refine for VIVO
The FundRef Reconciliation Service is designed to help publishers (or anybody) more easily clean-up their funder data and map it to the FundRef Registry. It is built on Open Refine and FundRef Metadata Search.
Use JournalTOCs API to create your own cool web applications that integrate content from freely available journal TOCs. Most of JournalTOCs API calls are free and don't require any registration process.
The VIAF® (Virtual International Authority File) combines multiple name authority files into a single OCLC-hosted name authority service. The goal of the service is to lower the cost and increase the utility of library authority files by matching and linking widely-used authority files and making that information available on the Web.
The reconciliation service is hosted by Roderic D. M. Page, more details on Reconciling author names using Open Refine and VIAF.
FAST (Faceted Application of Subject Terminology)
FAST is derived from the Library of Congress Subject Headings (LCSH), one of the library domain’s most widely-used subject terminology schemas. The development of FAST has been a collaboration of OCLC Research and the Library of Congress. Work on FAST began in late 1998.
Library of Congress Subject Headings
Library of Congress Subject Headings (LCSH) has been actively maintained since 1898 to catalog materials held at the Library of Congress. By virtue of cooperative cataloging other libraries around the United States also use LCSH to provide subject access to their collections.
The reconciliation service is hosted by Free Your Metadata group, more details on their website.
Sharedshelf Built Work Registry Reconciliation Service
Sharedshelf has offered reconciliation services for its users to enrichment data using Sharedshelf Built Work Registry (BWR) project. Built Work Registry is accessible via http://builtworksregistry.org/.
|Name||Sharedshelf BWR Display Endpoint|
|Label properties||rdfs:label && http://catalog.sharedshelf.artstor.org/display/prefLabel|
31 million corporate entities (as of Nov. 2011) available for reconciliation through their service.
DBpedia extension (currently) provides possibility to extend your DBpedia-reconciled data with additional related columns from DBpedia. It is based on similar Freebase extension.
The Reconciliation API is a simple web service that supports linking of datasets to the Ordnance Survey Linked Data. The API accepts a simple text search, e.g. a label, code or other identifier for a resource and then returns a ranked list of potential matches. Client-side tools may then use these results to either build links to the Linked Data or use the returned identifiers to extract further data to enrich the original dataset.
Organized Crime and Corruption Reporting Project
OCCRP provides a public reconciliation API endpoint which allows reconciliation of data against a comprehensive list of sanctioned persons and companies, politically exposed persons, and other persons of journalistic interest. The service is intended as a first-level "check for interesting entries" for government or private data.
The Freebase Reconciliation Service has been deprecated as of June 2015. There are no guarantees of its future after that date according to Google.
Nomenklatura is a simple service that makes it easy to maintain a canonical list of entities such as persons, companies or event streets and to match messy input, such as their names against that canonical list – for example, matching Acme Widgets, Acme Widgets Inc and Acme Widgets Incorporated to the canonical “Acme Widgets”.
With Nomenklatura its a matters of minutes to set up your own set of master data to match against and it provides a simple user interface and API which you can then use do matching (the API is compatible with Open Refine’s reconciliation function).
The Kasabi reconciliation services provide reconciliation against any database published on the Kasabi platform. Documentation
Talis Platform reconciliation services
This project has been shut down. They suggest some alternatives
The following are data sources that could provide useful reconciling within OpenRefine. If you would like to help with coding a reconciling extension for any, please contact our mailing list. We would love to see some of these happen!