Reconcilable Data Sources

Thad Guidry edited this page Apr 21, 2018 · 30 revisions

Listing of Reconcilable Data Sources

With OpenRefine you can perform reconciliation against any web service supporting the Reconciliation Service API.

Reconciliation against Wikidata is built in !

You can alternatively extend your data by calling web services.

General

Wikidata

See our dedicated Wiki page: Reconciliation with Wikidata

Reconcile-csv

Reconcile-csv is a reconciliation service for OpenRefine running from a CSV file. It uses fuzzy matching to match entries in one dataset to entries in another dataset, helping to introduce unique IDs into the system - so they can be used to join your data painlessly.

SPARQL endpoints

The RDF Extension by DERI at NUI Galway includes reconciliation against any SPARQL endpoint or RDF dump file and publishing of the results in RDF. http://lab.linkeddata.deri.ie/2010/grefine-rdf-extension/

conciliator

conciliator is a Java framework for creating OpenRefine reconciliation services. It currently offers out of the box support for VIAF, ORCID, Open Library, and any Apache Solr collection. Run your own service, or use the public server at http://refine.codefork.com

Librarians

VIVO Scientific Collaboration Platform

VIVO is a U.S. national interdisciplinary open source scientific collaboration platform funded by the NIH with development led by Cornell. Their reconciliation service allows reconciling against VIVO entities (faculty members, journals, etc) in any VIVO installation. Extending Google Refine for VIVO

JournalTOCs

Use JournalTOCs API to create your own cool web applications that integrate content from freely available journal TOCs. Most of JournalTOCs API calls are free and don't require any registration process.

VIAF

The VIAF® (Virtual International Authority File) combines multiple name authority files into a single OCLC-hosted name authority service. The goal of the service is to lower the cost and increase the utility of library authority files by matching and linking widely-used authority files and making that information available on the Web.

The reconciliation service is hosted by Roderic D. M. Page, more details on Reconciling author names using Open Refine and VIAF.

FAST (Faceted Application of Subject Terminology)

FAST is derived from the Library of Congress Subject Headings (LCSH), one of the library domain’s most widely-used subject terminology schemas. The development of FAST has been a collaboration of OCLC Research and the Library of Congress. Work on FAST began in late 1998.

Library of Congress Subject Headings

Library of Congress Subject Headings (LCSH) has been actively maintained since 1898 to catalog materials held at the Library of Congress. By virtue of cooperative cataloging other libraries around the United States also use LCSH to provide subject access to their collections.

The reconciliation service is hosted by Free Your Metadata group, more details on their website.

Sharedshelf Built Work Registry Reconciliation Service

Sharedshelf has offered reconciliation services for its users to enrichment data using Sharedshelf Built Work Registry (BWR) project. Built Work Registry is accessible via http://builtworksregistry.org/.

Endpoint Details

Service Labels Details
Name Sharedshelf BWR Display Endpoint
Endpoint http://lod.sharedshelf.artstor.org/sparql/api/reconcile/display
Graph URI http://lod.sharedshelf.artstor.org/sparql/api/project/756
Type Generic SPARQL
Label properties rdfs:label && http://catalog.sharedshelf.artstor.org/display/prefLabel

Open Data

OpenCorporates

31 million corporate entities (as of Nov. 2011) available for reconciliation through their service.

dbpedia

DBpedia extension (currently) provides possibility to extend your DBpedia-reconciled data with additional related columns from DBpedia. It is based on similar Freebase extension.

Ordnance Survey

The Reconciliation API is a simple web service that supports linking of datasets to the Ordnance Survey Linked Data. The API accepts a simple text search, e.g. a label, code or other identifier for a resource and then returns a ranked list of potential matches. Client-side tools may then use these results to either build links to the Linked Data or use the returned identifiers to extract further data to enrich the original dataset.

Organized Crime and Corruption Reporting Project

OCCRP provides a public reconciliation API endpoint which allows reconciliation of data against a comprehensive list of sanctioned persons and companies, politically exposed persons, and other persons of journalistic interest. The service is intended as a first-level "check for interesting entries" for government or private data.

Nomisma

Nomisma provides data about numismatics:

Biodiversity Researchers

Taxonomic Databases

Taxonomic databases (EOL,GBIF, NCBI,Global Names Index, uBio, WoRMS), as documented here.

Taxonomic names from IPNI via IPNI Names Reconciliation Service. Source code

Deprecated

Freebase

The Freebase Reconciliation Service has been deprecated in June 2015 and was shut down with the rest of Freebase later on.

http://reconcile.freebaseapps.com/

https://developers.google.com/freebase/v1/reconciliation-overview?hl=en

Nomenklatura

Nomenklatura is a simple service that makes it easy to maintain a canonical list of entities such as persons, companies or event streets and to match messy input, such as their names against that canonical list – for example, matching Acme Widgets, Acme Widgets Inc and Acme Widgets Incorporated to the canonical “Acme Widgets”.

With Nomenklatura its a matters of minutes to set up your own set of master data to match against and it provides a simple user interface and API which you can then use do matching (the API is compatible with Open Refine’s reconciliation function).

Talis Kasabi

The Kasabi reconciliation services provide reconciliation against any database published on the Kasabi platform. Documentation

Talis Platform reconciliation services

This project has been shut down. They suggest some alternatives

FundRef

The FundRef Reconciliation Service was designed to help publishers (or anybody) more easily clean-up their funder data and map it to the FundRef Registry. It was built on FundRef Metadata Search.

Wish List

The following are data sources that could provide useful reconciling within OpenRefine. If you would like to help with coding a reconciling extension for any, please contact our mailing list. We would love to see some of these happen!

  • Historical Newspapers - Library of Congress' Chronicling America provides JSON, RDF, XML & Linked Data with an easy to use API.
  • Chemical Identifier Resolver - Over 96 million chemical structures hosted by NCI/NIH, provides names, conversions, & various formats even XML output.
Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.