Data validation using external databases
Due to the given identifiers of many external databases, we thought of cross-checking our data with theirs and possibly adding the databases as references (depending on their reliability).
We have developed a small command line tool that requests data from Wikidata and MusicBrainz via their APIs and compares it.
The actual system should work with data dumps because API-queries would cause too much traffic.
See external validation/crosscheck.py
The tool could be available as a live tool (mockups below) or run as a cronjob, where found mismatches could be treated as constraint violations (see Using constraints more effectively).
A user can hit the cross-checking button to start the cross-checking for the current item.
... appears if the information from Wikidata and the external databases match.
... appears if there are references missing.
... appears if there are mismatches.
... appears if there are no suitable identifiers for external databases or no properties that could be validated with them.
We need data dumps in suitable formats (RDF, JSON, ...) that are unfortunately not commonly provided.