Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export to the Wikidata Mismatch Finder #5607

Open
wetneb opened this issue Feb 6, 2023 · 1 comment
Open

Export to the Wikidata Mismatch Finder #5607

wetneb opened this issue Feb 6, 2023 · 1 comment
Labels
Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements. wikibase Related to wikidata/wikibase integration

Comments

@wetneb
Copy link
Sponsor Member

wetneb commented Feb 6, 2023

There is a relatively new tool to import data into Wikidata: the Wikidata Mismatch Finder, a tool developed by WMDE which acts as a sort of staging space for data uploads to Wikidata. Once new or conflicting data is uploaded to the Mismatch Finder, editors can review each proposed statement individually and decide whether to add it to Wikidata or not.

@lydiapintscher asked me whether OpenRefine could be used to populate the suggestions offered by this tool. This works by producing a CSV file of a specific format, which then gets uploaded via the tool's API.

Proposed solution

A new exporter could be added to the Wikibase extension, making it possible to upload candidate edits to the Mismatch Finder (instead of uploading the data directly or using QuickStatements).
As a user, this would mean that one would prepare the data just like for direct edits (with reconciliation, schema building, issue fixing and preview), but one could at the end choose the Mismatch Finder as an upload method. This would generate a fitting CSV, which would be either downloaded from OpenRefine by the user and then uploaded to the Mismatch finder, or the upload via the Mismatch finder's API could also be handled by OpenRefine (which requires asking the user to generate a token on the tool's side and add it to OpenRefine, to be used for API authentication).

Alternatives considered

Someone could write a tutorial explaining how to use the existing OpenRefine functionalities to produce a CSV of the format expected by the Mismatch Finder.

Additional context

  • Whether the tool can be run on other Wikibase instances is unclear at this stage.
    To offer the functionality only for Wikibases which do support the tool, it might be necessary to add a relevant field in the Wikibase manifest, especially for an integration which would upload the data directly to the tool via its API.

  • The data model supported by the CSV format looks relatively poor, so it is unclear how rich statements (with qualifiers, novalue/somevalue claims, ranks, complex references…) could be represented in this export format.

@wetneb wetneb added Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements. wikibase Related to wikidata/wikibase integration labels Feb 6, 2023
@thadguidry
Copy link
Member

thadguidry commented Feb 6, 2023

@wetneb the format looks poor but I think it's because it's about comparing statement values first (then if wrong or differing, qualifiers, claims, ranks, etc. can then be additionally applied by the institution as part of their NEXT STEP).
So I think if I understand correctly, that Mismatch Finder is a FIRST STEP of bringing data accuracy or perhaps authoritative data into Wikidata... and speaking of authoritative data...

@lydiapintscher The format looks like something formed for Authoritative Statements work, but it's not quite about making Authoritative Statements quite yet, right? What's the history of Mismatch Finder itself?

The only possible values for a review status are:

"pending" - The mismatch is awaiting a review decision.
"wikidata" - The mismatching information is on Wikidata.
"missing" - The information is missing on Wikidata and correct on the external source.
"external" - The mismatching information is on the external source.
"both" - Both sources are incorrect.
"none" - None of the above.

So, what's the exact history of Mismatch Finder then? UPDATE: ok, I did some poking around and found your talk ...

Importantly, from the Data Reuse Days event itself where Mismatch Finder was demonstrated and slides ... I see one of a few questions that were asked:

Will "wrong data on Wikidata" automatically correct Wikidata, or do I have to do it manually?
no, Mismatch Finder right now does not edit Wikidata. human is still needed to make a change, if the problem is with Wikidata, and set the status to review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements. wikibase Related to wikidata/wikibase integration
Projects
None yet
Development

No branches or pull requests

2 participants