Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Reconcile > Use values as identifiers" does not reconcile #3172

Closed
allanaaa opened this issue Sep 7, 2020 · 8 comments · Fixed by #4666
Closed

"Reconcile > Use values as identifiers" does not reconcile #3172

allanaaa opened this issue Sep 7, 2020 · 8 comments · Fixed by #4666
Assignees
Labels
Good First Issue Indicates issues suitable for newcomers to design or coding, providing a gentle introduction. Module: Frontend These issues involve working on HTML, CSS, and JavaScript code that affects the user interface. reconciliation Related to the reconciliation operations and other features Type: Bug Issues related to software defects or unexpected behavior, which require resolution.
Milestone

Comments

@allanaaa
Copy link
Contributor

allanaaa commented Sep 7, 2020

To Reproduce

Run "Reconcile" > "Use values as identifiers" on any column, whether it contains properly-formatted unique IDs for the given service or not. You will see that it seems not to reconcile at all - it only says, whatever your cell contents are, it's 100% matched and generates a real-looking URL, whether or not that URL will resolve.

Expected Behavior

Reconciliation seems to be missing the part where it actually validates these unique IDs to see if they match up with existing entities. I expect failures to say so.

Screenshots

error1

I tested on Wikidata and VIAF. VIAF gives no hover-information, the matches just send you to 404s, e.g. http://viaf.org/viaf/Q17291. None of the content in that VIAF column should be matching (Q### is an invalid format) except that first value, which resolves correctly to https://viaf.org/viaf/38242123/.

There are two things happening on Wikidata that I thought I'd mention - IDs that don't exist give the error in the above screenshot. IDs that seem to exist but have yet to be assigned (something that fits the Q### format) look a bit different (so that may be a second bug to work out). I would expect these to have some other kind of obvious error message or flag.
error2

Versions

Windows, Firefox

  • OpenRefine: the new 3.4 release, as well as 3.4 beta 2 (last week)
@allanaaa allanaaa added Type: Bug Issues related to software defects or unexpected behavior, which require resolution. Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators reconciliation Related to the reconciliation operations and other features labels Sep 7, 2020
@wetneb
Copy link
Member

wetneb commented Sep 8, 2020

That is the intended behaviour - the goal is to be able to blindly trust a column of identifiers when you know they are valid, to avoid a costly reconciliation. We could:

  • Add text to the dialog where we choose the reconciliation service to use, explaining that identifiers will not be validated;
  • Improve the user experience after that, improving the way we handle invalid ids (for instance in the Wikidata extension)

@allanaaa
Copy link
Contributor Author

allanaaa commented Sep 8, 2020

Hrm! It feels like it's misleading to have this under the "Reconcile" menu if there's no actual reconciliation or validation happening here.

@wetneb
Copy link
Member

wetneb commented Sep 11, 2020

We could also have a checkbox to optionally validate the ids during this operation, but there is no provision for that in the reconciliation API. If people want validation, they should just use the standard reconciliation operation (although there is no requirement on the services that when queried with their own identifiers, they return the corresponding entity as a match).

I think the priority is adding the text to the UI (first bullet point above).

@tfmorris
Copy link
Member

I don't understand why this is a separate operation at all.

Why isn't this just a standard Reconcile operation against a property of "id" or whatever the reconciliation service calls it? It'd be trivial, blindingly fast, and have the behavior the user expects.

@wetneb
Copy link
Member

wetneb commented Sep 16, 2020

It'd be trivial, blindingly fast, and have the behavior the user expects.

If you have 100k rows, it is still going to take a fairly long while, and that can be quite frustrating when you have just retrieved these ids from the service (for instance with a SPARQL query).

The Wikidata service recognizes Wikidata identifiers in reconciliation queries and processes them as fast as it can (without searching for them) but it still takes quite a bit of time to fetch the label and types for each of them.

Moreover there is no requirement that services behave in this way - it could well be that some services out there do not implement a special case for recon queries that look like ids. This is something we could add to the specs now, but I am not sure it is fair to expect that from them right now.

@wetneb wetneb added Good First Issue Indicates issues suitable for newcomers to design or coding, providing a gentle introduction. Module: Frontend These issues involve working on HTML, CSS, and JavaScript code that affects the user interface. and removed Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators labels Feb 20, 2021
@wetneb
Copy link
Member

wetneb commented Feb 20, 2021

Marking this as a good first issue: the dialog opened by this operation (to pick the reconciliation service) should warn the user that reconciliation identifiers will not be validated.

@WaltonG
Copy link
Member

WaltonG commented Mar 28, 2022

@wetneb Am volunteering to work on this issue

@WaltonG
Copy link
Member

WaltonG commented Mar 29, 2022

@wetneb I have submitted a pull request #4655 for review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Good First Issue Indicates issues suitable for newcomers to design or coding, providing a gentle introduction. Module: Frontend These issues involve working on HTML, CSS, and JavaScript code that affects the user interface. reconciliation Related to the reconciliation operations and other features Type: Bug Issues related to software defects or unexpected behavior, which require resolution.
Projects
None yet
4 participants