Skip to content

Parts of reconciliation batches can fail silently; and parts of reconciled column lookup batches can fail silently #3369

@Jheald

Description

@Jheald

This may be a duplicate of an existing bug or bugs, but I find it incredibly frustrating: when items are 'unmatched' it appears to be impossible to distinguish whether there is really no match for the item, or whether the reconciliation process simply failed to complete for some sub-batch of entries.

A couple of days ago, I was running a reconciliation of a set of authors with LoC authority IDs against Wikidata using the "reconcili.link" service. I ran it once, and about 80% reconciled. I then filtered for the unmatched authors, reconciled again, and a further 15% of the original set reconciled.

I presume that what happened was that during the first reconciliation, one or more sub-batches of contiguous records timed out, or otherwise failed to return. (The problem did appear to be affecting contiguous records). That, I guess, is something that can always happen. The problem is that OpenRefine then marked those records' reconciliation status as "unmatched" rather than "unknown", not differentiating between items for which the reconcilation had failed to complete from those for which the reconcilation had concluded but found nothing.

Similarly, when I then added a column looking up LoC IDs based on the reconciled values (which I find is a necessary thing to have to do to sanity check reconcilations), only about 80% of the IDs got added, with more added when I re-ran the lookup for those without values.

This is frustrating enough for me, aware that this can happen, knowing that I may need to re-run reconciliations or augmentations two or three or even more times to be sure. But the even worse impact is with unaware users, who as a result may then add new duplicate items to Wikidata for their unreconciled entries, without realising that the unreconciled status cannot be depended on.

Metadata

Metadata

Assignees

Labels

Priority: HighDenotes issues that require urgent attention and may be blocking progress.Type: BugIssues related to software defects or unexpected behavior, which require resolution.reconciliationRelated to the reconciliation operations and other features

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions