Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to update/replace statements #3383

Closed
Nikerabbit opened this issue Dec 4, 2020 · 5 comments · Fixed by #4520
Closed

Ability to update/replace statements #3383

Nikerabbit opened this issue Dec 4, 2020 · 5 comments · Fixed by #4520
Assignees
Labels
Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements. wikibase Related to wikidata/wikibase integration
Milestone

Comments

@Nikerabbit
Copy link

I am looking for a tool to do mass edits to a Wikibase. It would be useful if the same tool could do both adding missing data and updating existing data. Based on my understand, currently OpenRefine is only able to add new items and new statements.

Proposed solution

Brute force solution
In Wikibase schema editor, have a checkbox to replace existing statements for the given property. Internally this would be implemented as deletion of all statements for the subject using that property. This would be sufficient for our case.

Ideal solution
"Reconcile" statements. Ideally the Query Service can be used to output statement identifiers that uniquely identify the statement. Otherwise a manual reconciliation is needed (but limited to the statements of the already reconciled item). Then the value (in another column) could be updated in OpenRefine and it would automatically update the existing statement instead of creating a new one.

Alternatives considered

I think I saw a suggestion to export to quickstatements and duplicate lines so that they are preceded by deletions. But this is rather difficult to do manually.

I think I could write my own tool that processes either the quickstatements v1 format, but it wouldn't know which columns should be overwritten. I could also write my own tool that just takes CSV (from Query Service, then modified) that does this, but for the end user there are already many tools to learn for the data update process (Wikibase, Query Service, OpenRefine, QuickStatements, this new tool?).

Additional context

https://nimiarkisto.nikerabb.it/w/index.php?title=Item:Q5106574&diff=6759469&oldid=6759468 example of an attempted edit creating a new statement instead of updating it.

#2116 is similar to this request, but per my understanding it is about updating qualifiers rather than the value of the statement.

#2999 (comment) mentions that deleting statements is currently not possible.

@Nikerabbit Nikerabbit added Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements. Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators labels Dec 4, 2020
@wetneb wetneb added wikibase Related to wikidata/wikibase integration and removed Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators labels Dec 4, 2020
@wetneb
Copy link
Sponsor Member

wetneb commented Dec 4, 2020

Yes this is a very legitimate request. The tricky part is to make this feature work for a wide range of use cases. Therefore we need to make multiple things configurable:

  • How do we match the generated statements with the existing ones? Only on the basis of the property? Property and value? Property, value and some qualifiers?
  • What do we do when we find a matching statement? Replace it by the new one, keep the existing one, merge qualifiers?

I am keen to work on this but it'll have to wait a few months still since I am busy with a migration to a new architecture (https://github.com/OpenRefine/OpenRefine/projects/7)

@tfmorris
Copy link
Member

tfmorris commented Dec 4, 2020

Asynchronous distributed updates are a difficult problem in general, but I think may be impossible to do reliably without at least some minimal machinery on the backend. Freebase had an entire framework that was put in place for bulk updates (which also included things like sampled quality reviews, etc). I think that, at a minimum, you'll need some type of "commit if conflict free" primitive that could capture the state of the world that you reconciled against and tell if it had changed since then.

It feels dangerous to try and innovate in this space as long as the Wikidata team is ignoring needs of bulk updates. They really need to put the architecture in place to support clients like OpenRefine.

@wetneb
Copy link
Sponsor Member

wetneb commented Dec 4, 2020

It is true that Wikibase-side support for this would be great. Pinging @addshore who has been thinking about this recently.

That being said, in the current context of Wikibase, I would not be too worried about atomicity: the edit rate on Wikidata is still really manageable and the chances that simultaneous editing ends up in data races is pretty low (compared to, for instance, the problems stale reconciliation data can cause).

@wetneb wetneb self-assigned this Feb 10, 2022
@wetneb
Copy link
Sponsor Member

wetneb commented Feb 10, 2022

In the context of the Wikimedia Commons integration project, we have updated Wikidata-Toolkit to a new version, which breaks the limited deduplication of statements we had, so I am going to build it back better, more configurable.

@wetneb
Copy link
Sponsor Member

wetneb commented Feb 18, 2022

This should be in the forthcoming release (3.6). In the meantime, feel free to use our snapshot releases to try it out and tell us if you see ways to improve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements. wikibase Related to wikidata/wikibase integration
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants