Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Lexeme editing #437

Closed
wetneb opened this issue Aug 30, 2019 · 26 comments · Fixed by #570
Closed

Add Lexeme editing #437

wetneb opened this issue Aug 30, 2019 · 26 comments · Fixed by #570
Labels

Comments

@wetneb
Copy link
Member

wetneb commented Aug 30, 2019

We now have support for Lexeme entities in the datamodel.
We could also support editing these in wdtk-wikibaseapi.

@Tpt
Copy link
Collaborator

Tpt commented Nov 19, 2019

https://phabricator.wikimedia.org/T202725 and https://phabricator.wikimedia.org/T199896 will make the implementation a bit cumbersome. Fixing these limitations properly in WikibaseLexeme is probably going to be as hard as writing a circumvention in WikidataToolkit

@wetneb
Copy link
Member Author

wetneb commented Nov 19, 2019

Interesting!

For the first one (T202725) this is related to #376: we should not assume that wbeditentity returns the full entity as this is only the case for Items apparently (not even properties).

For the second one I guess this means we need to implement other API actions, which are not going to be atomic indeed. I think we need to rethink the architecture of this module - I wanted to do that for a long time (#403) but haven't got round to it yet. This is related to my second bullet point in that ticket.

@Tpt
Copy link
Collaborator

Tpt commented Nov 20, 2019

Indeed, having support for other API actions would be great. But I believe these limitation should also be removed on the WikibaseLexeme side.

@thadguidry
Copy link

Also related: T249206 - Serialized statements of Forms and Senses are missing data type fields

datatype field for each Snak was missing previously for statements within Senses and Forms of a Lexeme.
Starting on 26 August 2020 the datatype fields will be present on them. [Reference: Wikidata Project Chat]

@62mkv
Copy link

62mkv commented Oct 12, 2020

hi team! sorry for a noob question, but.. how does one create a lexeme (with forms) using WDTK? I see that "datamodel" artifact has quite extensive support for Lexemes, but could not find anything in the WikibaseDataEditor. Is it because of this issue? if so, what would be the suggested workaround? Thanks in advance.

@wetneb
Copy link
Member Author

wetneb commented Oct 16, 2020

Hi @62mkv, that's correct: editing and creating lexemes is not supported yet in WDTK.

@62mkv
Copy link

62mkv commented Oct 18, 2020

thanks, @wetneb! I am looking into it now, seeing what is possible as a quick hack to be able to just a) create lexemes with forms or b) add forms to existing lexeme. Currently the stumbling point for me is an apparent inability to create a FormDocument as a new Form (with "null" id) for a given Lexeme.

@Tpt I see you've added those types, what would be your suggestion on how to resolve it?

PS: there seem to be tools that are capable of what I need, in particular LexData (although they seem to be using different action, "wbladdform") and https://github.com/lucaswerkmeister/tool-lexeme-forms/, but I'd really not want to abandon Java for this... TIA!

@Tpt
Copy link
Collaborator

Tpt commented Oct 19, 2020

Hi @62mkv!

To create a new Form with the WDTK datamodel, you can use the LexemeDocument.createForm method.
This will properly generate a new form identifier and add the form to the lexeme object.

Then, we need to implement lexemes/forms and senses saving. The wbeditentity API action we use for forms and senses is a bit limited (c.f. the upper discussions). If you are familiar with PHP, the easiest way to go is probably to just fix the MediaWiki WikibaseLexeme extension. If not, maybe some hacks with the existing API actions wbaddform... might do the job.

@62mkv
Copy link

62mkv commented Oct 21, 2020

thanks @Tpt ! I guess that would cover my use-case №1 (create and add forms) but how do I get LexemeDocument, if I have an L-id already?

I see that I can use WbGetEntitiesAction to get an EntityDocument but how to obtain a proper LexemeDocument out of that?

@62mkv
Copy link

62mkv commented Oct 21, 2020

This will properly generate a new form identifier and add the form to the lexeme object.

by the way, javadoc on that method says

    /**
     * Creates a new {@link FormDocument} for this lexeme.
     * The form is not added to the {@link LexemeDocument} object,
     * it should be done with {@link LexemeDocument#withForm}.
     */

@Tpt
Copy link
Collaborator

Tpt commented Oct 21, 2020

I see that I can use WbGetEntitiesAction to get an EntityDocument but how to obtain a proper LexemeDocument out of that?

You could just cast using the usual (LexemeDocument).

by the way, javadoc on that method says

Indeed, my bad.

@62mkv
Copy link

62mkv commented Oct 21, 2020

Cool! and by the way, if I try to call LexemeDocument.createForm on a not-yet added lexeme, it throws an exception

java.lang.IllegalArgumentException: The string L0-F1 is not a valid form id

	at org.wikidata.wdtk.datamodel.implementation.FormIdValueImpl.<init>(FormIdValueImpl.java:65)

so, it seems like there's no easy way to create lexeme AND add forms in a single wbeditaction hop... I'll try with WbGet now

@62mkv
Copy link

62mkv commented Oct 21, 2020

so, with this code:

        LexemeDocument existingLexeme = (LexemeDocument) wikibaseDataFetcher.getEntityDocument("L1358");
        FormDocument formDocument = existingLexeme.createForm(
                Collections.singletonList(Datamodel.makeMonolingualTextValue("aprils", LANGUAGE_CODE)),
                Collections.singletonList(getItemIdForTestWikidata("Q42"))
        );

        LexemeDocument withForm = existingLexeme.withForm(formDocument);

        LexemeDocument result = wikibaseDataEditor
                .createLexemeDocument(withForm, "Adding form to existing lexeme", null);

i am getting this request string:

summary=Adding form to existing lexeme&new=lexeme&maxlag=5&data={"type":"lexeme","id":"L1358","lexicalCategory":"Q212131","language":"Q208912","lemmas":{"en":{"language":"en","value":"april"}},"claims":{},"forms":[{"id":"L1358-F1","representations":{"en":{"language":"en","value":"aprils"}},"grammaticalFeatures":["Q42"],"claims":{},"lastrevid":533196,"type":"form"}],"senses":[],"lastrevid":533196}&bot=&assert=user&format=json&action=wbeditentity&token

and this MediaWikiException:

org.wikidata.wdtk.wikibaseapi.apierrors.MediaWikiApiErrorException: [param-invalid] Invalid field used in call: "id", must match id parameter

is it problem with my code, the WDTK unreadiness, or Wikidata API problem? I can't tell :( to me, request content looks legit. it correctly shows lemma, lexeme id, form with features..

UPD: aha, so, looking at the documentation for wbeditaction, (https://www.wikidata.org/w/api.php?action=help&modules=wbeditentity) it seems as though id parameter is missing. will look as to why that might happen

@62mkv
Copy link

62mkv commented Oct 21, 2020

dang, and if I mess with WbDataEditor to edit and not create lexemes, when new form is given as above, this is what I get from MediaWiki API:

org.wikidata.wdtk.wikibaseapi.apierrors.MediaWikiApiErrorException: [modification-failed] Lexeme does not have Form with given ID

so apparently you can't add forms with wbeditentity, dammit...

@Tpt
Copy link
Collaborator

Tpt commented Oct 24, 2020

so apparently you can't add forms with wbeditentity, dammit...

Yes, sadly. The Wikibase API for form and sense editing is currently in an unfinished state.

@62mkv
Copy link

62mkv commented Oct 24, 2020

yep. I've just tried to hack on FormDocument yet again, so that payload for wbeditentity looked like this:

{"type":"lexeme","id":"L1358","lexicalCategory":"Q212131","language":"Q208912","lemmas":{"en":{"language":"en","value":"april"}},"claims":{},"forms":[{"representations":{"en":{"language":"en","value":"aprils"}},"grammaticalFeatures":["Q42"],"claims":{},"lastrevid":533196,"type":"form"}],"senses":[],"lastrevid":533196}

and MediaWiki even gives "OK"-ish response:

 {"entity":{"claims":{},"id":"L1358","type":"lexeme","lastrevid":533196,"nochange":""},"success":1}

but still, nothing seems to be added to WD Lexeme at all. In fact, I can't even find any traces of this request execution on "test.wikidata.org" at all.. is it yet another bug of Wikibase API? .. meh

PS: does "nochange": "" in the response indicate that wiki-engine considered my request a no-op and that might explain why am I not seeing any logs of it?

@62mkv
Copy link

62mkv commented Oct 26, 2020

Hooooooy! I've managed to both create lexeme with forms and to add forms to existing lexeme. The key was this nugget: https://github.com/nyurik/lexicator/blob/master/lexicator/lexemer/LexemeParserState.py#L182 (thanks to @nyurik for help)!

the code is super-ugly but at least I should be able to progress with this.

62mkv added a commit to 62mkv/Wikidata-Toolkit that referenced this issue Oct 29, 2020
@robertvazan
Copy link
Collaborator

Hello. What's the status of lexeme editing? I have a private lexeme editing library that is in some ways more capable than WDTK and in others less capable. I am at the crossroads choosing between major upgrade to my private library or switching to WDTK and upgrading it with a series of smaller pull requests.

WDTK will mostly work for me. I have only encountered following issues:

  • no way to remove forms
  • no way to edit statements on lexemes
  • lexeme editing is in a branch that is accumulating conflicts with master for over a year

I can send pull requests for the first two issues, but the third one is a deal-breaker. Why is lexeme editing in a branch for so long? Is it seriously broken? When is it going to be merged? Why wasn't it merged already?

The other thing I am thinking about is the editing API. #403 is overkill for my use case. Ideally, I would prefer to just have mutable entities and have an API that computes diff from original entity and modified one and then writes the diff. But at the moment the whole model is immutable. Bare diff API is nevertheless good enough, although the updateStatements method is begging for a builder class. That can be done with a PR too.

@wetneb
Copy link
Member Author

wetneb commented Jan 30, 2021

Hi @robertvazan, thanks for offering to contribute on this!

Personally, I was not aware of the lexeme-editing branch at all. If this branch has been useful to you and you don't see any big issue about it, then you could open a pull request for it, potentially adding any further changes you have made on your side. I think it would be very welcome and I would be keen to review it.

Let's also ping the author @Tpt.

@robertvazan
Copy link
Collaborator

@wetneb I haven't started using WDTK yet. Can I just ignore the branch then and submit PRs to master?

@wetneb
Copy link
Member Author

wetneb commented Jan 30, 2021

If you did not use this branch yourself, then yes it's fine to submit PRs based on master. But it could be worth waiting a bit for @Tpt to understand why this branch was left unmerged.

@Tpt
Copy link
Collaborator

Tpt commented Jan 30, 2021

I have not merged this branch because it is still buggy.
Indeed a few features are still missing in WikibaseLexeme to be able to use the wbeditentity API just like we do on items and properties:

  1. It is not possible to edit sense as part of a lexeme edit: https://phabricator.wikimedia.org/T199896
  2. The JSON result with the new version of the lexeme is incomplete and sometime wrong: https://phabricator.wikimedia.org/T202725 https://phabricator.wikimedia.org/T200255 https://phabricator.wikimedia.org/T271105

Feel free to ignore my branch or take the relevant bits from it and integrate your own code.

@robertvazan
Copy link
Collaborator

robertvazan commented Jan 30, 2021

@Tpt WDTK can just implement Wikibase API to the extent it is implemented in Wikibase itself. Known unsupported request features can be detected and terminated with exception before they hit network. Incomplete responses can be either mapped to incomplete WDTK objects or an additional read requests can be made. This can be all documented. This way WDTK can expose available APIs to the maximum extent possible.

@Tpt
Copy link
Collaborator

Tpt commented Jan 30, 2021

@robertvazan That would be great! If you could implement it, it would be amazing!

@robertvazan
Copy link
Collaborator

Just FYI: I have tested wbeditentity on test Wikidata and most of the lexeme can be edited. The only exception is sense statements. Senses themselves (addition/removal) and their glosses are editable though. Editing of forms and senses works both directly via form/sense ID and via lexeme except for the mentioned sense statements. The returned JSON is indeed incomplete. It is only useful to obtain lexeme ID.

There are some inconsistencies in editing various parts of the lexeme. The following procedures were tested to work.

Lemma

  • keep: omit JSON key or omit language code
  • add/replace: map it from its language code
  • remove: like add/replace but include "remove":"" in JSON, "value" can be set to anything

Language and lexical category

  • keep: omit JSON key
  • replace: set in JSON to new value

Lexeme statements

  • keep: omit statement ID
  • add: statement without ID, including "add":"" in JSON is tolerated but unnecessary
  • replace: statement with ID
  • remove: statement with ID and "remove":""

Qualifiers and references
These cannot be edited on their own. They are part of the statement. Modifying the statement without repeating qualifiers and references will delete them.

Forms

  • keep: omit form ID
  • add: form without ID and "add":""
  • modify: form with ID, see below
  • remove: form ID and "remove":""

Form representations
Like lemmas.

Grammatical features

  • keep: omit JSON key
  • replace all: list them in JSON
  • remove all: assign empty list

Form statements
Like lexeme statements, just nested under form in JSON.

Senses
Like forms.

Glosses
Like lemmas.

Sense statements
Not supported. All edits are ignored.

@Auregann
Copy link

Hello there,
I just wanted to let you know that we fixed the issue that was preventing to edit Senses and statements from wbeditentity (T199896) which we hope will help tool maintainers to support Lexemes. We would of course love to see Wikidata Toolkit supporting Lexemes as it would be helpful to increase and diversify the tools base to edit Lexemes :)

If you have questions, issues or requests, feel free to contact me (not on this account as it's my personal one, rather at lea.lacroix@wikimedia.de) Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants