Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserializing a property with localMedia datatype fails #716

Closed
wetneb opened this issue Aug 9, 2022 · 0 comments · Fixed by #717
Closed

Deserializing a property with localMedia datatype fails #716

wetneb opened this issue Aug 9, 2022 · 0 comments · Fixed by #717

Comments

@wetneb
Copy link
Member

wetneb commented Aug 9, 2022

When trying to deserialize a property with localMedia dataype, such as this one:

        {
            "pageid": 6,
            "ns": 122,
            "title": "Property:P3",
            "lastrevid": 9,
            "modified": "2022-08-07T21:40:47Z",
            "type": "property",
            "datatype": "localMedia",
            "id": "P3",
            "labels": {
                "en": {
                    "language": "en",
                    "value": "media file"
                }
            },
            "descriptions": {},
            "aliases": {},
            "claims": {}
        }

We fail because the deserialization of the datatype fails, because the JSON encoding of datatypes is supposedly not allowed to contain capital letters: it should match some regex:

private static final Pattern JSON_DATATYPE_PATTERN = Pattern.compile("^[a-z\\-]+$");

This regex does not make much sense to me because it fails even for some datatypes that Wikidata-Toolkit is already aware of, such as commonsMedia.

I see various ways forward:

  • Add the localMedia datatype to WDTK, just like 700: add edtf DatatypeIdValue #701 did it for EDTF. While that solves this exact issue, we are not solving the underlying problem and the code will fail again, should any other such datatype be introduced in the future. Datatypes are extensible so we cannot assume we know of all the datatypes in use in Wikibase instances.
  • Allowing capital letters in the regex that validates JSON datatypes. While that also works, this breaks the current assumption that the two transformations between JSON datatypes and datatype URIs are inverse of each other: localMedia (JSON datatype) would be converted to http://wikiba.se/ontology#LocalMedia by DatatypeIdImpl.getDatatypeIriFromJsonDatatype, and then that value would be converted back to local-media (not localMedia) by DatatypeIdImpl.getJsonDatatypeFromDatatypeIri. Because WDTK stores datatypes internally as URIs and not as JSON strings, this means that serializing back the property above will result in an invalid datatype. I would be tempted to have WDTK store the original JSON datatype as well, to make sure it is preserved.
wetneb added a commit to wetneb/Wikidata-Toolkit that referenced this issue Aug 9, 2022
Because the translation between the IRI and JSON datatypes is unreliable, we need
to remember the JSON datatype as well as the IRI, so that we are able to reliably
deserialize and re-serialize properties with custom datatypes.
wetneb added a commit that referenced this issue Aug 10, 2022
Because the translation between the IRI and JSON datatypes is unreliable, we need
to remember the JSON datatype as well as the IRI, so that we are able to reliably
deserialize and re-serialize properties with custom datatypes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant