Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multilingual RDF #55

Open
amercader opened this issue Jan 19, 2016 · 5 comments
Open

Support for multilingual RDF #55

amercader opened this issue Jan 19, 2016 · 5 comments

Comments

@amercader
Copy link
Member

Right now, neither the parsers nor the serializers take multilingual metadata into account.

For instance given the following document, a random title among the three will be picked up during parsing time:

@prefix rdfs:    <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dcat:    <http://www.w3.org/ns/dcat#> .
@prefix dct:     <http://purl.org/dc/terms/> .
@prefix xsd:     <http://www.w3.org/2001/XMLSchema#> .
@prefix rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix skos:    <http://www.w3.org/2004/02/skos/core#> .

<http://data.london.gov.uk/dataset/Abandoned_Vehicles>
      a       dcat:Dataset ;
      dct:title "Abandoned Vehicles"@en ;
      dct:title "Vehículos Abandonados"@es ;
      adms:versionNotes "Some version notes"@en ;
      adms:versionNotes "Notas de la versión"@es ;

      ...

Parsing

The standard way of dealing with this seems to be to create metadata during the parsing that can be handled by ckanext-fluent when creating or updating the datasets.
This essentially means storing a dict instead of a string, with the keys being the language codes:

{

    "version_notes": {
        "en": "Some version notes",
        "es": "Notas de la versión"
    }
    ...

}


For core fields like title or notes, we need to add an extra field suffixed with _translated:

    "title": "",
    "title_translated": {
        "en": "Abandoned Vehicles",
        "es": "Vehiculos Abandonados"
    }
    ...

TODO: what to put in title?

To support it we can proabably have a variant of _object_value that handles the lang tags and returns a dict accordingly (RDFLib will return a different triple for each language).

Serializing

Similarly, the serializing code could check the fields marked as multilingual to see if they are a string or a dict and create triples accordingly, proabably via a helper function.

Things to think about:

@amercader
Copy link
Member Author

@wardi does that sound right? Also see the TODO above, does it matter what we put in there?

@metaodi
Copy link
Member

metaodi commented May 23, 2016

@amercader We start to implement this for DCAT-AP Switzerland, I'll keep you posted. We currently use the ckanext-fluent approach.

@amercader
Copy link
Member Author

Fantastic @metaodi! Let me know if you want me to help with some spec or discussion

@metaodi
Copy link
Member

metaodi commented Sep 15, 2016

Btw: here is the implementation of our multilingual DCAT-AP Switzerland profile: https://github.com/ogdch/ckanext-switzerland/blob/01652937c8f31f46d8560ab9527826a3c1523c06/ckanext/switzerland/dcat/profiles.py

Behind the scenes we use ckanext-scheming for validation/schema.

The main change to the "original" is the new parameter multilang in the _object_value method. We simply use this for all values where we expect multilingual values.

@amercader
Copy link
Member Author

Note there are two ongoing PRs with initial implementations:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Todo
Development

No branches or pull requests

2 participants