-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core model multilingual fields #2678
Comments
@davidread @brew @rossjones @joetsoi @wardi let me know what you think, specially about point 1 as it relates to #2668 |
👍 and I'm fine with the For a field like frequency I'd rather use a controlled list that stores a single string value with the translations in the schema, like how the scheming 'select' preset works, not store all the languages in every record, so that might not be a great example. Is resource format a multilingual field? I thought those were file extensions, essentially |
Same here (+1) -> _lang makes a lot of sense, but I think having a controlled list for frequency is really a different issue. Even if it were a fixed list, it would presumably have a translation. WRT your question on could we do this for the core fields in API v4 even if an instance is single-language - I'd say yes, even though it possibly provides more work for the caller - I'd guess ckanapi (and libs in other languages) could take care of abstracting this away if there's only one language, or by allowing a user to specify a language and defaulting to en. |
@rossjones you're right, frequency as a controlled list is a different discussion completely. I think we can live with |
Thanks for making this proposal. So to be clear we'd see something like this in the API:
And the "title" could be in whatever language - no guarantees. But it should be a repeat of one of those in title_lang. I think that's a good standard to agree to. I think layout makes more sense than having a 'multilingual' element at the top level with the multilingual versions of everything below. That's the system I think is what the Pan-European Data Portal is doing - they said in April they are lumping translations to all fields into a single extra field. So if the API had it that way too then e.g. the translated resource titles would not be under the resource - that would not very nice. Regarding a potential new API version, I'm not keen in making "title" multilingual for these few sites that need it. I'd rather that dict stayed in the "title_lang" field. Then the majority of clients can continue to use the simple "title" field as normal, and the few that really care about multi-language can take advantage of the "title_lang" field, if it is there. These little barriers to simple API use all add up. Now, assuming we agree this, what's the next step? Do we just document it and encourage existing sites (Ca, EU Publications Office) to move that way? |
The next step that's important to me is just having the core ckan templates for breadcrumbs and other places look for a "title_lang" field and use it if it's there. Everything else is easy to do from an extension. |
Apologies for the Comments:
On most cases yes. There a couple of translatable names on our default values like "Atom feed" and when harvesting DCAT datasets or others I've definitely have ended up with stuff like "Comma Separate Values" or "Web Map Service". Probably not a big deal to have it multilingual or not.
Can you clarify how fluent would handle this @wardi ? I'd really expect to always have the same language on this fields, definitely if you have
I really don't want to have
I now realized that we will need Group/Org title and description as well. |
@amercader fluent would detect that it's being used on a core field (scheming already does this to work with extras) and if so, switch to storing its values in Next it adds an output validator to the core field This approach has one drawback: the title etc. stored in the database won't match the one returned from the API. Do you think that will matter? For the helper, yes, let's just create a |
@wardi I might be missing something, but can't the appropriate language for
What is the value for |
@amercader at least for datasets we return the json straight from solr in package_show not pass it through the validation logic, so we'd be making everything slower just to potentially swap in a language. I also prefer the results from the API to be independent of a users' language. I'm not used to language being an API parameter. What would you like stored in |
Sorry, this slipped off my mind. Regarding the DB not having the same value that the API, the only issue I can think of is if you are migrating the db to a new instance and don't have the fluent extension enabled you are going to end up with an empty title. But maybe this is an edge case. |
@amercader the |
Sounds sensible. I was trying to avoid changing the templates to much with helpers call, but it looks like it is indeed the best approach. |
Now the hard decision:
|
Why do we need the A caller of the API would have to check what kind of value is returned, but I don't think that's bad, because if we have separate fields, what language should be returned in the We currently try to implement exactly that, if you want to have a look: https://github.com/ogdch/ckanext-switzerland/blob/3a7279cec1d7b166d4658f48d6d98b4227037a69/ckanext/switzerland/plugin.py#L175-L201 (still work in progress). |
@metaodi Because API clients expect the CKAN API to return a title as a string, and changing that to a dict would break existing users of the API. It also complicates the writing of all clients, whether they are particularly interested in making use of the alternative translations or not. |
My understanding is that this method is not for getting the name of the language. You're getting a field's value, ideally in a particular language, but falling back to another language. So something along these lines:
I'd also prefer suffix |
@davidread but that's why we have API versions, right? We just define that on API version X each field is either a string or a dict. I guess in the future it should always be a dict, even if you only use one language. Or you could specify the language as a parameter and then be sure to only get strings. |
@metaodi see points 1 and 2 on my original comment regarding API versions. These changes are focused on the current (v3) version of the API. For clients expecting a string value for |
I'm also +1 in the name reflecting that you are getting the value. The more explicit name for me is
but happy to shorten it to This will work for all kinds of dicts right (resource, orgs, etc)? as it will only look for a Perhaps to be consistent then I'd prefer the field suffix to be How does this sound? |
Those names suit me |
at least one of the values isn't translated :-) I read |
I still prefer |
does anyone object to an |
Change package templates to use helper
Here is a PR to implement this in fluent ckan/ckanext-fluent#20 |
@LaurentGoderre we've settled on |
Gotcha! |
fluent now supports this. What is the status on the templates changes for this? Can I help? |
there's a few more I have to double check but i should just create a pr at this point and get it merged. |
open the PR and I can probably help test it |
[#2678] Add core translation get_translated helper
Change package templates to use helper Conflicts: ckan/lib/helpers.py ckan/templates/package/confirm_delete.html ckan/templates/package/read.html
The consensus for handling multilingual metadata fields seems to be to follow the way ckanext-fluent handles them.
For custom fields this looks uncontroversial:
There is an issue though with using this approach on existing core fields like
title
ornotes
. Up until this point these have always had a single string value, and changing this to return a dict on some occasions would break things, both in core (eg in templates) or in extensions.For dealing with this in the current API version I'm proposing that for these particular core fields that need to be translated we add a separate field:
I have no strong preference for what the
_lang
suffix actually is called (_trans
,_i18n
...)The list of fields that would need these is:
Just to clarify, for now it would be ckanext-fluent (or any other extension) who will create these fields, no changes on ckan core would be made.
Moving forward, on a new API version, we can decide whether the same pattern can be applied directly to core fields as well:
The main issue I see is what happens with those CKAN instances (the majority) that don't handle multilingual metadata. Do we allow string and dict values on the same field? (that doesn't sound like a good idea tbh). Do we always enforce a language key, even if there is only one for the instance default locale? eg:
or
The text was updated successfully, but these errors were encountered: