Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with Lexvo recommended URIs for natural language property #507

Open
jonquet opened this issue Mar 29, 2024 · 5 comments
Open

Problem with Lexvo recommended URIs for natural language property #507

jonquet opened this issue Mar 29, 2024 · 5 comments
Labels
enhancement metadata Issues related to metadata edition/curation

Comments

@jonquet
Copy link
Contributor

jonquet commented Mar 29, 2024

It seems LEXVO does not dully support ISO-639-1
For exemple, they don't support Brazilian Portuguese

Portuguese URI
http://lexvo.org/id/iso639-1/pt
Brazilian Portuguese
http://lexvo.org/id/iso639-1/pt-br

Based on https://www.andiamo.co.uk/resources/iso-language-codes/ these codes are in ISO-639-1
I was confused with https://en.wikipedia.org/wiki/List_of_ISO_639_language_codes thinking this was complete but in fact not.

We will need to switch to another set if URIs

@jonquet jonquet added enhancement metadata Issues related to metadata edition/curation labels Mar 29, 2024
@jonquet
Copy link
Contributor Author

jonquet commented Mar 29, 2024

Maybe check: https://glottolog.org/glottolog/language

@jonquet
Copy link
Contributor Author

jonquet commented Apr 22, 2024

Other possibles URIs are from the LOC e.g., http://id.loc.gov/vocabulary/iso639-1/pt
But it dos not contains the 2 letter codes: http://id.loc.gov/vocabulary/iso639-1/pt-br

@syphax-bouazzouni
Copy link

@jonquet you should read this https://stackoverflow.com/questions/19288173/is-there-a-free-available-document-with-most-iso-639-languages-codes

You can get a full list of ISO639-1 codes as a SKOS concept (rdf) in various formats from the Library of Congress website: http://id.loc.gov/vocabulary/iso639-1.html ISO639-2 is a more complete list of 3 letter country codes (over 500 vs 180 for iso639-1) is also available on the website.

The "pt-BR" code for Brazilian Portuguese you mention is not actually the ISO639-1 code, but a composite code made up of the ISO639-1 code for portuguese "pt" and the ISO3166-1 country code for Brazil "BR". These are combined following best practice defined in RFC5646: https://www.rfc-editor.org/rfc/rfc5646 .

@jonquet
Copy link
Contributor Author

jonquet commented Apr 22, 2024

Interesting catch. Indeed this a good explanation of the explanation of how the tag is built...
And the other thread here: w3c/i18n-discuss#13 confirmed also there is no URIs for "subtags".
So let's move on with this and find a local solution

@jonquet
Copy link
Contributor Author

jonquet commented Apr 22, 2024

I propose we relax the rule in the back end of having a URI mandatory for the naturalLanguage property.
And we shall modify our popup selector to offer pt-BR as an additional proposition. And find a way to add a "Brazilian" flag for it.
What we need at the end is the multilingual support to work with pt and pt-BR as if there were completly 2 different languages.

@syphax-bouazzouni This is open to discussion to see if we do implement it in a way that will avoid us to come back to the code each time we need to add a "subtag"... still being sure that languages do not endup messy.. For instance, we could allow the proposition of language in ISO-639-3 codes (3 letters) and subtags (2letter-2letter)...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement metadata Issues related to metadata edition/curation
Projects
None yet
Development

No branches or pull requests

2 participants