New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Language Identification #50
Comments
The main challenge I see right now is turning non-BCP47 into BCP47. eg, for DBL, we have quite a long list of fields that, together, would feed into BCP47, but it's far from trivial to do this (ie it's not as simple as concatenating fields because BCP47 assumes that, eg, we know the default script for the language.) |
We should probably discuss how much of LDML we need: https://www.unicode.org/reports/tr35/ We should also discuss whether we want to use the same BCP 47 conformance as LDML: |
Re our discussion, this is the standard endpoint for obtaining the full list of tags including Suppress-Script information: Libraries to deal with BCP 47 tags with understanding of this registry seem fairly ubiquitous. From a cursory look: |
For ad-hoc searching the IANA registry: https://r12a.github.io/app-subtags/ |
I'm happy with what we have defined for language identification in https://github.com/bible-technology/scripture-burrito/blob/develop/schema/common.schema.json#L34-L39 . However, @rdb I notice that we don't have scriptDirection like we did in the XML schema. Is that intentional? |
@jag3773 we do, it's just defined inline because it's not used anywhere else: scripture-burrito/schema/language.schema.json Lines 11 to 14 in 089378a
|
Excellent, I was looking in the wrong place, but that's clearly where i should have been looking. I'm happy with the current implementation and will close this issue. If anyone has specific language related items that they think are not covered please create a new issue for them. |
Note, for the record, that I renamed the field from "bcp47" to "tag" (since that describes what it is, and "(IETF) language tag" is widely understood by that name rather than by the document that defines it). I also want to note that we don't currently have a "numeral system" field but expect it to be added to the IETF language tag using the Unicode extension syntax. I'm personally not 100% certain about this, but when I talked about it with Mark he seemed to prefer putting everything in the tag that can go in the tag, and there is some sense in that. |
We have defined a basic system of language identification in the documentation. It's likely that this fills the need but if there are critical pieces missing we should ask a group to discuss.
The text was updated successfully, but these errors were encountered: