-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: Transliterated name forms #271
proposal: Transliterated name forms #271
Conversation
When we don't know the lang of the string standards (in StdLocale) has switched from using "i-default" to using "und" as the lang tag which is in the BCP47 spec as meaning undetermined. The use of the 'i-xxxx' stuff is not guaranteed to be carried forward in future versions of BCP47. I totally favor tacking on the script type to the original language after transliterating a string. That is definitely in the spirit of BCP47. |
see: http://tools.ietf.org/html/bcp47#section-2.2.3 (esp. all the stuff around "script") see: http://en.wikipedia.org/wiki/Romanization see: http://en.wikipedia.org/wiki/ISO_15924 see: http://www.unicode.org/iso15924/iso15924-codes.html
When the original language is not known, what do you put for the transliterated name form script lang? This represents my best guess.
@rbarrynay, does your comment imply that FamilySearch's platform api should now be returning "und" for name langtags when the script is unspecified? I only saw one reference to "und" in BCP47 spec at 4.1.5 and its use was discouraged. Instead, it was recommended that no lang tag be used at all. I know that "i-default" is in the grandfathered/irregular category, but its use seems to be more in line with "use this if you need to supply a langtag but it has not been specified". Anyway, just looking for guidance and convergence on what is the unspecified langtag for gedcomx. |
The proposed modifications look good to me, but I'm looking forward to hearing back from @rbarrynay regarding the use of "und". |
Standards had decided not to use BCP47 tags (defined in http://www.iana.org/assignments/language-subtag-registry/language-subtag-registry) that are grandfathered. The scope: special language tags of mis (uncoded languages), mul (multiple langages), und (undetermined), and zxx (no linguistic content) have superseded the use of i-default. Concerning the use of the und tag, BCP47 4.1.5 section states:
Standards interprets that to mean, if the client doesn't know the language or the script then a simple empty string would do as a locale string. However, the locale string parser requires a language if any other information (like region or script) needs to be expressed. So if one doesn't know the language but DOES know the script one would use the string "und-Latn". |
I see, that clears things up for me. Does this imply that this is how the FamilySearch platform currently works for names? |
Not sure about the FamilySearch Platform. I do know this is how Standards is supposed to be working and Platform is calling though to Standards. |
proposal: Transliterated name forms
Closing this out; it has sat long enough. |
Here is what I learned about transliteration of name forms, and a proposal for how to deal with them in the
lang
attribute.