Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fails on 'es-419' #10

Closed
TomAnthony opened this issue Jul 12, 2015 · 21 comments
Closed

Fails on 'es-419' #10

TomAnthony opened this issue Jul 12, 2015 · 21 comments
Assignees
Labels
Milestone

Comments

@TomAnthony
Copy link

The library fails to recognise 'es-419' as a value, but it is an official value for IETF language tags. I don't entirely understand the different between IETF and IANA, but feel like this value should be recognised?

@cahytinne
Copy link
Contributor

I've tested the tag 'es-419' and the library recognizes it.

from language_tags import tags
tag = tags.tag('es-419')
print(tag.valid)
> True
print(tag)
> es-419
print(tag.descriptions)
> [u'Latin American Spanish']
print(tag.data)
> {'record': {u'Added': u'2005-07-15', u'Tag': u'es-419', u'Type': u'redundant', u'Description': [u'Latin American Spanish']}, 'tag': 'es-419'}

Which operating system are you using?
Can you post the error you're receiving? Thank you.

@TomAnthony
Copy link
Author

Thanks for the quick reply. I'm using this on both OS X and Ubuntu.

So I think this is my mistake, but I am still a bit confused. The 'es-419' tag behaves very differently to the 'es-es' tag (for example), which is why I thought it was failing (see below). If you could shed any light that would be great. In the meantime I'll go and take a longer look at the docs for the JS version.

Thanks again.

from language_tags import tags
tag = tags.tag('es-419')
tag.type
> u'redundant'
tag.descriptions
> [u'Latin American Spanish']
tag.data
> {'record': {u'Added': u'2005-07-15', u'Tag': u'es-419', u'Type': u'redundant', u'Description': [u'Latin American Spanish']}, 'tag': 'es-419'}
tag.language.description
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> AttributeError: 'NoneType' object has no attribute 'description'

tag = tags.tag('es-es')
tag.type
> 'tag'
tag.descriptions
> []
tag.data
> {'tag': 'es-es'}
tag.language.description
> [u'Spanish', u'Castilian']

@koenedaele
Copy link
Member

Thanks for posting the code.

I think it behaves differently because it's a redundant tag. But I do agree that it's not the cleanest solution at the moment. We'll have a further look to see how we can improve handling this case.

@koenedaele koenedaele added this to the 0.4.0 milestone Jul 14, 2015
@TomAnthony
Copy link
Author

Yeah, so I've done some more reading to make sure I understand the topic (http://www.w3.org/International/articles/language-tags/). It feels like the sub-tags should still be available, even if the type is set to redundant as some places are still using those tags.

The example I have for es-419 is Google are using it in the rel-alternate-hreflang annotations in the source of pages on the Google Play store (e.g. https://play.google.com/store/apps/details?id=com.babbel.mobile.android.en&hl=en).

@koenedaele
Copy link
Member

@cahytinne How does this work in the original JS version? Are they also ignoring the subtags if a tags is redundant?

@TomAnthony
Copy link
Author

@koenedaele @cahytinne I just checked this, and it seems they behave the same way as the Python version. I'm not sure it makes sense though as it essentially prevents someone from passing tags in the wild.

> tag = tags('es-es')
{ data: { tag: 'es-es' } }
> tag.subtags()
[ { data:
     { subtag: 'es',
       record: [Object],
       type: 'language' } },
  { data:
     { subtag: 'es',
       record: [Object],
       type: 'region' } } ]
>
>
> tag = tags('es-419')
{ data:
   { tag: 'es-419',
     record:
      { Type: 'redundant',
        Tag: 'es-419',
        Description: [Object],
        Added: '2005-07-15' } } }
> tag.subtags()
[]

@koenedaele
Copy link
Member

We did more or less port the js version to python as directly as we could, so it makes sense they're the same.

Can anyone see a good reason why the subtags aren't being generated with redundant (and possible other types) tags?

Might be interesting to involve the author of the original JS library as well.

@koenedaele
Copy link
Member

From RFC 5646:

Many of these registered tags were made redundant by the advent of either RFC 4646 or this document. A redundant tag is a grandfathered registration whose individual subtags appear with the same semantic meaning in the registry. For example, the tag "zh-Hant" (Traditional Chinese) can now be composed from the subtags 'zh' (Chinese) and 'Hant' (Han script traditional variant). These redundant tags are maintained in the registry as records of type 'redundant', mostly as a matter of historical curiosity.

If I read this correctly, it means that a redundant tag used to be a separate tag, but now it's just a regular tag that can be composed with it's relevant subtags. Which would indicate that es-419 should just be treated as language es in region 419. And at the tag level it would be nice to mark it as being a redundant tag for the sake of completeness. I'm not a native English speaker though, so maybe I'm reading this wrong.

@koenedaele
Copy link
Member

From looking at the data with @cahytinne it seems that some redundant tags do have a preferred value (eg. sgn-NL becomes dse).

So, if a redundant tag has a preferred value, use that. If not, allow splitting of the tag into subtags?

The grandfathered tags seem to be tags that are mostly invalid (eg. sgn-BE-FR) or that might be lexically valid but have a different meaning. Most of them have a preferred value, but a few don't (eg. cel-gaulish or i-default). So in the case of granfathered tags, use the preferred value if present. But don't split the tag into subtags.

@TomAnthony
Copy link
Author

I feel like it should be possible to split into subtags even if they are redundant. Preventing splitting them may make sense for writing scenarios, but if I am trying to read tags (as per the Google Play store) not being able to parse the tag makes the library useless.

@mattcg
Copy link

mattcg commented Jul 14, 2015

Hey, interesting discussion! Is this this consensus:

  • if the tag is redundant, behave normally
  • if the redundant tag has a preferred value, call tag.preferred() to get it

If so, I don't think it would be too difficult for me to implement in the JS version.

@TomAnthony
Copy link
Author

I think that makes sense, then you have the best of both worlds for both reading and writing.

@koenedaele
Copy link
Member

Seems like the most sensible option to me. So, looks like we all agree. Cool. I think we can implement this change later this week.

@koenedaele
Copy link
Member

Change looks ok to me. @TomAnthony or @mattcg Do either of you two have any comments? If not, i'll merge that branch with master.

@TomAnthony
Copy link
Author

It looks good to me, I think. :)

Awesome work guys, thanks a lot for the quick turn around.

@koenedaele
Copy link
Member

You're welcome. I'm going to wait to hear from @mattcg and then I'll merge.

@mattcg
Copy link

mattcg commented Jul 23, 2015

Thanks, @cahytinne. The only thing I'm not so sure about is subtags() returning the subtags from the preferred value if it has one. Maybe the user actually does want the valid subtags from the redundant tag and not the preferred value subtags.

Also, I think with this change calling format() after subtags() and then format() again will yield different results, as subtags() is changing the underlying data.

@mattcg
Copy link

mattcg commented Jul 23, 2015

I'd just eliminate the check for the preferred value and go with the logic from comment #10 (comment).

@TomAnthony
Copy link
Author

Yeah, that is a good point. Automatically pulling the preferred value is probably a bad idea for parsing old tags.

@cahytinne
Copy link
Contributor

Ok, I removed the check for the preferred value.

It is indeed an improvement. If the subtags of the preferred value are needed, then another tag can be created with the value of the preferred value of the redundant/grandfathered tag.

@koenedaele
Copy link
Member

If we go with this way, can we add an example of this usage to the docs as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants