Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvements to the recent unicode changes #137

Merged
merged 1 commit into from
Jan 24, 2019

Conversation

pelson
Copy link
Member

@pelson pelson commented Jan 22, 2019

#135 enabled unicode support, but it was quite limited for py2 users:

>>> import cf_units; cf_units.Unit(u'm²')
Unit('m?')

This change handles unicode much more uniformly, and only converts back to str at __str__ and raising exceptions. As a general rule, I've gone for "unicode in unicode out, str in str out". That essentially means that type(unit.origin) will be the same as the original type that you constructed the Unit with (unless it was an integer, in which case you will get str).

>>> type(cf_units.Unit('m').origin)
<type 'str'>

>>> type(cf_units.Unit(u'm²').origin)
<type 'unicode'>

>>> type(cf_units.Unit(2).origin)
<type 'str'>

The error handling is now far superior also:

>>> type(cf_units.Unit(u'ø'))
...
ValueError: [UT_UNKNOWN] Failed to parse unit "ø"

>>> type(cf_units.Unit(u'('))
Traceback (most recent call last):
...
ValueError: [UT_SUCCESS] Failed to parse unit "("

>>> type(cf_units.Unit('('))
Traceback (most recent call last):
...
ValueError: [UT_SUCCESS] Failed to parse unit "("

One of the main changes here is the introduction of the dunder __unicode__ method - __str__ must always return a string but we can choose to implement __unicode__ which can then be encoded to a str afterwards.

I'll be honest and say that this definitely has not been a fun experience. It has taken many hours longer than I'd hoped, and I've ended up finding some ugly warts in py2's str implementation (I knew they were there, but I hadn't realised how bad they were until I'd seen the py3 light 🌞 ).

This whole thing will be avoided on the day we drop py2 support. A decision is definitely needed in that regard. Perhaps we should follow suit from SciTools/cartopy#1186.

@coveralls
Copy link

coveralls commented Jan 22, 2019

Coverage Status

Coverage increased (+1.9%) to 90.255% when pulling 3767bdd on pelson:unicode_pt2 into 1e9af2a on SciTools:master.

@pelson
Copy link
Member Author

pelson commented Jan 23, 2019

@bjlittle

@pelson pelson mentioned this pull request Jan 23, 2019
__unicode__ = __str__

def __str__(self):
return unicode(self).encode('utf8')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pelson Yup, nice.

@bjlittle
Copy link
Member

@pelson Thanks! 👍

@bjlittle bjlittle merged commit 659e994 into SciTools:master Jan 24, 2019
@pelson pelson deleted the unicode_pt2 branch January 24, 2019 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants