Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeEncodeError using astropy.units.cds #5350

Closed
r-owen opened this issue Sep 21, 2016 · 26 comments · Fixed by #5355
Closed

UnicodeEncodeError using astropy.units.cds #5350

r-owen opened this issue Sep 21, 2016 · 26 comments · Fixed by #5355
Milestone

Comments

@r-owen
Copy link

r-owen commented Sep 21, 2016

The following two scripts fail for me at the Python 2.7.12 command line using AstroPy 1.2.1:

import astropy.units.cds
help(astropy.units.cds)

and

import astropy.coordinates
import astropy.units.cds
astropy.units.cds.enable()
astropy.coordinates.Angle("5d")

The former fails with:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/rowen/UW/LSST/lsstsw/miniconda/lib/python2.7/site.py", line 464, in __call__
    return pydoc.help(*args, **kwds)
  File "/Users/rowen/UW/LSST/lsstsw/miniconda/lib/python2.7/pydoc.py", line 1794, in __call__
    self.help(request)
  File "/Users/rowen/UW/LSST/lsstsw/miniconda/lib/python2.7/pydoc.py", line 1841, in help
    else: doc(request, 'Help on %s:')
  File "/Users/rowen/UW/LSST/lsstsw/miniconda/lib/python2.7/pydoc.py", line 1578, in doc
    pager(render_doc(thing, title, forceload))
  File "/Users/rowen/UW/LSST/lsstsw/miniconda/lib/python2.7/pydoc.py", line 1573, in render_doc
    return title % desc + '\n\n' + text.document(object, name)
  File "/Users/rowen/UW/LSST/lsstsw/miniconda/lib/python2.7/pydoc.py", line 361, in document
    if inspect.ismodule(object): return self.docmodule(*args)
  File "/Users/rowen/UW/LSST/lsstsw/miniconda/lib/python2.7/pydoc.py", line 1094, in docmodule
    for key, value in inspect.getmembers(object, inspect.isclass):
  File "/Users/rowen/UW/LSST/lsstsw/miniconda/lib/python2.7/inspect.py", line 253, in getmembers
    value = getattr(object, key)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb0' in position 2: ordinal not in range(128)

and the latter fails with:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/rowen/UW/LSST/lsstsw/miniconda/lib/python2.7/site-packages/astropy/coordinates/angles.py", line 111, in __new__
    angle, new_unit = util.parse_angle(angle, unit)
  File "/Users/rowen/UW/LSST/lsstsw/miniconda/lib/python2.7/site-packages/astropy/coordinates/angle_utilities.py", line 348, in parse_angle
    return _AngleParser().parse(angle, unit, debug=debug)
  File "/Users/rowen/UW/LSST/lsstsw/miniconda/lib/python2.7/site-packages/astropy/coordinates/angle_utilities.py", line 44, in __init__
    _AngleParser._parser, _AngleParser._lexer = self._make_parser()
  File "/Users/rowen/UW/LSST/lsstsw/miniconda/lib/python2.7/site-packages/astropy/coordinates/angle_utilities.py", line 104, in _make_parser
    '(?:{0})'.format(x) for x in cls._get_simple_unit_names())
  File "/Users/rowen/UW/LSST/lsstsw/miniconda/lib/python2.7/site-packages/astropy/coordinates/angle_utilities.py", line 49, in _get_simple_unit_names
    u.radian.find_equivalent_units(include_prefix_units=True))
  File "/Users/rowen/UW/LSST/lsstsw/miniconda/lib/python2.7/site-packages/astropy/units/core.py", line 1438, in find_equivalent_units
    include_prefix_units=include_prefix_units)
  File "/Users/rowen/UW/LSST/lsstsw/miniconda/lib/python2.7/site-packages/astropy/units/core.py", line 1263, in compose
    max_depth=max_depth, depth=0, cached_results={}))
  File "/Users/rowen/UW/LSST/lsstsw/miniconda/lib/python2.7/site-packages/astropy/units/core.py", line 1255, in sort_results
    if str(result) != str(last_result):
  File "/Users/rowen/UW/LSST/lsstsw/miniconda/lib/python2.7/site-packages/astropy/units/core.py", line 504, in __bytes__
    return unit_format.Generic.to_string(self).encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode character u'\xb5' in position 12: ordinal not in range(128)

See also https://groups.google.com/forum/#!topic/astropy-dev/-ebCdowI_0M

@timj
Copy link
Contributor

timj commented Sep 21, 2016

If this helps, the two unicode characters in question are:

Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:52:12) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> print("\xb5")
µ
>>> print("\xb0")
°

Using ascii to encode micron and degree symbol is not going to work.

@timj
Copy link
Contributor

timj commented Sep 21, 2016

On python3 the first example works fine and I get a different failure with the second example:

Traceback (most recent call last):
  File "xxx.py", line 4, in <module>
    astropy.coordinates.Angle("5d")
  File "/Users/timj/work/lsstsw3/miniconda/lib/python3.5/site-packages/astropy/coordinates/angles.py", line 111, in __new__
    angle, new_unit = util.parse_angle(angle, unit)
  File "/Users/timj/work/lsstsw3/miniconda/lib/python3.5/site-packages/astropy/coordinates/angle_utilities.py", line 348, in parse_angle
    return _AngleParser().parse(angle, unit, debug=debug)
  File "/Users/timj/work/lsstsw3/miniconda/lib/python3.5/site-packages/astropy/coordinates/angle_utilities.py", line 44, in __init__
    _AngleParser._parser, _AngleParser._lexer = self._make_parser()
  File "/Users/timj/work/lsstsw3/miniconda/lib/python3.5/site-packages/astropy/coordinates/angle_utilities.py", line 104, in _make_parser
    '(?:{0})'.format(x) for x in cls._get_simple_unit_names())
  File "/Users/timj/work/lsstsw3/miniconda/lib/python3.5/site-packages/astropy/coordinates/angle_utilities.py", line 50, in _get_simple_unit_names
    simple_units.remove(u.deg)
KeyError: Unit("deg")

@pllim
Copy link
Member

pllim commented Sep 21, 2016

When cds.enable() is invoked, d now means Julian day, not degree. The following should still work (I tested this with no problem in Python 3.5.1):

>>> import astropy
>>> from astropy import coordinates as coord
>>> from astropy.units import cds
>>> astropy.__version__
'1.3.dev16038'
>>> coord.Angle('5d')  # Without CDS
<Angle 5.0 deg>
>>> cds.enable()
<astropy.units.core._UnitContext at ...>
>>> coord.Angle('5deg')
<Angle 5.0 deg>
>>> coord.Angle('5°')
<Angle 5.0 deg>

This is due to the following mapping in cds.py:

mapping = [
    ...
    (['d'], u.d, "Julian day", ['c']),
    ((['deg', '°'], ['degree']), u.degree, "degree"),
    ...
]

@pllim
Copy link
Member

pllim commented Sep 21, 2016

For Python 2, you need to import unicode_literals for the unicode symbol to work, as follows:

>>> from astropy import coordinates as coord
>>> from astropy.units import cds
>>> coord.Angle('5°')  # Without unicode_literals
ValueError: 'ascii' codec can't decode ...
>>> from __future__ import unicode_literals
>>> coord.Angle('5°')
<Angle 5.0 deg>

As for help() in Python 2, I do indeed get the same error even with unicode_literals. One way to get around it is to use print or pprint to print the docstring (shown below). However, that still does not print everything that help() prints. A better way to read the documentation is to visit http://docs.astropy.org/en/stable/units/index.html#module-astropy.units.cds .

>>> from astropy.units import cds
>>> print(cds.__doc__)

Hope this helps.

@r-owen
Copy link
Author

r-owen commented Sep 21, 2016

I don't think the problem I originally reported is related to "d" being interpreted as Julian Day, at least when using Python 2.7 (it may explain the key error on Python 3). The following code also demonstrates the problem on Python 2.7 (and importing unicode_literals makes no difference):

from __future__ import unicode_literals  # makes no difference
import astropy.coordinates
import astropy.units
import astropy.units.cds
astropy.units.cds.enable()
astropy.coordinates.Angle("5", unit=astropy.units.deg)

This is a serious bug that prevents us from using astropy.coordinates.Angle (unless we forbid our users from using astropy.units.cds).

@pllim
Copy link
Member

pllim commented Sep 21, 2016

@r-owen , I cannot reproduce your error using the latest dev version. If this was indeed a bug, perhaps it is already fixed. Can you please try using Astropy dev and see if it works for you?

$ ipython
Python 2.7.12 |Anaconda 2.3.0 (64-bit)| (default, Jul  2 2016, 17:42:40) 
IPython 4.2.0 -- An enhanced Interactive Python.
>>> from __future__ import unicode_literals
>>> import astropy
>>> astropy.__version__
u'1.3.dev16118'
>>> from astropy import coordinates as coord
>>> from astropy import units as u
>>> from astropy.units import cds
>>> coord.Angle(5, unit=u.deg)
<Angle 5.0 deg>
>>> coord.Angle("5", unit=u.deg)
<Angle 5.0 deg>
>>> cds.enable()
<astropy.units.core._UnitContext at ...>
>>> coord.Angle(5, unit=u.deg)
<Angle 5.0 deg>
>>> coord.Angle("5", unit=u.deg)
<Angle 5.0 deg>

@timj
Copy link
Contributor

timj commented Sep 21, 2016

I would like to understand why __bytes__ is using ascii encoding. Wouldn't it be safer to use the default encoding? i.e. encode().

I can understand why unicode_literals is needed on python 2 for literal strings that have non-ASCII characters but that shouldn't be relevant for user code that is not using anything other than ASCII in string literals.

@mhvk
Copy link
Contributor

mhvk commented Sep 21, 2016

Very weird bug. Definitely an issue on 1.2.1, so if this is indeed fixed (I cannot easily test...), we need to be sure it gets backported to 1.0.11 and 1.2.2.
.
.
.
Actually, this is weirder still: I reproduce the error if I follow the example given, but do not get it when I try it first before enabling cds and then again afterwards (as @pllim did):

In [1]: from astropy.coordinates import Angle; import astropy.units as u; import astropy.units.cds as cds

In [2]: Angle("5", unit=u.deg)
Out[2]: <Angle 5.0 deg>

In [3]: cds.enable()
Out[3]: <astropy.units.core._UnitContext at 0x7fb3ff2b6e10>

In [4]: Angle("5", unit=u.deg)
Out[4]: <Angle 5.0 deg>

And without doing that:

In [1]: from astropy.coordinates import Angle; import astropy.units as u; import astropy.units.cds as cds

In [2]: cds.enable()                                                              
Out[2]: <astropy.units.core._UnitContext at 0x7f0d6a264d90>                       

In [3]: Angle("5", unit=u.deg)
---------------------------------------------------------------------------       
UnicodeEncodeError                        Traceback (most recent call last)

@pllim - did you try the exact same thing as posted by @r-owen?

@mhvk
Copy link
Contributor

mhvk commented Sep 21, 2016

OK, from the trace, it is down to __bytes__ indeed, as noted by @timj. Since __repr__ doesn't just encode as 'ascii', but as 'unicode_escape', a change seemed in order. This seems to fix the problem. PR will follow shortly.

@mhvk
Copy link
Contributor

mhvk commented Sep 21, 2016

My fix is in #5355. Note that it does not fix the help(astropy.units.cds) issue. Here, the problem is, I think, that the inspect module cannot deal with unicode attributes. I don't know if it is worth trying to work around this...

@r-owen
Copy link
Author

r-owen commented Sep 21, 2016

Thank you very much @mhvk! As to the issue with help(astropy.units.cds) I agree it is probably not worth fixing. I only mentioned it as a possible clue as to what might be wrong. Making Angle safe to use is the important thing!

@timj
Copy link
Contributor

timj commented Sep 21, 2016

On Python 3, I get the KeyError even if I explicitly specify the unit with CDS mode enabled:

>>> astropy.coordinates.Angle("5", unit=astropy.units.deg)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/timj/work/lsstsw3/miniconda/lib/python3.5/site-packages/astropy/coordinates/angles.py", line 111, in __new__
    angle, new_unit = util.parse_angle(angle, unit)
  File "/Users/timj/work/lsstsw3/miniconda/lib/python3.5/site-packages/astropy/coordinates/angle_utilities.py", line 348, in parse_angle
    return _AngleParser().parse(angle, unit, debug=debug)
  File "/Users/timj/work/lsstsw3/miniconda/lib/python3.5/site-packages/astropy/coordinates/angle_utilities.py", line 44, in __init__
    _AngleParser._parser, _AngleParser._lexer = self._make_parser()
  File "/Users/timj/work/lsstsw3/miniconda/lib/python3.5/site-packages/astropy/coordinates/angle_utilities.py", line 104, in _make_parser
    '(?:{0})'.format(x) for x in cls._get_simple_unit_names())
  File "/Users/timj/work/lsstsw3/miniconda/lib/python3.5/site-packages/astropy/coordinates/angle_utilities.py", line 50, in _get_simple_unit_names
    simple_units.remove(u.deg)
KeyError: Unit("deg")

The relevant code is:

    @classmethod
    def _get_simple_unit_names(cls):
        simple_units = set(
            u.radian.find_equivalent_units(include_prefix_units=True))
        simple_units.remove(u.deg)
        simple_units.remove(u.hourangle)
        simple_unit_names = set()
        for unit in simple_units:
            simple_unit_names.update(unit.names)
        return list(simple_unit_names)

which suggests that u.radian.find_equivalent_units changes its content when CDS is active. This code should therefore tests for membership before removing an item, or complain explicitly if degrees should be in the list.

@mhvk
Copy link
Contributor

mhvk commented Sep 21, 2016

It definitely is a tricky bug, since I think the reason @pllim didn't see it is that some caching takes place when equivalent units are first searched for, and the code itself basically worked; it was just the test with str(unit) that failed. Anyway, thanks for reporting, and for providing a nice and clear example. I'm always happy to hear that things are used!

@timj
Copy link
Contributor

timj commented Sep 21, 2016

Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul  2 2016, 17:52:12) 
[GCC 4.2.1 Compatible Apple LLVM 4.2 (clang-425.0.28)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import astropy.coordinates
>>> units = set(astropy.units.radian.find_equivalent_units(include_prefix_units=True))
>>> units
{Unit("uas"), Unit("urad"), Unit("prad"), Unit("Tarcmin"), Unit("Trad"), Unit("crad"), Unit("frad"), Unit("Zdeg"), Unit("mrad"), Unit("arad"), Unit("yarcmin"), Unit("dadeg"), Unit("arcmin"), Unit("darad"), Unit("drad"), Unit("karcmin"), Unit("yrad"), Unit("Tdeg"), Unit("harcmin"), Unit("daarcsec"), Unit("Parcmin"), Unit("fdeg"), Unit("kdeg"), Unit("Zarcsec"), Unit("Grad"), Unit("deg"), Unit("zarcmin"), Unit("Mdeg"), Unit("uarcsec"), Unit("mas"), Unit("aarcsec"), Unit("Earcsec"), Unit("Yarcmin"), Unit("Yarcsec"), Unit("Ydeg"), Unit("farcsec"), Unit("Yrad"), Unit("Gdeg"), Unit("Edeg"), Unit("krad"), Unit("Zarcmin"), Unit("parcsec"), Unit("nrad"), Unit("cycle"), Unit("ndeg"), Unit("rad"), Unit("Marcsec"), Unit("carcsec"), Unit("Marcmin"), Unit("udeg"), Unit("carcmin"), Unit("narcmin"), Unit("uarcmin"), Unit("karcsec"), Unit("ydeg"), Unit("narcsec"), Unit("zdeg"), Unit("Erad"), Unit("hdeg"), Unit("Garcsec"), Unit("marcmin"), Unit("Pdeg"), Unit("darcmin"), Unit("cdeg"), Unit("mdeg"), Unit("hourangle"), Unit("adeg"), Unit("marcsec"), Unit("hrad"), Unit("Earcmin"), Unit("aarcmin"), Unit("Zrad"), Unit("ddeg"), Unit("darcsec"), Unit("Parcsec"), Unit("farcmin"), Unit("Prad"), Unit("Garcmin"), Unit("zarcsec"), Unit("parcmin"), Unit("yarcsec"), Unit("Mrad"), Unit("Tarcsec"), Unit("arcsec"), Unit("daarcmin"), Unit("pdeg"), Unit("zrad"), Unit("harcsec")}
>>> astropy.units.deg in units
True
>>> import astropy.units.cds
>>> astropy.units.cds.enable()
<astropy.units.core._UnitContext object at 0x101b2b860>
>>> cunits = set(astropy.units.radian.find_equivalent_units(include_prefix_units=True))
>>> cunits
{Unit("urad"), Unit("prad"), Unit("Garcs"), Unit("Pirad"), Unit("Trad"), Unit("Tirad"), Unit("crad"), Unit("frad"), Unit("Zdeg"), Unit("arad"), Unit("µas"), Unit("mrad"), Unit("Eiarcm"), Unit("dadeg"), Unit("darad"), Unit("drad"), Unit("darcm"), Unit("carcs"), Unit("yrad"), Unit("Tdeg"), Unit("Tiarcs"), Unit("Garcm"), Unit("Miarcs"), Unit("uarcs"), Unit("fdeg"), Unit("kdeg"), Unit("Zarcm"), Unit("Eiarcs"), Unit("Girad"), Unit("Giarcm"), Unit("narcs"), Unit("Earcs"), Unit("Grad"), Unit("yarcs"), Unit("Piarcm"), Unit("zarcm"), Unit("Mdeg"), Unit("deg"), Unit("Mideg"), Unit("farcs"), Unit("Kideg"), Unit("karcm"), Unit("karcs"), Unit("arcm"), Unit("mas"), Unit("Ydeg"), Unit("marcs"), Unit("Yrad"), Unit("Gdeg"), Unit("Tarcm"), Unit("Parcm"), Unit("yarcm"), Unit("Edeg"), Unit("Tideg"), Unit("Parcs"), Unit("krad"), Unit("farcm"), Unit("Giarcs"), Unit("nrad"), Unit("uarcm"), Unit("ndeg"), Unit("Yarcm"), Unit("Pideg"), Unit("harcs"), Unit("zarcs"), Unit("daarcs"), Unit("Eideg"), Unit("udeg"), Unit("parcm"), Unit("aarcm"), Unit("Piarcs"), Unit("rad"), Unit("daarcm"), Unit("Kirad"), Unit("Tarcs"), Unit("ydeg"), Unit("carcm"), Unit("zdeg"), Unit("aarcs"), Unit("Erad"), Unit("hdeg"), Unit("Kiarcm"), Unit("Miarcm"), Unit("Pdeg"), Unit("Earcm"), Unit("parcs"), Unit("marcm"), Unit("cdeg"), Unit("narcm"), Unit("mdeg"), Unit("Mirad"), Unit("adeg"), Unit("Yarcs"), Unit("hrad"), Unit("Marcs"), Unit("harcm"), Unit("Eirad"), Unit("Zrad"), Unit("ddeg"), Unit("Prad"), Unit("Kiarcs"), Unit("Tiarcm"), Unit("Gideg"), Unit("Mrad"), Unit("darcs"), Unit("Zarcs"), Unit("arcs"), Unit("pdeg"), Unit("zrad"), Unit("Marcm")}
>>> astropy.units.deg in cunits
False

@MSeifert04
Copy link
Contributor

@mhvk I had a quick look and caching is the problem there because the coordinates.angle_utilities._AngleParser assumes that the "equivalent units" do not change. Not sure but I think this could break for several equivalencies and additional-units modules.

@MSeifert04
Copy link
Contributor

However an easy fix for the KeyError would be to just change the _get_simple_unit_names function:

    @classmethod
    def _get_simple_unit_names(cls):
        simple_units = set(
            u.radian.find_equivalent_units(include_prefix_units=True))
        if u.deg in simple_units:
            simple_units.remove(u.deg)
        if u.hourangle in simple_units:
            simple_units.remove(u.hourangle)
        simple_unit_names = set()
        for unit in simple_units:
            simple_unit_names.update(unit.names)
        return list(simple_unit_names)

But this won't fix the caching issue. But if you like I can submit this workaround as PR.

@timj
Copy link
Contributor

timj commented Sep 21, 2016

That would work. But I'm a bit confused because if you look at the output I show above, the CDS list does include a Unit("deg") but for some reason it is not the same unit, so it's almost like deg in CDS is different to normal deg. We may want to work out why that is before protecting the remove.

@MSeifert04
Copy link
Contributor

MSeifert04 commented Sep 21, 2016

@timj That's because hash(u.cds.deg) == hash(u.deg) is False. But you're right that protecting the remove may not be the best way. I should have read it more carefully before submitting a PR.

@mhvk
Copy link
Contributor

mhvk commented Sep 21, 2016

Hmm, just goes to show that I should write regression tests that exactly reproduce the problem reported! We'll indeed need to look into that code a bit deeper to see what exactly should be removed (my guess is the cds version of deg should be removed). In case he has time, cc @mdboom since git blames him for this....

@pllim
Copy link
Member

pllim commented Sep 22, 2016

@mhvk , good catch there. No, I did not run the code exactly as @r-owen . What I ran was what I posted. It did not occur to me that running something before enable() would somehow fix the bug. Thanks. Great investigative work, everyone!

@pllim pllim added the Bug label Sep 22, 2016
@MSeifert04
Copy link
Contributor

Should this issue be reopened because of the caching-issue in angleparser/the duplicate deg unit?

@mhvk mhvk reopened this Sep 22, 2016
@mhvk
Copy link
Contributor

mhvk commented Sep 22, 2016

Yes, I hadn't realised that writing "partially fixes #" in comment would actually close something. Definitely not done yet!

@r-owen
Copy link
Author

r-owen commented Sep 22, 2016

Thanks. I was wondering. I suppose you could file a new ticket for what remains broken (I admit I'm not sure what's fixed and what's broken at this point).

@mhvk
Copy link
Contributor

mhvk commented Sep 22, 2016

@r-owen - it is the problem in python3 noted in #5350 (comment)

@mhvk
Copy link
Contributor

mhvk commented Nov 18, 2016

A workaround for the problem noted by @timj is now in #5483. As I think we decided not to worry about the help not working, and this issue has become quite confusing, I'm going to close it when #5483 is merged.

@mhvk
Copy link
Contributor

mhvk commented Nov 21, 2016

Closing since #5483 is merged. Thanks all for helping to track these odd issues down!

@mhvk mhvk closed this as completed Nov 21, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants