Fix #463 by using io.open #562

dankolbman · 2016-01-04T01:21:22Z

No description provided.

f0k · 2016-01-04T12:57:53Z

setup.py

@@ -2,6 +2,7 @@
 import re
 from setuptools import find_packages
 from setuptools import setup
+from io import open


I'd prefer import io and using io.open below, otherwise it looks like you're using the builtin open.

dankolbman · 2016-01-04T16:09:11Z

Follow up to previous note, it looks like codecs.open() is the preffered way to open files if we ever want to keep compatability for 2.5 and lower.

f0k · 2016-01-04T16:52:07Z

setup.py

-    version = re.search('__version__ = "(.*)"', init_py).groups()[0]
+    with io.open(os.path.join(here, 'lasagne', '__init__.py'), 'r') as f:
+        init_py = f.read()
+        version = re.search('__version__ = "(.*)"', init_py).groups()[0]


Sorry for the nitpick, but the last line should be left unindented as before. It's not part of the file operation. Thanks!

f0k · 2016-01-04T16:56:41Z

Follow up to previous note, it looks like codecs.open() is the preffered way to open files

Thanks for checking! I'd still prefer io.open as this is an alias for open() in Python 3, while codecs.open() seems to do something slightly more complicated. Theano requires Python >= 2.6, so we don't need to care about older versions anyway.

Could you fix my last comment and then squash everything into a single commit? (Let me know if you don't know how.) Thanks!

dankolbman · 2016-01-04T18:47:49Z

Everything should be nice and tidy now (I definitely did not make a mess of things trying to squash those commits)

I think it makes more sense to have the operations that are dependent on the file to be inside the with statement, but I think that comes down to personal preference.

kmike · 2016-01-04T23:24:28Z

Hey, I could be wrong, but AFAIK there is no way to use non-ascii text as setup.py metadata in a reliable way across different Python interpreters (see e.g. pypa/pip#761). At least I gave up doing this :) I don't recall all the details, but the data type is different in Python 2 and Python 3, and different setup.py commands (e.g. setup.py sdist or setup.py register or setup.py install) may require different data types (bytes or unicode) in Python 2. For Python3-only solution unicode is the way to go, but AFAIK to support all Python 2 users there must be no non-ascii text in long_description.

Maybe things changed, but last time I checked it was a can of worms with lots of edge cases and various pip and distutils bugs.

dankolbman · 2016-01-04T23:58:41Z

From your pr here, it looks like only 2.5 failed, and 2.5 isn't supported here any way. I think the fix here should be sufficient for the needs here (?)

Alternatively, unicode in README and CHANGES could be replaced with ascii.

kmike · 2016-01-05T00:04:34Z

@dankolbman I don't know, maybe you're right, it was a long time ago and I haven't checked it since then. But other packages seems to follow 'don't use non-ascii text in setup.py' rule in 2015 - e.g. requests did it here: https://github.com/kennethreitz/requests/pull/2420.

f0k · 2016-01-05T13:30:58Z

@kmike: Thanks for looking into this! I was always weary of unconditionally changing the description to be unicode in Python 2. Reading what you found in 2013, it seems the only problem this introduces is that python setup.py register will not work for Python < 2.7? (As this was fixed in 2.7: http://bugs.python.org/issue13114.) Everything else seems to be fine? This is something we can surely live with, and I'd prefer that over crippling contributor names with foreign characters (I may be biased, though).
Assuming that the long description is not terribly important to end users anyway, as an additional measure, we can also have the try..except clause catch the UnicodeDecodeError in addition to the IOError. Not being able to fill in the long description shouldn't be a reason against installing the package. @dankolbman: Feel free to add this if you agree.

My other fear was that we might not want to set the version string to unicode in Python 2 as this is usually a str. As long as we do not introduce non-ASCII characters into __init__.py (which currently is limited to ASCII and does not declare an encoding for Python either), we can continue to read it with the builtin open instead of io.open so we'll get a byte str in Python 2 and a unicode str in Python 3, as before. @dankolbman: Sorry for this iterative process... could you please remove the io. for reading the version string? If this is done, I'm confident we've got everything right.

Thanks for scrutinizing this issue so thoroughly!

dankolbman · 2016-01-06T15:49:10Z

I'm tempted to just roll it back and only catch the UnicodeDecodeError and leaving the fields blank in that case. Mixing io.open() and open() seems sloppy to me and, as you say, there's no reason users should need the long_description. Care will just need to be taken when submitting to pypi to make sure it's not empty then.

Tell me if you agree with this and I'll get the appropriate changes in place.

On a side note, I found there was a problem in my environment when a bunch of other packages failed from pip. Something was keeping me in the ascii locale which prevented a quick workaround and led me to think this was more a fault of the package than of my own. Anyone with a properly configured environment should have no problem doing a quick export LANG=UTF-8.

f0k · 2016-01-06T19:39:12Z

Something was keeping me in the ascii locale which prevented a quick workaround and led me to think this was more a fault of the package than of my own. Anyone with a properly configured environment should have no problem doing a quick export LANG=UTF-8.

It's not very obvious that this is the solution, though, and I think it's a fault of the package to assume that the user will always have a UTF8 locale. The package knows that the files it's going to read are encoded in UTF8, so it should act accordingly.

I'm tempted to just roll it back and only catch the UnicodeDecodeError and leaving the fields blank in that case. Mixing io.open() and open() seems sloppy to me and, as you say, there's no reason users should need the long_description. Care will just need to be taken when submitting to pypi to make sure it's not empty then.

How can we ensure that, though? I think this is a little risky. Maybe it's a bad idea to catch UnicodeDecodeError after all, let's better leave that part the way it is.
About mixing open and io.open: This still seems to be the correct solution to me. We should properly document that, though. For the version string, the call could be prepended with:

# Read ASCII file with builtin `open()`, so `__version__` is `str` in Python 2 and 3

And the other reading operation could be prepended with:

# Read UTF8 files with `io.open()`, so `long_description` is `unicode` in Python 2, `str` in Python 3

Could you live with that? Just catching UnicodeDecodeError would be ducking out, and blindly sending an UTF8-encoded byte string to pypi (that's what we did for the first release) doesn't seem very elegant either.

I'm still glad we discussed this in detail and not just applied the first fix we could think of! I would have been afraid of breaking something else then.

dankolbman · 2016-01-07T16:33:23Z

It's not very obvious that this is the solution...

No, certainly not, but if it had worked for me the first time, I sure wouldn't have opened this can of worms ;)

Alrighty, I agree catching the UnicodeDecodeError is not the way to go. It seems tho only thing that can break it now is if unicode gets into the version string somehow. Although reading the version does blindly catch all exceptions, I don't think that's very good practice.

Ideally, I think this should all be taken care of from pip's side, though sadly it seems to have been on their radar for a while now and no progress has been made.

f0k · 2016-01-07T17:26:50Z

It seems tho only thing that can break it now is if unicode gets into the version string somehow.

If that happens, import lasagne will fail because __init__.py does not declare an encoding. Of course it could happen that this is fixed without modifying setup.py.

Although reading the version does blindly catch all exceptions, I don't think that's very good practice.

Yes... but let's leave it at that now. The can of worms was big enough already :)

Finally merging, thanks a lot for all your patience!

Ideally, I think this should all be taken care of from pip's side

They can't. It's our fault not to specify an encoding when reading the file and relying on the user's system to have an UTF8 default encoding.

Fix #463 by using io.open

benanne · 2016-01-07T23:13:01Z

Thanks to all of you for looking into this :)

dankolbman · 2016-01-08T03:15:27Z

👍

They can't. It's our fault not to specify an encoding when reading the file and relying on the user's system to have an UTF8 default encoding.

Ahh yes of course. I don't know what I was thinking there.
There are still a good many more packages out there that are guilty of this, like the ones I had issues with the other day. Now at least there's a nice wall of text for them to find summarizing our struggles :)

f0k reviewed Jan 4, 2016
View reviewed changes

dankolbman force-pushed the master branch 2 times, most recently from 1a4ace7 to fda7237 Compare January 4, 2016 18:42

Fix Lasagne#463 by using io.open

38840d8

dankolbman force-pushed the master branch 2 times, most recently from 3e45b6f to 38840d8 Compare January 7, 2016 16:23

f0k added a commit that referenced this pull request Jan 7, 2016

Merge pull request #562 from dankolbman/master

990e3f6

Fix #463 by using io.open

f0k merged commit 990e3f6 into Lasagne:master Jan 7, 2016

f0k mentioned this pull request Oct 6, 2016

Cannot install with pip3 install lasagne #703

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #463 by using io.open #562

Fix #463 by using io.open #562

dankolbman commented Jan 4, 2016

f0k Jan 4, 2016

dankolbman commented Jan 4, 2016

f0k Jan 4, 2016

f0k commented Jan 4, 2016

dankolbman commented Jan 4, 2016

kmike commented Jan 4, 2016

dankolbman commented Jan 4, 2016

kmike commented Jan 5, 2016

f0k commented Jan 5, 2016

dankolbman commented Jan 6, 2016

f0k commented Jan 6, 2016

dankolbman commented Jan 7, 2016

f0k commented Jan 7, 2016

benanne commented Jan 7, 2016

dankolbman commented Jan 8, 2016

Fix #463 by using io.open #562

Fix #463 by using io.open #562

Conversation

dankolbman commented Jan 4, 2016

f0k Jan 4, 2016

Choose a reason for hiding this comment

dankolbman commented Jan 4, 2016

f0k Jan 4, 2016

Choose a reason for hiding this comment

f0k commented Jan 4, 2016

dankolbman commented Jan 4, 2016

kmike commented Jan 4, 2016

dankolbman commented Jan 4, 2016

kmike commented Jan 5, 2016

f0k commented Jan 5, 2016

dankolbman commented Jan 6, 2016

f0k commented Jan 6, 2016

dankolbman commented Jan 7, 2016

f0k commented Jan 7, 2016

benanne commented Jan 7, 2016

dankolbman commented Jan 8, 2016