Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #463 by using io.open #562

Merged
merged 1 commit into from
Jan 7, 2016
Merged

Fix #463 by using io.open #562

merged 1 commit into from
Jan 7, 2016

Conversation

dankolbman
Copy link
Contributor

No description provided.

@@ -2,6 +2,7 @@
import re
from setuptools import find_packages
from setuptools import setup
from io import open
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer import io and using io.open below, otherwise it looks like you're using the builtin open.

@dankolbman
Copy link
Contributor Author

Follow up to previous note, it looks like codecs.open() is the preffered way to open files if we ever want to keep compatability for 2.5 and lower.

version = re.search('__version__ = "(.*)"', init_py).groups()[0]
with io.open(os.path.join(here, 'lasagne', '__init__.py'), 'r') as f:
init_py = f.read()
version = re.search('__version__ = "(.*)"', init_py).groups()[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the nitpick, but the last line should be left unindented as before. It's not part of the file operation. Thanks!

@f0k
Copy link
Member

f0k commented Jan 4, 2016

Follow up to previous note, it looks like codecs.open() is the preffered way to open files

Thanks for checking! I'd still prefer io.open as this is an alias for open() in Python 3, while codecs.open() seems to do something slightly more complicated. Theano requires Python >= 2.6, so we don't need to care about older versions anyway.

Could you fix my last comment and then squash everything into a single commit? (Let me know if you don't know how.) Thanks!

@dankolbman dankolbman force-pushed the master branch 2 times, most recently from 1a4ace7 to fda7237 Compare January 4, 2016 18:42
@dankolbman
Copy link
Contributor Author

Everything should be nice and tidy now (I definitely did not make a mess of things trying to squash those commits)

I think it makes more sense to have the operations that are dependent on the file to be inside the with statement, but I think that comes down to personal preference.

@kmike
Copy link
Contributor

kmike commented Jan 4, 2016

Hey, I could be wrong, but AFAIK there is no way to use non-ascii text as setup.py metadata in a reliable way across different Python interpreters (see e.g. pypa/pip#761). At least I gave up doing this :) I don't recall all the details, but the data type is different in Python 2 and Python 3, and different setup.py commands (e.g. setup.py sdist or setup.py register or setup.py install) may require different data types (bytes or unicode) in Python 2. For Python3-only solution unicode is the way to go, but AFAIK to support all Python 2 users there must be no non-ascii text in long_description.

Maybe things changed, but last time I checked it was a can of worms with lots of edge cases and various pip and distutils bugs.

@dankolbman
Copy link
Contributor Author

From your pr here, it looks like only 2.5 failed, and 2.5 isn't supported here any way. I think the fix here should be sufficient for the needs here (?)

Alternatively, unicode in README and CHANGES could be replaced with ascii.

@kmike
Copy link
Contributor

kmike commented Jan 5, 2016

@dankolbman I don't know, maybe you're right, it was a long time ago and I haven't checked it since then. But other packages seems to follow 'don't use non-ascii text in setup.py' rule in 2015 - e.g. requests did it here: https://github.com/kennethreitz/requests/pull/2420.

@f0k
Copy link
Member

f0k commented Jan 5, 2016

@kmike: Thanks for looking into this! I was always weary of unconditionally changing the description to be unicode in Python 2. Reading what you found in 2013, it seems the only problem this introduces is that python setup.py register will not work for Python < 2.7? (As this was fixed in 2.7: http://bugs.python.org/issue13114.) Everything else seems to be fine? This is something we can surely live with, and I'd prefer that over crippling contributor names with foreign characters (I may be biased, though).
Assuming that the long description is not terribly important to end users anyway, as an additional measure, we can also have the try..except clause catch the UnicodeDecodeError in addition to the IOError. Not being able to fill in the long description shouldn't be a reason against installing the package. @dankolbman: Feel free to add this if you agree.

My other fear was that we might not want to set the version string to unicode in Python 2 as this is usually a str. As long as we do not introduce non-ASCII characters into __init__.py (which currently is limited to ASCII and does not declare an encoding for Python either), we can continue to read it with the builtin open instead of io.open so we'll get a byte str in Python 2 and a unicode str in Python 3, as before. @dankolbman: Sorry for this iterative process... could you please remove the io. for reading the version string? If this is done, I'm confident we've got everything right.

Thanks for scrutinizing this issue so thoroughly!

@dankolbman
Copy link
Contributor Author

I'm tempted to just roll it back and only catch the UnicodeDecodeError and leaving the fields blank in that case. Mixing io.open() and open() seems sloppy to me and, as you say, there's no reason users should need the long_description. Care will just need to be taken when submitting to pypi to make sure it's not empty then.

Tell me if you agree with this and I'll get the appropriate changes in place.

On a side note, I found there was a problem in my environment when a bunch of other packages failed from pip. Something was keeping me in the ascii locale which prevented a quick workaround and led me to think this was more a fault of the package than of my own. Anyone with a properly configured environment should have no problem doing a quick export LANG=UTF-8.

@f0k
Copy link
Member

f0k commented Jan 6, 2016

Something was keeping me in the ascii locale which prevented a quick workaround and led me to think this was more a fault of the package than of my own. Anyone with a properly configured environment should have no problem doing a quick export LANG=UTF-8.

It's not very obvious that this is the solution, though, and I think it's a fault of the package to assume that the user will always have a UTF8 locale. The package knows that the files it's going to read are encoded in UTF8, so it should act accordingly.

I'm tempted to just roll it back and only catch the UnicodeDecodeError and leaving the fields blank in that case. Mixing io.open() and open() seems sloppy to me and, as you say, there's no reason users should need the long_description. Care will just need to be taken when submitting to pypi to make sure it's not empty then.

How can we ensure that, though? I think this is a little risky. Maybe it's a bad idea to catch UnicodeDecodeError after all, let's better leave that part the way it is.
About mixing open and io.open: This still seems to be the correct solution to me. We should properly document that, though. For the version string, the call could be prepended with:

# Read ASCII file with builtin `open()`, so `__version__` is `str` in Python 2 and 3

And the other reading operation could be prepended with:

# Read UTF8 files with `io.open()`, so `long_description` is `unicode` in Python 2, `str` in Python 3

Could you live with that? Just catching UnicodeDecodeError would be ducking out, and blindly sending an UTF8-encoded byte string to pypi (that's what we did for the first release) doesn't seem very elegant either.

I'm still glad we discussed this in detail and not just applied the first fix we could think of! I would have been afraid of breaking something else then.

@dankolbman dankolbman force-pushed the master branch 2 times, most recently from 3e45b6f to 38840d8 Compare January 7, 2016 16:23
@dankolbman
Copy link
Contributor Author

It's not very obvious that this is the solution...

No, certainly not, but if it had worked for me the first time, I sure wouldn't have opened this can of worms ;)

Alrighty, I agree catching the UnicodeDecodeError is not the way to go. It seems tho only thing that can break it now is if unicode gets into the version string somehow. Although reading the version does blindly catch all exceptions, I don't think that's very good practice.

Ideally, I think this should all be taken care of from pip's side, though sadly it seems to have been on their radar for a while now and no progress has been made.

@f0k
Copy link
Member

f0k commented Jan 7, 2016

It seems tho only thing that can break it now is if unicode gets into the version string somehow.

If that happens, import lasagne will fail because __init__.py does not declare an encoding. Of course it could happen that this is fixed without modifying setup.py.

Although reading the version does blindly catch all exceptions, I don't think that's very good practice.

Yes... but let's leave it at that now. The can of worms was big enough already :)

Finally merging, thanks a lot for all your patience!

Ideally, I think this should all be taken care of from pip's side

They can't. It's our fault not to specify an encoding when reading the file and relying on the user's system to have an UTF8 default encoding.

f0k added a commit that referenced this pull request Jan 7, 2016
@f0k f0k merged commit 990e3f6 into Lasagne:master Jan 7, 2016
@benanne
Copy link
Member

benanne commented Jan 7, 2016

Thanks to all of you for looking into this :)

@dankolbman
Copy link
Contributor Author

👍

They can't. It's our fault not to specify an encoding when reading the file and relying on the user's system to have an UTF8 default encoding.

Ahh yes of course. I don't know what I was thinking there.
There are still a good many more packages out there that are guilty of this, like the ones I had issues with the other day. Now at least there's a nice wall of text for them to find summarizing our struggles :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants