New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use bytes paths in plat_other #13
Conversation
Thanks for tackling this. It is indeed a bit ugly, but necessary. I agree with you about handling path through bytes. Before we decide to take this path, here's an idea: what about using What do you think? |
Oh, nevermind, I just fired an interpreter to see that Will review the PR soon... |
I disagree with your premise to just use bytes. You will have issues when the paths are not in the same encoding as the system. full unicode always works, just use something different than urllib.quote imo. There is a reason python3 uses unicode as default str type, and this PR breaks cases with python3 |
@miigotu I'm generally on board with using unicode by default, but for Unix filenames specifically, I think the underlying implementation needs to deal with bytes, because that's what the filenames are. The public API definitely needs to accept unicode paths, though. On my computer, with open(b'abc\xe1', 'w'): pass If we try to convert that to unicode on Python 2, it doesn't work:
There is no unicode string which my computer will understand as referring to that file (at least on Python 2 - Python 3 has a trick to do it). The only reliable way to store the name for it is as bytes.
The current implementation already has those issues - it decodes bytes paths using Again, note that both the current implementation and this PR allow passing either bytes or unicode into the function. The only difference is what they do with it internally. |
@miigotu this is precisely the case where you will have issues using unicode. A latin-1 FS mounted on a utf-8 system will likely have troubles. I haven't tested lately and I think that python 3 does some surrogate unicode decode dance, which partially mitigate the issue, but it's still a hack. Even if In cases when you don't need to display the path, the most reliable way to refer to path is to use, for example If you need to display paths in your UI, then you're in trouble anyway. You'll have to implement encoding-guessing algorithms and all. But that's something you'll do at the application level. As @takluyver says, at the library level, we have to support byte paths properly without trying to decode them. |
This is good stuff, merging. Thanks @takluyver . Will create a v1.4 release after a little cooldown period. Also, as you suggested earlier, I've added travis config so that tests are ran automatically :) |
Thanks! If you could ping me here when the release happens, that would be great. :-) |
@takluyver ping! |
Thank-you! That's made our tests pass :-). I saw that there was a minor hiccup with Windows requiring a quick 1.4.1. I'm happy to write an Appveyor config and a couple of tests if you're interested in more CI. I'm not going to evangelise this, though - I only usually bother with one CI service for most of my projects. |
This fixes issues with non-ascii paths when using the fallback freedesktop trash implementation.
This is a rather ugly set of changes, but it's based on two things:
urllib.quote()
on Python 2 requires bytes. On Python 3, it can handle unicode or bytes.So this is a complete switch around from #12: rather than converting all paths to unicode when they're passed in, convert them all to bytes.
I added tests for passing in a path as unicode or as bytes. Before the changes, both tests fail under Python 2. Afterwards, everything passes, on both versions of Python.