-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow unicode filenames in Python 2 #119
Conversation
honestly, I would postpone this until we have tox running and can actually test this in different versions of Python. Also, this should be a relatively popular issue: Are there any best practices for cases like this? |
Sure, this isn't urgent at all. The code looks quite innocuous: try:
file = file.encode(_sys.getfilesystemencoding())
except AttributeError:
pass I guess we could just add it and wait if there are any bug reports.
I don't know. I guess normally Python's According to this stackoverflow page, BTW, the page also mentions |
Hmm ... this is more complicated than I thought ... Therefore, we'll either have to add an additional check for Another problem is the call to |
I added two commits to allow Unicode in Python 2, which didn't really work anywhere (ee4a400) and to specifically use There are no tests yet, because I don't know how test this with little effort (but still reasonably well). |
I added another commit (19b7e7c). I think this is the expected behavior, because it's very close to the built-in I currently don't have the motivation to create unit tests that thoroughly test all combinations of |
Oh my, what a terrible situation. Is there any way to factor the string sanitation out into a function? At any rate, I feel like a few comments would make things clearer--otherwise this just looks unnecessarily complex. |
Sure, that's the next step I'm working on ... But first I saw that the tests are actually not passing ... another commit is coming up ... |
... and disallow bytes for mode/format/subtype/endian. Regarding mode, this is the same behavior as in the built-in open() function.
OK, I replaced the last commit with a (hopefully) fixed one: 0ac4082 |
I completely refactored Better? |
If someone is fighting with Unicode implementation issues, I highly recommend watching this video: http://nedbatchelder.com/text/unipain.html This talks about Python 2 and 3 separately, it's even more pain if you want to support both with the same code base ... |
Yes, very much so. |
We now have a bunch of methods that kind of wrap sndfile commands. Maybe now would be the time to factor them out into their own class, and separate the Python part from the C-interop part. |
I think we should first make the release 0.7.0 and then do the suggested refactoring in a separate issue/PR. |
Allow unicode filenames in Python 2
Currently we check with
isinstance(file, str)
, which doesn't allow the Python 2unicode
type.We should probably allow both "raw bytes" and "unicode" in both Python 2 and 3.
We would have to call
encode()
only on the "unicode" strings.Alternatively, we could disallow Python 3
bytes
objects, but I think it's better to allow them, because they are allowed for standard Python file objects, too.I'm not sure which encoding is used by default, but we should check if we should probably change to
sys.getfilesystemencoding()
.This idea is stolen from https://github.com/vokimon/python-wavefile
It's kind of annoying to check for unicode vs bytes, but I guess this would work in 2.6, 2.7 and 3.3+:
We could also just check if it's any of the "string-like" types and then
try
if there is anencode()
method:In the combined check we could also use
isinstance(file, (type(u""), bytes))
, but I don't think that's better.If Python 3.0-3.2 compatibility is important (which I think isn't), we could, instead of
type(u"")
, usefrom __future__ import unicode_literals
and check fortype("")
, but this would lead to problems in other parts of the code (e.g. mode strings) whereisinstance(..., str)
would have to be extended to accept unicode in Python 2 (which would make things more complicated than before).As yet another alternative, we could
try
toencode()
before the type check and then drop the check forstr
and only check forbytes
:I think this doesn't restrict compatibility and even looks nicer than the other alternatives, so it is probably the way to go.
There should probably also be tests, but I don't know how the encoding stuff can be reasonably tested cross-platform with the least effort.