You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
":".join("{:02x}".format(ord(c)) for c in u"C:\Temp\xx鳭僣yy.txt")
'43:3a:5c:54:65:6d:70:5c:78:78:9ced:50e3:79:79:2e:74:78:74'
Write something to file, and check size to confirm
with open(u"C:\Temp\xx鳭僣yy.txt","w") as f: f.write("abc")
op.getsize(u"C:\Temp\xx鳭僣yy.txt")
3L # success!
Lets look for filenames in the directory using scandir (only file in dir):
for f in scandir.scandir_c("C:\Temp"): ":".join("{:02x}".format(ord(c)) for c in f.name)
'78:78:3f:3f:79:79:2e:74:78:74'
for f in scandir.scandir_python("C:\Temp"): ":".join("{:02x}".format(ord(c)) for c in f.name)
'78:78:3f:3f:79:79:2e:74:78:74'
for f in scandir.scandir_generic("C:\Temp"): ":".join("{:02x}".format(ord(c)) for c in f.name)
'78:78:3f:3f:79:79:2e:74:78:74'
Note the "3f:3f" is "??", so the filename is being printed as 'xx??yy.txt'
Scandir seems unable to retrieve the UTF8 encoded filename, even though I am able to write to this file and check the size using Python. The standing listdir/walk in OS module also suffer the same problem.
How can I get a directory listing with UTF8 filenames preserved?
The text was updated successfully, but these errors were encountered:
This is (annoying) but expected behaviour, due to the way byte and unicode filenames are handled in Python 2.x. To get around it, just pass a unicode string instead of a byte string to scandir, like you're doing with open(), for example:
scandir.scandir(u"C:\Temp\xx鳭僣yy.txt")
Let me know if this works. Note that this is (or it should be!) the same behaviour as os.listdir() on Python 2.x.
Sorry, I said "filename" and copied the unicode filename rather than the unicode directory.
But no, my module isn't "broken". :-) It's operating by design, as per os.listdir(). The behaviour of bytes paths on Windows is kinda weird -- if you pass in a byte string, you get out byte strings with non-ASCII chars replaced by ? characters on Windows. This is different from on Linux, where you get UTF-8. So bytes paths are kind of half broken on Windows Python.
What you need to do is simply path a unicode path to scandir. Like so:
>>>os.mkdir('temp')
>>>f=open(u'temp\\xx\u9ced\u50e3yy.txt', 'w') # create a unicode filename>>>f.close()
>>> [e.nameforeinscandir.scandir('temp')] # this is what you are doing
['xx??yy.txt']
>>> [e.nameforeinscandir.scandir(u'temp')] # this is what you need to do
[u'xx\u9ced\u50e3yy.txt']
>>> [e.name.encode('utf-8') foreinscandir.scandir(u'temp')] # or as UTF-8
['xx\xe9\xb3\xad\xe5\x83\xa3yy.txt']
Note that, by design, this exactly matches the behaviour of os.listdir() on Python 2.x:
Bytes of the real filename string
Write something to file, and check size to confirm
Lets look for filenames in the directory using scandir (only file in dir):
Note the "3f:3f" is "??", so the filename is being printed as 'xx??yy.txt'
Scandir seems unable to retrieve the UTF8 encoded filename, even though I am able to write to this file and check the size using Python. The standing listdir/walk in OS module also suffer the same problem.
How can I get a directory listing with UTF8 filenames preserved?
The text was updated successfully, but these errors were encountered: