-
Notifications
You must be signed in to change notification settings - Fork 149
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IOError: [Errno 36] File name too long #292
Comments
I think the dumps should be produced in the same way whatever the
filesystem, otherwise we'll end up with multiple incompatible formats of
dumps. Ideally we would not store the filenames in the local filesystem
and we'd be able to keep the original wiki's filesystem metadata, but so
it isn't currently.
I'm fine with adding a note in the README that your filesystem isn't
supported (or better, how long a filename we assume is possible).
|
I don't think that a note on the readme that ecryptfs is not supported is a "solution". It is not "my" file system, it is the default of Ubuntu if you encrypt your home directory. Therefore a lot of users are affected. Keep the original wiki's filesystem structure (= filenames) sounds like a good idea for me since IHMO a dump(er software) should copy the source w/o change it -> ideal solution. Coming from the ideal solution, the current filename handling is a dirty hack and also buggy (sorry to say that). Let me elaborate this claim on the current code:
Lets have a look on this innocent looking French filename: >>> fn = "Assortiment de différentes préparation à bases de légumes et féculents, bien sur servit avec de l'injara.JPG"
>>> len(fn)
108 108 > 100, so it will truncated. Worst case with added fn = u"Assortiment de diff\xe9rentes pr\xe9paration \xe0 bases de l\xe9gumes et f\xe9culents, bien sur servit avec de l'inf1f192008cca2209820a6db246f5e3b1.JPG.desc"
>>> fn
"Assortiment de différentes préparation à bases de légumes et féculents, bien sur servit avec de l'inf1f192008cca2209820a6db246f5e3b1.JPG.desc"
>>> len(fn)
141 Should be save, since it is below the crash limit (143). Or not? >>> with open(fn, 'w') as f:
... f.write("BOOOM")
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
OSError: [Errno 36] File name too long: "Assortiment de différentes préparation à bases de légumes et féculents, bien sur servit avec de l'inf1f192008cca2209820a6db246f5e3b1.JPG.desc" Simply said: To store an unicode char on the filesystem, you need more than one char. This is also the case if you store them in ram, of cause. So the "real length" of >>> len(fn.encode("utf-8"))
146 (The additional +5 chars came from the So the real bug is IMHO in line 1103 where the unicode length should be taken into account. Maybe in other locations too. I can prepare a PR but I'm not sure to test this stuff in an appropriate manner to avoid bugs like this in the future. |
As long as we write files to disk, there is no perfect solution other
than downloading a wiki's files only from a host which uses the same
filesystem as the wiki's server. The only alternative I can think of is
to require tar and append downloaded files straight from memory to the
tar file without ever using the local filesystem (maybe even 7z allows
to do so, but I'm not sure).
Robert Felten, 21/02/2017 12:47:
But the /real bug/ is a hidden in the unicode handling of Python:
On this I can certainly agree. Thanks for the diagnosis. I think a way
to test the bug is simply to download images on a wiki which has such
filenames, interrupt the download at some point and then resume the
download. The resume usually fails, probably for the reason you described.
|
I've created a PR, see #293. I hope I've fixed all bugs and did'nt break something. I've also created a new testcase file for stuff that can be tested offline, since I don't want to download several gigabytes very time I change something. Unfortunately the current codebase is not very testing friendly, for instance I see no way to get a the There was also another bug: if the |
Hi,
using
DumpGenerator 0.3.0-alpha
on4.4.0-59-generic #80-Ubuntu x86_64
and I ran into issues to dump from http://www.kochwiki.org/w/api.php to anencryptfs
'ed file system.Stacktrace:
The encryptfs file system supports filenames up to ~140 chars (source).
The
dumpgenerator.py
contains code to trim file names if they are too long - which failed here. Therefore I consider this as bug ;)The text was updated successfully, but these errors were encountered: