Open
Description
In #5224 (comment), we had a lenghty discussion on a refactoring of filename truncation. Ultimately, the issue is that we don't currently understand the history and necessity of that code very well. It also appears that the current code is not quite right, but maybe in a way that doesn't really matter in most cases.
I'm opening this issue to keep track of some thoughts around this. Some observations and context:
- When
Item.destination(basedir=...)
is used with a non-None
basedir
, anystatvfs
calls that are done for path truncation should really check the filesystem atbasedir
, not always the library directory. This has been the case since the commit that introduced the use ofstatvfs
(but the code looks like it might have been unintentional). art_destination
might also be affected (but is probably less likely to generate long file names in the first place).- With PEP529, the Windows fs encoding is utf8, but path truncation should be based on Unicode code points. Thus, our truncation is not really doing what it should, probably shortening paths unnecessarily most of the time. A fix on Windows might be always truncate the Unicode
str
instead of thebytes
representation. What happens on Linux for FAT/NTFS file systems, though? - (Some) People care about very long file names: Filename/path lengths #3383
- We use the method described here to allow long file names on Windows. Thus, on all platforms path limits are component length limits.
- Some analysis on length limits on Linux from stackexchange:
In summary, how we determine filename length limits, and how truncate names, can probably be improved. It is unclear to me right now how rare situations are where this really matters.