New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed issue #496 - sanitize/truncate bug #1361
Conversation
while path not in path_candidates: | ||
path_candidates.append(path) | ||
# Convert back to Unicode with extension removed | ||
print(util.displayable_path(path)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stray debugging print.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes - will deal with that when I update the PR.
I've added some replies to your comments - will tweak this PR hopefully at the weekend, or by Tuesday if not then. |
Great! Thanks again! If you just leave a comment when you're happy with the next round, I'll get an email to come back and merge. |
Completely forgot I had this open still - will make the necessary changes and close tonight if I get time. |
…st_truncation_does_not_conflict_with_replacement test. Fixes beetbox#496.
… class methods. Also made algorithm more predictable, and added an extra test.
So, I finally finished this and I'm happy with it now. No stray prints, no loops to get stuck in and predictable behaviour. There are also two tests for it, which both pass. The new simpler algorithm is to do one pass with sanitize using the user replacements, followed by truncate. Then, a second pass with the user replacements, and a test to see whether further truncation occurred. If it did, remove the user replacements, and sanitize and truncate (based on the assumption that none of the built in replacements will ever increase the path length - it might be good to throw in a test for that, but I can't see a good way to do it). |
# Outputting Unicode. | ||
extension = extension.decode('utf8', 'ignore') | ||
|
||
first_stage_path =\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be first_stage_path, _ = ...
to ignore the second returned value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! Just noticed the [0]
below. Using the unpacking syntax can be marginally clearer, though.
Very nice! I like this version a lot. So it's still possible for users to write "evil" replacements that increase the length of the path. This version essentially says, "if you do that, then your replacements may not always be obeyed". The only other alternatives I can see would be:
In any case, we should probably add some sort of warning to the documentation, right? The policy in this version can create invalid paths if the replacement enforces some OS requirement—for example, with trailing whitespace on Windows—and beets will crash later when it tries to create a file with that name. There's no way around something going wrong, though, so the best we can do is provide good documentation. |
Ok, I've updated the function docstrings and switched over to unpacking syntax as suggested. I think it'd be a good idea to warn users in the documentation, and also to warn the user at runtime if replacements have been ignored. I guess this should be mentioned at https://beets.readthedocs.org/en/v1.3.13/reference/config.html#replace for the documentation, but not sure how the warning should be generated in beets? Do you have some sort of logging system, or would it just be implemented with print? |
That sounds like the perfect place for the docs warning. And yes, there is a logging system. For example: https://github.com/sampsyo/beets/blob/master/beets/library.py#L612 |
…eraction in documentation.
All done now, think it's ready to merge! |
Awesome! This is looking great. I'll merge this. The next step in this direction will be to address #1533/#1418. The |
To be clear: it would be great if, as this continues to evolve, we can make the common case (no truncation) not require encoding, decoding to Unicode again, then re-encoding. |
Fixed issue #496 - sanitize/truncate bug
Just to reiterate: ✨ THANK YOU ✨ for working on such a nasty, deceptively complicated issue. Woohoo! I'm adding you as a collaborator in case you want to do further maintenance. |
No problem :) I'm especially glad that this is fixed because it was affecting a couple of songs in my library. Will help out where I can if I have some spare time. I'm planning to write a python module to handle platform-dependency in path names as a result of working on this issue (there doesn't seem to be one!), and that would hopefully solve the unicode character issues, as well as improve the existing solution applied here. |
A library like that sounds incredibly useful. Perhaps it would be worthwhile to dovetail with pathlib, which has a (very basic) notion of platform-specific paths? |
Added loop to iterate over sanitize/truncate until stable. Enabled test_truncation_does_not_conflict_with_replacement test. See discussion in #496.