Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed issue #496 - sanitize/truncate bug #1361

Merged
merged 5 commits into from Jul 8, 2015
Merged

Conversation

LordSputnik
Copy link
Collaborator

Added loop to iterate over sanitize/truncate until stable. Enabled test_truncation_does_not_conflict_with_replacement test. See discussion in #496.

while path not in path_candidates:
path_candidates.append(path)
# Convert back to Unicode with extension removed
print(util.displayable_path(path))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stray debugging print.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes - will deal with that when I update the PR.

@LordSputnik
Copy link
Collaborator Author

I've added some replies to your comments - will tweak this PR hopefully at the weekend, or by Tuesday if not then.

@sampsyo
Copy link
Member

sampsyo commented Mar 18, 2015

Great! Thanks again! If you just leave a comment when you're happy with the next round, I'll get an email to come back and merge.

@LordSputnik
Copy link
Collaborator Author

Completely forgot I had this open still - will make the necessary changes and close tonight if I get time.

…st_truncation_does_not_conflict_with_replacement test. Fixes beetbox#496.
… class methods. Also made algorithm more predictable, and added an extra test.
@LordSputnik
Copy link
Collaborator Author

So, I finally finished this and I'm happy with it now. No stray prints, no loops to get stuck in and predictable behaviour. There are also two tests for it, which both pass.

The new simpler algorithm is to do one pass with sanitize using the user replacements, followed by truncate. Then, a second pass with the user replacements, and a test to see whether further truncation occurred. If it did, remove the user replacements, and sanitize and truncate (based on the assumption that none of the built in replacements will ever increase the path length - it might be good to throw in a test for that, but I can't see a good way to do it).

# Outputting Unicode.
extension = extension.decode('utf8', 'ignore')

first_stage_path =\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be first_stage_path, _ = ... to ignore the second returned value?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops! Just noticed the [0] below. Using the unpacking syntax can be marginally clearer, though.

@sampsyo
Copy link
Member

sampsyo commented Jul 7, 2015

Very nice! I like this version a lot.

So it's still possible for users to write "evil" replacements that increase the length of the path. This version essentially says, "if you do that, then your replacements may not always be obeyed". The only other alternatives I can see would be:

  • "If you do that, then you may get filenames that are too long." (the current behavior in beets)
  • "If you do that, then beets will print an error message and exit."

In any case, we should probably add some sort of warning to the documentation, right? The policy in this version can create invalid paths if the replacement enforces some OS requirement—for example, with trailing whitespace on Windows—and beets will crash later when it tries to create a file with that name. There's no way around something going wrong, though, so the best we can do is provide good documentation.

@LordSputnik
Copy link
Collaborator Author

Ok, I've updated the function docstrings and switched over to unpacking syntax as suggested.

I think it'd be a good idea to warn users in the documentation, and also to warn the user at runtime if replacements have been ignored. I guess this should be mentioned at https://beets.readthedocs.org/en/v1.3.13/reference/config.html#replace for the documentation, but not sure how the warning should be generated in beets? Do you have some sort of logging system, or would it just be implemented with print?

@sampsyo
Copy link
Member

sampsyo commented Jul 7, 2015

That sounds like the perfect place for the docs warning.

And yes, there is a logging system. For example: https://github.com/sampsyo/beets/blob/master/beets/library.py#L612
The only trick is that we try to keep the util module log-free (just pure utilities). So we may need to either move some code or return a flag so we can log the message from library.py.

@LordSputnik
Copy link
Collaborator Author

All done now, think it's ready to merge!

@sampsyo
Copy link
Member

sampsyo commented Jul 8, 2015

Awesome! This is looking great. I'll merge this.

The next step in this direction will be to address #1533/#1418. The _legalize_stage machinery, which is already responsible for both encoding and truncation, should do so now in an encoding-aware way. Hopefully, we can also find a way to avoid the encode-then-decode-again cycle that we have to resort to at the moment.

@sampsyo
Copy link
Member

sampsyo commented Jul 8, 2015

To be clear: it would be great if, as this continues to evolve, we can make the common case (no truncation) not require encoding, decoding to Unicode again, then re-encoding.

@sampsyo sampsyo merged commit 1f1e0f7 into beetbox:master Jul 8, 2015
sampsyo added a commit that referenced this pull request Jul 8, 2015
Fixed issue #496 - sanitize/truncate bug
sampsyo added a commit that referenced this pull request Jul 8, 2015
@sampsyo
Copy link
Member

sampsyo commented Jul 8, 2015

Just to reiterate: ✨ THANK YOU ✨ for working on such a nasty, deceptively complicated issue. Woohoo!

I'm adding you as a collaborator in case you want to do further maintenance.

@LordSputnik
Copy link
Collaborator Author

No problem :) I'm especially glad that this is fixed because it was affecting a couple of songs in my library.

Will help out where I can if I have some spare time. I'm planning to write a python module to handle platform-dependency in path names as a result of working on this issue (there doesn't seem to be one!), and that would hopefully solve the unicode character issues, as well as improve the existing solution applied here.

@sampsyo
Copy link
Member

sampsyo commented Jul 8, 2015

A library like that sounds incredibly useful. Perhaps it would be worthwhile to dovetail with pathlib, which has a (very basic) notion of platform-specific paths?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants