Fixed issue #496 - sanitize/truncate bug #1361

LordSputnik · 2015-03-17T21:55:50Z

Added loop to iterate over sanitize/truncate until stable. Enabled test_truncation_does_not_conflict_with_replacement test. See discussion in #496.

sampsyo · 2015-03-18T02:14:17Z

beets/library.py

+            while path not in path_candidates:
+                path_candidates.append(path)
+                # Convert back to Unicode with extension removed
+                print(util.displayable_path(path))


Stray debugging print.

Ah yes - will deal with that when I update the PR.

LordSputnik · 2015-03-18T16:08:41Z

I've added some replies to your comments - will tweak this PR hopefully at the weekend, or by Tuesday if not then.

sampsyo · 2015-03-18T16:24:39Z

Great! Thanks again! If you just leave a comment when you're happy with the next round, I'll get an email to come back and merge.

LordSputnik · 2015-05-10T16:48:49Z

Completely forgot I had this open still - will make the necessary changes and close tonight if I get time.

…st_truncation_does_not_conflict_with_replacement test. Fixes beetbox#496.

… class methods. Also made algorithm more predictable, and added an extra test.

LordSputnik · 2015-07-07T00:43:25Z

So, I finally finished this and I'm happy with it now. No stray prints, no loops to get stuck in and predictable behaviour. There are also two tests for it, which both pass.

The new simpler algorithm is to do one pass with sanitize using the user replacements, followed by truncate. Then, a second pass with the user replacements, and a test to see whether further truncation occurred. If it did, remove the user replacements, and sanitize and truncate (based on the assumption that none of the built in replacements will ever increase the path length - it might be good to throw in a test for that, but I can't see a good way to do it).

sampsyo · 2015-07-07T00:57:16Z

beets/util/__init__.py

+        # Outputting Unicode.
+        extension = extension.decode('utf8', 'ignore')
+
+    first_stage_path =\


Should this be first_stage_path, _ = ... to ignore the second returned value?

Oops! Just noticed the [0] below. Using the unpacking syntax can be marginally clearer, though.

sampsyo · 2015-07-07T01:07:39Z

Very nice! I like this version a lot.

So it's still possible for users to write "evil" replacements that increase the length of the path. This version essentially says, "if you do that, then your replacements may not always be obeyed". The only other alternatives I can see would be:

"If you do that, then you may get filenames that are too long." (the current behavior in beets)
"If you do that, then beets will print an error message and exit."

In any case, we should probably add some sort of warning to the documentation, right? The policy in this version can create invalid paths if the replacement enforces some OS requirement—for example, with trailing whitespace on Windows—and beets will crash later when it tries to create a file with that name. There's no way around something going wrong, though, so the best we can do is provide good documentation.

LordSputnik · 2015-07-07T11:06:25Z

Ok, I've updated the function docstrings and switched over to unpacking syntax as suggested.

I think it'd be a good idea to warn users in the documentation, and also to warn the user at runtime if replacements have been ignored. I guess this should be mentioned at https://beets.readthedocs.org/en/v1.3.13/reference/config.html#replace for the documentation, but not sure how the warning should be generated in beets? Do you have some sort of logging system, or would it just be implemented with print?

sampsyo · 2015-07-07T14:35:51Z

That sounds like the perfect place for the docs warning.

And yes, there is a logging system. For example: https://github.com/sampsyo/beets/blob/master/beets/library.py#L612
The only trick is that we try to keep the util module log-free (just pure utilities). So we may need to either move some code or return a flag so we can log the message from library.py.

…eraction in documentation.

LordSputnik · 2015-07-07T23:35:00Z

All done now, think it's ready to merge!

sampsyo · 2015-07-08T01:07:31Z

Awesome! This is looking great. I'll merge this.

The next step in this direction will be to address #1533/#1418. The _legalize_stage machinery, which is already responsible for both encoding and truncation, should do so now in an encoding-aware way. Hopefully, we can also find a way to avoid the encode-then-decode-again cycle that we have to resort to at the moment.

sampsyo · 2015-07-08T01:09:59Z

To be clear: it would be great if, as this continues to evolve, we can make the common case (no truncation) not require encoding, decoding to Unicode again, then re-encoding.

Fixed issue #496 - sanitize/truncate bug

Fix #496, at long last.

sampsyo · 2015-07-08T01:23:49Z

Just to reiterate: ✨ THANK YOU ✨ for working on such a nasty, deceptively complicated issue. Woohoo!

I'm adding you as a collaborator in case you want to do further maintenance.

LordSputnik · 2015-07-08T14:02:00Z

No problem :) I'm especially glad that this is fixed because it was affecting a couple of songs in my library.

Will help out where I can if I have some spare time. I'm planning to write a python module to handle platform-dependency in path names as a result of working on this issue (there doesn't seem to be one!), and that would hopefully solve the unicode character issues, as well as improve the existing solution applied here.

sampsyo · 2015-07-08T15:14:11Z

A library like that sounds incredibly useful. Perhaps it would be worthwhile to dovetail with pathlib, which has a (very basic) notion of platform-specific paths?

sampsyo reviewed Mar 18, 2015
View reviewed changes

LordSputnik force-pushed the master branch from 5cf0d71 to 4ef44f6 Compare May 10, 2015 17:14

Added loop to iterate over sanitize/truncate until stable. Enabled te…

d07c8fd

…st_truncation_does_not_conflict_with_replacement test. Fixes beetbox#496.

LordSputnik force-pushed the master branch from 3424b82 to d07c8fd Compare July 6, 2015 13:57

LordSputnik added 2 commits July 7, 2015 01:17

Rewrote path legalization code to be two module functions rather than…

de17d00

… class methods. Also made algorithm more predictable, and added an extra test.

Remove unused import.

22b2527

sampsyo reviewed Jul 7, 2015
View reviewed changes

Minor changes suggested suggested in PR comments.

b479982

Added warning message and paragraph about replacements/max length int…

1f1e0f7

…eraction in documentation.

sampsyo merged commit 1f1e0f7 into beetbox:master Jul 8, 2015

sampsyo added a commit that referenced this pull request Jul 8, 2015

Merge pull request #1361 from LordSputnik/master

d766b6b

Fixed issue #496 - sanitize/truncate bug

sampsyo added a commit that referenced this pull request Jul 8, 2015

Expand a little on the docs for #1361

39809a8

Fix #496, at long last.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixed issue #496 - sanitize/truncate bug #1361

Fixed issue #496 - sanitize/truncate bug #1361

LordSputnik commented Mar 17, 2015

sampsyo Mar 18, 2015

LordSputnik Mar 18, 2015

LordSputnik commented Mar 18, 2015

sampsyo commented Mar 18, 2015

LordSputnik commented May 10, 2015

LordSputnik commented Jul 7, 2015

sampsyo Jul 7, 2015

sampsyo Jul 7, 2015

sampsyo commented Jul 7, 2015

LordSputnik commented Jul 7, 2015

sampsyo commented Jul 7, 2015

LordSputnik commented Jul 7, 2015

sampsyo commented Jul 8, 2015

sampsyo commented Jul 8, 2015

sampsyo commented Jul 8, 2015

LordSputnik commented Jul 8, 2015

sampsyo commented Jul 8, 2015

Fixed issue #496 - sanitize/truncate bug #1361

Fixed issue #496 - sanitize/truncate bug #1361

Conversation

LordSputnik commented Mar 17, 2015

sampsyo Mar 18, 2015

Choose a reason for hiding this comment

LordSputnik Mar 18, 2015

Choose a reason for hiding this comment

LordSputnik commented Mar 18, 2015

sampsyo commented Mar 18, 2015

LordSputnik commented May 10, 2015

LordSputnik commented Jul 7, 2015

sampsyo Jul 7, 2015

Choose a reason for hiding this comment

sampsyo Jul 7, 2015

Choose a reason for hiding this comment

sampsyo commented Jul 7, 2015

LordSputnik commented Jul 7, 2015

sampsyo commented Jul 7, 2015

LordSputnik commented Jul 7, 2015

sampsyo commented Jul 8, 2015

sampsyo commented Jul 8, 2015

sampsyo commented Jul 8, 2015

LordSputnik commented Jul 8, 2015

sampsyo commented Jul 8, 2015