Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decompose unicode paths and unidecode each component. Fixes #1087 #1159 #1541 #2286 #2303

Merged
merged 12 commits into from
Dec 9, 2016

Conversation

tweitzel
Copy link
Contributor

@tweitzel tweitzel commented Dec 3, 2016

if asciify_paths on: breakdown incoming paths, run unidecode on each one, substitute os.path.sep into beets.config['path_sep_replace'], recompose paths with os.path.sep.

Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fantastic; thank you! ✨

I noticed that the PR adds a new test file, unicode_path_sep.mp3, but it's not used anywhere that I can see. Any chance that was a mistake?

self.lib.replacements = [(re.compile(u'"'), u'q')]
self.lib.replacements = [
(re.compile(u'"'), u'q'),
(re.compile(os.path.sep), config['path_sep_replace'].get())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this addition to the replacements necessary for this test? I don't quite see how it works…

It might also be nice to have a separate test just focused on the new functionality so we can see exactly what's being tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was needed since the test just operates on a string, not a path. As you point out and I notice in hindsight, this test doesn't actually cover the whole code path that I modified, so it's back to the drawing board.

@tweitzel
Copy link
Contributor Author

tweitzel commented Dec 4, 2016

The unicode_path_sep.mp3 was going to be used to make an end-to-end test of asciify_paths, but as a test newbie it was hard for me to figure out how to write the test, so I went with modifying the existing asciify test. If you can enlighten me as to what test frameworks are being used so I can look for documentation about them, I can try writing a more fully-featured test.

@sampsyo
Copy link
Member

sampsyo commented Dec 5, 2016

Aha; thanks for explaining.

How about this test? Nice and simple, and I think it tests the change you've made.

def test_asciify_character_expanding_to_slash(self):
    config['asciify_paths'] = True
    self.lib.directory = b'lib'
    self.lib.path_formats = [(u'default', u'$title')]
    self.i.title = u'ab\xa2d'
    self.assertEqual(self.i.destination(), np('lib/abC_d'))

@tweitzel
Copy link
Contributor Author

tweitzel commented Dec 5, 2016 via email

@sampsyo
Copy link
Member

sampsyo commented Dec 5, 2016

Cool; thanks for looking into it!

No—if this unit test suffices, there's no need for the new test MP3.

@tweitzel
Copy link
Contributor Author

tweitzel commented Dec 6, 2016

I added the test you came up with off the top of your head because you're an old wizard from a kung fu movie and it passed all my tests; something was up with one of the travis tests, though.

@sampsyo
Copy link
Member

sampsyo commented Dec 6, 2016

It me.

cthd

Anyway, something weird is up with Travis… I'm not sure what to make of it, but I don't think it's related. But actually, the Windows build on AppVeyor turned up something interesting: https://ci.appveyor.com/project/beetbox/beets/build/1141/job/mncrvbw3y1425e4e

In particular, the path still has backslashes where there were slashes introduced by "asciification." I think this means that the code should probably check for both os.sep and os.path.altsep, the latter of which is / on Windows and None on Unix. Here's where we do the same thing in dbcore:

beets/beets/dbcore/db.py

Lines 74 to 77 in ae5e55c

sep_repl = beets.config['path_sep_replace'].as_str()
for sep in (os.path.sep, os.path.altsep):
if sep:
value = value.replace(sep, sep_repl)

Perhaps that should become a utility function?

@tweitzel
Copy link
Contributor Author

tweitzel commented Dec 6, 2016

Yeah, the paths should all go through the same asciification process. The thing I'm not sure about (because I don't have quite an encyclopedic knowledge of the code), is at what stage the dbcore snippet comes into play - before or after the library asciify_paths bit? Is the output of dbcore used for subpath value, vice versa, or neither?

@sampsyo
Copy link
Member

sampsyo commented Dec 6, 2016

Good question. It happens before. Here's the deal: dbcore is responsible for filling out templates and constructing paths in a general, abstract way; then the library.py stuff uses that to make beets-specific tweaks. In this case, the asciify_paths option is a post-processing step on the results produced by dbcore.

@tweitzel
Copy link
Contributor Author

tweitzel commented Dec 6, 2016

It sounds to me like the question is: do we expect that paths are always going to be calculated based on asciify_paths as they move around inside beets, or have that applied only at the last minute as the library is about to write to the file system? Right now it looks like beets considers the output of dbcore as the path as objects are slung around by the longshoremen, and then at the last minute it gets asciified as part of library. If my understanding is right, we've got three options:

should asciify_paths be done in dbcore so everything is consistent all the time, which might open up more bugs later?
should asciify_paths become a utility function so other code paths and plugins that need it for some reason* can use it? (My vote)
should my existing pull be modified to include the os.path.altsep option and call it a day?

*I can see a plugin that exports to a fat32 usb stick or something using asciify_paths, but leaving the main library on a unicode-aware volume.

The other question is one of etiquette: barring option 3, do you want this pull request closed and a new one opened for the refactor, or keep working in this PR on my branch so context is kept?

@sampsyo
Copy link
Member

sampsyo commented Dec 6, 2016

I'd agree with the longshoreman-slinging summary except for an important difference: I don't think anyone sees the direct output of dbcore's path formatting. For beets, all generated paths come from that destination method where asciiification is applied. So I don't think there's any danger that "raw" paths will leak out anywhere.

So yes, let's make it a utility function! In fact, we probably want to use it for the plain %asciify path function too, also in library.py: https://github.com/beetbox/beets/blob/master/beets/library.py#L1431
That's a strong argument for a reusable function here.

About PR etiquette, I don't have a strong opinion either way, but I generally find it easiest to keep on the same PR to continue the thread instead of switching to a new one.

* make asciifying handle both os.sep and os.altsep (testing needed as I
  don't have a windows box handy)
* make %asciify{} use the same code path as the asciify_paths goop.
* added a discrete test to %asciify{} so my life acts as a warning to
  others
* changelog note now with 80% less antihistamine-induced runon sentences
@tweitzel
Copy link
Contributor Author

tweitzel commented Dec 8, 2016

Aside from the one travis thing which I believe is fixed in a @jrobeson PR (per IRC), this looks good to go. Third time's a charm?

@sampsyo
Copy link
Member

sampsyo commented Dec 9, 2016

Yep, that'd be #2310.

This looks great! Thank you for sticking with this. I'll merge this now. ✨ 🚀 All the folks watching the related tickets should get to thanking you profusely.

@sampsyo sampsyo merged commit 4bbb05b into beetbox:master Dec 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants