Add musicbrainz id option to importer #1808

diego-plan9 · 2016-01-11T17:14:07Z

Work in progress for solving the recently-revived issue #170, as I have also been bitten by the long time it takes to import extremely large albums and matching against all candidates - hopefully this can be a way of alleviating the amount of time needed in some scenarios.

I'm mainly opening the pull request at this early stage in order to discuss the level where the use of the ID should be handled. Currently it is done inside Task.lookup_candidates(), but it feels a bit out of place to use config['import']['musicbrainz_id'] there, and might convolute a bit readability and isolation. I'm wondering if this would be better moved to other place, such as:

the lookup_candidates() pipeline stage?
pass it as a parameter to the Task constructor from the ImportTaskFactory?
other options?

I have also toyed with the idea of allowing several IDs to be specified (probably by using the -m option several times) in order to have a "batch mode" of sorts, but it might end up causing the opposite effect (ie. performing the matching against all the user-specified ids for every album). However, if we can assume that the user is highly confident that the items are reasonably clean, perhaps we can come up with some "light and quick" way of discarding all but one ID for the current item (for example, simply by counting the files, or using the file names somehow, etc). Might be a long shot, but I'd love to hear some opinions.

Acknowledged TODOs: revise docstrings, revise UI strings and names, add tests, write documentation.

* Add '-m', '--musicbrainzid' option to the import command, allowing the user to specify a single MusicBrainz ID to be used for the candidate lookup instead of trying to find suitable candidates. * Modify lookup_candidates() of ImportTask and SingletonImportTask to use the musicbrainz id if present.

sampsyo · 2016-01-11T19:04:45Z

Fantastic. Thanks for getting started on this.

On where the feature should "live": What you've done here is definitely the most straightforward change we can make, which is a good thing. And since the intended effect of the -m option is to globally instruct the importer to use a given MBID, it's not too crazy as a concept, either. So I wouldn't be sad if this was what we did in the end.

The other option that seems interesting to me is a parameter on the Task. This feels like it might afford a bit more flexibility: I don't exactly know what it would look like, but you can imagine designing a feature that would assign a different MBID to each task based on some external guess. Maybe it's worth looking into—we can see how messy it would get.

Multiple MBIDs seem like a natural extension too! It would also apply if, for example, you know you have one of a few different albums but aren't quite sure which.

* Modify the "--musicbrainzid" argument to the importer so multiple IDs can be specified by the user instead of a single one. * Revise autotag.match.tag_album and autotag.match.tag_item signature to expect a list of IDs (search_ids) instead of a single one (search_id), and add logic for handling and returning multiple matches for those IDs. * Update calls to those functions in other parts of the code.

* Add tests for the "--musicbrainzid" argument (one/several ids for matching an album/singleton; direct test on task.lookup_candidates() for album/singleton).

diego-plan9 · 2016-01-19T21:16:51Z

Thanks for the input, as usual! Slow days for me, which means this pull requests continues to be an unpolished work in progress, but I have made an attempt to add the option for specifying multiple MusicBrainz IDs instead of a single one, plus some basic tests.

I resorted to modifying the signatures of match.tag_album() and match.tag_item() so they expect a list of IDs instead of a single id (search_id -> search_ids), adjusting the internal logic accordingly so it loops through the IDs preserving the matches found. This is a bit more intrusive that I'd like, but the alternatives did not look too appealing to me (keep the signature and add some code for handling single items vs. lists inside the function, or wrap the calls into a light "disambiguation" function of sorts) - I hope it's an acceptable implementation. As a result of modifying the signature, some calls in other places of the code had to be adjusted accordingly - I'll do another round to make sure no unexpected problems occur.

A note as well on the tests: at the moment they query MusicBrainz several times with very similar queries (and one of them is failing on Travis due to the order of the arguments it seems), as at this point they are mainly intended to be an aid during development. I do intend to clean them up before merging so they mock the calls to MusicBrainz using a solution inspired on the AutotagStub or the test_mb mechanisms.

diego-plan9 · 2016-01-19T21:25:49Z

beets/ui/commands.py

@@ -787,7 +787,7 @@ def choose_item(self, task):
                search_id = manual_id(True)
                if search_id:
                    candidates, rec = autotag.tag_item(task.item,
-                                                       search_id=search_id)
+                                                       search_ids=[search_id])


A related side effect (provided the current implementation of multiple IDs stays more or less the same) is that it would take almost no effort (mainly a matter of splitting search_id) to allow the user to specify several IDs during the import when selecting the enter Id prompt choice. Would that be a good idea?

I'm for it!

Edit: But, I should add, I don't think it's critical—we could come back around to this later, of course.

* Replace the entities used on ImportMusicBrainzIdTest mocking the calls to musicbrainzngs.get_release_by_id and musicbrainzngs.get_recording_by_id instead of querying MusicBrainz. * Other cleanup and docstring fixes.

diego-plan9 · 2016-01-20T16:32:13Z

Fixed the tests so they mock the queries to MusicBrainz, and tidied them up in general. Travis is still failing on a test with the singletons that I can't reproduce on my environment (yet).

* Fix an issue that caused the candidates for a singleton not to be returned ordered by distance from autotag.match.tag_item(), when searching multiple MusicBrainz ids (ie. several "--musicbrainzid" arguments). The candidates are now explicitely reordered before being returned and before the recommendation is computed. * Fix test_importer.mocked_get_recording_by_id so that the artist is nested properly (and as a result, taken into account into the distance calculations).

sampsyo · 2016-01-20T19:27:05Z

I resorted to modifying the signatures of match.tag_album() and match.tag_item() so they expect a list of IDs instead of a single id (search_id -> search_ids),

This sounds perfect to me. It's a generalization of the old functionality to match the new feature. Wonderful!

sampsyo · 2016-01-20T19:33:54Z

Also:

Travis is still failing on a test with the singletons that I can't reproduce on my environment (yet).

That is certainly weird—I also note that this seems to be failing in the py27 environment but not the pypy one. That certainly shouldn't happen…

I also tried running the tests on your branch locally and couldn't trigger the failure. Seriously weird—any chance this is some crazy, random fluke?

sampsyo · 2016-01-20T19:34:40Z

beets/autotag/match.py

@@ -370,7 +370,7 @@ def _add_candidate(items, results, info):


 def tag_album(items, search_artist=None, search_album=None,
-              search_id=None):
+              search_ids=[]):


We should probably describe search_ids in the docstring. (Although I note we didn't document search_id before—bad @sampsyo!)

diego-plan9 · 2016-01-20T19:41:45Z

This sounds perfect to me. It's a generalization of the old functionality to match the new feature. Wonderful!

Great! I tend to be a bit hesitant when modifying function signatures and the likes, but I'm glad it is looking good.

That is certainly weird—I also note that this seems to be failing in the py27 environment but not the pypy one. That certainly shouldn't happen…

I also tried running the tests on your branch locally and couldn't trigger the failure. Seriously weird—any chance this is some crazy, random fluke?

I have just pushed a commit after further investigating the failing test: it seems that I introduced a bug that caused the candidates for a singleton not to be guaranteed to be returned ordered by their distance, and would probably have gone unnoticed if it were not for the weird fail - my apologies. I'm guessing the lucky part was that the candidates dict "order" got reversed on that particular environment, or something similar?

sampsyo · 2016-01-20T19:43:54Z

Aha! That actually makes sense. I think it's due to Python's newish random hash function, which causes dictionaries to iterate in different orders on every run. http://stackoverflow.com/questions/27522626/hash-function-in-python-3-3-returns-different-results-between-sessions

* Style cleanup and fixes for the "--musicbrainzid" import argument. * Allow the input of several IDs (separated by spaces) on the "enter Id" importer prompt. * Add basic documentation.

diego-plan9 · 2016-01-21T16:22:13Z

I'm glad the mystery has been sorted out, I was rather puzzled myself for a while!

Thanks as well for the review and the notes, I have hopefully addressed the issues on the latest commit and also added some basic documentation. As for the ability of specifying multiple IDs when using the interactive import enter Id prompt discussed on #1808 (comment), I have opted for including it along with a mention on tagger.rst (that was missing a comment on the enter Id choice altogether).

Next step will be an attempt to make mb_ids a parameter of the Task, as mentioned on the initial comments - if it ends up being too messy, we can safely discard it.

A question I came across while revising documentation and other issues: I have personally never used the discogs backend and unfortunately could not locate any discogs-specific unit tests, so could you let me know if you envisage any conflicts when using discogs instead of musicbrainz, or if there are some manual tests that need to be performed? There seems to be some potential interference between these changes and the features provided by the discogs plugin.

* Store the user-supplied MusicBrainz IDs (via the "--musicbrainzid" importer argument) on ImporTask.task.musicbrainz_ids during the lookup_candidates() pipeline stage. * Update test cases to reflect the changes.

diego-plan9 · 2016-01-21T20:42:58Z

4eedd2b contains the aforementioned attempt to make the feature a parameter of the Task. In an effort for keeping the changes to a minimal, I took some shortcuts: in particular, sacrificed a bit of explicitness with the actual Task.musicbrainz_ids attribute (no "setter"-like method, or constructor parameter, etc), and also placed the initialization of the attribute directly on the lookup_candidates pipeline stage.

A more correct way of making these changes useful to the eventual "MusicBrainzID <-> task" mapper feature/mechanism would probably involve creating another pipeline stage, or similar changes to the Factory or Session. However, as it is not really clear to me yet how the feature would work (nor if it would be a really useful addition at this point), I feel these changes would end up being placeholders and candidates for being replaced entirely once the feature is discussed and implemented.

As usual, I'd love to hear your thoughts, and make any further changes as desired!

sampsyo · 2016-01-21T22:02:47Z

I have personally never used the discogs backend and unfortunately could not locate any discogs-specific unit tests, so could you let me know if you envisage any conflicts when using discogs instead of musicbrainz

This should be OK. Since the ID lookups are hidden behind the *_for_id function in beets.autotag.hooks, the Discogs backend shouldn't be affected. (Further evidence: this change didn't require any updates in beets.autotag.mb.)

sampsyo · 2016-01-21T22:03:34Z

beets/importer.py

+
+    # Restrict the initial lookup to IDs specified by the user via the -m
+    # option. Currently all the IDs are passed onto the tasks directly.
+    task.musicbrainz_ids = session.config['musicbrainz_ids'].as_str_seq()


Looks great for now. As you note in your other comment, we can move this around later if we want a more sophisticated way of populating this.

* Rename the importer argument and related variables to make it more generic, as the feature should be independent of the backend used and not restricted to MusicBrainz. * Update documentation and docstrings accordingly. * Add changelog entry.

diego-plan9 · 2016-01-22T15:49:39Z

This is a tiny nitpick, but these technically don't need to be MBIDs. Other backends can also interpret these IDs (for now, just the Discogs plugin).

Alas, I don't have a great idea for another option name since -i and -I are both taken. Perhaps -f for --force-id? Or maybe -d as a second-choice contraction of --id? Or -S for --search-id?

Sounds like a more than fair nitpick, and indeed derived from my initial "MusicBrainz-centric" approach. I went with the -S option in order to try to adhere the "first letter of the long option" policy to the extent possible - -f sounded tempting as well, but I preferred to reserve the letter for eventual arguments that might do other, more intrusive "forcing"s.

I also took the chance to modify the documentation and docstrings, as well as rename the variables used, in order to make it clearer that the feature should be backend-agnostic. For the docs, it basically ended up being a s/MusicBrainz/metadata backend/ modification: perhaps there is a better term?

I have also added an entry to the changelog, in the hopes of having the pull request ready for the final review!

sampsyo · 2016-01-22T18:42:52Z

docs/reference/cli.rst

@@ -132,6 +132,11 @@ Optional command flags:
  option. If set, beets will just print a list of files that it would
  otherwise import.

+* If you already have a metadata backend ID that matches the items to be
+  imported, you can instruct beets to restrict the search to that ID instead of
+  searching for other candidates by using the ``--search_id SEARCH_ID`` option.


Itsy-bitsy typo: --search-id, not --search_id.

sampsyo · 2016-01-22T18:44:02Z

Fantastic! This all looks perfect. I found one incredibly tiny typo, but this is otherwise 100% ready. Feel free to hit the big green button. 🎉

This is a surprisingly frequent request, so I'm thrilled to have this implemented!

diego-plan9 · 2016-01-22T19:13:27Z

Thanks for double-checking, as usual, and my apologies for the careless typo! I'm also glad it looks like the feature will be put to good use 👍

I was planning on merging when the travis build was completed, but seems like it failed due some network problems / truncated download of sorts - I'm not sure if there is a way to relaunch the job and get rid of the little red "x" next to the commit. Strange luck with the builds on this pull requests, it seems!

sampsyo · 2016-01-22T22:28:14Z

Bad luck indeed. I have the power to restart a build -- I'll see if I can extend that to GitHub collaborators too.

sampsyo · 2016-01-22T23:04:30Z

I didn't figure that out, but rerunning the build did work! ✨ Merging now.

Add musicbrainz id option to importer

diego-plan9 added 3 commits January 19, 2016 21:43

Add tests for importer musicbrainz id argument

865be11

* Add tests for the "--musicbrainzid" argument (one/several ids for matching an album/singleton; direct test on task.lookup_candidates() for album/singleton).

Merge remote-tracking branch 'upstream/master' into mbid

c12e974

diego-plan9 reviewed Jan 19, 2016
View reviewed changes

Avoid querying MB during ImportMusicBrainzIdTest

4e5ddac

* Replace the entities used on ImportMusicBrainzIdTest mocking the calls to musicbrainzngs.get_release_by_id and musicbrainzngs.get_recording_by_id instead of querying MusicBrainz. * Other cleanup and docstring fixes.

sampsyo reviewed Jan 20, 2016
View reviewed changes

Cleanup and documentation for MB_id importer arg

b526227

* Style cleanup and fixes for the "--musicbrainzid" import argument. * Allow the input of several IDs (separated by spaces) on the "enter Id" importer prompt. * Add basic documentation.

Store user-supplied MB ids on the Tasks

4eedd2b

* Store the user-supplied MusicBrainz IDs (via the "--musicbrainzid" importer argument) on ImporTask.task.musicbrainz_ids during the lookup_candidates() pipeline stage. * Update test cases to reflect the changes.

sampsyo reviewed Jan 21, 2016
View reviewed changes

sampsyo reviewed Jan 22, 2016
View reviewed changes

Fix typo on importer search by id documentation

4f51302

sampsyo added a commit that referenced this pull request Jan 22, 2016

Merge pull request #1808 from diego-plan9/mbid

cb447f7

Add musicbrainz id option to importer

sampsyo merged commit cb447f7 into beetbox:master Jan 22, 2016

sampsyo mentioned this pull request Jan 25, 2016

Allow to specify the musicbrainz ID when importing without having to wait for beet to search first #170

Closed

diego-plan9 deleted the mbid branch October 17, 2016 15:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add musicbrainz id option to importer #1808

Add musicbrainz id option to importer #1808

diego-plan9 commented Jan 11, 2016

sampsyo commented Jan 11, 2016

diego-plan9 commented Jan 19, 2016

diego-plan9 Jan 19, 2016

sampsyo Jan 20, 2016

diego-plan9 commented Jan 20, 2016

sampsyo commented Jan 20, 2016

sampsyo commented Jan 20, 2016

sampsyo Jan 20, 2016

diego-plan9 commented Jan 20, 2016

sampsyo commented Jan 20, 2016

diego-plan9 commented Jan 21, 2016

diego-plan9 commented Jan 21, 2016

sampsyo commented Jan 21, 2016

sampsyo Jan 21, 2016

diego-plan9 commented Jan 22, 2016

sampsyo Jan 22, 2016

sampsyo commented Jan 22, 2016

diego-plan9 commented Jan 22, 2016

sampsyo commented Jan 22, 2016

sampsyo commented Jan 22, 2016

Add musicbrainz id option to importer #1808

Add musicbrainz id option to importer #1808

Conversation

diego-plan9 commented Jan 11, 2016

sampsyo commented Jan 11, 2016

diego-plan9 commented Jan 19, 2016

diego-plan9 Jan 19, 2016

Choose a reason for hiding this comment

sampsyo Jan 20, 2016

Choose a reason for hiding this comment

diego-plan9 commented Jan 20, 2016

sampsyo commented Jan 20, 2016

sampsyo commented Jan 20, 2016

sampsyo Jan 20, 2016

Choose a reason for hiding this comment

diego-plan9 commented Jan 20, 2016

sampsyo commented Jan 20, 2016

diego-plan9 commented Jan 21, 2016

diego-plan9 commented Jan 21, 2016

sampsyo commented Jan 21, 2016

sampsyo Jan 21, 2016

Choose a reason for hiding this comment

diego-plan9 commented Jan 22, 2016

sampsyo Jan 22, 2016

Choose a reason for hiding this comment

sampsyo commented Jan 22, 2016

diego-plan9 commented Jan 22, 2016

sampsyo commented Jan 22, 2016

sampsyo commented Jan 22, 2016