Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch artists, performers, engineers etc flexibly #3589

Closed
wants to merge 58 commits into from

Conversation

dosoe
Copy link
Contributor

@dosoe dosoe commented May 14, 2020

An attempt to solve #1547 by fetching all artists involved in a recording. The artists have tag of the form mb type - role (credit). For example, on this recording https://musicbrainz.org/recording/38fce51c-b8c9-4369-92d3-c354a8e09fd4 , tags would look like:
mb vocals - bass vocals (Filippo) and mb vocals sort - bass vocals (Filippo) for the sort name, or mb vocals - choir vocals
If several artists have the same type, role and credit, then the tag will contain a list of artists separated by ; . To avoid cluttering over time, I also change the mbsync plugin to remove all tags that start with mb (with a space) that are not recovered when fetching the data again. This fetches all performers, engineers, arrangers (replacing the arranger tag). Some type names (like orchestra) have been modified, just for the convenience of having orchestra instead of performing orchestra or recording engineer instead of recording.

Finding all tags that start with mb is easy on singletons, but for albums I don't know how to do it in the way it is implemented in mbsync. @sampsyo could you help me for that please? @bearcatsandor does this correspond to your idea of fetching all metadata? There is still more data, like recording place and date, but that's a start, and once this works, it should be easy to adapt it for more tags.

@sampsyo
Copy link
Member

sampsyo commented May 14, 2020

Thanks for getting this started! This is pretty tricky… I think an important thing to focus on here is that this is a pretty specialized use case, and I'm not sure most users will want to have this enabled by default. We can consider it, under the hypothesis that extra tags may not bother anybody as long as they are not written to the file, but we may also want to consider creating some kind of new plugin hook that lets the MusicBrainz data extraction be extended. This way, users could customize the way MB data gets "flattened" into tags.

I'm also uncertain about the idea of using a naming convention to do synchronization—i.e., to "reserve" all tags starting with mb . It's certainly worth considering, especially with a more unique prefix that ties fields more obviously to a specific plugin, for example. But maybe it would be better to just use a built-in list of fields?

Finally, I definitely think that beets field names should not contain spaces. Otherwise, it's pretty hard to write things like template strings that use the fields.

@dosoe
Copy link
Contributor Author

dosoe commented May 14, 2020

A built-in list of fields is not a good idea in my opinion. That would require creating a field for every 200+ instrument of MB, the overwhelming majority of them empty, not even talking about all the credits (in opera, typically), and now that we have flexible tags why not use them? In #1547 (comment) I interpret it as reserving certain fields for a plugin and singling them out by a prefix seems the easiest way of doing so.
I wouldn't know how to create hooks to flatten the data. The data on performers seems cut into 3 parts: the type (performer, vocals, orchestra, instrument, sound engineer, recording engineer, balance engineer, producer and many others), the role (name of the instrument, the voice, some additional stuff like solo, additional or invited) and the credit (usually the name of the role for operas, sometimes some precision on the instrument or other similar stuff). It then has a name, a sort name, an id (sometimes also an id for the instrument) a date range and a few more fields. Of course, not all the fields are populated all the time. It should be possible to make something similar to path with these bricks, which would be effectively some tag name configuration, but I have very little experience on how to do this.
I first thought about making a separate plugin and pretty much copy-pasting album_info and track_info (from mb.py) with modifications but that's not great, it duplicates the code needlessly. Maybe put an optional boolean argument in them to fetch additional MB data or not?

@dosoe
Copy link
Contributor Author

dosoe commented May 14, 2020

I can't speak for everyone, but as far as I'm concerned, the use case would just be to use beets to check on the info on a recording, not to use them to name files.

@dosoe
Copy link
Contributor Author

dosoe commented Jun 7, 2020

Hi! I changed the code a little. A few quality of life improvements: beet mbsync will now also show changes on singletons. Another one: for releases with more than 500 tracks, when fetching an album with musicbrainzngs the simplified version of the album is used, that doesn't contain any works or performers. To get those, these releases need to be fetched track by track, which is much longer. I figure that the people that are interested in this kind of additional data would want to get this information and would be ready to wait a little more for their updates.
I tied the additional data to an option in mbsync (option -I --more_info, the name is temporary until I find a better one) and all tags populated this way will start with mbsync_ so all tags like this are reserved for mbsync. I also replaced all spaces with underscores.

@dosoe
Copy link
Contributor Author

dosoe commented Jun 7, 2020

This also uses two additional optional arguments in track_info and all functions that are passing over stuff to it: more_info says if we want to fetch the performers and by_rec says if the album has more than 500 tracks and the tracks need to be fetched individually. This seems to make the tests fail.

@dosoe
Copy link
Contributor Author

dosoe commented Jun 23, 2020

some kind of new plugin hook

What do you mean? The way it is now is that there is an additional argument in beets core that allows to choose whether one wants to get this additional data that is activated by a plugin, but that means changes in beets core. Do you have something else in mind?

@sampsyo
Copy link
Member

sampsyo commented Jun 23, 2020

What I meant there is that we could add a brand-new event that allows plugins to process MusicBrainz responses and extract additional data. This would let the task of MusicBrainz-to-beets data extraction evolve independently of the beets core, which would remain somewhat minimal. If other people in the future come up with other swaths of additional data they'd like to fetch from MusicBrainz, we won't need to add new flags like even_more_info and still_more_info_yet for each new set of data.

@dosoe
Copy link
Contributor Author

dosoe commented Jun 24, 2020

What I meant there is that we could add a brand-new event that allows plugins to process MusicBrainz responses and extract additional data.

Ah ok! Now I understand. I had a little thought at this, but I'm running into a problem: I can get the response of MB by setting up an event and process the response, but I somehow need the items linked to this response to apply the new tags to it. So I need both the MB response and the item, but those two are created in completely different parts of the code.

@dosoe
Copy link
Contributor Author

dosoe commented Jun 24, 2020

For example, I tried to put up an event that gives the MB response: 'trackdata_recieved':

res = musicbrainzngs.get_recording_by_id(trackid, TRACK_INCLUDES)
beets.plugins.send(u'trackdata_recieved', info=res['recording'])

Then I fetch the response in mbsync: self.register_listener('trackdata_recieved', self.track_performers) . This triggers a function track_performers:

def track_performers(self, recording):
        info = autotag.hooks.TrackInfo(...)
        artists = {}
...
        for key in artists:
            info[key] = u'; '.join(artists[key])

        info.decode()

Now I want to apply this info to the item it belongs to. How do we get it? I could create an event that triggers when an item is called, but there would be no guaranty that this would be the same item than the one we just got the MB data for. How do these triggers work btw? Does the whole program stop to process it when it triggers? Can track_performers return data? So far it doesn't seem so, but I'm not sure.

@sampsyo
Copy link
Member

sampsyo commented Jun 24, 2020

Does the whole program stop to process it when it triggers? Can track_performers return data? So far it doesn't seem so, but I'm not sure.

It just turns into a normal function call, so yes, the rest of beets (in the current thread) "stops" while the event is handled. And yes, they can return values! 🎉 Here's an example from elsewhere in beets:

beets/beets/ui/commands.py

Lines 865 to 866 in 31855a9

extra_choices = list(chain(*plugins.send('before_choose_candidate',
session=self, task=task)))

So a sensible thing to do here might be to just return a dict with all the information extracted. Then the mb module can take that stuff and assign it onto the *Info object, which will later be applied to the Items.

@dosoe
Copy link
Contributor Author

dosoe commented Jul 11, 2020

Thanks! With the end of the lockdown and the start of the final phase of my PhD, I won't be able to work on this too much in the next weeks, but I'm keeping an eye on it.

@Holzhaus
Copy link
Contributor

When writing performer information to files, does this use the same tag mapping that is used by Picard? We shouldn't roll our own tags, because then support by media players will be nonexistent. Picard's tags do have some support, at least they are shown correctly by mpd/ncmpcpp.

@aereaux
Copy link
Contributor

aereaux commented Jul 1, 2021

Thanks for the fix, but now I get this error:

Traceback (most recent call last):
  File "/home/user/.local/bin//beet", line 23, in <module>
    beets.ui.main()
  File "/home/user/.local/lib/beets/beets/ui/__init__.py", line 1291, in main
    _raw_main(args)
  File "/home/user/.local/lib/beets/beets/ui/__init__.py", line 1278, in _raw_main
    subcommand.func(lib, suboptions, subargs)
  File "/home/user/.local/lib/beets/beetsplug/mbsync.py", line 164, in func
    self.albums(lib, query, move, pretend, write, performer_info)
  File "/home/user/.local/lib/beets/beetsplug/mbsync.py", line 222, in albums
    album_info = hooks.album_for_mbid(a.mb_albumid)
  File "/home/user/.local/lib/beets/beets/autotag/hooks.py", line 558, in album_for_mbid
    album = mb.album_for_id(release_id)
  File "/home/user/.local/lib/beets/beets/autotag/mb.py", line 571, in album_for_id
    return album_info(res['release'])
  File "/home/user/.local/lib/beets/beets/autotag/mb.py", line 361, in album_info
    ti = track_info(
  File "/home/user/.local/lib/beets/beets/autotag/mb.py", line 273, in track_info
    extra_trackdatasets = plugins.send('mb_track_extract', info=recording)
  File "/home/user/.local/lib/beets/beets/plugins.py", line 498, in send
    result = handler(**arguments)
  File "/home/user/.local/lib/beets/beets/plugins.py", line 153, in wrapper
    return func(*args, **kwargs)
TypeError: import_listener() missing 1 required positional argument: 'data'

@dosoe
Copy link
Contributor Author

dosoe commented Jul 3, 2021

@aereaux That is strange, I don't have this problem. Are you sure you use the latest version?

@dosoe
Copy link
Contributor Author

dosoe commented Jul 3, 2021

However, one problem I do have, is that I need the performer_info option active in the config file for beets mbsync -P to work, which defeats the purpose of having this option. Is there a way to solve this? Can listeners be activated by an option in the command, or are they only by options in the config file?

@aereaux
Copy link
Contributor

aereaux commented Jul 4, 2021

Ah, looks like it's my fault. I have a plugin that was causing the issue. No idea why it's suddenly failing, though

@@ -270,8 +270,8 @@ def track_info(recording, index=None, medium=None, medium_index=None,
info.arranger = u', '.join(arranger)

# Supplementary fields provided by plugins
extra_trackdatas = plugins.send('mb_track_extract', data=recording)
for extra_trackdata in extra_trackdatas:
extra_trackdatasets = plugins.send('mb_track_extract', info=recording)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sorry I screwed up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it should be fine now

@beetbox beetbox deleted a comment from poo922 Jul 5, 2021
@dosoe
Copy link
Contributor Author

dosoe commented Jul 10, 2021

Is it the case that you get mbsync_x fields fetched even when the mbsync plugin is disabled?

I did a test, they are not fetched if mbsync is disabled, even if the 'performer_info' option is active.
Why am I associationg it with mbsync? mbsync goes over all items to fetch info from MusicBrainz and the performer info I want to fetch is from MusicBrainz, so it seems an obvious match. Also, to update the performer information, I need to go over all items and fetch their MB data, which mbsync already does. I don't think there is such high demand for this that it should be in beets core.

@dosoe
Copy link
Contributor Author

dosoe commented Jul 10, 2021

Ah, looks like it's my fault. I have a plugin that was causing the issue. No idea why it's suddenly failing, though

Does it work now, with your plugin?

@aereaux
Copy link
Contributor

aereaux commented Jul 10, 2021

Ah, looks like it's my fault. I have a plugin that was causing the issue. No idea why it's suddenly failing, though

Does it work now, with your plugin?

Yep, changing the parameter name back fixed it.

Also, at least as a user, I think of mbsync as only updating the same information that is retrieved on initial import. It seems like it would make more sense either to have this in core so that it gets retrieved on initial import and on mbsync (probably behind a feature flag) or to put it in a separate plugin like how parentwork works, although parentwork does do a lot of additional info fetching.

@dosoe
Copy link
Contributor Author

dosoe commented Jul 10, 2021

Actually, it is retrieved at the initial import if mbsync and the 'performer_info' option are active. That's probably the feature flag you're looking for.

@aereaux
Copy link
Contributor

aereaux commented Jul 10, 2021

Actually, it is retrieved at the initial import if mbsync and the 'performer_info' option are active. That's probably the feature flag you're looking for.

Ah OK, I wasn't sure because it seemed like it required using both that parameter and the commandline flag when using mbsync. But it seems like it works now, I might have done something wrong when I tried it before.

@bearcatsandor
Copy link

bearcatsandor commented Jul 10, 2021

I think I missed something. If this has been/will be merged into the base branch, what config options do I need to enable to try it out?

@dosoe
Copy link
Contributor Author

dosoe commented Jul 10, 2021

@bearcatsandor Yes, you need to enable the mbsync plugin and in the mbsync plusing you need to enable the fetch_performers option

mbsync: 
    fetch_performers:yes

There is a command-line option that should override this option, but it doesn't work, I will probably remove it and performer fetching will be only possible by enabling the option in the config, unless you believe there could be a need/demand for it (something like beet mbsync -P to fetch the performers regardless of the option in the config file).

@bearcatsandor
Copy link

So, unless I'm being a doofus, I've placed that mbsync option into my config (the plugin was already enabled), and it doesn't appear to do anything. Looking at this page, I'm not sure how to tell if it's been merged into beet:master.

@sampsyo
Copy link
Member

sampsyo commented Jul 11, 2021

Hey @bearcatsandor—no, I think you're understanding correctly! This PR has not been merged: you can tell because the top of the page still says "open" in green. #3995 is an example of a merged (purple!) PR, meaning the changes are now part of the branch they were proposed for. To try out these changes, you'll need to check out @dosoe's branch.

@sampsyo
Copy link
Member

sampsyo commented Jul 11, 2021

I did a test, they are not fetched if mbsync is disabled, even if the 'performer_info' option is active.
Why am I associationg it with mbsync? mbsync goes over all items to fetch info from MusicBrainz and the performer info I want to fetch is from MusicBrainz, so it seems an obvious match. Also, to update the performer information, I need to go over all items and fetch their MB data, which mbsync already does. I don't think there is such high demand for this that it should be in beets core.

Thanks for clarifying, @dosoe! This does help clarify things a bit for me.

Nonetheless, it does solidify my position that this probably should not be part of mbsync. The original intention behind the mbsync plugin was to provide simple, fast alternative to re-importing your music—to just refresh exactly the same metadata that the original import process also provides. Like the documentation says:

This is useful for updating tags as they are fixed in the MusicBrainz database, or when you change your mind about some config options that change how tags are written to files.

That's the use case it's aiming for: when you know something has changed on MB (perhaps because you edited MB yourself!) and you'd like for your database to reflect that. Notably, it doesn't fetch any more metadata than the original import process does—it just fetches the same stuff only "fresher."

From that perspective, it seems like fetching performer information would, in an ideal world, want to obey the same rules. It could get fetched on initial import and refreshed on mbsync. But there would be no need for it to be mbsync-specific, i.e., people who just want performer data shouldn't need to enable mbsync at all.

So can I make the recommendation that this be put into a separate, special-purpose plugin, perhaps called performers or similar? In v1.0, it could just provide an explicit beet performers command that fetches and stores all this data. The tags could be called performer_* to make the purpose clear. In a future revision, we could explore hooking this the importer and into mbsync, but that can come after an initial "explicit" invocation. Would this make sense?

@dosoe
Copy link
Contributor Author

dosoe commented Jul 12, 2021

It does make sense. However, to be honest, I'm not sure how to get the performer info without getting all the other MusicBrainz tags as well, as they get fetched during the call of hooks.track_for_mbid(item.mb_trackid) which fetches the data from MB. I could then just apply the tags related to performers, but that seems kind of a waste to have two plugins that do the same thing (because of the listener, each time mbsync is called, the performers would be updated). Maybe that's a problem with the way I implemented the listener, let me explain why:

If you create a new flexible tag, let's call it instrument_instr where instr is the name of the instrument. This tag gets added to track_info using the listener. Now someone changes the instrument. Then the listener will add the new tag, but not remove the old, now obsolete tag. This has to be done by hand, which is why I added a little loop that cleans all the tags with the mbsync_ prefix in this PR. However, now I need to put this little loop not only in the mbsync plugin, but in each and every plugin that uses the listener and adds tags using it. Now if I have an instrument plugin that adds this instrument tag, I need to make a loop in mbsync that cleans up the instruments tag, else in case of a change it would add the new instrument and not remove the old one. Which means that the mb_track_extract event should allow to add tags, but also to remove some. I hope I'm clear, I don't know if it's a huge issue. Maybe a run of the instrument plugin would clean it up anyway, but that means that to update cleanly my data, I would need to call mbsync, instruments and eventually performers which all make exactly the same calls to MB, which seems like a waste of calls and can take some time for my collection of 50k tracks.

@sampsyo
Copy link
Member

sampsyo commented Jul 13, 2021

Yes, very clear; thanks for the explanation!! What makes this effort unique relative to other "apply some extra fields" efforts is the need to create a data-driven set of fields, rather than a fixed list, which implies the need to delete as well as add them.

What do you think about adding a new event just for doing this? We could emit the event in the apply_item_metadata function or thereabouts, to let plugins do whatever they want just before setting new fields on Items. The new plugin could just delete all the performer_* tags at that point and then rely on the normal process to set the new ones. Would that work (and be as simple as it sounds)?

(If that does work, it would be good to do one thorough check to see if some other, existing event is close enough to the right "time" that it world already service this purpose...)

@aereaux
Copy link
Contributor

aereaux commented Jul 13, 2021

One other possibility that I can think of, although I don't know how complicated it would be, would be to have some official way to associate specific tags with specific plugins. This might help resolve any potential conflicts between plugins and also allow for beets to automatically delete the old versions of tags.

@stale
Copy link

stale bot commented Nov 10, 2021

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 10, 2021
@bearcatsandor
Copy link

If I can help keep this alive, let me know. I've been eager for this for years.

@stale stale bot removed the stale label Nov 16, 2021
@dosoe
Copy link
Contributor Author

dosoe commented Nov 19, 2021

Hi! Yes, that's greatly appreciated. I just can't find the time to do it, sorry.
So that's where I think this is: Right now, I have two events mb_track_extract and mb_album_extract , I have two functions that get the performers track_performers and album_performers and I need to go through all tracks and albums and download musicbrainz data to feed it to these functions and then something to apply it to the files. Right now, I'm piggybacking on the mbsync plugin but from what I understand, it should change, maybe create a new plugin specifically for this.
There is a catch however: The performer tags are created flexibly, which means that when they change, they are not replaced like-for-like, it can happen that new ones are created and the old ones just stay.
Flexible tags also open the possibility for conflicts, if two plugins use the same tag for different purposes. Previously, all tags were explicitly listed, so the risk was low, you could just look up if the tag was already used for something else. Now, tags can be created on the fly, which means that there is no way to know in advance which tags will be used by which plugin. The solution I opted here was to create a prefix for all tags created by mbsync.
To clean up old potentially obsolete tags, I remove all tags with this prefix before applying the new ones, however, by using an event, all other plugins using triggering this event and applying tags would not make this cleanup (so far there are none, but if we create a new pluging for performers there will be: mbsync and this new plugin).
I hope I'm clear and helpful, else just tell me.

@stale
Copy link

stale bot commented Mar 20, 2022

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Mar 20, 2022
@stale stale bot closed this Mar 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants