Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No time limit on candidate lookup #2446

Closed
RollingStar opened this issue Feb 19, 2017 · 23 comments
Closed

No time limit on candidate lookup #2446

RollingStar opened this issue Feb 19, 2017 · 23 comments
Labels
needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature." stale

Comments

@RollingStar
Copy link
Contributor

RollingStar commented Feb 19, 2017

Problem

If a candidate is a huge box set, beets slows to an absolute crawl trying to match tracks to it. There should be a cap on maximum lookup time, and beets should cancel a lookup thread that goes passed that time limit. Then it can provide the user with its candidates, including the interrupted lookups, and give its usual prompts, like (A)pply, but add a new one: something like (C)ontinue interrupted lookups. The time limit could default to something reasonable (10-60 seconds) and perhaps be user-specified in the config.

The import hangs (gets really slow but eventually finishes) on two attempts for the same files. Similar lookups took at least 25 minutes, but I haven't timed it exactly. edit: I started this command at least before posting this issue 1 hour and 45 minutes ago. Beets is still considering the entire boxset as a candidate and Python still shows CPU usage.

Running this command in verbose (-vv) mode:

beet -vv import e:\music\mozart
user configuration: C:\Users\RollingStar\AppData\Roaming\beets\config.yaml
data directory: C:\Users\RollingStar\AppData\Roaming\beets
plugin paths:
Sending event: pluginload
inline: adding item field disc_layer
inline: adding album field album_bitrate_kbps
library database: e:\songs\beetslibrary.bib
library directory: e:\songs
Sending event: library_opened
Sending event: import_begin
Import of the directory:
e:\music\mozart
was interrupted. Resume (Y/n)?
Resuming interrupted import of e:\use
Skipping previously-imported path: [omitted]
Sending event: import_task_created
Sending event: import_task_start
Looking up: e:\music\mozart\Volume 6 - Dances & Marches
Tagging Mozart, Wolfgang Amadeus - Tänze · Märsche (CD 1) (Complete Mozart Edition Vol.6)
No album ID found.
Search terms: Mozart, Wolfgang Amadeus - Tänze · Märsche (CD 1) (Complete Mozart Edition Vol.6)
Album might be VA: False
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: import_task_created
Sending event: albuminfo_received
Candidate: Wolfgang Amadeus Mozart - Complete Mozart Edition, Volume 6: Dances / Marches
Success. Distance: 0.25
Sending event: albuminfo_received
Candidate: Judas Priest - The Complete Albums Collection
Success. Distance: 0.79
Sending event: albuminfo_received
Candidate: Wolfgang Amadeus Mozart - Complete Mozart Edition, Volume 1: Early Symphonies
Success. Distance: 0.73
Sending event: albuminfo_received
Candidate: Wolfgang Amadeus Mozart - Complete Mozart Edition
[here is where it's stuck for many minutes, although candidate 1 (correct) and 3 are also slow]

Led to this problem:

beet -vv import e:\music\mozart

Here's a link to the music files that trigger the bug (if relevant): n/a, but this dummy file that matches the box set probably simulates the same effect. dummy - 01 - Mozart, Wolfgang Amadeus - 7 Menuette, KV 61b (65a) - 1. Menuett in G-dur.zip

Setup

  • OS: Win10 x64
  • Python version: 3.6.0
  • beets version: 1.4.3
  • Turning off plugins made problem go away (yes/no): no (can disable my last two plugins, but the log shows that they should be unrelated). I have already disabled most of my plugins and trimmed my config down to the bare essentials.

My configuration (output of beet config) is:

directory: e:\songs\
library: e:\songs\beetslibrary.bib
import:
    copy: no
    move: yes
    write: yes
    timid: no
per_disc_numbering: yes

plugins: inline copyartifactspy3

paths:
    default: $albumartist - $original_year - $album [$album_bitrate_kbps]/%if{$disc_layer,$disc_layer}$track. $title 
album_fields:
    album_bitrate_kbps: |
        #dummied out my custom album bitrate function
        #this dummy function should have almost 0 performance penalty
        return(str(abcdefghi))
item_fields:
    disc_layer: |
        if disctotal > 1:
           return str(disc) + '-'
        else:
           return ''
@sampsyo sampsyo added the needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature." label Feb 20, 2017
@sampsyo
Copy link
Member

sampsyo commented Feb 20, 2017

Hi! Thanks for the details on this.

Can you tell whether beets is stuck communicating with MusicBrainz to get matches or doing the matching? I've found in the past that the matching algorithm itself can be the culprit in these situations with very large albums. Specifically, it seems from the log like it's getting stuck when it tries to match with a very large candidate album.

This might be testable, for example, by giving beets an MBID to search for and comparing the time to match a few different options in isolation.

@RollingStar
Copy link
Contributor Author

RollingStar commented Feb 20, 2017

I gave beets a MBID for the correct match (volume 6 of the set rather than the entire set) and the whole operation was quite manageable. Probably under 5 minutes, compared to the 1h45+ which I eventually cancelled.

How specifically do you propose I test the performance? I think this is the MBID that beets was hanging on, but I'm not sure. 03ce3f61-9294-4f5d-aace-a4687aad4229

edit: and keep in mind, my point for this issue is that no matter how optimized beets gets, there should be a time limit, because it is a more graceful handling of unpredictable edge cases.

@sampsyo
Copy link
Member

sampsyo commented Feb 20, 2017

Thanks for looking into it. And also for including the "dummy" file above, which does match the same albums.

Oddly, on my machine (an aging MacBook Pro), this only takes a few seconds per match. It's not exactly snappy, but it's also not in the tens of minutes. It seems like about half the time is being spent in track matching (i.e., on the CPU) and the other half is just downloading data from MusicBrainz.

So, let's try to narrow down what's taking so long on your machine. I just pushed a bunch of extra logging that should delimit when beets is trying to match metadata and when it's communicating with MB. It will be helpful to sort that out regardless of whether we add a timeout or just optimize that chunk of code so it can't take so long. Can you give the most recent source a try in verbose mode and report back about which phase takes the longest?

sampsyo added a commit that referenced this issue Feb 20, 2017
This could help debug #2446. It will help, at least, narrow down when the
Munkres algorithm is taking too long.
sampsyo added a commit that referenced this issue Feb 20, 2017
Even more performance-isolating logging to help debug #2446.
@dosoe
Copy link
Contributor

dosoe commented Mar 22, 2017

Hi! I'm having a similar problem with this release:
https://musicbrainz.org/release/9bcd75dd-995e-482b-8ba7-1ef074d253de
However, with very big releases (like a complete edition of bach) I don't have problems since I have only a fraction of the files. Complete releases with ~15 CDs like https://musicbrainz.org/release/b0326242-344d-4a07-a175-c36a1418b644 are handled without problems, even if all files are there.
Also, when some files are double and some are lacking, beets is trying to put them in place of the missing files. Maybe there should be a check if a file is actually a duplicate.
I would think that beets is trying all different configurations of files to check which one has the best fit. If you have 700 or 1000 files that you can combine, it makes a lot of combinations to try out. If it's not done already, it would make sense to check for a tag like 'tracknumber' to reduce the number of possible combinations.

The timeout is not implemented so far, I got beets searching for a few hours to match an album.

@dosoe
Copy link
Contributor

dosoe commented Mar 28, 2017

If the problem is the handling and comparison of too many candidates, could there be a way to give the ID on MusicBrainz as an input to get rid of the problem? Or set a timeout and after it has expired, prompt for an ID.

@sampsyo
Copy link
Member

sampsyo commented Mar 28, 2017

It's actually the time taken to evaluate a single candidate album, when that candidate itself is very large.

@dosoe
Copy link
Contributor

dosoe commented Mar 28, 2017

Still, if we already tell him: "this is the right release, now you have only to get the tracks at the right place", could that speed it up or is it exactly that step that is taking time?

@sampsyo
Copy link
Member

sampsyo commented Mar 28, 2017

Yes, disabling the track alignment procedure could work.

@RollingStar
Copy link
Contributor Author

I still think a time limit (perhaps with a user prompt to keep looking) would help retain user-friendliness for unpredictable edge cases. This could be done as the only fix, or in addition to any other fixes.

@dosoe
Copy link
Contributor

dosoe commented Apr 30, 2017

I have the same problem with a multi-disc release. I ended up importing all the CDs individually, but now he considers that he has the same release several times, one for every import. Is there a possibility to merge them, so that he only has one release that contains all the CDs?

@sampsyo
Copy link
Member

sampsyo commented May 1, 2017

The most relevant issue about merging albums is #112.

@boardfish
Copy link

Currently dealing with this issue for d51f64e5-1d52-4d40-8308-5c9f22c4af03.

@8h2a
Copy link
Contributor

8h2a commented Jan 24, 2019

I'm having the same(?) issue with these releases:
https://musicbrainz.org/series/aa610319-3b4a-4a26-a127-b5dc78c3b732
Beets just seems to be doing something forever.
Any bigger release (e.g. dozens of discs, or hundreds of tracks) seems to be impossible to import using beets.
Another such example would be this: https://musicbrainz.org/release-group/9d0636f7-e959-4a00-9787-9a23c07f9655

@KW4NP
Copy link

KW4NP commented Dec 28, 2019

We're still dealing with this issue, an album of 260 tracks took over an hour before beets finally realized that there was no match. Would be very nice to have at least a timeout to keep this from happening needlessly.

@stale
Copy link

stale bot commented Jul 11, 2020

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jul 11, 2020
@RollingStar
Copy link
Contributor Author

I still think a time limit (perhaps with a user prompt to keep looking) would help retain user-friendliness for unpredictable edge cases. This could be done as the only fix, or in addition to any other fixes.

The "what is blocking it" line works better on a huge project than this one. There aren't many devs, and even fewer devs know beets in detail.

@stale stale bot removed the stale label Jul 12, 2020
@jtpavlock
Copy link
Contributor

jtpavlock commented Jul 12, 2020

The purpose of that line, in the case of these issues, is to hopefully realize what is preventing the issue from being labeled as a feature or bug and provide the necessary info to "upgrade" it from needinfo status. So in the case of this issue, perhaps one blocking element is the request from @sampsyo:

So, let's try to narrow down what's taking so long on your machine. I just pushed a bunch of extra logging that should delimit when beets is trying to match metadata and when it's communicating with MB. It will be helpful to sort that out regardless of whether we add a timeout or just optimize that chunk of code so it can't take so long. Can you give the most recent source a try in verbose mode and report back about which phase takes the longest?

@RollingStar were you ever able to get around to this?

@stale
Copy link

stale bot commented Sep 10, 2020

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Sep 10, 2020
@KW4NP
Copy link

KW4NP commented Sep 10, 2020 via email

@RollingStar
Copy link
Contributor Author

Part of why I haven't gone back and try to reproduce this specific bug with the massive Mozart box set is it would be plugging one potential bug. Enforcing a time-out would stop a whole class of odd behavior. Another reason is I eventually caved and imported the box set as sub-sets.

@stale stale bot removed the stale label Sep 10, 2020
@sampsyo
Copy link
Member

sampsyo commented Sep 10, 2020

Would anybody who's affected by this issue be interested in doing the aforementioned digging to figure out the root cause (cf #2446 (comment), for example)?

@dosoe
Copy link
Contributor

dosoe commented Sep 11, 2020 via email

@stale
Copy link

stale bot commented Nov 10, 2020

Is this still relevant? If so, what is blocking it? Is there anything you can do to help move it forward?

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Nov 10, 2020
@stale stale bot closed this as completed Nov 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needinfo We need more details or follow-up from the filer before this can be tagged "bug" or "feature." stale
Projects
None yet
Development

No branches or pull requests

7 participants