Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Beets Work Queue Discussion #1375

Open
ab623 opened this issue Mar 24, 2015 · 16 comments
Open

Beets Work Queue Discussion #1375

ab623 opened this issue Mar 24, 2015 · 16 comments

Comments

@ab623
Copy link
Contributor

ab623 commented Mar 24, 2015

Work Queue Discussion

I wanted to start a discussion on work queuing.
With current beets functionality, you run the command beet import ~/music and that folder is imported but in an interactive state.
Beets can be run in a psudo non-interactive state, but at the risk of reducing the confidence of the autotagger.
What I would like is the ability to still have the full functionality of beets, but simply at a later date, which is why I would like to a see a task based functionality implemented.

Proposed Scenario

Here are my proposed scenario:

  1. Run a beet import with a switch to run in an non-interactive mode
  2. Beets starts an import, and either stays open, or demonises itself for the period of the import process.
  3. I can then run a beet tasks or something to that effect
  4. Based on a configuration, this would contain all the tasks that require human intervention
  5. From here I could then go through each task, which is either an album or track, and see the options available to me, and then either apply, skip, keep, etc, as I would when running interactively.

Maybe this tasks functionality could also be used to hold other tasks or actions such as

  • Verify correct cover image before applying
  • Verify chroma matches
  • Update library
  • …and anything else you can think of

Applications

The applications for this are the following:

  • Import a HUGE library and not have to worry about sitting down for ages
  • Import and worry about the tasks at a later date
  • Work on 4 or 5 tasks at a time, and leave the rest when you are free
  • Because tasks are being done in the background, when an albums confidence is low, it can request similar albums as well. So when the task is revisited the user can pick from multiple, and it’s entered immediately.
  • A GUI web application could be created, so that it can easily pick up the tasks, and display the content. Which can facilitate people who aren’t comfortable with the command line. This will also facilitate with being able to show album art.

Thoughts? Discussions? Implementation ideas?

@guibog
Copy link
Contributor

guibog commented Mar 25, 2015

Hi, you could try to use unison sync tool, it has something similar to what you propose.

@ab623
Copy link
Contributor Author

ab623 commented Mar 25, 2015

@guibog I'm not really sure how the Unison Sync tool, fulfills the use case I outlined in my original post. Can you explain further.

@sampsyo
Copy link
Member

sampsyo commented Mar 27, 2015

Thanks for bringing this up! This is a great idea to begin discussing.

There are some notes on the same idea on the wiki: https://github.com/sampsyo/beets/wiki/Refactoring
See the bullet for "Asynchronous import decisions".

This would be a huge amount of work, but I'm especially excited for the alternate UI possibilities this might open up.

@ab623
Copy link
Contributor Author

ab623 commented Mar 30, 2015

I had a look at the Asynchronous import decisions and it is pretty much what i described too. So I'm glad other people have been thinking about this too.

I think a simple method by far will be the best. Using a SQLite table to insert the task that requires completion, along with a status, and outcome of actions.

Then a separate instance of beets could be run which runs in a daemon mode, and picks up any pending tasks, and proceeds to work on them, and update the SQLite table as required. This demon, can handle all the priority etc.

Then whenever you run an instance of beet you can either specify if the task is done immediately or put into a queue beet import -q /path/to/file. This could be set by default in the config file.

Then beets tasks can simply format and display the details from the table.

What we need to figure first is what is the best solution that we can implement that is expandable not only to the autotagger but that other plugins can potentially hook into. I know there are other engines such as Celery which we could use, but is that the best idea? Could be, as we don't really want to reinvent the wheel.

@sampsyo
Copy link
Member

sampsyo commented Mar 31, 2015

What about re-using the existing database structure? That is, we'd just have a "pending" flag on items and albums indicating that they've been tracked but not fully imported yet. Or, possibly, a more general "status" field indicating which tasks have been run on them so far.

@guibog
Copy link
Contributor

guibog commented Mar 31, 2015

A general status makes sense, but sometime we need more than one. For instance, I have my big library imported but because I am obsessive about it, I haven't dared yet to write tags to files and move them. If I ever do that, ideally I would like to do it step by step, and be able to know which file was written/moved when. Another example: MusicBrainz data is updated from time to time, so I would like a "last_checked_on_mb" status. For these kind of things, I have a taste for event tables: "id | entity_id | event_name | timestamp". This is a bit like a log, and allow for much flexibility, at the cost of a bit more complexity in SQL queries.

Example of events that could apply:

  • played
  • detected_missing (the file is not there)
  • moved
  • metadata_written_to_file
  • metadata_checked_on_mb
  • cover_added
  • ...

@ab623
Copy link
Contributor Author

ab623 commented Mar 31, 2015

@sampsyo - I wouldn't want to use the existing database structure as it wasn't designed to be used for this type of feature, meaning it won't be able to be easily expanded to allow additional functionality. If the autotagger runs in the background, then it has more time to request more information, and store that info. So if its confidence was low, it could automatically request and store additional candidates, or plugins could hook into it, such as lyrics, and autodownload and store lyrics. I wouldn't want to muddy the existing table with this information.

@guibog - I think a log file should is a good addition to beets, but that should be a beets core function. It should be logging what it does. This way we can see what changes were applied at what time. This can then be used to roll back changes. But this should be raised as a separate feature.

What we need is a table which stores

  • Beet command to be run - beet import /path/, beet ftintitle
  • Date/time of addition
  • Status - Pending, Cancelled, Superseded, Completed
  • Output

This is the basic information, that I think could be supplied to a beets daemon. So when i run a beets import, i can specify to run it in the background. This adds it to the task list, and then the daemon will begin processing it. Once processed it will change its status to completed, and an output is applied, this may be a decision that the user must make a later point in time.

I can then fire up beets frontend, and run a command which lists all tasks, and any actions I need to take against them. This way in your scenario @guibog i can create a cron job which queries my library each week for updates to musicbrains data, and it runs in the background, kepping my data in sync. A status of superseded could also be used which can invalidate a previous import by a new import.

@sampsyo
Copy link
Member

sampsyo commented Mar 31, 2015

Thanks of the comments. I'm still not quite sure why this functionality would "muddy the existing table"—we can of course hide the new functionality from other user-facing interactions (i.e., pending stuff would not show up in beet list). I'm willing to buy this with a little more explanation, of course. Maybe doing a more detailed design of both alternatives—on a wiki page, for example—would help clarify?

@ab623
Copy link
Contributor Author

ab623 commented Mar 31, 2015

@sampsyo - Sounds like a good idea. Are you able to provide an export of the main tables beets uses along with columns, and a typical subset of data, maybe 100 rows? Maybe paste in a gist (comma/tab separated). That would help with the design and visualisation.

@sampsyo
Copy link
Member

sampsyo commented Mar 31, 2015

You can get this from your own library using the SQLite command-line program:

$ sqlite3 ~/.config/beets/library.db
SQLite version 3.8.5 2014-08-15 22:37:57
Enter ".help" for usage hints.
sqlite> .schema
[...]
sqlite> select * from items limit 100
[...]

@ab623
Copy link
Contributor Author

ab623 commented Apr 1, 2015

@SamPsy - Thanks. My library isn't as fully featured at the moment with plugins etc, but i will dump mine anyway. I can already envision a issues with using the existing table, but I will get them noted down anyway.

@hrehfeld
Copy link

hrehfeld commented Sep 4, 2015

from #1538 :

It would be awesome if we could run

 $ beet import --quiet

which then writes a file similar to a temporary commit message file. It would contain the interactive prompt:

# lines starting with # will be ignored
# (32 items)
# Tagging:
#     Dead Brothers - Dead Music for Dead People
# URL:
#     http://musicbrainz.org/release/c775bc5b-26d9-4da6-8b3b-b464040d3147
# (Similarity: 83.0%) (unmatched tracks) (CD, 2000, CH, Voodoo Rhythm Records)
# Unmatched tracks (16):
#  ! Dead Brothers Stomp (#1) (3:02)
#  ! I've Always Known (#2) (2:04)
#  ! Farmer Boy (#3) (2:42)
#  ! Besame Mucho (#4) (2:14)
#  ! Roger (#5) (0:54)
#  ! She Collects Postcards (#6) (5:03)
#  ! Banjo Villa Against Tarass Boulba (#7) (0:51)
#  ! Crying (#8) (4:38)
#  ! Hora (#9) (2:46)
#  ! Allons Aux Paquis! (#10) (2:11)
#  ! Somewhere Between Dog & Wolf (#11) (2:58)
#  ! Buy It! (#12) (1:11)
#  ! Good Time Religion (#13) (5:04)
#  ! Orally (#14) (1:01)
#  ! Ramblin' Man (#15) (3:57)
#  ! [untitled] (#16) (1:08)
# (A)pply, [S]kip, (U)se as-is, as (T)racks, (G)roup albums, Requery (i)nteractively
# Enter search, enter Id, aBort? 
/music/beets_/ Dead Music for Dead People - Voodoo Rhythm Records - c775bc5b
<optional temporary uuid here>

I would then put a character with whatever choice I decide on:

# Enter search, enter Id, aBort? 
/music/beets_/ Dead Music for Dead People - Voodoo Rhythm Records - c775bc5b
<optional temporary uuid here>
a

and the next item would follow.

Then i could call

$ beet import --file

and beets would apply my choices.

This would help with the dreadful importing stage. (I am importing my library since... a few days ago, and it's still going on)

@sampsyo sampsyo mentioned this issue Mar 31, 2016
@ab623
Copy link
Contributor Author

ab623 commented Apr 1, 2016

Based on previous discussions should we look into a simple python package which we can use a a task system. Something which will underpin the entire concept is the ability to add tasks to a queue and process them and get back results.

We ideally need the following requirements in my opinion

  • Ability to store items in a queue
  • Ability to persist queue if server goes down, so it can resume processing
  • Ability to store job responses and exceptions

Nice to haves would be:

  • Simple workflow system
  • Multiple queues for high / low priority jobs

Many tasks based queues use a back end broker, a popular one is Redis. I'm not sure we want to include another dependency onto beets, but this tasked based system could be a config option, allow people to use it without a back end broker.

Any suggestions?

@sampsyo
Copy link
Member

sampsyo commented Apr 1, 2016

This does sound like approximately the right list of requirements. For beets, though, I'd argue that something much simpler than Redis would be the right way to go—in particular, just storing "tasks" as records in a SQLite database should work great. In particular, I'm nervous about any solution that involves a separate process just to store and distribute tasks. That's good for a server setting, where lots of daemons run constantly anyway, but less good for an interactive, user-facing application.

What do you think of a simple database-backed queue?

@ab623
Copy link
Contributor Author

ab623 commented Apr 1, 2016

I understand your hesitance for an external system to manage queues and I
agree.

I've been looking into a database backed system, but I havnt had much luck.
It may be that we need to implement our own.

Maybe a new table in the db...or a new database entirely (I dont like the
library db to filled with temp data, with constant read writes). It could
store the beets command intended to run, priority, output success etc. And
then beets in the background in server mode could poll it ever x seconds.
For the next task.

For the amount of tasks that we will run (minimal) and the features
required. It's perfect plausible to roll our own.
On 1 Apr 2016 6:32 p.m., "Adrian Sampson" notifications@github.com wrote:

This does sound like approximately the right list of requirements. For
beets, though, I'd argue that something much simpler than Redis would be
the right way to go—in particular, just storing "tasks" as records in a
SQLite database should work great. In particular, I'm nervous about any
solution that involves a separate process just to store and distribute
tasks. That's good for a server setting, where lots of daemons run
constantly anyway, but less good for an interactive, user-facing
application.

What do you think of a simple database-backed queue?


You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
#1375 (comment)

@gryphonmyers
Copy link

gryphonmyers commented Mar 30, 2021

No activity on this for some time, but this feature would unlock just about the most ideal import process I can imagine for music library software. Being able to navigate tagging decisions as interactions with a Telegram bot on my own time, as new music comes in via automated processes... 😍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants