Add fallback for item access to album's attributes #2988

FichteFoll · 2018-07-20T00:53:52Z

Allows queries (especially for pathspecs) based on an album's flexattrs
while operating on items.

I'll leave formulating the changelog up to you.

sampsyo

Thank you for diving into this! This is a tricky issue, despite being a relatively small diff, and I appreciate your patience while we think through it.

I've left one minor comment inline. I also have a few general discussion points, from minor to major:

We have an existing mechanism for formatting album-level data for items. This would need to be removed if we move the "merging" behavior lower in the abstraction hierarchy.
You asked about caching issues. In our current system, Item and Album objects are not "live," in that changing them through one reference does not magically change their values in another context. They represent a sort of snapshot of the database, which is why they have a load method for updating values from the database. In that sense, caching the associated album might be even more reasonable than reloading the values every time.
This kind of change would completely hide item-level data, and eliminate the possibility for items to differ from each other on album-level fields. This is probably OK, but we have to consider the consequences. For example, in beets currently, it's possible to "work around" album-level fields you don't like and make some values different on a per-track basis. For example, if you decide you want country to vary across the tracks in a certain album, you can accomplish that if you really want to. With this change, however, those differences would be silently hidden. Again, this is probably OK, but we'd need to be clear about what's changing.

sampsyo · 2018-07-22T16:46:13Z

beets/dbcore/db.py

-            return self[key]
-        else:
-            return default
-


This is really just a matter of aesthetics, but one alternative would be to have get just do the necessary exception handling. That is, __getitem__ would continue to contain all the "real" logic and include the raise at the end. Then, get would change to contain:

try: return self[key] except KeyError: return default

That would help keep the docstring for get sensible, for example.

It might have been a bit of premature optimization, but afaik exception handling in Python isn't exactly performant (creation of exception object, traceback, caching of frames etc.) and combining the code paths was pretty trivial. I also liked that I could use this to have a short implementation in Item.get.

The docstring issue doesn't bother me personally because it properly describes what get can do. It does have a different function signature than dict.get, which we need for the with_album parameter anyway, but it's still compatible. I think I prefer Optionally over Alternatively, however.

I would have made those parameters keyword-only, but that is only possible in Python 3.

FichteFoll · 2018-07-22T18:14:09Z

Not sure what you mean by this. Formatting works exactly like it did before. However, I missed that self.model_keys is fetched from model.keys(True), which includes the album fallback (and results in the item's formatter for a field being requested rather than the album's). I'll override self.model_keys in FormattedItemMapping's __init__.
I see. So, I could keep an Item-internal reference to an album and just run load on that before trying to access album attributes (or keys()). Is load smart enough not to update itself when there haven't been any changes to the database since the last fetch? If it doesn't, then this change would be more involved and requires some care to actually make the caching useful. (I was thinking about storing the same album reference in multiple items for example, but I'd need to call load for each item individually anyway because we can't be sure the database hasn't been updated in the meantime.) Otherwise, just keeping a lazy-loaded album attribute per item individually would probably already be an improvement.
I'm not sure I understand the problem you're describing because, as far as I'm aware, all of this is still possible. Attribute access on the item itself is prioritized over the album fallback, for their standard fields and even for flexattrs. _setitem was not modified, so you can also still set an item's field or flexattr to override an album's.

sampsyo · 2018-07-22T19:16:42Z

Not sure what you mean by this. Formatting works exactly like it did before. However, I missed that self.model_keys is fetched from model.keys(True), which includes the album fallback (and results in the item's formatter for a field being requested rather than the album's). I'll override self.model_keys in FormattedItemMapping's init.

Thanks for catching that!

What I'm worried about is not a direct conflict or anything—just that we're implementing the same logic ("fallback" between item and album attributes) twice. If evaluating the expression item.field already looks up field in item's album, then ideally we would not need FormattedItemMapping—the plain old FormattedMapping from dbcore would do the trick.

But as you discovered, there's subtlety about which formatter gets used. Maybe there's an elegant way to provide a merged view without the duplication, but maybe this division of responsibilities is OK.

I see. So, I could keep an Item-internal reference to an album and just run load on that before trying to access album attributes (or keys()). Is load smart enough not to update itself when there haven't been any changes to the database since the last fetch? If it doesn't, then this change would be more involved and requires some care to actually make the caching useful. (I was thinking about storing the same album reference in multiple items for example, but I'd need to call load for each item individually anyway because we can't be sure the database hasn't been updated in the meantime.) Otherwise, just keeping a lazy-loaded album attribute per item individually would probably already be an improvement.

No, load always loads the latest data. (Otherwise, we'd need some mechanism on the side for tracking when the database has changed—which likely would be no faster to check than just loading from the database.)

I'm not sure I understand the problem you're describing because, as far as I'm aware, all of this is still possible. Attribute access on the item itself is prioritized over the album fallback, for their standard fields and even for flexattrs. _setitem was not modified, so you can also still set an item's field or flexattr to override an album's.

OK, good point! I had missed that existing values on items take precedence. That means, unless I'm mistaken, that item-level fixed attributes always take precedence—because it's impossible to remove them. Sounds good!

FichteFoll · 2018-07-22T21:25:19Z

just that we're implementing the same logic ("fallback" between item and album attributes) twice

Yes, I noticed that too, but I believe we still need the formatter for the reasons you mentioned (and also because it performs other tasks such as alias mapping).

So, I tried implementing a lazy-loaded and cached album property for internal use, next to get_album, but I quickly realized that this doesn't really improve things a lot and it certainly hurts readability in a way that it makes code complex. You'd think you can turn a couple album = self.get_album(); if album: … into just if self.album:, but the property getter still needs to call load on every access. In the end, it comes down to a load call vs a get_album call, and with the former you'd end up having to juggle an internal-only album object for yet another way to access an item's album.

sampsyo · 2018-07-23T02:23:58Z

So, I tried implementing a lazy-loaded and cached album property for internal use, next to get_album, but I quickly realized that this doesn't really improve things a lot and it certainly hurts readability in a way that it makes code complex. You'd think you can turn a couple album = self.get_album(); if album: … into just if self.album:, but the property getter still needs to call load on every access. In the end, it comes down to a load call vs a get_album call, and with the former you'd end up having to juggle an internal-only album object for yet another way to access an item's album.

I see—thanks for giving it a try, and I can see how that would be less than ideal.

Anyway, this is shaping up well! It might be a good idea to run a few simple performance tests to make sure we aren't doing something terrible to the time required to run beet list, for example.

FichteFoll · 2018-07-26T13:18:25Z

~ λ hyperfine "beet list" -m 2
Benchmark #1: beet list
  Time (mean ± σ):      7.207 s ±  0.018 s    [User: 5.438 s, System: 0.759 s]
  Range (min … max):    7.194 s …  7.220 s

~/code/beets ∃ hyperfine "python -m beets list" -m 2
Benchmark #1: python -m beets list
  Time (mean ± σ):     17.757 s ±  0.093 s    [User: 12.581 s, System: 2.135 s]
  Range (min … max):   17.691 s … 17.823 s

Well, not looking so bright. It's a >100% slowdown. This'd need some smart caching, probably. I do wonder why the difference is so high, though. I mean, the ItemFormatter needed to access the item's album before as well. Maybe keys is run more often than I expected?

Also, I should probably add some documentation about this change.

Benchmark tool: https://github.com/sharkdp/hyperfine

sampsyo · 2018-07-26T13:32:44Z

Hmm, that is a little worrisome. Let's dig a little deeper and see if we can't mitigate some of the effects. (Thanks for the tip about hyperfine, btw!)

FichteFoll · 2018-07-26T19:07:58Z

(I just found out about hyperfine today as I browsed the fd Readme by the same author.)

This is probably the point where I would start to look into profiling as I'm still not too familiar with the code base and believe this would provide a good starting point. Have you ever done this in python and have some recommendations for tools or other tips? (I haven't.)

sampsyo · 2018-07-26T20:47:06Z

I think that cProfile, in the standard library, is still probably the best profiler out there. One tip I do have, however, is that SnakeViz is a really nice browser-based GUI for viewing/navigating profile data.

FichteFoll · 2018-09-14T01:07:57Z

Took a look back at this. I used py-spy for some quick effort-less profiling and it was quite obvious that the majority of the time is being spent with database access in get_album (or rather the album property, as I changed it). Just uncommenting the album.load() code removes the entire performance impact, but it also means the albums we're trying to print could be outdated.

I considered the simplest solution forward to be what I suggested earlier:

Make the item cache its album field and provide access through a property. The album returned by this property is read only since it is, well, cached. I decided against preventive measures here and instead made the property "hidden" with an underscore and provided documentation.
Only load database model objects when they have changed by tracking a revision number that I added to the database and increase on each mutating transaction. Had to tweak this for a little while until it passed all tests, but I suppose this is fairly safe going forward now. I added a comment clarifying on the possibility of race conditions, but as long as the _db_lock is aquired, we are fine.

Let me know what you think.

Also, I wasn't sure if I should add a section regarding the API to the changelog. It would mention the fallback of item access on Item and that re-loading is now lazy, although the latter should be transparent.

FichteFoll · 2018-09-14T01:23:33Z

Benchmarks: first is with this PR, second is 1.4.7.

~/code/beets ¬ hyperfine "python -m beets list" -m 20 && hyperfine "beet list" -m 20
Benchmark #1: python -m beets list

  Time (mean ± σ):      6.188 s ±  0.216 s    [User: 4.582 s, System: 0.761 s]

  Range (min … max):    6.005 s …  6.948 s

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark #1: beet list

  Time (mean ± σ):      6.036 s ±  0.044 s    [User: 4.455 s, System: 0.749 s]

  Range (min … max):    5.956 s …  6.115 s

Utilize album fields for special formatting of doujin releases. Requires a currently unmerged PR to beets. beetbox/beets#2988

sampsyo · 2018-09-16T19:24:28Z

Wow! This is extremely cool! Very nice work!

You’re right that the “revision” trick is fragile, since it requires us to intervene on all updates of the database and only avoids races because we implement our own internal global lock, but this seems like a great trade-off for the performance win it affords. This seems worth doing independent of the new query behavior you’ve introduced here.

As an aside, we’re already doing something similar for memozing %aunique strings, which are otherwise very expensive to recompute:
https://github.com/beetbox/beets/blob/b6bf82933ed3e8233e200079c92e2577a5ad5040/beets/library.py

Perhaps we should move this mechanism to reuse the revision mechanism (to detect when it’s time to invalidate the memoization table).

I do like the idea of adding a note about the API change “for developers” in the changelog. The new fallback behavior is worth documenting, even if _cached_album itself isn’t.

FichteFoll · 2018-09-17T12:39:40Z

I see. Yes, that would probably be useful for the aunique feature and might even warrant a proper implementation (i.e. a "public API"), but I'd rather not do this in here.

Regarding the changelog, I can do that. I deliberately made _cached_album an internal property (with the underscore) because I wasn't confident in exposing it. It is, after all, kind of a workaround, although it's the best I could think of. But as long as it's private, it can be changed.

By the way, as you can see from the commit that references this PR, I started using this in production and haven't encountered any issues so far. I probably could have done that before as well, as I wasn't too concerned about the performance, but for this PR it was a must.

sampsyo · 2018-09-17T14:28:33Z

OK, great. Since it’s a sensitive change, it might be wise to put out a call for testing so folks can try it out with funky configurations. I’ll post something to Discourse.

sampsyo · 2018-09-17T20:08:44Z

Post added: https://discourse.beets.io/t/call-for-testers-better-queries-for-album-level-fields-a-performance-improvement/477

FichteFoll · 2018-09-17T23:23:01Z

Before you check: The CI failure is unrelated to the PR and caused by some error with curl when trying to download the Python 3.4 image for flake8 checking.

sampsyo · 2018-09-18T11:52:52Z

(Thanks. I restarted that Travis job and everything’s fine.)

FichteFoll · 2018-10-30T12:39:15Z

Any updates on this? Doesn't seem like the discourse thread attracted much attention.

FichteFoll · 2018-11-16T17:46:17Z

Rebased to fix the merge conflict on the changelog. AppVeyor has some errors in the setup phase with chocolatey.

FichteFoll · 2019-01-07T16:33:37Z

Still nobody using this, it seems. 😞

Let me know when you intend to merge this, so I only need to fix the changelog conflict once (or you do it 🤷‍♂️ ).

FichteFoll · 2019-03-25T04:42:31Z

Someone was asking for this a few days ago on IRC, but I missed them and couldn't point towards this PR.

Anyway, I've been using this branch for half a year now with exactly 0 issues so far. I don't use the entire feature set of beets, but importing and path styles based on album flexattrs, which is my primary use case, are just fine.

I'll try to remember making a new speed comparison since my library grew a bit over time, but I don't expect it to be much different compared to the last time.

kergoth · 2019-05-14T16:13:11Z

FYI, I ran into a couple of issues with this, mostly relating to types in the fallback, both in path format queries and in beet ls. See https://discourse.beets.io/t/ranges-not-working-in-beet-ls-with-album-fields-in-item-track-context/

arcresu · 2019-05-14T23:15:24Z

I wasn't aware of this when I threw together the diff on the discourse thread @kergoth mentioned. I'll just reproduce it here:

diff --git a/beets/library.py b/beets/library.py
index 16db1e97..71b6db22 100644
--- a/beets/library.py
+++ b/beets/library.py
@@ -526,7 +526,17 @@ class Item(LibModel):
 
     @classmethod
     def _getters(cls):
-        getters = plugins.item_field_getters()
+        def atoi(f, ag):
+            def ig(i):
+                a = i.get_album()
+                if a:
+                    return ag(a)
+                else:
+                    return cls._type(f).null
+            return ig
+        getters = {f: atoi(f, g)
+                   for f, g in plugins.album_field_getters().items()}
+        getters.update(plugins.item_field_getters())
         getters['singleton'] = lambda i: i.album_id is None
         getters['filesize'] = Item.try_filesize  # In bytes.
         return getters
diff --git a/beets/ui/__init__.py b/beets/ui/__init__.py
index 327db6b0..c3adc72d 100644
--- a/beets/ui/__init__.py
+++ b/beets/ui/__init__.py
@@ -1145,7 +1145,10 @@ def _setup(options, lib=None):
         plugins.send("library_opened", lib=lib)
 
     # Add types and queries defined by plugins.
-    library.Item._types.update(plugins.types(library.Item))
+    at = plugins.types(library.Album)
+    at.update(library.Item._types)
+    at.update(plugins.types(library.Item))
+    library.Item._types = at
     library.Album._types.update(plugins.types(library.Album))
     library.Item._queries.update(plugins.named_queries(library.Item))
     library.Album._queries.update(plugins.named_queries(library.Album))

This wasn't intended to be a final implementation, but my approach was a little bit different in that I thought the album-item relationship was something beets-specific and therefore should be reflected in library.py rather than dbcore. I used the existing getter mechanism. The atoi function takes an album-level getter and converts it into an item-level one that fetches the item's album and delegates to the original getter. Item-level properties still have precedence, as in this PR.

I did find that it was necessary to also change Item._types in order to get queries to work as intended since otherwise the album-level fields don't have type information when accessed on Items.

Note that we recently picked up a helper for memoisation in another PR:

beets/beets/util/__init__.py

Lines 1037 to 1057 in 909fd1e

    
           def lazy_property(func): 
        
               """A decorator that creates a lazily evaluated property. On first access, 
        
               the property is assigned the return value of `func`. This first value is 
        
               stored, so that future accesses do not have to evaluate `func` again. 
        
               This behaviour is useful when `func` is expensive to evaluate, and it is 
        
               not certain that the result will be needed. 
        
               """ 
        
               field_name = '_' + func.__name__ 
        
               @property 
        
               @functools.wraps(func) 
        
               def wrapper(self): 
        
                   if hasattr(self, field_name): 
        
                       return getattr(self, field_name) 
        
                   value = func(self) 
        
                   setattr(self, field_name, value) 
        
                   return value 
        
               return wrapper

FichteFoll · 2019-05-19T10:20:03Z

Thanks for the headsup. I suspect that the problem with ranges is related to me not updating the items' type information, as you did in your diff. I was entirely new to the code base before working on this, so I just never considered that to be relevant.

The lazy_property is similar so something I drafted earlier in the process but ended up scraping because of what I outlined in an earlier comment (#2988 (comment)). The problem here is that the cached album is a snapshot of the database at whatever time it was first accessed, but the db may change during runtime and the lazy property will have no way to consider that fact.

I'll take a closer look at your getter approach when I find some time to work on this again.

(I'd like to mention that I cannot use beets without this feature anymore, so even if there is a huge update going on, I'll continue using my fork until I updated the PR for the changes.)

…lback

radusuciu · 2020-08-05T19:36:33Z

Willing to test if still needed, though it seems like there are now merge conflicts..

FichteFoll · 2020-08-10T23:28:42Z

It's primarily the changelog that becomes conflicted every now and then. I recently did a merge of master after my local version broke due to the Python 3.8 update that affected beets' AST usage. I'm still using the branch as my daily driver, FYI, but haven't found the time to dig into the performance cost since.

I do believe that the majority of the effort has already been made and what's left is likely comparatively small, but it's still not within my free time budget at the moment.

ctrueden · 2020-09-29T13:54:48Z

This issue (#2797) also bit me while organizing my library. I have the same use case as @radusuciu: wanting to partition my directory structure based on flexible attributes set during import via the --set flag. For the moment I am using the same workaround with the inline plugin.

My inline and paths config (WIP)

album_fields:
  topdir: |
    def value(f, otherwise):
      try: result = f()
      except: result = None
      return result if result else otherwise
    return value(lambda: category, 'Artists')
  subdir: |
    def value(f, otherwise):
      try: result = f()
      except: result = None
      return result if result else otherwise
    topdir = value(lambda: category, 'Artists')
    if topdir == 'Soundtracks':
      return value(lambda: franchise, '[Unknown]')
    return '[Various]' if comp else albumartist

item_fields:
  topdir: |
    def value(f, otherwise):
      try: result = f()
      except: result = None
      return result if result else otherwise
    return value(lambda: category, 'Artists')
  subdir: |
    def value(f, otherwise):
      try: result = f()
      except: result = None
      return result if result else otherwise
    topdir = value(lambda: category, 'Artists')
    if topdir == 'Soundtracks':
      return value(lambda: franchise, '[Unknown]')
    return artist
  disc_and_track: |
    if disctotal > 9:
      return u'%02i-%02i'% (disc, track)
    elif disctotal > 1:
      return u'%01i-%02i' % (disc, track)
    elif tracktotal > 99:
      return u'%03i' % (track)
    elif tracktotal > 9:
      return u'%02i' % (track)
    else:
      return u'%01i' % (track)

paths:
  # My album flexible attributes:
  # - avmedia: Comma-separated; e.g. Animation, TV, Video Games, Musicals, Movies
  # - nationality: e.g. German, Japanese, Korean
  # - franchise: e.g. Final Fantasy
  singleton: $topdir/%the{$subdir}/%the{$artist} - $title
  comp: $topdir/%the{$subdir}/($year) $album%aunique{}/($disc_and_track) $artist - $title
  default: $topdir/%the{$subdir}/($year) $album%aunique{}/($disc_and_track) $title

It works well, but there are downsides:

The performance of inline: anecdotally, beet move checks seem much slower with the workaround than without.
I don't see an obvious way to do varying levels of directory nesting with this approach. Every case has to fit into a $topdir/$subdir pattern. It's certainly doable to accommodate that, but if you wanted to have e.g. Artists/A Band/(2020) Their Best Album (i.e. two folders deep) in one case and Soundtracks/Movies/My Favorite Movie/(2020) That Movie's Soundtrack (i.e. three folders deep) in another, you'd have to set things up differently since template fields with path separators (/ or \) are not interpolated and split (see this forum thread for discussion).

TL;DR: I would love to see this PR make it into 1.5.0!

@FichteFoll I'm relatively new to beets, but will try to make time to do some performance profiling in the next few days to see how it affects my own library.

ctrueden · 2020-10-26T15:05:09Z

I rebased this and pushed to ctrueden/beets@0c7c586a.

Here's are the benchmark results on my library (48646 items):

commit	without album attr	with album attr
master (`769e424`)	16.19s	41.56s
rebased PR (ctrueden/beets@0c7c586a)	26.44s	38.16s

I agree that the performance drop is both unexpected and unfortunate. I'll try to dig more soon and report back.

ctrueden · 2020-10-26T16:14:46Z

TL;DR I pushed a fix to my branch: ctrueden/beets@4c5b5084. 🙌

Explanation

Here is where the performance hit is happening:

  File ".../beets/beets/ui/commands.py", line 1076, in list_func
    list_items(lib, decargs(args), opts.album)
  File ".../beets/beets/ui/commands.py", line 1072, in list_items
    ui.print_(format(item, fmt))
  File ".../beets/beets/library.py", line 362, in __format__
    return self.evaluate_template(spec)
  File ".../beets/beets/dbcore/db.py", line 621, in evaluate_template
    return template.substitute(self.formatted(for_path),
  File ".../beets/beets/dbcore/db.py", line 611, in formatted
    return self._formatter(self, for_path)
  File ".../beets/beets/library.py", line 378, in __init__
    super(FormattedItemMapping, self).__init__(item, for_path)
  File ".../beets/beets/dbcore/db.py", line 62, in __init__
    self.model_keys = model.keys(True)

The list_items function iterates over the items in the query:

for item in lib.items(query):
    ui.print_(format(item, fmt))

This invokes the library's formatter function on each item, which constructs a FormattedItemMapping:

class FormattedItemMapping(dbcore.db.FormattedMapping):
    ...
    def __init__(self, item, for_path=False):
        super(FormattedItemMapping, self).__init__(item, for_path)
        # We treat album and item keys specially here,
        # so exclude transitive album keys from the model's keys.
        self.model_keys = item.keys(computed=True, with_album=False)
        self.item = item

But then the super-constructor does this:

def __init__(self, model, for_path=False):
    self.for_path = for_path
    self.model = model
    self.model_keys = model.keys(True) # <== Equivalent to computed=True, with_album=True !

As you can see above, the generic FormattedMapping asks for the item keys including those from the album, which grabs them eagerly regardless of whether they will be needed for that template down the line—even though the comment suggests the intent was to exclude the album keys.

@FichteFoll Is the fix as simple as passing with_album=False here instead? It worked in my tests! I pushed a more cautious fix to my branch (ctrueden/beets@4c5b5084): it adds a flag to the super-constructor so the model_keys don't get double-computed (once in FormattedMapping with album keys included, and then again in FormattedItemMapping without them).

One other question

In my tests, I noticed that on master, flexible attributes do fall back to album attributes successfully in the format string.

Example of flexible attributes falling back to albums

$ beet ls album:'Twin Peaks' -f '$album :: $title :: $franchise'
Twin Peaks (Music From the Limited Event Series) :: Lark :: Twin Peaks
Twin Peaks :: Twin Peaks Theme :: Twin Peaks
Twin Peaks :: Laura Palmer's Theme :: Twin Peaks
Twin Peaks :: Audrey's Dance :: Twin Peaks
...
Twin Peaks (Music From the Limited Event Series) :: Out of Sand :: Twin Peaks
Twin Peaks (Music From the Limited Event Series) :: Axolotl (Roadhouse mix) :: Twin Peaks
Twin Peaks (Limited Event Series Soundtrack) :: Threnody to the Victims of Hiroshima :: Twin Peaks
Twin Peaks (Music From the Limited Event Series) :: Sharp Dressed Man :: Twin Peaks
$ beet ls album:'Twin Peaks' -f '$album :: $title :: $franchise' | wc -l
84
$ echo 'select * from item_attributes where key="franchise" and value="Twin Peaks";' | sqlite3 ~/.config/beets/library.db
805891|49695|franchise|Twin Peaks
805895|49696|franchise|Twin Peaks
805899|49697|franchise|Twin Peaks
805945|49698|franchise|Twin Peaks

It seems to be only queries that don't fall back:

$ beet ls franchise:'Twin Peaks' | wc -l
4

Is that known/intended behavior? Certainly it's inconsistent. Finishing this PR would address that!

Updated benchmarks with bug-fix

commit	without album attr	with album attr
master (`769e424`)	16.19s	41.56s
rebased PR (ctrueden/beets@0c7c586)	26.44s	38.16s
rebased PR + bug-fix (ctrueden/beets@4c5b5084)	14.70s	35.94s

sampsyo · 2020-10-27T00:02:15Z

Wow!! That's truly amazing—excellent work digging into this! 🎉 It would be really, really cool to include this.

To answer your second question: yes, that is indeed inconsistent, but it is also the intended behavior. Basically, the backstory is that we knew the templating fallback was useful, and it was straightforward to implement without too much performance impact, so we did it—but the query side of things was harder to do. But here we are?!

ctrueden · 2020-10-27T14:58:59Z

@sampsyo Great! So would you say this latest patch series is ready for merge, then? If so, @FichteFoll, could you please force push the PR accordingly?

sampsyo · 2020-10-28T16:39:21Z

Just to confirm, we want to merge the changes here and your new patch together, right? So maybe the right thing is to open a new PR with all that together?

FichteFoll · 2020-10-28T19:01:09Z

@ctrueden thank you a lot for looking into this. I will check out your over the weekend, most likely, and verify it in my environment as well as include your changes.

ctrueden · 2020-10-28T20:04:57Z

@sampsyo wrote:

we want to merge the changes here and your new patch together, right?

Right.

So maybe the right thing is to open a new PR with all that together?

@FichteFoll can force-push the branch to update this PR. That would be cleaner IMHO than opening a new PR. 😄

@FichteFoll wrote:

I will check out your over the weekend

Great! 👍

…lback

FichteFoll · 2020-11-02T00:06:21Z

Good work, @ctrueden. This fixes the really bad performance for me as well. 🎉 Glad it was only an oversight on my part and not an inherent flaw in the implementation.

I applied your commit pretty much as-is. An alternative way would have been to skip the super contructor from FormattedItemMapping and reimplement it, but that didn't seem proper.

My results with 11171 entries in the database:

master

Benchmark #1: python -m  beets ls --format "$artist - $album - $title" >/dev/null
  Time (mean ± σ):      1.931 s ±  0.071 s    [User: 1.828 s, System: 0.094 s]
  Range (min … max):    1.881 s …  2.013 s    3 runs
  
Benchmark #1: python -m beets ls --format "$artist - $album%ifdef{event, [$event]} - $title" >/dev/null
  Time (mean ± σ):     12.289 s ±  0.291 s    [User: 7.841 s, System: 1.724 s]
  Range (min … max):   11.971 s … 12.542 s    3 runs

PR before the fix

Benchmark #1: python -m beets ls --format "$artist - $album - $title" >/dev/null
  Time (mean ± σ):     13.921 s ±  0.282 s    [User: 8.721 s, System: 2.090 s]
  Range (min … max):   13.624 s … 14.186 s    3 runs

Benchmark #1: python -m beets ls --format "$artist - $album%ifdef{event, [$event]} - $title" >/dev/null
  Time (mean ± σ):     13.929 s ±  0.769 s    [User: 9.457 s, System: 1.701 s]
  Range (min … max):   13.042 s … 14.416 s    3 runs

PR with fix

Benchmark #1: python -m  beets ls --format "$artist - $album - $title" >/dev/null
  Time (mean ± σ):      2.124 s ±  0.342 s    [User: 1.975 s, System: 0.132 s]
  Range (min … max):    1.908 s …  2.518 s    3 runs

Benchmark #1: python -m beets ls --format "$artist - $album%ifdef{event, [$event]} - $title" >/dev/null
  Time (mean ± σ):     14.250 s ±  0.469 s    [User: 9.232 s, System: 1.947 s]
  Range (min … max):   13.975 s … 14.792 s    3 runs

Still slightly slower in the first case, but much more manageable. Note that the comparison of the second case is unfair, since it tests functionality that the current master does not provide.

Co-Authored-By: Curtis Rueden <ctrueden@wisc.edu>

ctrueden · 2020-11-02T20:17:36Z

Awesome, thanks @FichteFoll ! 👍

@sampsyo Looks like this PR is good to go! 🏎️ 💨

ctrueden · 2020-11-26T02:09:48Z

Anything else I can do to help move this forward?

ctrueden · 2020-12-08T20:57:20Z

Just pinging one more time—seems a shame to let this languish now that the performance issue is resolved. I completely understand being too busy, but if there's anything the community can do to help ensure this can merge smoothly, let us know. E.g. if there is a manual testing process to help avoid regressions which others could run through on various platforms, to offload otherwise direct maintainer effort, let's write that up and do it!

…lback

FichteFoll · 2021-01-08T18:34:33Z

Did a quick merge locally as I continue to use the PR as my main driver.

Linting fails because flake8-blind-except was updated yesterday and introduced a new error code for except Exception, which is why the latest master passed. Considering how flake8 (pycodestyle) now has a check for bare except: statements itself, the plugin can probably be removed from the lint setup.

jackwilsdon · 2021-01-08T18:37:36Z

@FichteFoll if you merge master in it should be resolved now 👍

…lback

sampsyo

Huge apologies for the enormous, inexplicable delay on merging this PR. It's amazing work and didn't deserve to languish, but I'm going to hit the green button now!

My only remaining worry about this design is about concurrency: the new caching/revision-number scheme is obviously great, but because it can turn model.load() into a no-op, it runs the risk of missing updates from other threads and processes that would previously be visible. I can't at the moment think of a case where that kind of update is critical, however (perhaps the web plugin?), so I think it's time to unleash this change on the world and track down problems as they arise.

Thank you all for the longitudinal team effort. I'm seriously impressed at the clear thinking and solid engineering that went into this change. 👏

sampsyo reviewed Jul 22, 2018

View reviewed changes

FichteFoll force-pushed the pr/item-album-fallback branch 2 times, most recently from 13c868f to b6bf829 Compare September 14, 2018 01:06

FichteFoll added a commit to FichteFoll/dotfiles that referenced this pull request Sep 15, 2018

[beets] Update beets path config

b38b6b9

Utilize album fields for special formatting of doujin releases. Requires a currently unmerged PR to beets. beetbox/beets#2988

FichteFoll mentioned this pull request Nov 13, 2018

Path config query does not work with album-level flexible attributes #2797

Closed

FichteFoll force-pushed the pr/item-album-fallback branch from db9103d to 29acdd6 Compare November 16, 2018 16:38

FichteFoll mentioned this pull request Feb 27, 2019

Switch from pyyaml to ruamel.yaml #3170

Closed

Merge remote-tracking branch 'upstream/master' into pr/item-album-fal…

eda9930

…lback

Merge remote-tracking branch 'upstream/master' into pr/item-album-fal…

701cd6c

…lback

Avoid overeager inclusion of album attributes

2d024d2

Co-Authored-By: Curtis Rueden <ctrueden@wisc.edu>

FichteFoll force-pushed the pr/item-album-fallback branch from 89721ea to 2d024d2 Compare November 2, 2020 00:10

Merge remote-tracking branch 'upstream/master' into pr/item-album-fal…

5ace2b6

…lback

FichteFoll and others added 2 commits January 8, 2021 19:47

Merge remote-tracking branch 'upstream/master' into pr/item-album-fal…

20ec011

…lback

Merge branch 'master' into pr/item-album-fallback

09a6ec4

sampsyo approved these changes Mar 7, 2021

View reviewed changes

sampsyo merged commit 3e82613 into beetbox:master Mar 7, 2021

djl mentioned this pull request Mar 13, 2021

mbsync: Double output #3880

Closed

wisp3rwind mentioned this pull request Mar 26, 2021

Set field values on the import command line #1881

Closed

ssssam mentioned this pull request May 1, 2021

Optimise FormattedMapping when querying a specific set of fields #3762

Merged

2 tasks

wisp3rwind mentioned this pull request May 13, 2021

persist set_fields to media files #3927

Merged

3 tasks

sampsyo mentioned this pull request Apr 18, 2023

Use SQL to query flex fields, and related Album/Item data #4746

Closed

3 tasks

Add fallback for item access to album's attributes #2988

Add fallback for item access to album's attributes #2988

Conversation

FichteFoll commented Jul 20, 2018

sampsyo left a comment

Choose a reason for hiding this comment

sampsyo Jul 22, 2018

Choose a reason for hiding this comment

FichteFoll Jul 22, 2018 • edited

Choose a reason for hiding this comment

FichteFoll commented Jul 22, 2018 • edited

sampsyo commented Jul 22, 2018

FichteFoll commented Jul 22, 2018

sampsyo commented Jul 23, 2018

FichteFoll commented Jul 26, 2018 • edited

sampsyo commented Jul 26, 2018

FichteFoll commented Jul 26, 2018

sampsyo commented Jul 26, 2018

FichteFoll commented Sep 14, 2018

FichteFoll commented Sep 14, 2018

sampsyo commented Sep 16, 2018

FichteFoll commented Sep 17, 2018

sampsyo commented Sep 17, 2018

sampsyo commented Sep 17, 2018

FichteFoll commented Sep 17, 2018

sampsyo commented Sep 18, 2018

FichteFoll commented Oct 30, 2018

FichteFoll commented Nov 16, 2018

FichteFoll commented Jan 7, 2019

FichteFoll commented Mar 25, 2019

kergoth commented May 14, 2019 • edited

arcresu commented May 14, 2019

FichteFoll commented May 19, 2019

radusuciu commented Aug 5, 2020

FichteFoll commented Aug 10, 2020 • edited

ctrueden commented Sep 29, 2020 • edited

ctrueden commented Oct 26, 2020

ctrueden commented Oct 26, 2020 • edited

Explanation

One other question

Updated benchmarks with bug-fix

sampsyo commented Oct 27, 2020

ctrueden commented Oct 27, 2020

sampsyo commented Oct 28, 2020

FichteFoll commented Oct 28, 2020

ctrueden commented Oct 28, 2020

FichteFoll commented Nov 2, 2020 • edited

ctrueden commented Nov 2, 2020

ctrueden commented Nov 26, 2020

ctrueden commented Dec 8, 2020

FichteFoll commented Jan 8, 2021 • edited

jackwilsdon commented Jan 8, 2021

sampsyo left a comment

Choose a reason for hiding this comment

FichteFoll Jul 22, 2018 •

edited

FichteFoll commented Jul 22, 2018 •

edited

FichteFoll commented Jul 26, 2018 •

edited

kergoth commented May 14, 2019 •

edited

FichteFoll commented Aug 10, 2020 •

edited

ctrueden commented Sep 29, 2020 •

edited

ctrueden commented Oct 26, 2020 •

edited

FichteFoll commented Nov 2, 2020 •

edited

FichteFoll commented Jan 8, 2021 •

edited