Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add fallback for item access to album's attributes #2988

Merged
merged 13 commits into from Mar 7, 2021

Conversation

FichteFoll
Copy link
Contributor

Allows queries (especially for pathspecs) based on an album's flexattrs
while operating on items.

Fixes #2797.

I'll leave formulating the changelog up to you.

Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for diving into this! This is a tricky issue, despite being a relatively small diff, and I appreciate your patience while we think through it.

I've left one minor comment inline. I also have a few general discussion points, from minor to major:

  1. We have an existing mechanism for formatting album-level data for items. This would need to be removed if we move the "merging" behavior lower in the abstraction hierarchy.
  2. You asked about caching issues. In our current system, Item and Album objects are not "live," in that changing them through one reference does not magically change their values in another context. They represent a sort of snapshot of the database, which is why they have a load method for updating values from the database. In that sense, caching the associated album might be even more reasonable than reloading the values every time.
  3. This kind of change would completely hide item-level data, and eliminate the possibility for items to differ from each other on album-level fields. This is probably OK, but we have to consider the consequences. For example, in beets currently, it's possible to "work around" album-level fields you don't like and make some values different on a per-track basis. For example, if you decide you want country to vary across the tracks in a certain album, you can accomplish that if you really want to. With this change, however, those differences would be silently hidden. Again, this is probably OK, but we'd need to be clear about what's changing.

return self[key]
else:
return default

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really just a matter of aesthetics, but one alternative would be to have get just do the necessary exception handling. That is, __getitem__ would continue to contain all the "real" logic and include the raise at the end. Then, get would change to contain:

try:
    return self[key]
except KeyError:
    return default

That would help keep the docstring for get sensible, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might have been a bit of premature optimization, but afaik exception handling in Python isn't exactly performant (creation of exception object, traceback, caching of frames etc.) and combining the code paths was pretty trivial. I also liked that I could use this to have a short implementation in Item.get.

The docstring issue doesn't bother me personally because it properly describes what get can do. It does have a different function signature than dict.get, which we need for the with_album parameter anyway, but it's still compatible. I think I prefer Optionally over Alternatively, however.

I would have made those parameters keyword-only, but that is only possible in Python 3.

@FichteFoll
Copy link
Contributor Author

FichteFoll commented Jul 22, 2018

  1. Not sure what you mean by this. Formatting works exactly like it did before. However, I missed that self.model_keys is fetched from model.keys(True), which includes the album fallback (and results in the item's formatter for a field being requested rather than the album's). I'll override self.model_keys in FormattedItemMapping's __init__.

  2. I see. So, I could keep an Item-internal reference to an album and just run load on that before trying to access album attributes (or keys()). Is load smart enough not to update itself when there haven't been any changes to the database since the last fetch? If it doesn't, then this change would be more involved and requires some care to actually make the caching useful. (I was thinking about storing the same album reference in multiple items for example, but I'd need to call load for each item individually anyway because we can't be sure the database hasn't been updated in the meantime.) Otherwise, just keeping a lazy-loaded album attribute per item individually would probably already be an improvement.

  3. I'm not sure I understand the problem you're describing because, as far as I'm aware, all of this is still possible. Attribute access on the item itself is prioritized over the album fallback, for their standard fields and even for flexattrs. _setitem was not modified, so you can also still set an item's field or flexattr to override an album's.

@sampsyo
Copy link
Member

sampsyo commented Jul 22, 2018

Not sure what you mean by this. Formatting works exactly like it did before. However, I missed that self.model_keys is fetched from model.keys(True), which includes the album fallback (and results in the item's formatter for a field being requested rather than the album's). I'll override self.model_keys in FormattedItemMapping's init.

Thanks for catching that!

What I'm worried about is not a direct conflict or anything—just that we're implementing the same logic ("fallback" between item and album attributes) twice. If evaluating the expression item.field already looks up field in item's album, then ideally we would not need FormattedItemMapping—the plain old FormattedMapping from dbcore would do the trick.

But as you discovered, there's subtlety about which formatter gets used. Maybe there's an elegant way to provide a merged view without the duplication, but maybe this division of responsibilities is OK.

I see. So, I could keep an Item-internal reference to an album and just run load on that before trying to access album attributes (or keys()). Is load smart enough not to update itself when there haven't been any changes to the database since the last fetch? If it doesn't, then this change would be more involved and requires some care to actually make the caching useful. (I was thinking about storing the same album reference in multiple items for example, but I'd need to call load for each item individually anyway because we can't be sure the database hasn't been updated in the meantime.) Otherwise, just keeping a lazy-loaded album attribute per item individually would probably already be an improvement.

No, load always loads the latest data. (Otherwise, we'd need some mechanism on the side for tracking when the database has changed—which likely would be no faster to check than just loading from the database.)

I'm not sure I understand the problem you're describing because, as far as I'm aware, all of this is still possible. Attribute access on the item itself is prioritized over the album fallback, for their standard fields and even for flexattrs. _setitem was not modified, so you can also still set an item's field or flexattr to override an album's.

OK, good point! I had missed that existing values on items take precedence. That means, unless I'm mistaken, that item-level fixed attributes always take precedence—because it's impossible to remove them. Sounds good!

@FichteFoll
Copy link
Contributor Author

just that we're implementing the same logic ("fallback" between item and album attributes) twice

Yes, I noticed that too, but I believe we still need the formatter for the reasons you mentioned (and also because it performs other tasks such as alias mapping).

So, I tried implementing a lazy-loaded and cached album property for internal use, next to get_album, but I quickly realized that this doesn't really improve things a lot and it certainly hurts readability in a way that it makes code complex. You'd think you can turn a couple album = self.get_album(); if album: … into just if self.album:, but the property getter still needs to call load on every access. In the end, it comes down to a load call vs a get_album call, and with the former you'd end up having to juggle an internal-only album object for yet another way to access an item's album.

@sampsyo
Copy link
Member

sampsyo commented Jul 23, 2018

So, I tried implementing a lazy-loaded and cached album property for internal use, next to get_album, but I quickly realized that this doesn't really improve things a lot and it certainly hurts readability in a way that it makes code complex. You'd think you can turn a couple album = self.get_album(); if album: … into just if self.album:, but the property getter still needs to call load on every access. In the end, it comes down to a load call vs a get_album call, and with the former you'd end up having to juggle an internal-only album object for yet another way to access an item's album.

I see—thanks for giving it a try, and I can see how that would be less than ideal.

Anyway, this is shaping up well! It might be a good idea to run a few simple performance tests to make sure we aren't doing something terrible to the time required to run beet list, for example.

@FichteFoll
Copy link
Contributor Author

FichteFoll commented Jul 26, 2018

~ λ hyperfine "beet list" -m 2
Benchmark #1: beet list
  Time (mean ± σ):      7.207 s ±  0.018 s    [User: 5.438 s, System: 0.759 s]
  Range (min … max):    7.194 s …  7.220 s

~/code/beets ∃ hyperfine "python -m beets list" -m 2
Benchmark #1: python -m beets list
  Time (mean ± σ):     17.757 s ±  0.093 s    [User: 12.581 s, System: 2.135 s]
  Range (min … max):   17.691 s … 17.823 s

Well, not looking so bright. It's a >100% slowdown. This'd need some smart caching, probably. I do wonder why the difference is so high, though. I mean, the ItemFormatter needed to access the item's album before as well. Maybe keys is run more often than I expected?

Also, I should probably add some documentation about this change.

Benchmark tool: https://github.com/sharkdp/hyperfine

@sampsyo
Copy link
Member

sampsyo commented Jul 26, 2018

Hmm, that is a little worrisome. Let's dig a little deeper and see if we can't mitigate some of the effects. (Thanks for the tip about hyperfine, btw!)

@FichteFoll
Copy link
Contributor Author

(I just found out about hyperfine today as I browsed the fd Readme by the same author.)

This is probably the point where I would start to look into profiling as I'm still not too familiar with the code base and believe this would provide a good starting point. Have you ever done this in python and have some recommendations for tools or other tips? (I haven't.)

@sampsyo
Copy link
Member

sampsyo commented Jul 26, 2018

I think that cProfile, in the standard library, is still probably the best profiler out there. One tip I do have, however, is that SnakeViz is a really nice browser-based GUI for viewing/navigating profile data.

@FichteFoll FichteFoll force-pushed the pr/item-album-fallback branch 2 times, most recently from 13c868f to b6bf829 Compare September 14, 2018 01:06
@FichteFoll
Copy link
Contributor Author

Took a look back at this. I used py-spy for some quick effort-less profiling and it was quite obvious that the majority of the time is being spent with database access in get_album (or rather the album property, as I changed it). Just uncommenting the album.load() code removes the entire performance impact, but it also means the albums we're trying to print could be outdated.

I considered the simplest solution forward to be what I suggested earlier:

  1. Make the item cache its album field and provide access through a property. The album returned by this property is read only since it is, well, cached. I decided against preventive measures here and instead made the property "hidden" with an underscore and provided documentation.
  2. Only load database model objects when they have changed by tracking a revision number that I added to the database and increase on each mutating transaction. Had to tweak this for a little while until it passed all tests, but I suppose this is fairly safe going forward now. I added a comment clarifying on the possibility of race conditions, but as long as the _db_lock is aquired, we are fine.

Let me know what you think.

Also, I wasn't sure if I should add a section regarding the API to the changelog. It would mention the fallback of item access on Item and that re-loading is now lazy, although the latter should be transparent.

@FichteFoll
Copy link
Contributor Author

Benchmarks: first is with this PR, second is 1.4.7.

~/code/beets ¬ hyperfine "python -m beets list" -m 20 && hyperfine "beet list" -m 20
Benchmark #1: python -m beets list

  Time (mean ± σ):      6.188 s ±  0.216 s    [User: 4.582 s, System: 0.761 s]

  Range (min … max):    6.005 s …  6.948 s

  Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark #1: beet list

  Time (mean ± σ):      6.036 s ±  0.044 s    [User: 4.455 s, System: 0.749 s]

  Range (min … max):    5.956 s …  6.115 s

FichteFoll added a commit to FichteFoll/dotfiles that referenced this pull request Sep 15, 2018
Utilize album fields for special formatting of doujin releases.
Requires a currently unmerged PR to beets.

beetbox/beets#2988
@sampsyo
Copy link
Member

sampsyo commented Sep 16, 2018

Wow! This is extremely cool! Very nice work!

You’re right that the “revision” trick is fragile, since it requires us to intervene on all updates of the database and only avoids races because we implement our own internal global lock, but this seems like a great trade-off for the performance win it affords. This seems worth doing independent of the new query behavior you’ve introduced here.

As an aside, we’re already doing something similar for memozing %aunique strings, which are otherwise very expensive to recompute:
https://github.com/beetbox/beets/blob/b6bf82933ed3e8233e200079c92e2577a5ad5040/beets/library.py

Perhaps we should move this mechanism to reuse the revision mechanism (to detect when it’s time to invalidate the memoization table).

I do like the idea of adding a note about the API change “for developers” in the changelog. The new fallback behavior is worth documenting, even if _cached_album itself isn’t.

@FichteFoll
Copy link
Contributor Author

I see. Yes, that would probably be useful for the aunique feature and might even warrant a proper implementation (i.e. a "public API"), but I'd rather not do this in here.

Regarding the changelog, I can do that. I deliberately made _cached_album an internal property (with the underscore) because I wasn't confident in exposing it. It is, after all, kind of a workaround, although it's the best I could think of. But as long as it's private, it can be changed.

By the way, as you can see from the commit that references this PR, I started using this in production and haven't encountered any issues so far. I probably could have done that before as well, as I wasn't too concerned about the performance, but for this PR it was a must.

@sampsyo
Copy link
Member

sampsyo commented Sep 17, 2018

OK, great. Since it’s a sensitive change, it might be wise to put out a call for testing so folks can try it out with funky configurations. I’ll post something to Discourse.

@sampsyo
Copy link
Member

sampsyo commented Sep 17, 2018

Post added: https://discourse.beets.io/t/call-for-testers-better-queries-for-album-level-fields-a-performance-improvement/477

@FichteFoll
Copy link
Contributor Author

Before you check: The CI failure is unrelated to the PR and caused by some error with curl when trying to download the Python 3.4 image for flake8 checking.

@sampsyo
Copy link
Member

sampsyo commented Sep 18, 2018

(Thanks. I restarted that Travis job and everything’s fine.)

@FichteFoll
Copy link
Contributor Author

Any updates on this? Doesn't seem like the discourse thread attracted much attention.

@FichteFoll
Copy link
Contributor Author

Rebased to fix the merge conflict on the changelog. AppVeyor has some errors in the setup phase with chocolatey.

@FichteFoll
Copy link
Contributor Author

Still nobody using this, it seems. 😞

Let me know when you intend to merge this, so I only need to fix the changelog conflict once (or you do it 🤷‍♂️ ).

@FichteFoll
Copy link
Contributor Author

Someone was asking for this a few days ago on IRC, but I missed them and couldn't point towards this PR.

Anyway, I've been using this branch for half a year now with exactly 0 issues so far. I don't use the entire feature set of beets, but importing and path styles based on album flexattrs, which is my primary use case, are just fine.

I'll try to remember making a new speed comparison since my library grew a bit over time, but I don't expect it to be much different compared to the last time.

@kergoth
Copy link
Contributor

kergoth commented May 14, 2019

FYI, I ran into a couple of issues with this, mostly relating to types in the fallback, both in path format queries and in beet ls. See https://discourse.beets.io/t/ranges-not-working-in-beet-ls-with-album-fields-in-item-track-context/

@arcresu
Copy link
Member

arcresu commented May 14, 2019

I wasn't aware of this when I threw together the diff on the discourse thread @kergoth mentioned. I'll just reproduce it here:

diff --git a/beets/library.py b/beets/library.py
index 16db1e97..71b6db22 100644
--- a/beets/library.py
+++ b/beets/library.py
@@ -526,7 +526,17 @@ class Item(LibModel):
 
     @classmethod
     def _getters(cls):
-        getters = plugins.item_field_getters()
+        def atoi(f, ag):
+            def ig(i):
+                a = i.get_album()
+                if a:
+                    return ag(a)
+                else:
+                    return cls._type(f).null
+            return ig
+        getters = {f: atoi(f, g)
+                   for f, g in plugins.album_field_getters().items()}
+        getters.update(plugins.item_field_getters())
         getters['singleton'] = lambda i: i.album_id is None
         getters['filesize'] = Item.try_filesize  # In bytes.
         return getters
diff --git a/beets/ui/__init__.py b/beets/ui/__init__.py
index 327db6b0..c3adc72d 100644
--- a/beets/ui/__init__.py
+++ b/beets/ui/__init__.py
@@ -1145,7 +1145,10 @@ def _setup(options, lib=None):
         plugins.send("library_opened", lib=lib)
 
     # Add types and queries defined by plugins.
-    library.Item._types.update(plugins.types(library.Item))
+    at = plugins.types(library.Album)
+    at.update(library.Item._types)
+    at.update(plugins.types(library.Item))
+    library.Item._types = at
     library.Album._types.update(plugins.types(library.Album))
     library.Item._queries.update(plugins.named_queries(library.Item))
     library.Album._queries.update(plugins.named_queries(library.Album))

This wasn't intended to be a final implementation, but my approach was a little bit different in that I thought the album-item relationship was something beets-specific and therefore should be reflected in library.py rather than dbcore. I used the existing getter mechanism. The atoi function takes an album-level getter and converts it into an item-level one that fetches the item's album and delegates to the original getter. Item-level properties still have precedence, as in this PR.

I did find that it was necessary to also change Item._types in order to get queries to work as intended since otherwise the album-level fields don't have type information when accessed on Items.

Note that we recently picked up a helper for memoisation in another PR:

beets/beets/util/__init__.py

Lines 1037 to 1057 in 909fd1e

def lazy_property(func):
"""A decorator that creates a lazily evaluated property. On first access,
the property is assigned the return value of `func`. This first value is
stored, so that future accesses do not have to evaluate `func` again.
This behaviour is useful when `func` is expensive to evaluate, and it is
not certain that the result will be needed.
"""
field_name = '_' + func.__name__
@property
@functools.wraps(func)
def wrapper(self):
if hasattr(self, field_name):
return getattr(self, field_name)
value = func(self)
setattr(self, field_name, value)
return value
return wrapper

@FichteFoll
Copy link
Contributor Author

Thanks for the headsup. I suspect that the problem with ranges is related to me not updating the items' type information, as you did in your diff. I was entirely new to the code base before working on this, so I just never considered that to be relevant.

The lazy_property is similar so something I drafted earlier in the process but ended up scraping because of what I outlined in an earlier comment (#2988 (comment)). The problem here is that the cached album is a snapshot of the database at whatever time it was first accessed, but the db may change during runtime and the lazy property will have no way to consider that fact.

I'll take a closer look at your getter approach when I find some time to work on this again.

(I'd like to mention that I cannot use beets without this feature anymore, so even if there is a huge update going on, I'll continue using my fork until I updated the PR for the changes.)

@radusuciu
Copy link

Willing to test if still needed, though it seems like there are now merge conflicts..

@FichteFoll
Copy link
Contributor Author

FichteFoll commented Aug 10, 2020

It's primarily the changelog that becomes conflicted every now and then. I recently did a merge of master after my local version broke due to the Python 3.8 update that affected beets' AST usage. I'm still using the branch as my daily driver, FYI, but haven't found the time to dig into the performance cost since.

I do believe that the majority of the effort has already been made and what's left is likely comparatively small, but it's still not within my free time budget at the moment.

@ctrueden
Copy link
Contributor

ctrueden commented Sep 29, 2020

This issue (#2797) also bit me while organizing my library. I have the same use case as @radusuciu: wanting to partition my directory structure based on flexible attributes set during import via the --set flag. For the moment I am using the same workaround with the inline plugin.

My inline and paths config (WIP)
album_fields:
  topdir: |
    def value(f, otherwise):
      try: result = f()
      except: result = None
      return result if result else otherwise
    return value(lambda: category, 'Artists')
  subdir: |
    def value(f, otherwise):
      try: result = f()
      except: result = None
      return result if result else otherwise
    topdir = value(lambda: category, 'Artists')
    if topdir == 'Soundtracks':
      return value(lambda: franchise, '[Unknown]')
    return '[Various]' if comp else albumartist

item_fields:
  topdir: |
    def value(f, otherwise):
      try: result = f()
      except: result = None
      return result if result else otherwise
    return value(lambda: category, 'Artists')
  subdir: |
    def value(f, otherwise):
      try: result = f()
      except: result = None
      return result if result else otherwise
    topdir = value(lambda: category, 'Artists')
    if topdir == 'Soundtracks':
      return value(lambda: franchise, '[Unknown]')
    return artist
  disc_and_track: |
    if disctotal > 9:
      return u'%02i-%02i'% (disc, track)
    elif disctotal > 1:
      return u'%01i-%02i' % (disc, track)
    elif tracktotal > 99:
      return u'%03i' % (track)
    elif tracktotal > 9:
      return u'%02i' % (track)
    else:
      return u'%01i' % (track)

paths:
  # My album flexible attributes:
  # - avmedia: Comma-separated; e.g. Animation, TV, Video Games, Musicals, Movies
  # - nationality: e.g. German, Japanese, Korean
  # - franchise: e.g. Final Fantasy
  singleton: $topdir/%the{$subdir}/%the{$artist} - $title
  comp: $topdir/%the{$subdir}/($year) $album%aunique{}/($disc_and_track) $artist - $title
  default: $topdir/%the{$subdir}/($year) $album%aunique{}/($disc_and_track) $title

It works well, but there are downsides:

  1. The performance of inline: anecdotally, beet move checks seem much slower with the workaround than without.
  2. I don't see an obvious way to do varying levels of directory nesting with this approach. Every case has to fit into a $topdir/$subdir pattern. It's certainly doable to accommodate that, but if you wanted to have e.g. Artists/A Band/(2020) Their Best Album (i.e. two folders deep) in one case and Soundtracks/Movies/My Favorite Movie/(2020) That Movie's Soundtrack (i.e. three folders deep) in another, you'd have to set things up differently since template fields with path separators (/ or \) are not interpolated and split (see this forum thread for discussion).

TL;DR: I would love to see this PR make it into 1.5.0!

@FichteFoll I'm relatively new to beets, but will try to make time to do some performance profiling in the next few days to see how it affects my own library.

@ctrueden
Copy link
Contributor

I rebased this and pushed to ctrueden/beets@0c7c586a.

Here's are the benchmark results on my library (48646 items):

commit without album attr with album attr
master (769e424) 16.19s 41.56s
rebased PR (ctrueden/beets@0c7c586a) 26.44s 38.16s

I agree that the performance drop is both unexpected and unfortunate. I'll try to dig more soon and report back.

@ctrueden
Copy link
Contributor

ctrueden commented Oct 26, 2020

TL;DR I pushed a fix to my branch: ctrueden/beets@4c5b5084. 🙌

Explanation

Here is where the performance hit is happening:

  File ".../beets/beets/ui/commands.py", line 1076, in list_func
    list_items(lib, decargs(args), opts.album)
  File ".../beets/beets/ui/commands.py", line 1072, in list_items
    ui.print_(format(item, fmt))
  File ".../beets/beets/library.py", line 362, in __format__
    return self.evaluate_template(spec)
  File ".../beets/beets/dbcore/db.py", line 621, in evaluate_template
    return template.substitute(self.formatted(for_path),
  File ".../beets/beets/dbcore/db.py", line 611, in formatted
    return self._formatter(self, for_path)
  File ".../beets/beets/library.py", line 378, in __init__
    super(FormattedItemMapping, self).__init__(item, for_path)
  File ".../beets/beets/dbcore/db.py", line 62, in __init__
    self.model_keys = model.keys(True)

The list_items function iterates over the items in the query:

for item in lib.items(query):
    ui.print_(format(item, fmt))

This invokes the library's formatter function on each item, which constructs a FormattedItemMapping:

class FormattedItemMapping(dbcore.db.FormattedMapping):
    ...
    def __init__(self, item, for_path=False):
        super(FormattedItemMapping, self).__init__(item, for_path)
        # We treat album and item keys specially here,
        # so exclude transitive album keys from the model's keys.
        self.model_keys = item.keys(computed=True, with_album=False)
        self.item = item

But then the super-constructor does this:

def __init__(self, model, for_path=False):
    self.for_path = for_path
    self.model = model
    self.model_keys = model.keys(True) # <== Equivalent to computed=True, with_album=True !

As you can see above, the generic FormattedMapping asks for the item keys including those from the album, which grabs them eagerly regardless of whether they will be needed for that template down the line—even though the comment suggests the intent was to exclude the album keys.

@FichteFoll Is the fix as simple as passing with_album=False here instead? It worked in my tests! I pushed a more cautious fix to my branch (ctrueden/beets@4c5b5084): it adds a flag to the super-constructor so the model_keys don't get double-computed (once in FormattedMapping with album keys included, and then again in FormattedItemMapping without them).

One other question

In my tests, I noticed that on master, flexible attributes do fall back to album attributes successfully in the format string.

Example of flexible attributes falling back to albums
$ beet ls album:'Twin Peaks' -f '$album :: $title :: $franchise'
Twin Peaks (Music From the Limited Event Series) :: Lark :: Twin Peaks
Twin Peaks :: Twin Peaks Theme :: Twin Peaks
Twin Peaks :: Laura Palmer's Theme :: Twin Peaks
Twin Peaks :: Audrey's Dance :: Twin Peaks
...
Twin Peaks (Music From the Limited Event Series) :: Out of Sand :: Twin Peaks
Twin Peaks (Music From the Limited Event Series) :: Axolotl (Roadhouse mix) :: Twin Peaks
Twin Peaks (Limited Event Series Soundtrack) :: Threnody to the Victims of Hiroshima :: Twin Peaks
Twin Peaks (Music From the Limited Event Series) :: Sharp Dressed Man :: Twin Peaks
$ beet ls album:'Twin Peaks' -f '$album :: $title :: $franchise' | wc -l
84
$ echo 'select * from item_attributes where key="franchise" and value="Twin Peaks";' | sqlite3 ~/.config/beets/library.db
805891|49695|franchise|Twin Peaks
805895|49696|franchise|Twin Peaks
805899|49697|franchise|Twin Peaks
805945|49698|franchise|Twin Peaks

It seems to be only queries that don't fall back:

$ beet ls franchise:'Twin Peaks' | wc -l
4

Is that known/intended behavior? Certainly it's inconsistent. Finishing this PR would address that!

Updated benchmarks with bug-fix

commit without album attr with album attr
master (769e424) 16.19s 41.56s
rebased PR (ctrueden/beets@0c7c586) 26.44s 38.16s
rebased PR + bug-fix (ctrueden/beets@4c5b5084) 14.70s 35.94s

@sampsyo
Copy link
Member

sampsyo commented Oct 27, 2020

Wow!! That's truly amazing—excellent work digging into this! 🎉 It would be really, really cool to include this.

To answer your second question: yes, that is indeed inconsistent, but it is also the intended behavior. Basically, the backstory is that we knew the templating fallback was useful, and it was straightforward to implement without too much performance impact, so we did it—but the query side of things was harder to do. But here we are?!

@ctrueden
Copy link
Contributor

@sampsyo Great! So would you say this latest patch series is ready for merge, then? If so, @FichteFoll, could you please force push the PR accordingly?

@sampsyo
Copy link
Member

sampsyo commented Oct 28, 2020

Just to confirm, we want to merge the changes here and your new patch together, right? So maybe the right thing is to open a new PR with all that together?

@FichteFoll
Copy link
Contributor Author

@ctrueden thank you a lot for looking into this. I will check out your over the weekend, most likely, and verify it in my environment as well as include your changes.

@ctrueden
Copy link
Contributor

@sampsyo wrote:

we want to merge the changes here and your new patch together, right?

Right.

So maybe the right thing is to open a new PR with all that together?

@FichteFoll can force-push the branch to update this PR. That would be cleaner IMHO than opening a new PR. 😄

@FichteFoll wrote:

I will check out your over the weekend

Great! 👍

@FichteFoll
Copy link
Contributor Author

FichteFoll commented Nov 2, 2020

Good work, @ctrueden. This fixes the really bad performance for me as well. 🎉 Glad it was only an oversight on my part and not an inherent flaw in the implementation.

I applied your commit pretty much as-is. An alternative way would have been to skip the super contructor from FormattedItemMapping and reimplement it, but that didn't seem proper.

My results with 11171 entries in the database:

master

Benchmark #1: python -m  beets ls --format "$artist - $album - $title" >/dev/null
  Time (mean ± σ):      1.931 s ±  0.071 s    [User: 1.828 s, System: 0.094 s]
  Range (min … max):    1.881 s …  2.013 s    3 runs
  
Benchmark #1: python -m beets ls --format "$artist - $album%ifdef{event, [$event]} - $title" >/dev/null
  Time (mean ± σ):     12.289 s ±  0.291 s    [User: 7.841 s, System: 1.724 s]
  Range (min … max):   11.971 s … 12.542 s    3 runs

PR before the fix

Benchmark #1: python -m beets ls --format "$artist - $album - $title" >/dev/null
  Time (mean ± σ):     13.921 s ±  0.282 s    [User: 8.721 s, System: 2.090 s]
  Range (min … max):   13.624 s … 14.186 s    3 runs

Benchmark #1: python -m beets ls --format "$artist - $album%ifdef{event, [$event]} - $title" >/dev/null
  Time (mean ± σ):     13.929 s ±  0.769 s    [User: 9.457 s, System: 1.701 s]
  Range (min … max):   13.042 s … 14.416 s    3 runs

PR with fix

Benchmark #1: python -m  beets ls --format "$artist - $album - $title" >/dev/null
  Time (mean ± σ):      2.124 s ±  0.342 s    [User: 1.975 s, System: 0.132 s]
  Range (min … max):    1.908 s …  2.518 s    3 runs

Benchmark #1: python -m beets ls --format "$artist - $album%ifdef{event, [$event]} - $title" >/dev/null
  Time (mean ± σ):     14.250 s ±  0.469 s    [User: 9.232 s, System: 1.947 s]
  Range (min … max):   13.975 s … 14.792 s    3 runs

Still slightly slower in the first case, but much more manageable. Note that the comparison of the second case is unfair, since it tests functionality that the current master does not provide.

Co-Authored-By: Curtis Rueden <ctrueden@wisc.edu>
@ctrueden
Copy link
Contributor

ctrueden commented Nov 2, 2020

Awesome, thanks @FichteFoll ! 👍

@sampsyo Looks like this PR is good to go! 🏎️ 💨

@ctrueden
Copy link
Contributor

Anything else I can do to help move this forward?

@ctrueden
Copy link
Contributor

ctrueden commented Dec 8, 2020

Just pinging one more time—seems a shame to let this languish now that the performance issue is resolved. I completely understand being too busy, but if there's anything the community can do to help ensure this can merge smoothly, let us know. E.g. if there is a manual testing process to help avoid regressions which others could run through on various platforms, to offload otherwise direct maintainer effort, let's write that up and do it!

@FichteFoll
Copy link
Contributor Author

FichteFoll commented Jan 8, 2021

Did a quick merge locally as I continue to use the PR as my main driver.

Linting fails because flake8-blind-except was updated yesterday and introduced a new error code for except Exception, which is why the latest master passed. Considering how flake8 (pycodestyle) now has a check for bare except: statements itself, the plugin can probably be removed from the lint setup.

@jackwilsdon
Copy link
Member

@FichteFoll if you merge master in it should be resolved now 👍

Copy link
Member

@sampsyo sampsyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huge apologies for the enormous, inexplicable delay on merging this PR. It's amazing work and didn't deserve to languish, but I'm going to hit the green button now!

My only remaining worry about this design is about concurrency: the new caching/revision-number scheme is obviously great, but because it can turn model.load() into a no-op, it runs the risk of missing updates from other threads and processes that would previously be visible. I can't at the moment think of a case where that kind of update is critical, however (perhaps the web plugin?), so I think it's time to unleash this change on the world and track down problems as they arise.

Thank you all for the longitudinal team effort. I'm seriously impressed at the clear thinking and solid engineering that went into this change. 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Path config query does not work with album-level flexible attributes
8 participants