New and changed contributor scanning improvements #979

darrell-k · 2024-01-17T15:41:46Z

Changed strategy for keeping the contributor_albums table in line through a "New & Changed" scan.

(It turned out that my change in this area just before Xmas introduced a bug as well as fixing one!)

Rather than continue complicating the rescan routine in Contributors.pm, I've introduced code based on the commented out code in schema_optimize.sql, but running it only for albums that have changed, in Utils/Scanner/Local.pm.

I've also changed the update to the albums table to prioritise ALBUMARTIST if there is one. This already happens when creating a new album, but ALBUMARTIST was being ignored in the update, which just seemed wrong.

I've tested it, but it really needs independent testing and scrutiny.

michaelherger · 2024-01-18T14:03:07Z

Did you notice any performance hit? I fear re-running those full queries might be quite slow compared to some individual updates?

darrell-k · 2024-01-18T15:28:48Z

I would hope it will improve performance, because it would now be doing the contributor_album update in one hit on the database, rather than incrementally: the Contributor->rescan from which I have removed the contributor_album manipulation is called once for every album/track/contributor, most of the time probably not changing anything after running its queries.

Obviously, this change adds a slight overhead of creating the temporary changed_albums table. I've made a small change to avoid this when there are no changed tracks (ie in a full rescan). Will push it shortly.

I'll also provide some metrics on performance.

mherger · 2024-01-18T15:56:49Z

I guess it really depends on how many changes you do, and how large your library is. If you have 50k tracks and changed 10k of them, then a mass update should be faster. But if you only changed one album out of 50k albums?...

darrell-k · 2024-01-18T16:27:36Z

We've got an index on tracks(url) so the temporary table creation should be fine. The join to tracks on url is already being done for the existing "changed" table, presumably that is one reason why the index exists.

I could add an album column to the existing "changed" table rather than creating a separate "changed_albums" table, but that would make the "SELECT DISTINCT" in the insertion work a bit harder which might be noticable with a large set of changed tracks, so I don't know that it would be worth it.

Anyway, as I said above, metrics incoming.

darrell-k · 2024-01-18T19:34:32Z

The results are in.

Library size: 6508 albums, 86853 tracks

Old:
Media folder scan total (from the log message): 13 changed tracks: 8.1 seconds, 13000 changed tracks: 156 seconds
changed file scan only (diff in log timestamps): 13 changed tracks: 0.4 seconds, 13000 changed tracks: 149 seconds

New:
Media folder scan total (from the log message): 13 changed tracks: 8.2 seconds, 13000 changed tracks: 145 seconds
changed file scan only (diff in log timestamps): 13 changed tracks: 0.5 seconds, 13000 changed tracks: 137 seconds

My conclusion is that there is no significant difference in performance. The test was somewhat biased towards the old method because there were no actual tag changes in the files, meaning that the old method would not have made any changes to the contributor_album table, whereas the new method rebuilds it regardless.

This is on a (very) cheap Pentium mini PC with 8G RAM running Debian with the music library on a USB-attached mechanical disk, which may account for my relatively prolonged file/folder discovery times - basically all of the 8 seconds shown above!

darrell-k · 2024-01-24T23:52:35Z

What's the feeling on this now? I hit the bug again today while fiddling with my tags to help out a user on the forum. I think it would be a nice bullet point for the 8.4 release to improve the n&c scan, which I think will be appreciated.

I think my performance test above is quite robust, and Frank on the forum has been running it for a few days now, with no reported slowdown doing n&c scans on his very large collection.

michaelherger · 2024-01-25T06:45:39Z

Slim/Utils/Scanner/Local.pm

+			$dbh->do( qq{
+				INSERT INTO changed_albums
+				SELECT DISTINCT(tracks.album) AS album
+				FROM   changed
+				JOIN   tracks ON changed.url = tracks.url
+				WHERE NOT EXISTS (SELECT * FROM changed_albums WHERE changed_albums.album = tracks.album)
+			} );


Wouldn't it be simpler/good enough to only populate this table here, rather than on line 314 and here?

And would INSERT OR IGNORE be slower rather than the lookup on itself?

Oh, I missed that there's not only the deleteTracks() call, but also updateTracks() in between them.

Ok, the INSERT OR IGNORE would require some more tweaking of the code. But that looks "cleaner" to me:

diff --git a/Slim/Utils/Scanner/Local.pm b/Slim/Utils/Scanner/Local.pm index 96b596f2e..2a290cf70 100644 --- a/Slim/Utils/Scanner/Local.pm +++ b/Slim/Utils/Scanner/Local.pm @@ -310,8 +310,9 @@ sub rescan { if ( $changedOnlyCount ) { $log->error("Build temporary table for changed albums") unless main::SCANNER && $main::progress; $dbh->do('DROP TABLE IF EXISTS changed_albums'); + $dbh->do("CREATE $createTemporary TABLE changed_albums (album INTEGER PRIMARY KEY UNIQUE)"); $dbh->do( qq{ - CREATE $createTemporary TABLE changed_albums AS + INSERT INTO changed_albums SELECT DISTINCT(tracks.album) AS album FROM changed JOIN tracks ON changed.url = tracks.url @@ -332,11 +333,10 @@ sub rescan { # ... first, now that tracks has been updated, add albums referenced now which weren't referenced before (in case the user has moved a track to a different album) $log->error("Adding to temporary table for changed albums") unless main::SCANNER && $main::progress; $dbh->do( qq{ - INSERT INTO changed_albums + INSERT OR IGNORE INTO changed_albums SELECT DISTINCT(tracks.album) AS album FROM changed JOIN tracks ON changed.url = tracks.url - WHERE NOT EXISTS (SELECT * FROM changed_albums WHERE changed_albums.album = tracks.album) } ); # ... now, rebuild contributor_album $dbh->do( qq{

mherger · 2024-01-25T08:09:10Z

Let's give that a try! Thanks a lot! I'll tweak the SQL on my end.

darrell-k · 2024-01-25T09:56:30Z

INSERT OR IGNORE is a new one on me, I was brought up on DB2 SQL which has it's own set of extensions to ANSI SQL but not that one.

If you need the unique index to make that work, I guess what "OR IGNORE" is doing is silently surpressing primary key violation errors during the INSERT. I used to write a error handler for that sort of thing!

Aren't standards a wonderful thing?

Anyway, Thanks, hopefully this will cut down a little bit on "my n&c scan didn't work" questions.

New and changed contributor scanning improvements

06993dc

Don't build temp albums table if no changed tracks

9d6a559

michaelherger reviewed Jan 25, 2024

View reviewed changes

mherger merged commit c8a820b into LMS-Community:public/8.4 Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New and changed contributor scanning improvements #979

New and changed contributor scanning improvements #979

darrell-k commented Jan 17, 2024

michaelherger commented Jan 18, 2024

darrell-k commented Jan 18, 2024

mherger commented Jan 18, 2024

darrell-k commented Jan 18, 2024

darrell-k commented Jan 18, 2024

darrell-k commented Jan 24, 2024

michaelherger Jan 25, 2024

michaelherger Jan 25, 2024

michaelherger Jan 25, 2024

mherger commented Jan 25, 2024

darrell-k commented Jan 25, 2024

New and changed contributor scanning improvements #979

New and changed contributor scanning improvements #979

Conversation

darrell-k commented Jan 17, 2024

michaelherger commented Jan 18, 2024

darrell-k commented Jan 18, 2024

mherger commented Jan 18, 2024

darrell-k commented Jan 18, 2024

darrell-k commented Jan 18, 2024

darrell-k commented Jan 24, 2024

michaelherger Jan 25, 2024

Choose a reason for hiding this comment

michaelherger Jan 25, 2024

Choose a reason for hiding this comment

michaelherger Jan 25, 2024

Choose a reason for hiding this comment

mherger commented Jan 25, 2024

darrell-k commented Jan 25, 2024