Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New and changed contributor scanning improvements #979

Merged
merged 2 commits into from Jan 25, 2024

Conversation

darrell-k
Copy link
Contributor

Changed strategy for keeping the contributor_albums table in line through a "New & Changed" scan.

(It turned out that my change in this area just before Xmas introduced a bug as well as fixing one!)

Rather than continue complicating the rescan routine in Contributors.pm, I've introduced code based on the commented out code in schema_optimize.sql, but running it only for albums that have changed, in Utils/Scanner/Local.pm.

I've also changed the update to the albums table to prioritise ALBUMARTIST if there is one. This already happens when creating a new album, but ALBUMARTIST was being ignored in the update, which just seemed wrong.

I've tested it, but it really needs independent testing and scrutiny.

@michaelherger
Copy link
Member

Did you notice any performance hit? I fear re-running those full queries might be quite slow compared to some individual updates?

@darrell-k
Copy link
Contributor Author

I would hope it will improve performance, because it would now be doing the contributor_album update in one hit on the database, rather than incrementally: the Contributor->rescan from which I have removed the contributor_album manipulation is called once for every album/track/contributor, most of the time probably not changing anything after running its queries.

Obviously, this change adds a slight overhead of creating the temporary changed_albums table. I've made a small change to avoid this when there are no changed tracks (ie in a full rescan). Will push it shortly.

I'll also provide some metrics on performance.

@mherger
Copy link
Contributor

mherger commented Jan 18, 2024

I guess it really depends on how many changes you do, and how large your library is. If you have 50k tracks and changed 10k of them, then a mass update should be faster. But if you only changed one album out of 50k albums?...

@darrell-k
Copy link
Contributor Author

We've got an index on tracks(url) so the temporary table creation should be fine. The join to tracks on url is already being done for the existing "changed" table, presumably that is one reason why the index exists.

I could add an album column to the existing "changed" table rather than creating a separate "changed_albums" table, but that would make the "SELECT DISTINCT" in the insertion work a bit harder which might be noticable with a large set of changed tracks, so I don't know that it would be worth it.

Anyway, as I said above, metrics incoming.

@darrell-k
Copy link
Contributor Author

The results are in.

Library size: 6508 albums, 86853 tracks

Old:
Media folder scan total (from the log message): 13 changed tracks: 8.1 seconds, 13000 changed tracks: 156 seconds
changed file scan only (diff in log timestamps): 13 changed tracks: 0.4 seconds, 13000 changed tracks: 149 seconds

New:
Media folder scan total (from the log message): 13 changed tracks: 8.2 seconds, 13000 changed tracks: 145 seconds
changed file scan only (diff in log timestamps): 13 changed tracks: 0.5 seconds, 13000 changed tracks: 137 seconds

My conclusion is that there is no significant difference in performance. The test was somewhat biased towards the old method because there were no actual tag changes in the files, meaning that the old method would not have made any changes to the contributor_album table, whereas the new method rebuilds it regardless.

This is on a (very) cheap Pentium mini PC with 8G RAM running Debian with the music library on a USB-attached mechanical disk, which may account for my relatively prolonged file/folder discovery times - basically all of the 8 seconds shown above!

@darrell-k
Copy link
Contributor Author

What's the feeling on this now? I hit the bug again today while fiddling with my tags to help out a user on the forum. I think it would be a nice bullet point for the 8.4 release to improve the n&c scan, which I think will be appreciated.

I think my performance test above is quite robust, and Frank on the forum has been running it for a few days now, with no reported slowdown doing n&c scans on his very large collection.

Comment on lines +334 to +340
$dbh->do( qq{
INSERT INTO changed_albums
SELECT DISTINCT(tracks.album) AS album
FROM changed
JOIN tracks ON changed.url = tracks.url
WHERE NOT EXISTS (SELECT * FROM changed_albums WHERE changed_albums.album = tracks.album)
} );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be simpler/good enough to only populate this table here, rather than on line 314 and here?

And would INSERT OR IGNORE be slower rather than the lookup on itself?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed that there's not only the deleteTracks() call, but also updateTracks() in between them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the INSERT OR IGNORE would require some more tweaking of the code. But that looks "cleaner" to me:

diff --git a/Slim/Utils/Scanner/Local.pm b/Slim/Utils/Scanner/Local.pm
index 96b596f2e..2a290cf70 100644
--- a/Slim/Utils/Scanner/Local.pm
+++ b/Slim/Utils/Scanner/Local.pm
@@ -310,8 +310,9 @@ sub rescan {
 		if ( $changedOnlyCount ) {
 			$log->error("Build temporary table for changed albums") unless main::SCANNER && $main::progress;
 			$dbh->do('DROP TABLE IF EXISTS changed_albums');
+			$dbh->do("CREATE $createTemporary TABLE changed_albums (album INTEGER PRIMARY KEY UNIQUE)");
 			$dbh->do( qq{
-				CREATE $createTemporary TABLE changed_albums AS
+				INSERT INTO changed_albums
 					SELECT DISTINCT(tracks.album) AS album
 					FROM   changed
 					JOIN   tracks ON changed.url = tracks.url
@@ -332,11 +333,10 @@ sub rescan {
 			# ... first, now that tracks has been updated, add albums referenced now which weren't referenced before (in case the user has moved a track to a different album)
 			$log->error("Adding to temporary table for changed albums") unless main::SCANNER && $main::progress;
 			$dbh->do( qq{
-				INSERT INTO changed_albums
+				INSERT OR IGNORE INTO changed_albums
 					SELECT DISTINCT(tracks.album) AS album
 					FROM   changed
 						JOIN tracks ON changed.url = tracks.url
-				WHERE NOT EXISTS (SELECT * FROM changed_albums WHERE changed_albums.album = tracks.album)
 			} );
 			# ... now, rebuild contributor_album
 			$dbh->do( qq{

@mherger mherger merged commit c8a820b into LMS-Community:public/8.4 Jan 25, 2024
@mherger
Copy link
Contributor

mherger commented Jan 25, 2024

Let's give that a try! Thanks a lot! I'll tweak the SQL on my end.

@darrell-k
Copy link
Contributor Author

INSERT OR IGNORE is a new one on me, I was brought up on DB2 SQL which has it's own set of extensions to ANSI SQL but not that one.

If you need the unique index to make that work, I guess what "OR IGNORE" is doing is silently surpressing primary key violation errors during the INSERT. I used to write a error handler for that sort of thing!

Aren't standards a wonderful thing?

Anyway, Thanks, hopefully this will cut down a little bit on "my n&c scan didn't work" questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants