New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New and changed contributor scanning improvements #979
Conversation
Did you notice any performance hit? I fear re-running those full queries might be quite slow compared to some individual updates? |
I would hope it will improve performance, because it would now be doing the contributor_album update in one hit on the database, rather than incrementally: the Contributor->rescan from which I have removed the contributor_album manipulation is called once for every album/track/contributor, most of the time probably not changing anything after running its queries. Obviously, this change adds a slight overhead of creating the temporary changed_albums table. I've made a small change to avoid this when there are no changed tracks (ie in a full rescan). Will push it shortly. I'll also provide some metrics on performance. |
I guess it really depends on how many changes you do, and how large your library is. If you have 50k tracks and changed 10k of them, then a mass update should be faster. But if you only changed one album out of 50k albums?... |
We've got an index on tracks(url) so the temporary table creation should be fine. The join to tracks on url is already being done for the existing "changed" table, presumably that is one reason why the index exists. I could add an album column to the existing "changed" table rather than creating a separate "changed_albums" table, but that would make the "SELECT DISTINCT" in the insertion work a bit harder which might be noticable with a large set of changed tracks, so I don't know that it would be worth it. Anyway, as I said above, metrics incoming. |
The results are in. Library size: 6508 albums, 86853 tracks Old: New: My conclusion is that there is no significant difference in performance. The test was somewhat biased towards the old method because there were no actual tag changes in the files, meaning that the old method would not have made any changes to the contributor_album table, whereas the new method rebuilds it regardless. This is on a (very) cheap Pentium mini PC with 8G RAM running Debian with the music library on a USB-attached mechanical disk, which may account for my relatively prolonged file/folder discovery times - basically all of the 8 seconds shown above! |
What's the feeling on this now? I hit the bug again today while fiddling with my tags to help out a user on the forum. I think it would be a nice bullet point for the 8.4 release to improve the n&c scan, which I think will be appreciated. I think my performance test above is quite robust, and Frank on the forum has been running it for a few days now, with no reported slowdown doing n&c scans on his very large collection. |
$dbh->do( qq{ | ||
INSERT INTO changed_albums | ||
SELECT DISTINCT(tracks.album) AS album | ||
FROM changed | ||
JOIN tracks ON changed.url = tracks.url | ||
WHERE NOT EXISTS (SELECT * FROM changed_albums WHERE changed_albums.album = tracks.album) | ||
} ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be simpler/good enough to only populate this table here, rather than on line 314 and here?
And would INSERT OR IGNORE
be slower rather than the lookup on itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I missed that there's not only the deleteTracks()
call, but also updateTracks()
in between them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the INSERT OR IGNORE
would require some more tweaking of the code. But that looks "cleaner" to me:
diff --git a/Slim/Utils/Scanner/Local.pm b/Slim/Utils/Scanner/Local.pm
index 96b596f2e..2a290cf70 100644
--- a/Slim/Utils/Scanner/Local.pm
+++ b/Slim/Utils/Scanner/Local.pm
@@ -310,8 +310,9 @@ sub rescan {
if ( $changedOnlyCount ) {
$log->error("Build temporary table for changed albums") unless main::SCANNER && $main::progress;
$dbh->do('DROP TABLE IF EXISTS changed_albums');
+ $dbh->do("CREATE $createTemporary TABLE changed_albums (album INTEGER PRIMARY KEY UNIQUE)");
$dbh->do( qq{
- CREATE $createTemporary TABLE changed_albums AS
+ INSERT INTO changed_albums
SELECT DISTINCT(tracks.album) AS album
FROM changed
JOIN tracks ON changed.url = tracks.url
@@ -332,11 +333,10 @@ sub rescan {
# ... first, now that tracks has been updated, add albums referenced now which weren't referenced before (in case the user has moved a track to a different album)
$log->error("Adding to temporary table for changed albums") unless main::SCANNER && $main::progress;
$dbh->do( qq{
- INSERT INTO changed_albums
+ INSERT OR IGNORE INTO changed_albums
SELECT DISTINCT(tracks.album) AS album
FROM changed
JOIN tracks ON changed.url = tracks.url
- WHERE NOT EXISTS (SELECT * FROM changed_albums WHERE changed_albums.album = tracks.album)
} );
# ... now, rebuild contributor_album
$dbh->do( qq{
Let's give that a try! Thanks a lot! I'll tweak the SQL on my end. |
INSERT OR IGNORE is a new one on me, I was brought up on DB2 SQL which has it's own set of extensions to ANSI SQL but not that one. If you need the unique index to make that work, I guess what "OR IGNORE" is doing is silently surpressing primary key violation errors during the INSERT. I used to write a error handler for that sort of thing! Aren't standards a wonderful thing? Anyway, Thanks, hopefully this will cut down a little bit on "my n&c scan didn't work" questions. |
Changed strategy for keeping the contributor_albums table in line through a "New & Changed" scan.
(It turned out that my change in this area just before Xmas introduced a bug as well as fixing one!)
Rather than continue complicating the rescan routine in Contributors.pm, I've introduced code based on the commented out code in schema_optimize.sql, but running it only for albums that have changed, in Utils/Scanner/Local.pm.
I've also changed the update to the albums table to prioritise ALBUMARTIST if there is one. This already happens when creating a new album, but ALBUMARTIST was being ignored in the update, which just seemed wrong.
I've tested it, but it really needs independent testing and scrutiny.