Add indexes to improve performance #4706

AxeOfMen · 2022-04-09T00:36:25Z

Add performance index

jedthe3rd · 2022-04-09T04:03:10Z

Hey, can you validate that adding :
CREATE INDEX "FilesetEntryFilesetID" ON "FilesetEntry" ("FilesetID");
Improves the performance over the indexes already in Performance Indexes.sql.

This index: CREATE INDEX "nn_FilesetentryFile" on FilesetEntry ("FilesetID","FileID"); should meet the same responsibility as your FilesetEntryFilesetID index.

Also any new database changes need to be in their own subsequent .sql file. So you should add your changes to a 13. Additional index.sql or something as any existing databases created with the current canary would need to have another version so they know to apply the new indexes. I think as long as that .sql file hasn't been added to a release it is ok to include it in the same file but that isn't the case for this.

AxeOfMen · 2022-04-09T06:10:45Z

Thanks for the feedback. I have moved the indexes to a separate file and incremented the db version to 13.

Yes, as mentioned in the comments for Issue 4645, these new queries are necessary. I figured your nn_FilesetentryFile index would cover the same ground as my FilesetEntryFilesetID index, but for whatever reason, mine is necessary as well. I'm curious whether you see this using the query I posted in 4645 against your 10G db.

AxeOfMen · 2022-04-09T15:38:47Z

@jedthe3rd Running ANALYZE; after your indexes eliminates the need for my indexes. Should I change my pull request to just do an analyze? It seems a little silly, but otherwise the next analyze won't occur until the end of the next run when the database connection is closed. That could take hours. In my case, I waited for weeks and the query never finished. Certainly worse in your case with a 10G db.

jedthe3rd · 2022-04-09T15:52:39Z

@jedthe3rd Running ANALYZE; after your indexes eliminates the need for my indexes. Should I change my pull request to just do an analyze? It seems a little silly, but otherwise the next analyze won't occur until the end of the next run when the database connection is closed. That could take hours. In my case, I waited for weeks and the query never finished. Certainly worse in your case with a 10G db.

This is what I was thinking was the case. Ya, I definitely think we should run analyze before each backup. I actually made this change in my code but haven't gotten around to creating the pull request because I am working on a few changes I wanted to batch.

Just change this request to add analyze to run before the backup runs and that should be good. This should have a minimal time impact overall. Can you also see if you can update the backup status to show it's "optimizing the database" as a step. I think that would be a good addition to this change. If it seems like a bunch of work then we can leave that for a different time.

AxeOfMen · 2022-04-09T17:00:33Z

I'm not in love with the idea of analyzing before each run. It's not best practice and could set a bad precedent for others looking to do something similar. While analyzing before each run may not be excessive overhead, ideally the analyze would occur immediately after adding the index. But since your PR has already been released to canary, I think that ship has sailed. As goofy as it looks, I think adding version 13 to do an analyze is the most viable option. Hmm.

jedthe3rd · 2022-04-10T19:43:24Z

I'm not in love with the idea of analyzing before each run. It's not best practice and could set a bad precedent for others looking to do something similar. While analyzing before each run may not be excessive overhead, ideally the analyze would occur immediately after adding the index. But since your PR has already been released to canary, I think that ship has sailed. As goofy as it looks, I think adding version 13 to do an analyze is the most viable option. Hmm.

Ya, I didn't think that thought all the way through. Actually the best bet is probably to add an analyze after a database is upgraded here: https://github.com/duplicati/duplicati/blob/master/Duplicati/Library/SQLiteHelper/DatabaseUpgrader.cs
Then once we release the next version of changes an analyze is a part of the process of the upgrade. I have a number of SQL changes I need to finish testing so we can have the next db version here fairly soon.

kenkendk · 2022-06-12T20:19:30Z

@AxeOfMen & @jedthe3rd Do you want to include the changes here, or did you have a better fix in mind?

jedthe3rd · 2022-06-17T04:13:59Z

@AxeOfMen & @jedthe3rd Do you want to include the changes here, or did you have a better fix in mind?

These indexes are basically duplicates. The real solution to the issue he is facing is to add an analyze when the database is upgraded.

mr-russ

Is this still relevant in the same way, the comments on the pull request seem to indicate the addition of analyze on upgrade would be useful. That is true, should that be part of this chrnge?

Also note that vacuum appears to be called during operation startup as well, I cant recall without looking it uf if that does any analysis work.

mr-russ · 2023-05-05T13:27:29Z

Duplicati/Library/Main/Database/Database schema/Schema.sql

@@ -82,6 +82,7 @@ CREATE TABLE "FilesetEntry" (
 /* Improved reverse lookup for joining Fileset and File table */
 CREATE INDEX "FilesetentryFileIdIndex" on "FilesetEntry" ("FileID");
 CREATE INDEX "nn_FilesetentryFile" on FilesetEntry ("FilesetID","FileID");
+CREATE INDEX "FilesetEntryFilesetID" ON "FilesetEntry" ("FilesetID")


The line above has an index on the same leading column. The cost of maintaining this index will be higher shan using the original one as it will skipscan.

Is there a specific reason for this index or crn it be removed?

mr-russ · 2023-05-05T13:28:55Z

Duplicati/Library/Main/Database/Database schema/13. Performance Indexes.sql

@@ -0,0 +1,4 @@
+CREATE INDEX "FileLookupMetadataID" ON "FileLookup" ("MetadataID");


I think there is value in adding sql comments indicating the operations this is known to improve so others working on this later can see the reason for the addition.

duplicatibot · 2023-06-27T23:45:21Z

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/first-backup-has-been-running-for-over-a-day-since-upgrade-from-2-0-6-1-to-2-0-7-1/16382/6

Add indexes to improve performance

cac71ad

AxeOfMen mentioned this pull request Apr 9, 2022

Backup gets stuck executing a query #4645

Open

2 tasks

AxeOfMen added 2 commits April 9, 2022 00:05

Move indexes to new file

d6debcb

Increase db schema version to 13

c90c490

mr-russ reviewed May 5, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add indexes to improve performance #4706

Add indexes to improve performance #4706

AxeOfMen commented Apr 9, 2022

jedthe3rd commented Apr 9, 2022 •

edited

AxeOfMen commented Apr 9, 2022

AxeOfMen commented Apr 9, 2022

jedthe3rd commented Apr 9, 2022

AxeOfMen commented Apr 9, 2022

jedthe3rd commented Apr 10, 2022

kenkendk commented Jun 12, 2022

jedthe3rd commented Jun 17, 2022

mr-russ left a comment

mr-russ May 5, 2023

mr-russ May 5, 2023

duplicatibot commented Jun 27, 2023

		@@ -0,0 +1,4 @@
		CREATE INDEX "FileLookupMetadataID" ON "FileLookup" ("MetadataID");

Add indexes to improve performance #4706

Are you sure you want to change the base?

Add indexes to improve performance #4706

Conversation

AxeOfMen commented Apr 9, 2022

jedthe3rd commented Apr 9, 2022 • edited

AxeOfMen commented Apr 9, 2022

AxeOfMen commented Apr 9, 2022

jedthe3rd commented Apr 9, 2022

AxeOfMen commented Apr 9, 2022

jedthe3rd commented Apr 10, 2022

kenkendk commented Jun 12, 2022

jedthe3rd commented Jun 17, 2022

mr-russ left a comment

Choose a reason for hiding this comment

mr-russ May 5, 2023

Choose a reason for hiding this comment

mr-russ May 5, 2023

Choose a reason for hiding this comment

duplicatibot commented Jun 27, 2023

jedthe3rd commented Apr 9, 2022 •

edited