Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add indexes to improve performance #4706

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

AxeOfMen
Copy link
Contributor

@AxeOfMen AxeOfMen commented Apr 9, 2022

Add performance index

@jedthe3rd
Copy link
Contributor

jedthe3rd commented Apr 9, 2022

Hey, can you validate that adding :
CREATE INDEX "FilesetEntryFilesetID" ON "FilesetEntry" ("FilesetID");
Improves the performance over the indexes already in Performance Indexes.sql.

This index: CREATE INDEX "nn_FilesetentryFile" on FilesetEntry ("FilesetID","FileID"); should meet the same responsibility as your FilesetEntryFilesetID index.

Also any new database changes need to be in their own subsequent .sql file. So you should add your changes to a 13. Additional index.sql or something as any existing databases created with the current canary would need to have another version so they know to apply the new indexes. I think as long as that .sql file hasn't been added to a release it is ok to include it in the same file but that isn't the case for this.

@AxeOfMen
Copy link
Contributor Author

AxeOfMen commented Apr 9, 2022

Thanks for the feedback. I have moved the indexes to a separate file and incremented the db version to 13.

Yes, as mentioned in the comments for Issue 4645, these new queries are necessary. I figured your nn_FilesetentryFile index would cover the same ground as my FilesetEntryFilesetID index, but for whatever reason, mine is necessary as well. I'm curious whether you see this using the query I posted in 4645 against your 10G db.

@AxeOfMen
Copy link
Contributor Author

AxeOfMen commented Apr 9, 2022

@jedthe3rd Running ANALYZE; after your indexes eliminates the need for my indexes. Should I change my pull request to just do an analyze? It seems a little silly, but otherwise the next analyze won't occur until the end of the next run when the database connection is closed. That could take hours. In my case, I waited for weeks and the query never finished. Certainly worse in your case with a 10G db.

@jedthe3rd
Copy link
Contributor

@jedthe3rd Running ANALYZE; after your indexes eliminates the need for my indexes. Should I change my pull request to just do an analyze? It seems a little silly, but otherwise the next analyze won't occur until the end of the next run when the database connection is closed. That could take hours. In my case, I waited for weeks and the query never finished. Certainly worse in your case with a 10G db.

This is what I was thinking was the case. Ya, I definitely think we should run analyze before each backup. I actually made this change in my code but haven't gotten around to creating the pull request because I am working on a few changes I wanted to batch.

Just change this request to add analyze to run before the backup runs and that should be good. This should have a minimal time impact overall. Can you also see if you can update the backup status to show it's "optimizing the database" as a step. I think that would be a good addition to this change. If it seems like a bunch of work then we can leave that for a different time.

@AxeOfMen
Copy link
Contributor Author

AxeOfMen commented Apr 9, 2022

I'm not in love with the idea of analyzing before each run. It's not best practice and could set a bad precedent for others looking to do something similar. While analyzing before each run may not be excessive overhead, ideally the analyze would occur immediately after adding the index. But since your PR has already been released to canary, I think that ship has sailed. As goofy as it looks, I think adding version 13 to do an analyze is the most viable option. Hmm.

@jedthe3rd
Copy link
Contributor

I'm not in love with the idea of analyzing before each run. It's not best practice and could set a bad precedent for others looking to do something similar. While analyzing before each run may not be excessive overhead, ideally the analyze would occur immediately after adding the index. But since your PR has already been released to canary, I think that ship has sailed. As goofy as it looks, I think adding version 13 to do an analyze is the most viable option. Hmm.

Ya, I didn't think that thought all the way through. Actually the best bet is probably to add an analyze after a database is upgraded here: https://github.com/duplicati/duplicati/blob/master/Duplicati/Library/SQLiteHelper/DatabaseUpgrader.cs
Then once we release the next version of changes an analyze is a part of the process of the upgrade. I have a number of SQL changes I need to finish testing so we can have the next db version here fairly soon.

@kenkendk
Copy link
Member

@AxeOfMen & @jedthe3rd Do you want to include the changes here, or did you have a better fix in mind?

@jedthe3rd
Copy link
Contributor

@AxeOfMen & @jedthe3rd Do you want to include the changes here, or did you have a better fix in mind?

These indexes are basically duplicates. The real solution to the issue he is facing is to add an analyze when the database is upgraded.

Copy link
Contributor

@mr-russ mr-russ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still relevant in the same way, the comments on the pull request seem to indicate the addition of analyze on upgrade would be useful. That is true, should that be part of this chrnge?

Also note that vacuum appears to be called during operation startup as well, I cant recall without looking it uf if that does any analysis work.

@@ -82,6 +82,7 @@ CREATE TABLE "FilesetEntry" (
/* Improved reverse lookup for joining Fileset and File table */
CREATE INDEX "FilesetentryFileIdIndex" on "FilesetEntry" ("FileID");
CREATE INDEX "nn_FilesetentryFile" on FilesetEntry ("FilesetID","FileID");
CREATE INDEX "FilesetEntryFilesetID" ON "FilesetEntry" ("FilesetID")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line above has an index on the same leading column. The cost of maintaining this index will be higher shan using the original one as it will skipscan.

Is there a specific reason for this index or crn it be removed?

@@ -0,0 +1,4 @@
CREATE INDEX "FileLookupMetadataID" ON "FileLookup" ("MetadataID");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is value in adding sql comments indicating the operations this is known to improve so others working on this later can see the reason for the addition.

@duplicatibot
Copy link

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/first-backup-has-been-running-for-over-a-day-since-upgrade-from-2-0-6-1-to-2-0-7-1/16382/6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants