Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run ANALYZE after the initial backup, before validation #4326

Closed
wants to merge 3 commits into from

Conversation

ericfs
Copy link

@ericfs ericfs commented Sep 26, 2020

Based on discussion in https://forum.duplicati.com/t/initial-backup-stuck-at-100/10767/5.

I haven't been able to verify that this actually improves anything so I'm interested in feedback in the best way to validate this is worthwhile.

@duplicatibot
Copy link

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/initial-backup-stuck-at-100/10767/10

@warwickmm
Copy link
Member

I'm not a SQLite expert, but it seems that the recommendation isn't to explicitly invoke ANALYZE. It's also not clear what statistics are available during the initial backup.

My preference would be to improve the actual queries. There's only so much that the query optimizer could do with a poorly written query.

@ericfs
Copy link
Author

ericfs commented Sep 26, 2020

That makes sense. I don't think I have enough familiarity to suggest any improvements to the queries themselves.

This is just attempting to address the specific case where the first backup creates a large database and the verification is run before pragma OPTIMIZE or ANALYZE are called. If you don't think that is worthwhile, I'm fine with dropping this.

@warwickmm
Copy link
Member

My feeling is that the queries can be greatly improved. I'm not a SQL expert, but many of them involve lots of subqueries that might be inefficient. The query you referred to in the forum post is actually part of a larger query:

var combinedLengths = @"
SELECT
""A"".""ID"" AS ""BlocksetID"",
IFNULL(""B"".""CalcLen"", 0) AS ""CalcLen"",
""A"".""Length""
FROM
""Blockset"" A
LEFT OUTER JOIN
(
SELECT
""BlocksetEntry"".""BlocksetID"",
SUM(""Block"".""Size"") AS ""CalcLen""
FROM
""BlocksetEntry""
LEFT OUTER JOIN
""Block""
ON
""Block"".""ID"" = ""BlocksetEntry"".""BlockID""
GROUP BY ""BlocksetEntry"".""BlocksetID""
) B
ON
""A"".""ID"" = ""B"".""BlocksetID""
";
// For each blockset with wrong lengths, fetch the file path
var reportDetails = @"SELECT ""CalcLen"", ""Length"", ""A"".""BlocksetID"", ""File"".""Path"" FROM (" + combinedLengths + @") A, ""File"" WHERE ""A"".""BlocksetID"" = ""File"".""BlocksetID"" AND ""A"".""CalcLen"" != ""A"".""Length"" ";

If you don't mind, I'm going to close this. Introducing this call to ANALYZE feels a little more like treating a symptom rather than the underlying cause.

@warwickmm warwickmm closed this Sep 27, 2020
@duplicatibot
Copy link

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/very-slow-database-recreation/10827/12

@Brunni
Copy link

Brunni commented Feb 27, 2021

As I have just written in the forum, the ANALYZE command was the only chance for me to get a backup working. It did not finish otherwise. The sqllite db implementation seems to be crappy.

I would strongly vote for merging this pully request @warwickmm This is not about performance, but about working at all.

https://forum.duplicati.com/t/initial-backup-stuck-at-100/10767/11?u=brunni

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants