run ANALYZE after the initial backup, before validation #4326

ericfs · 2020-09-26T15:09:56Z

Based on discussion in https://forum.duplicati.com/t/initial-backup-stuck-at-100/10767/5.

I haven't been able to verify that this actually improves anything so I'm interested in feedback in the best way to validate this is worthwhile.

Sync to upstream

duplicatibot · 2020-09-26T15:11:12Z

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/initial-backup-stuck-at-100/10767/10

warwickmm · 2020-09-26T16:42:17Z

I'm not a SQLite expert, but it seems that the recommendation isn't to explicitly invoke ANALYZE. It's also not clear what statistics are available during the initial backup.

My preference would be to improve the actual queries. There's only so much that the query optimizer could do with a poorly written query.

ericfs · 2020-09-26T16:48:43Z

That makes sense. I don't think I have enough familiarity to suggest any improvements to the queries themselves.

This is just attempting to address the specific case where the first backup creates a large database and the verification is run before pragma OPTIMIZE or ANALYZE are called. If you don't think that is worthwhile, I'm fine with dropping this.

warwickmm · 2020-09-27T00:54:44Z

My feeling is that the queries can be greatly improved. I'm not a SQL expert, but many of them involve lots of subqueries that might be inefficient. The query you referred to in the forum post is actually part of a larger query:

duplicati/Duplicati/Library/Main/Database/LocalDatabase.cs

Lines 731 to 756 in 9b977b7

    
                           var combinedLengths = @" 
        
           SELECT  
        
               ""A"".""ID"" AS ""BlocksetID"",  
        
               IFNULL(""B"".""CalcLen"", 0) AS ""CalcLen"",  
        
               ""A"".""Length"" 
        
           FROM 
        
               ""Blockset"" A 
        
           LEFT OUTER JOIN 
        
               ( 
        
                   SELECT  
        
                       ""BlocksetEntry"".""BlocksetID"", 
        
                       SUM(""Block"".""Size"") AS ""CalcLen"" 
        
                   FROM 
        
                       ""BlocksetEntry"" 
        
                   LEFT OUTER JOIN 
        
                       ""Block"" 
        
                   ON 
        
                       ""Block"".""ID"" = ""BlocksetEntry"".""BlockID"" 
        
                   GROUP BY ""BlocksetEntry"".""BlocksetID"" 
        
               ) B 
        
           ON 
        
               ""A"".""ID"" = ""B"".""BlocksetID"" 
        
           "; 
        
                           // For each blockset with wrong lengths, fetch the file path 
        
                           var reportDetails = @"SELECT ""CalcLen"", ""Length"", ""A"".""BlocksetID"", ""File"".""Path"" FROM (" + combinedLengths + @") A, ""File"" WHERE ""A"".""BlocksetID"" = ""File"".""BlocksetID"" AND ""A"".""CalcLen"" != ""A"".""Length"" ";

If you don't mind, I'm going to close this. Introducing this call to ANALYZE feels a little more like treating a symptom rather than the underlying cause.

duplicatibot · 2020-10-14T22:23:53Z

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/very-slow-database-recreation/10827/12

Brunni · 2021-02-27T12:39:29Z

As I have just written in the forum, the ANALYZE command was the only chance for me to get a backup working. It did not finish otherwise. The sqllite db implementation seems to be crappy.

I would strongly vote for merging this pully request @warwickmm This is not about performance, but about working at all.

https://forum.duplicati.com/t/initial-backup-stuck-at-100/10767/11?u=brunni

ericfs and others added 3 commits September 24, 2020 09:39

Run analyze between uploading files and verifying the DB consistency.

dfe6cad

Merge remote-tracking branch 'upstream/master' into master

aa7aa75

Sync to upstream

removed extraneous new lines

aab2705

warwickmm closed this Sep 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

run ANALYZE after the initial backup, before validation #4326

run ANALYZE after the initial backup, before validation #4326

ericfs commented Sep 26, 2020

duplicatibot commented Sep 26, 2020

warwickmm commented Sep 26, 2020

ericfs commented Sep 26, 2020

warwickmm commented Sep 27, 2020

duplicatibot commented Oct 14, 2020

Brunni commented Feb 27, 2021 •

edited

Loading

run ANALYZE after the initial backup, before validation #4326

run ANALYZE after the initial backup, before validation #4326

Conversation

ericfs commented Sep 26, 2020

duplicatibot commented Sep 26, 2020

warwickmm commented Sep 26, 2020

ericfs commented Sep 26, 2020

warwickmm commented Sep 27, 2020

duplicatibot commented Oct 14, 2020

Brunni commented Feb 27, 2021 • edited Loading

Brunni commented Feb 27, 2021 •

edited

Loading