Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Stop after current file" has various problems with partial backups #3982

Closed
1 task done
ts678 opened this issue Nov 10, 2019 · 39 comments
Closed
1 task done

"Stop after current file" has various problems with partial backups #3982

ts678 opened this issue Nov 10, 2019 · 39 comments
Milestone

Comments

@ts678
Copy link
Collaborator

ts678 commented Nov 10, 2019

  • I have searched open and closed issues for duplicates.

Environment info

  • Duplicati version: 2.0.4.34 canary
  • Operating system: Windows 10 Version 1903
  • Backend: Local folder

Description

Various issues around partial backup. Rather than file separate issues now, this is a collection of issues.

"Stop after current file" with --usn-policy may never back up remaining files #3971 was separately filed, and possibly has a different cause than the ones here which are primarily on how backups are now split into full and partial, with record-keeping in the database and backup, and attempts to make things like retention work right by hiding partial backups after a full exists. This breaks things because some code is going by the view-after-hiding, some is going with the raw view, and translating between may break.

Fix 'stop after current file' #3836 has technical info. This issue is more of a hopefully-reproducible tour.

  • Partial backup produces a Warning result without indication of why. Unknown if this is intentional.
  • Partial backup Repair regenerates missing dlist file without a fileset file to show "partial" status.
  • Partial backup marking is on all Direct restore versions, whether that version is really full or partial.
  • Hidden partial backups hold backend storage forever.
  • Hidden partial backups cannot be referenced by user.
  • Hidden partial backups might not be reachable by Duplicati, unless direct queries to DB are used.
  • Hidden partial backups break Duplicati when direct DB queries are translated to version for users.
    One is probably in the number translator for "Unexpected difference in fileset" errors (not tested).
  • There's no telling how much other code is now (or may someday be) broken due to the dual map. Having only one way to map Fileset ID to its version (and vice versa) just didn't have this problem.

Steps to reproduce

  1. Make a C:\stop test folder holding two Duplicati executables zip files. Mine are about 14 MB each.
  2. Slow with Remote Volume Size at 1 MB, --throttle-download and --throttle-upload at 100 KByte/s
  3. Set Backup retention to keep 1 backup.
  4. To spot when to Stop, and see other info, open a tab to About --> Show log --> Live --> Verbose
  5. "Run now" Backup, go to live log tab, wait for uploading to begin, then do "Stop after current file"
  6. "Run now" Backup. There will be a third backup later, but partial-then-full is a good starting point.
  • Look at the job log icon and full log for the first backup which was the partial.
  • Actual result:
    Icon is a yellow exclamation point (Warning I think), and log has one "ParsedResult": "Warning",
  • Expected result:
    Not sure which I prefer, but if it's going to do a Warning, there should be some way to know why.
  • Test Commandline purge with arguments set to some file -- it doesn't seem to make a difference.
  • Actual result:
  Listing remote folder ...

System.InvalidProgramException: Fileset was reported with id 1, but could not be found?
   at Duplicati.Library.Main.Operation.PurgeFilesHandler.DoRun(LocalPurgeDatabase db, IFilter filter, Action`3 filtercommand, Single pgoffset, Single pgspan)
   at Duplicati.Library.Main.Operation.PurgeFilesHandler.Run(IFilter filter)
   at Duplicati.Library.Main.Controller.RunAction[T](T result, String[]& paths, IFilter& filter, Action`1 method)
   at Duplicati.Library.Main.Controller.PurgeFiles(IFilter filter)
   at Duplicati.CommandLine.Commands.PurgeFiles(TextWriter outwriter, Action`1 setup, List`1 args, Dictionary`2 options, IFilter filter)
   at Duplicati.CommandLine.Program.ParseCommandLine(TextWriter outwriter, Action`1 setup, Boolean& verboseErrors, String[] args)
   at Duplicati.CommandLine.Program.RunCommandLine(TextWriter outwriter, TextWriter errwriter, Action`1 setup, String[] args)
Return code: 100
  • Expected result:
    Not certain, but the id 1 would be the first partial backup which is now hidden and not user-visible.
    Keeping an eye on the DB (especially the Fileset table) using an SQLite DB browser could be useful.
  • Test Commandline affected with arguments set to file name of oldest dblock file, i.e. from partial.
  • Actual result:
System.Collections.Generic.KeyNotFoundException: The given key was not present in the dictionary.
   at System.ThrowHelper.ThrowKeyNotFoundException()
   at System.Collections.Generic.Dictionary`2.get_Item(TKey key)
   at Duplicati.Library.Main.Database.LocalListAffectedDatabase.<GetFilesets>d__5.MoveNext()
   at System.Linq.Buffer`1..ctor(IEnumerable`1 source)
   at System.Linq.OrderedEnumerable`1.<GetEnumerator>d__1.MoveNext()
   at System.Linq.Enumerable.Any[TSource](IEnumerable`1 source)
   at Duplicati.CommandLine.Commands.<>c__DisplayClass3_0.<Affected>b__0(IListAffectedResults res)
   at Duplicati.Library.Main.Operation.ListAffected.Run(List`1 args, Action`1 callback)
   at Duplicati.Library.Main.Controller.RunAction[T](T result, String[]& paths, IFilter& filter, Action`1 method)
   at Duplicati.Library.Main.Controller.ListAffected(List`1 args, Action`1 callback)
   at Duplicati.CommandLine.Commands.Affected(TextWriter outwriter, Action`1 setup, List`1 args, Dictionary`2 options, IFilter filter)
   at Duplicati.CommandLine.Program.ParseCommandLine(TextWriter outwriter, Action`1 setup, Boolean& verboseErrors, String[] args)
   at Duplicati.CommandLine.Program.RunCommandLine(TextWriter outwriter, TextWriter errwriter, Action`1 setup, String[] args)
Return code: 100
  • Expected result:
    Proper report.
  • Test Commandline delete with arguments --version=1
  • Actual result:
  Listing remote folder ...
No filesets matched the criteria
Return code: 0
  • Expected result:
    Maybe actual was expected. --version=1 would be initial partial which exists but you can't get to it.
  • Test Commandline find with arguments * and --all-versions to see if "all versions" does just that.
  • Actual result:
Listing contents 0 (11/9/2019 7:05:30 PM):
C:\stop test\ 
C:\stop test\duplicati-2.0.4.12_canary_2019-01-16.zip (14.44 MB)
C:\stop test\duplicati-2.0.4.13_canary_2019-01-29.zip (14.44 MB)
Return code: 0
  • Expected result:
    Per the current manual, "Searches in all backup sets, instead of just searching the latest." No longer.
  • Do the third backup after replacing the two zip files with a short .txt file. The one I used was 1 byte.
  • Actual result:
Source:1 bytes
Backup:14.49 MB / 1 Version
  • Expected result:
    It should have released the space from the first partial, but that still remains and is hanging onto it.
    The Duplicati user can't get to these to delete them, and it appears Duplicati retention can't either.
    As additional backups happen and replace old blocks, these hidden partials become a storage leak.
  • Look at things in a start down Direct restore from backup folder, to see what versions it shows you.
  • Actual result:
    Version 0 is the current full with 1 byte file, but is mis-called partial. Version 1 is the original partial.
  • Expected result:
    Correct label of partial. Direct restore says that everything is partial, whether it actually is, or is not.
  • Look inside both dlist zip files to see fileset file is there, delete dlists, then run a Database Repair.
  • Actual result:
    dlist files are regenerated without fileset file.
  • Expected result:
    dlist files are regenerated with correct fileset file with correct IsFullBackup variable in its JSON.
  • Run Database Recreate (delete and Repair), then look to see what Restore files will offer you.
  • Actual result:
    Version 0 is the 1 byte file full backup. Version 1 is the original partial. Neither one is marked partial.
  • Expected result:
    This test used the bad regenerated dlist files. Good ones work OK. Unhiding partials may be useful, should one desperately need to get to one, with no other way (but it's sure a strange workaround).

Screenshots

Debug log

warwickmm added a commit to warwickmm/duplicati that referenced this issue Nov 10, 2019
Previously, a yellow warning icon would be displayed but the logs did
not contain any details regarding the cause of the warning.

This concerns issue duplicati#3982.
@warwickmm
Copy link
Member

Pull request #3984 should address the issue with the empty warning that occurs when a backup is cancelled.

@warwickmm
Copy link
Member

When I have some time, I'll try and look into addressing some of these issues. @ts678, if I submit a pull request with the work in progress, would you be able to help test?

@ts678
Copy link
Collaborator Author

ts678 commented Nov 19, 2019

Sorry I didn't see this post earlier. So your preference is to try to fix the current code?

if I submit a pull request with the work in progress, would you be able to help test?

Not readily, as build procedure and hardware/software/environment challenges exist.
I can test builds, but not from a pull request. Is there a way to get builds from the CI?
Last time I looked into this, it seemed possible to some extent, but I forget the detail.
Both Travis CI and AppVeyor have some methods to publish build artifacts somehow.

There are various options, including Fix pausing and stopping after upload #3712 PR.
@seantempleton was a bit involved in the current code, and I think did some testing.
The current PR also removes "Stop now", which is still at least something of a liability.
Depending on current interest, it would be great to have another developer involved.

As you may know, I really want to get this fixed and a Beta out with or without Stop...
Thanks so much for working on it, and I hope you can get some support if it's helpful.

@warwickmm
Copy link
Member

So your preference is to try to fix the current code?

I think my preference would be for a beta to ship sooner than later. If we are able to include fixes for these issues, then great. If shipping a beta requires that we hide all stop buttons so that we have more time to test, that's fine with me as well.

I have a branch where I think the only code that really heeds the partial/full status is the delete handler and database recreate handler. However, much more testing is needed.

@drwtsn32x
Copy link
Contributor

drwtsn32x commented Nov 20, 2019

@warwickmm I can help test a PR. This issue has a good recipe documented to reproduce the issue, so hopefully it won't be too difficult.

warwickmm added a commit to warwickmm/duplicati that referenced this issue Nov 20, 2019
warwickmm added a commit to warwickmm/duplicati that referenced this issue Nov 20, 2019
warwickmm added a commit to warwickmm/duplicati that referenced this issue Nov 20, 2019
warwickmm added a commit to warwickmm/duplicati that referenced this issue Nov 20, 2019
warwickmm added a commit to warwickmm/duplicati that referenced this issue Nov 20, 2019
If surrounded by full backups, partial backups are assumed to be no
longer necessary.

This concerns issue duplicati#3982.
warwickmm added a commit to warwickmm/duplicati that referenced this issue Nov 20, 2019
If surrounded by full backups, partial backups are assumed to be no
longer necessary.

This concerns issue duplicati#3982.
@warwickmm
Copy link
Member

Thanks @drwtsn32x and @ts678. Pull request #3992 contains the current work in progress.

@drwtsn32x
Copy link
Contributor

Ok great, first step for me is to reproduce the problem using @ts678's recipe without the PR changes applied...

@kenkendk
Copy link
Member

I think my preference would be for a beta to ship sooner than later. If we are able to include fixes for these issues, then great. If shipping a beta requires that we hide all stop buttons so that we have more time to test, that's fine with me as well.

I agree on "sooner than later", but just hiding the button just means that the issue will be rarer. You can still provoke the same errors if you kill the process while a backup is active.

@drwtsn32x
Copy link
Contributor

I was able to reproduce all of the problems shown in @ts678's instructions when using the the latest source code without @warwickmm's PR applied. (Only difference is that with the latest source code the warning when stopping a backup correctly shows the reason - since PR 3984 is present.)

Applied @warwickmm's PR and restarted all tests.

  1. 'purge' test - now works without error
  2. 'affected' test - now works without error
  3. 'delete' version 1 test - same result as before - technically there isn't a "1" version, so this is correct behavior if we no longer have a dangling partial version 1 hidden in the back
  4. 'find' all versions test - same results as before - again, correct if we don't have a dangling partial version hiding
  5. replace 2 larger files and with 1-byte txt file - run backup - now works - backend size for me shows 2.24KB instead of 15MB like before
  6. 'direct restore' test - now looks good, i see only version 0 with the 1 byte text file. it is labeled 'partial' but from what I recall any time you use 'direct restore' all versions are labeled with 'partial' (not sure if that is intended)
  7. only one dlist is present in back end - which is correct since we only have 1 version. Looked at 'fileset' and it shows IsFullBackup: true
  8. 'delete dlist' test - ran repair, dlist recreated but it still is missing the 'fileset' file
  9. 'delete recreate db' test - recreates fine and only shows version 0 as expected

This is looking really good!

Seeing 'partial' in test 6 is a different issue (if it is not intended behavior). Test 7 where regenerated dlists don't contain 'fileset' is also an unrelated issue.

warwickmm added a commit to warwickmm/duplicati that referenced this issue Nov 21, 2019
Previously, all filesets would be displayed as partial.  Now, we obtain
the partial/full type from the fileset file (not the fileset.json) file
in the dlist file.  If this file does not exist, the backup type
defaults fo full.

Note that this requires downloading and uncompressing the dlist file, so
performance of populating the restore list may be impacted
significantly.

This concerns issue duplicati#3982.
@warwickmm
Copy link
Member

warwickmm commented Nov 21, 2019

I've pushed a commit that addresses the "partial" label in case 6 for my simple test case (1 full backup followed by 1 partial backup). Note that determining the partial/full status of the filesets from the remote files requires downloading and uncompressing the dlist files, so performance of populating the restore list may be impacted significantly.

@drwtsn32x, can you check the partial/full labels in case 6 with the new changes?

@warwickmm
Copy link
Member

warwickmm commented Nov 21, 2019

  1. 'delete' version 1 test - same result as before - technically there isn't a "1" version, so this is correct behavior if we no longer have a dangling partial version 1 hidden in the back

@drwtsn32x, can you clarify what "same result as before" means? I would expect that with the changes in the pull request, there now is a visible "1" version and that it would be deleted, leaving only the full backup.

Thanks again for the help testing!

EDIT: Sorry, I didn't set retention to keep 1 backup. Without the retention setting and a test case with an initial partial backup followed by a full backup, I was able to delete version 1 (the partial one). I believe this is the correct behavior.

@drwtsn32x
Copy link
Contributor

can you check the partial/full labels in case 6 with the new changes?

Yep I'll give it a test and let you know...

Without the retention setting and a test case with an initial partial backup followed by a full backup, I was able to delete version 1 (the partial one). I believe this is the correct behavior.

Agreed!

warwickmm added a commit to warwickmm/duplicati that referenced this issue Nov 21, 2019
warwickmm added a commit to warwickmm/duplicati that referenced this issue Nov 21, 2019
@warwickmm
Copy link
Member

warwickmm commented Nov 21, 2019

I pushed a fix for case 8 (delete dlist test). Let me know if it works!

@drwtsn32x
Copy link
Contributor

Ok i merged your PR with my branch, going to test both 6 and 8 now...

@drwtsn32x
Copy link
Contributor

Tested 6 - looks good - I also tried it with a larger set of backup data I have (195 versions) and the time required increased from 1m0s to 1m59s. Backup data is local on a NAS.

I like how it distinguishes partial from full but what do you all feel about the tradeoff in performance? It will be more pronounced with slower, remote backends.

@drwtsn32x
Copy link
Contributor

drwtsn32x commented Nov 21, 2019

Tested 8 - looks good! Ran through multiple scenarios of full and partials, and the 'filelist' in the dlist was properly included with the regenerated dlist files every time now, with the correct boolean value. Nice!

@warwickmm
Copy link
Member

I like how it distinguishes partial from full but what do you all feel about the tradeoff in performance? It will be more pronounced with slower, remote backends.

I think the partial/full distinction is key, especially if a user's goal is to restore all files. They would want to do so from a full backup.

It's possible that my solution unnecessarily retrieves the dlist files twice. I'll try and take a look later when I have some time.

@drwtsn32x
Copy link
Contributor

I think the partial/full distinction is key, especially if a user's goal is to restore all files. They would want to do so from a full backup.

Yes, good point. I'm ok with the performance tradeoff myself taking into account using 'direct restore' is probably a rare event for most users.

@warwickmm
Copy link
Member

warwickmm commented Nov 22, 2019

  1. There appear to be other usages of FilesetVolumeWriter that do not add the fileset file to the dlist file. We'll need to look into whether these instances need to be fixed if possible.

This has now been addressed. Previously the dlist files that were recreated after a purge operation were missing the fileset file. Now, the new dlist files include the filelist, with the correct partial/full state.

The synthetic filelist was also missing the fileset file. I'm not sure exactly how the synthetic filelist is used, but I set the state in the synthetic filelist to always be partial. The assumption here is that the synthetic filelist is only created if a backup was interrupted. Is that the correct assumption?

@drwtsn32x
Copy link
Contributor

Do you think we should look into this further, or is the impact on performance ok for now?

I think it should be merged as-is, with a note to check on this in the future. It's better to have this and other fixes, even if not yet fully optimized, than not have them at all.

The assumption here is that the synthetic filelist is only created if a backup was interrupted. Is that the correct assumption?

I do not know..... I haven't looked into when synth filelists are used at all.

Thanks for the continued improvements!

@kenkendk
Copy link
Member

Moreover, for some reason the RecreateDatabaseHandler is provided options that result in the filelists being filtered to just the most recent one. So, the line linked to above that sets the partial/full state is only invoked once.

If I understand this correctly, it is used with the UI and a direct restore. In this case, we need to build a local database, but since we only restore from the most recent version, the recreated database is marked as "incomplete" (to avoid using it for anything else by mistake) and contains only the minimal information required to run the restore.

I'm not sure exactly how the synthetic filelist is used, but I set the state in the synthetic filelist to always be partial. The assumption here is that the synthetic filelist is only created if a backup was interrupted. Is that the correct assumption?

Yes. Synthetic is intended to indicate that the dlist is not created from the filesystem, but rather by looking at what files were previously there and what new files are there. The synthetic filelist is only created when the previous backup failed, as this situation may leave new dblock files on the backend but with no files using that data. I opted for the fix-after-fail solution, instead of periodically uploading a temporary dlist file.

@ts678
Copy link
Collaborator Author

ts678 commented Nov 22, 2019

  1. The code path for using an actual retention rule (as opposed to "keep number of versions") is different. We may need to test this to see how it handles partial/full backups.

One big question is how far we want to go to protect the users from themselves. Currently, is it the PR case that --keep-versions looks only at fulls, but --keep-time and --retention-policy treats them all the same, allowing the possibility to delete or thin partials and maybe prevent a full from ever happening?

If a user picks "Smart backup retention" I think that's the same as custom 1W:1D,4W:1W,12M:1M, and if they then do multiple partials in a day, some might be thinned, and maybe even the blocks discarded.

What will possibly save them is that the file enumeration of the next partial might rerun previous path order, and additional version references lowers risk of compact thinking blocks are now wasted space.

From a uniformity point of view, I'd sort of like all three retention ways to do the same, so maybe the --keep-time should just never delete partials (which does make me a bit nervous in some other ways).

--retention-policy makes me nervous because the algorithm is a little complex, so needs some care... Simply hiding partials could hurt keeping of most recent backup, subject to several different settings.

Order also matters. If a full backup finishes and is marked before ApplyRetentionPolicy runs, then the most recent backup would be that full, otherwise the full just run might show up marked as a partial.

If partial runs then ApplyRetentionPolicy runs, then I'd say exempt partials from retention processing, which could probably be done by pulling these out of the list early if it can be reconciled with current special handling of most recent file. You need to look before and after the main loop to see the ends.

If --keep-time gets changed, test should be pretty easy. --retention-policy has more cases to go into.

@drwtsn32x
Copy link
Contributor

allowing the possibility to delete or thin partials and maybe prevent a full from ever happening?

Why would deleting a partial affect a full backup? I thought all filesets (backup versions) are independent from each other. Partial just means it didn't finish backing up all the identified data, right?

and if they then do multiple partials in a day, some might be thinned, and maybe even the blocks discarded.

Aren't partials a side effect of stopping/interrupting a backup job? So shouldn't they be rare? Or is there some other scenario where they would happen often?

Also even if a partial is deleted, I don't see it affecting the deletion of blocks any differently. Blocks should only be deleted if they are unreferenced by ANY job (partial OR full). At least this is how I understand it.

@ts678
Copy link
Collaborator Author

ts678 commented Nov 23, 2019

Why would deleting a partial affect a full backup?

I don't know the design super-well, and I don't have a build with the changes. Here goes anyway:

As a simple case, say you have --keep-time set to more than a file backup time but less than time between backups. Stop the backup when you see a dblock begin uploading, as in usual test steps.

First partial stops after file A and contains file A.
Second partial records A and backs up B, so contains A and B. First partial is old, gets deleted, OK.
Now change file A completely.
Third partial backs up A, so contains file A only. Second partial is old, gets deleted. B is now gone, meaning no version holds a reference. If compact runs, its blocks will be unavailable for recycling.

If it's possible for backed up file blocks to get deleted this way, the full might finish later or never.
Analysis under --retention-policy gets harder to test-case and analyze, but same principle applies.

Getting this to fail in test might be harder than shown due to concurrency. A design overview here suggests third partial could process B in parallel with A, so more files might be needed to see this:

Multiple processes read paths and computes block hashes; new block hashes (and data) are written to a channel

Or is there some other scenario where they would happen often?

Pause reboot resume? is a recent use case where initial backup needs to be broken into pieces. Currently we tell people to slowly add folders, so that backup finishes, but an alternative might be to backup, stop when needed, continue the next chance. Say initial backup takes a month and you carry a laptop around. Would a user know to use "Keep all backups" awhile, in order to get to a full backup in the quickest way?

Note that I'm not sure this is worth extra code risk to "improve" now. Above just explains risk of inaction. To any who know the design better than I do, does above make sense in at least a hypothetical manner?

@warwickmm
Copy link
Member

warwickmm commented Nov 23, 2019

Sorry, I haven't been able to follow all the various scenarios, but I can try and explain how the approach in pull request #3992 attempts to deal with "keep a specific number of backups". Suppose that "keep a specific number of backups" is set to 1 and we do the following:

  1. Make an initial partial backup. This results in only one backup version, and it is partial.
  2. Make a full backup. Since we have a full backup, the preceding partials are deleted. This results in only one backup version, and it is full.
  3. Make a partial backup. Since this partial backup is not followed by a full backup, it is not deleted. We are now left with the full backup from step 2, and the partial one from step 3.
  4. Make a partial backup. Since this partial backup is not followed by a full backup, it is not deleted. We are now left with the full backup from step 2, the partial one from step 3, and the partial one from step 4.
  5. Make a full backup. Since we have a full backup, the backups from steps 2 through 4 can be deleted.

Let me know if there are any flaws with the above approach.

I'm not sure that we can treat a "retention rule" (1W:1D,4W:1W,12M:1M) in a similar manner, since the notion of an "interval" means that some partials should arguably be kept. For example, suppose we are to keep 1 backup per day. A full backup is made on Monday, partial backups are made from Tuesday through Thursday, and a full backup is made on Friday. I would think that the partial backups in this case should be kept.

I think to be conservative, we should not delete partial backups if its not clear that they are superseded by a full backup. I think the logic above for "keep a specific number of backups" is reasonable. The current treatment of the "retention rule" case is to not delete partials. I think this isn't a bad starting point. As @drwtsn32x noted, partial backups should be a rarity.

I wasn't aware of --keep-time until now, so I don't know how it behaves or what the expectations should be.

EDIT: It looks like currently all backups that are older than that specified by --keep-time are eligible for deletion.

@drwtsn32x
Copy link
Contributor

It looks like currently all backups that are older than that specified by --keep-time are eligible for deletion.

Yes, that's my understanding. Relevant code:

https://github.com/duplicati/duplicati/blob/master/Duplicati/Library/Main/Operation/DeleteHandler.cs#L207-L212

@ts678
Copy link
Collaborator Author

ts678 commented Nov 23, 2019

Looking at PR 3992 (isn't that the relevant one?), I hadn't actually noticed that the delete-partials-when-superseded-by-a-full was only in --keep-versions, because I think I was advocating for that everywhere.

Partials without later full are more useful. As described before, they're accumulating data towards a full. Giving such partials a blanket exemption could run the risk of someone building up partials forever, but they'd also have to do "Stop after current file" forever, so I don't think that's a scenario that's very likely.

When I wrote earlier about --retention-policy, I was thinking partials-at-the-recent-end-if-at-all in mind.

@ts678
Copy link
Collaborator Author

ts678 commented Nov 23, 2019

I'm not sure that we can treat a "retention rule" (1W:1D,4W:1W,12M:1M) in a similar manner, since the notion of an "interval" means that some partials should arguably be kept.

which sounds like an argument favoring specific configuration (even if inadvertent, through choosing "Smart backup retention") over consistency between the different retention methods. In original work involving hiding partials, my main worry was the breakage it caused, not disrespect for user intention, therefore I saw deleting rather than hiding superseded partials as a reasonable universal replacement.

I'm still kind of leaning that way because it does respect user intention with respect to full backups, so maintains the previous behavior when there were only full backups. Partials seem rather unpredictable sources for restores, because it's not certain how far they got. I suppose one could just look and hope.

The problem with letting partials get fully involved in --retention-policy without any distinction is that partial could have a full too close after it, so full gets deleted. If I look at what I think is the PR merged into context (using View file button), it looks like some code from previous logic was kept, and that kept partial doesn't update lastKept, so can't throw off any newer delete decision for whatever's there.

I wonder if there's a way to get GitHub web UI to show a diff between before-this-all-began to where things are proposed at the moment? I have most confidence in the Beta-tested code. If we'd reverted back to that for a restart, I might favor a different path, but maybe the current should just be shipped.

There does need to be more testing at some point, then after that, maybe cut a Canary and test more.

@warwickmm
Copy link
Member

it looks like some code from previous logic was kept, and that kept partial doesn't update lastKept, so can't throw off any newer delete decision for whatever's there.

Correct. If can summarize the behaviors currently in pull request #3992 (as of cd08ca9):

  • For "keep number of versions", a partial can be removed if it is followed by a full backup.
  • For "retention rule" (e.g., 1W:1D,4W:1W,12M:1M), partials are not removed and a full backup will not be discarded in favor of a partial one..
  • For "keep time", all backups that are older than that specified by --keep-time are eligible for deletion.

Partial backups should be rare so I think the behavior in the "retention rule" case is reasonable. We should focus on fixing the bugs caused by hiding partials, so I'd be hesitant to change the (complicated) retention rule logic too much right now. This can always be improved after the beta release, especially if we make an effort to release more often.

I might make a small change to the "keep time" case so that at least one full backup is always kept. For example, if the last full backup is older than the specified window, it should probably be kept just in case.

I wonder if there's a way to get GitHub web UI to show a diff between before-this-all-began to where things are proposed at the moment?

https://help.github.com/en/github/committing-changes-to-your-project/comparing-commits-across-time#comparing-commits

warwickmm added a commit to warwickmm/duplicati that referenced this issue Nov 23, 2019
@warwickmm
Copy link
Member

Revision 861f0ff ensures that we keep at least one full backup when removing versions according to the --keep-time option.

Are there any other remaining issues here that need to be addressed? If not, I will remove the do-not-merge label from the pull request for further review, testing, and potential merge.

@warwickmm warwickmm added this to the Upcoming beta milestone Nov 23, 2019
@drwtsn32x
Copy link
Contributor

Nice work, @warwickmm!

@kenkendk
Copy link
Member

kenkendk commented Nov 27, 2019

As a simple case, say you have --keep-time set to more than a file backup time but less than time between backups. Stop the backup when you see a dblock begin uploading, as in usual test steps.

First partial stops after file A and contains file A.
Second partial records A and backs up B, so contains A and B. First partial is old, gets deleted, OK.
Now change file A completely.
Third partial backs up A, so contains file A only. Second partial is old, gets deleted. B is now gone, meaning no version holds a reference. If compact runs, its blocks will be unavailable for recycling.

If it's possible for backed up file blocks to get deleted this way, the full might finish later or never.
Analysis under --retention-policy gets harder to test-case and analyze, but same principle applies.

This worries me. The original design was explicitly made to avoid this scenario.
The idea is that a synthetic dlist is constructed by taking the last dlist (complete or not) and then stupidly adding whatever was partially backed up.

To repeat the above it would be:

  • Run backup, record A, dlist = [A]
  • Run backup, record B, dlist = [A,B]
  • Run backup, change A, dlist = [A*,B]

The downside to this approach, is that we will be adding data forever, and only figuring out about deleted files if the backup is ever running to completion.

Based on @ts678 scenario, the last step would somehow be "- Run backup, change A, dlist = [A*]". Why would this happen?

warwickmm added a commit to warwickmm/duplicati that referenced this issue Nov 29, 2019
We had to remove the keep-versions options since multiple retention
options are not allowed.

This concerns issue duplicati#3982.
@warwickmm
Copy link
Member

Based on @ts678 scenario, the last step would somehow be "- Run backup, change A, dlist = [A*]". Why would this happen?

I'm not sure why this would happen. @ts678, can you elaborate? Also, the logic for the keep-time option will not delete any partial backups until at least one full backup is encountered (when traversed from newest to oldest). So, it will keep backups older than implied by keep-time if necessary in order to keep at least one full backup.

warwickmm added a commit to warwickmm/duplicati that referenced this issue Nov 29, 2019
If the most recent backups were partials (and not followed by a full
backup), they should not be removed.  A partial backup should only be
removed if it is followed by a full backup.

This concerns issue duplicati#3982.
@warwickmm
Copy link
Member

warwickmm commented Nov 29, 2019

By the way, I fixed a bug in my earlier attempt at the keep-versions logic. I also added some unit tests to help verify the behaviors (writing the test helped reveal the bug).

warwickmm added a commit to warwickmm/duplicati that referenced this issue Nov 29, 2019
@ts678
Copy link
Collaborator Author

ts678 commented Nov 30, 2019

can you elaborate?

Sorry, missed the earlier comment. I'm ignoring synthetic filelist because I think it's broken currently.

Diagnosis:

Warning Message after Backup

Warning: Expected there to be a temporary fileset for synthetic filelist #2329

Expected there to be a temporary fileset for synthetic filelist #2506

I talked synthetic filelist in a staff topic (and original PR), as a way to avoid inventing new mechanisms, however partials does have its appeal (and headaches), and I'm not sure if synthetic filelist on the way down would work, but if it could then that solves the partial-is-partial. Synthetic always has prior base.

@warwickmm
Copy link
Member

Fixed in pull request #3992.

warwickmm added a commit to warwickmm/duplicati that referenced this issue Dec 30, 2019
This test would have failed prior to the changes in revision 51b4ecf
("Include fileset file in dlist files after purge operation").

This concerns issue duplicati#3982.
warwickmm added a commit to warwickmm/duplicati that referenced this issue Dec 30, 2019
This test would have failed prior to the changes in revision 0cfce6b
("Include fileset file when repairing missing dlist file").

This concerns issue duplicati#3982.
warwickmm added a commit to warwickmm/duplicati that referenced this issue Dec 31, 2019
This test would have failed prior to the changes in revision 8ab6407
("Display correct backup type when using direct restore").  Prior to
revision 8ab6407, all filesets would have been marked as partial when
performing a direct restore without the local database.

This concerns issue duplicati#3982.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants