Implemented backup based on changes recorded in NTFS USN journal #3184

dgehri · 2018-04-19T21:20:01Z

First implementation of NTFS USN journal optimization.

Folder renames are correctly handled (old name is removed, new one added)
I don't actually rely on the USN journal to determine if files have changed. Instead, I use it to filter the source list (reduce it) to the set of possibly modified files and folders, which are then scanned recursively using FilterHandler.EnumerateFilesAndFolders(). That way, there is no risk of incorrectly interpreting the journal entries in complex rename / delete / re-create scenarios. Whenever a file / folder is "touched" in the journal, it will be fully scanned.
The state of the USN journal is recorded in a new table ChangeJournalData in the local database. This table also records a hash value for each fileset, representing the active source files and filter set. An initial full scan is re-triggered whenever the backup configuration is modified.
A full scan is also triggered if the USN journal is incomplete (has been overwritten / truncated), and of course in case of errors.
In case of DB loss, recovery, the USN data is not recovered, and a full scan is triggered to establish a sane baseline for subsequent backups.

TODO:

The USN journal records a limited amount of changes, and if backups are spaced too far apart, full scans are required as the data will be incomplete. This has the following implications:

Frequent (realtime) backups avoid this problem. If nothing has changed, the fileset will be compacted. Duplicati may need optimizing if this becomes a common scenario (compacting before uploading).
Frequent backups result in numerous filesets, and this may interfere with retention policy. Maybe for the current day, many more filesets would make sense in the "automatic" retention policy mode, to avoid deleting changing data.
Less frequent backups with USN support could be made possible by scanning the USN journal at regular intervals, and recording the changes, using a process / thread separate from the backup thread. When the backup is run, this data is then used instead of reading the journal at that time. There is no risk that modifications will be missed during reboots / Duplicati not running, as the USN numbers allow us to ensure that the journal is recorded w/o gaps.

duplicatibot · 2018-04-19T21:21:21Z

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/real-time-backup/263/38

… warnings.

…shortly

…ilter

piegamesde · 2018-04-21T06:40:20Z

I think you can close the pull request and reopen it again if you don't want it to be merged at the moment.

…e used

…atabase, and speed up various queries. This fixes duplicati#1283

… feature/usn

…ge' into feature/usn" This reverts commit 10d6b4c, reversing changes made to 1af7f9e.

dgehri · 2018-04-23T21:04:37Z

The pull request is now in sync with the latest canary.

… due to latest merge with master

…, and added a `using` directive. Added some additional logging to better diagnose issues with VSS/LVM snapshots.

…lass, as it crashes the Mono compiler for some reason.

kenkendk · 2018-05-09T08:54:17Z

Ok, all looks good.
For future commits, it is much easier to review if refactoring and actual changes are two different commits.

I added some logging that was not previously there, and made some minor fixes to the MD5 utility class:
https://github.com/duplicati/duplicati/tree/feature/usn-support-update

I did have one problem, namely that the EnumerateRecords function crashes the Mono compiler when it tries to build the enumeration class. This is quite strange as enumerator methods are used in many places, but for some reason this particular function is causing problems. It might be related to the type being enumerated over is private. In any case, I rewrote the function to use a manual instantiated IEnumerable<Record> (not as pretty, but at least the compiler does not crash).

Can you review/test my changes and then I will merge them into master, closing this PR in the progress?

dgehri · 2018-05-09T09:03:32Z

Kenneth, Thanks for reviewing, and yes, I will test your changes right now. I know about the refactoring. The issue was that I needed the Snapshot classes to behave differently, namely, that they Enumeration method takes the list of folders to enumerate, and not the constructor. This resulted in a big code change, and that's when I realized that I can minimize new duplicate code by refactoring. I'll try to keep things separate in the future. Not related to USN, and regarding snapshots, by the way: this takes almost 5' on my PC, and in fact, it would be sufficient to use the VSS only for files that cannot be backed up directly (because they are open). An additional option "on demand" would be nice, where the snapshot service is used only on a per-file basis (not per drive), whenever a file cannot be accessed otherwise. I think that would further speed up things. That said, USN already reduces backup time on my server from roughly 30 minutes to about 2-3, and only a few seconds of this are for enumating files. It's close to real-time. - Daniel ᐧ

…

On Wed, May 9, 2018 at 10:54 AM, Kenneth Skovhede ***@***.***> wrote: Ok, all looks good. For future commits, it is much easier to review if refactoring and actual changes are two different commits. I added some logging that was not previously there, and made some minor fixes to the MD5 utility class: https://github.com/duplicati/duplicati/tree/feature/usn-support-update I did have one problem, namely that the EnumerateRecords function crashes the Mono compiler when it tries to build the enumeration class. This is quite strange as enumerator methods are used in many places, but for some reason this particular function is causing problems. It might be related to the type being enumerated over is private. In any case, I rewrote the function to use a manual instantiated IEnumerable<Record> (not as pretty, but at least the compiler does not crash). Can you review/test my changes and then I will merge them into master, closing this PR in the progress? — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#3184 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACLCbrU0safNKAEM7jRUvfZfzJpW9ovXks5twq6_gaJpZM4TchIz> .

dgehri · 2018-05-09T09:28:05Z

Ken,

Your changes look good, too. I just made a minor change to the Enumerator (ensuring that Current is read-only, and ensuring that m_entryData is > sizeof(long).

I think we are good to go.

dgehri · 2018-05-09T13:34:56Z

Not sure why the checks failed - this seems to be a n issue with AppVeyor / Travis rather than this commit...

duplicatibot · 2018-12-02T18:52:58Z

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/will-changed-files-use-macos-fsevents-service/5485/6

duplicatibot · 2019-08-10T21:09:32Z

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/2-0-4-22-ntfs-usn/7719/2

duplicatibot · 2020-02-01T14:06:34Z

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/explain-usn-option/150/7

Implemented backup based on changes recorded in NTFS USN journal

2c1a0dd

dgehri added 5 commits April 20, 2018 09:17

Fixed some of the issues reported by codacy. The others are incorrect…

8af63b1

… warnings.

Disabling USN for now when snapshots are used - this will be enabled …

96d1390

…shortly

Correctly filtering list of changed files / folders using exclusion f…

b97d4e0

…ilter

Updated comment to retrigger AppVeyor

8c13222

Fixed case of Duplicati.Library.Utility.Utility.GetEntryAssembly()

59af7c8

dgehri closed this Apr 21, 2018

dgehri added 4 commits April 22, 2018 23:03

Fixed post-processing of modified file list obtain from USN

a7d60d8

Fixed USN in conjunction with snapshots on Windows

6ce298b

Recording previous journal data as default if change journal cannot b…

fd70a9e

…e used

Enabling USN in conjunction with snapshots

8ff7681

dgehri reopened this Apr 23, 2018

dgehri and others added 4 commits April 23, 2018 22:28

Merge remote-tracking branch 'upstream/master' into feature/usn

1af7f9e

Added refactoring to improve the storage requirements for the local d…

a8a32ea

…atabase, and speed up various queries. This fixes duplicati#1283

Merge remote-tracking branch 'upstream/feature/fix_path_storage' into…

10d6b4c

… feature/usn

Revert "Merge remote-tracking branch 'upstream/feature/fix_path_stora…

7bf0a25

…ge' into feature/usn" This reverts commit 10d6b4c, reversing changes made to 1af7f9e.

dgehri added 11 commits April 25, 2018 16:45

Removed unchanged files from PR

efbf0c1

Merge remote-tracking branch 'upstream/master' into feature/usn

0b43c46

Using DB to calculate fileset size and number of files

7a4467f

Restored .travis.yml from master

001e565

Fixed full-scan for USN, broken due to latest merge with master

894183a

Simplified USN journal interface; corrected folder enumeration broken…

ad8a3f4

… due to latest merge with master

Reduced number of affected files by PR

58739a9

Reduced number of affected files by PR

33d64ee

Merge remote-tracking branch 'origin/feature/usn' into feature/usn

0f338c3

Fixed USN journal service initialization

90b6fc7

USN change journal based backup now correctly reporting source size

8236cae

dgehri and others added 5 commits May 8, 2018 15:10

Merge remote-tracking branch 'upstream/master' into feature/usn

0011f7e

Merge remote-tracking branch 'upstream/master' into feature/usn

eb6fdc7

Removed the redundant byte-array-to-string method from the MD5 module…

2b54bb9

…, and added a `using` directive. Added some additional logging to better diagnose issues with VSS/LVM snapshots.

Auto-fix syntax from editor

6dd9b64

Rewrote the EnumerateRecords method to use an explicit enumerator c…

f9f9ce5

…lass, as it crashes the Mono compiler for some reason.

Minor changes to RecordEnumerator

055d7e4

Pushing again to retrigger checks.

fca4e80

kenkendk merged commit ecd9a6c into duplicati:master May 11, 2018

dgehri deleted the feature/usn branch May 13, 2018 18:45

jojje mentioned this pull request Jul 5, 2018

PathTooLongException when using USNJournal #3311

Closed

SylwesterZarebski mentioned this pull request Nov 18, 2019

Monitor file changes to trigger continuous backups [$50] #3104

Closed

1 task

ts678 mentioned this pull request Jan 27, 2020

With --usn-policy=On, source deselection retains files and folders from removed items #4071

Closed

1 task

ts678 mentioned this pull request Apr 19, 2020

Setting USN as required does not always use USN (Windows) #4175

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented backup based on changes recorded in NTFS USN journal #3184

Implemented backup based on changes recorded in NTFS USN journal #3184

dgehri commented Apr 19, 2018 •

edited

duplicatibot commented Apr 19, 2018

piegamesde commented Apr 21, 2018

dgehri commented Apr 23, 2018

kenkendk commented May 9, 2018

dgehri commented May 9, 2018 via email

dgehri commented May 9, 2018

dgehri commented May 9, 2018

duplicatibot commented Dec 2, 2018

duplicatibot commented Aug 10, 2019

duplicatibot commented Feb 1, 2020

Implemented backup based on changes recorded in NTFS USN journal #3184

Implemented backup based on changes recorded in NTFS USN journal #3184

Conversation

dgehri commented Apr 19, 2018 • edited

duplicatibot commented Apr 19, 2018

piegamesde commented Apr 21, 2018

dgehri commented Apr 23, 2018

kenkendk commented May 9, 2018

dgehri commented May 9, 2018 via email

dgehri commented May 9, 2018

dgehri commented May 9, 2018

duplicatibot commented Dec 2, 2018

duplicatibot commented Aug 10, 2019

duplicatibot commented Feb 1, 2020

dgehri commented Apr 19, 2018 •

edited