Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented backup based on changes recorded in NTFS USN journal #3184

Merged
merged 32 commits into from May 11, 2018

Conversation

@dgehri
Copy link
Contributor

commented Apr 19, 2018

First implementation of NTFS USN journal optimization.

  • Folder renames are correctly handled (old name is removed, new one added)
  • I don't actually rely on the USN journal to determine if files have changed. Instead, I use it to filter the source list (reduce it) to the set of possibly modified files and folders, which are then scanned recursively using FilterHandler.EnumerateFilesAndFolders(). That way, there is no risk of incorrectly interpreting the journal entries in complex rename / delete / re-create scenarios. Whenever a file / folder is "touched" in the journal, it will be fully scanned.
  • The state of the USN journal is recorded in a new table ChangeJournalData in the local database. This table also records a hash value for each fileset, representing the active source files and filter set. An initial full scan is re-triggered whenever the backup configuration is modified.
  • A full scan is also triggered if the USN journal is incomplete (has been overwritten / truncated), and of course in case of errors.
  • In case of DB loss, recovery, the USN data is not recovered, and a full scan is triggered to establish a sane baseline for subsequent backups.

TODO:

The USN journal records a limited amount of changes, and if backups are spaced too far apart, full scans are required as the data will be incomplete. This has the following implications:

  • Frequent (realtime) backups avoid this problem. If nothing has changed, the fileset will be compacted. Duplicati may need optimizing if this becomes a common scenario (compacting before uploading).
  • Frequent backups result in numerous filesets, and this may interfere with retention policy. Maybe for the current day, many more filesets would make sense in the "automatic" retention policy mode, to avoid deleting changing data.
  • Less frequent backups with USN support could be made possible by scanning the USN journal at regular intervals, and recording the changes, using a process / thread separate from the backup thread. When the backup is run, this data is then used instead of reading the journal at that time. There is no risk that modifications will be missed during reboots / Duplicati not running, as the USN numbers allow us to ensure that the journal is recorded w/o gaps.
@duplicatibot

This comment has been minimized.

Copy link

commented Apr 19, 2018

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/real-time-backup/263/38

@piegamesde

This comment has been minimized.

Copy link
Contributor

commented Apr 21, 2018

I think you can close the pull request and reopen it again if you don't want it to be merged at the moment.

@dgehri dgehri closed this Apr 21, 2018

@dgehri dgehri reopened this Apr 23, 2018

dgehri and others added some commits Apr 23, 2018

Added refactoring to improve the storage requirements for the local d…
…atabase, and speed up various queries.

This fixes #1283
@duplicatibot

This comment has been minimized.

Copy link

commented on a8a32ea Apr 23, 2018

This commit has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/optimizing-load-speed-on-the-restore-page/3185/17

This comment has been minimized.

Copy link

replied Apr 30, 2018

This commit has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/question-about-sqlite-file-sizes/3394/3

dgehri added some commits Apr 23, 2018

Revert "Merge remote-tracking branch 'upstream/feature/fix_path_stora…
…ge' into feature/usn"

This reverts commit 10d6b4c, reversing
changes made to 1af7f9e.
@dgehri

This comment has been minimized.

Copy link
Contributor Author

commented Apr 23, 2018

The pull request is now in sync with the latest canary.

dgehri and others added some commits May 2, 2018

Removed the redundant byte-array-to-string method from the MD5 module…
…, and added a `using` directive.

Added some additional logging to better diagnose issues with VSS/LVM snapshots.
Rewrote the `EnumerateRecords` method to use an explicit enumerator c…
…lass, as it crashes the Mono compiler for some reason.
@kenkendk

This comment has been minimized.

Copy link
Member

commented May 9, 2018

Ok, all looks good.
For future commits, it is much easier to review if refactoring and actual changes are two different commits.

I added some logging that was not previously there, and made some minor fixes to the MD5 utility class:
https://github.com/duplicati/duplicati/tree/feature/usn-support-update

I did have one problem, namely that the EnumerateRecords function crashes the Mono compiler when it tries to build the enumeration class. This is quite strange as enumerator methods are used in many places, but for some reason this particular function is causing problems. It might be related to the type being enumerated over is private. In any case, I rewrote the function to use a manual instantiated IEnumerable<Record> (not as pretty, but at least the compiler does not crash).

Can you review/test my changes and then I will merge them into master, closing this PR in the progress?

@dgehri

This comment has been minimized.

Copy link
Contributor Author

commented May 9, 2018

@dgehri

This comment has been minimized.

Copy link
Contributor Author

commented May 9, 2018

Ken,

Your changes look good, too. I just made a minor change to the Enumerator (ensuring that Current is read-only, and ensuring that m_entryData is > sizeof(long).

I think we are good to go.

@dgehri

This comment has been minimized.

Copy link
Contributor Author

commented May 9, 2018

Not sure why the checks failed - this seems to be a n issue with AppVeyor / Travis rather than this commit...

@kenkendk kenkendk merged commit ecd9a6c into duplicati:master May 11, 2018

2 checks passed

continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details

@dgehri dgehri deleted the dgehri:feature/usn branch May 13, 2018

@duplicatibot

This comment has been minimized.

Copy link

commented on 2c1a0dd Jun 29, 2018

This commit has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/snapshot-policy-broken-in-2-0-3-7/3866/7

@duplicatibot

This comment has been minimized.

Copy link

commented Dec 2, 2018

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/will-changed-files-use-macos-fsevents-service/5485/6

@duplicatibot

This comment has been minimized.

Copy link

commented Aug 10, 2019

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/2-0-4-22-ntfs-usn/7719/2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.