Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented backup based on changes recorded in NTFS USN journal #3184

Merged
merged 32 commits into from May 11, 2018

Conversation

dgehri
Copy link
Contributor

@dgehri dgehri commented Apr 19, 2018

First implementation of NTFS USN journal optimization.

  • Folder renames are correctly handled (old name is removed, new one added)
  • I don't actually rely on the USN journal to determine if files have changed. Instead, I use it to filter the source list (reduce it) to the set of possibly modified files and folders, which are then scanned recursively using FilterHandler.EnumerateFilesAndFolders(). That way, there is no risk of incorrectly interpreting the journal entries in complex rename / delete / re-create scenarios. Whenever a file / folder is "touched" in the journal, it will be fully scanned.
  • The state of the USN journal is recorded in a new table ChangeJournalData in the local database. This table also records a hash value for each fileset, representing the active source files and filter set. An initial full scan is re-triggered whenever the backup configuration is modified.
  • A full scan is also triggered if the USN journal is incomplete (has been overwritten / truncated), and of course in case of errors.
  • In case of DB loss, recovery, the USN data is not recovered, and a full scan is triggered to establish a sane baseline for subsequent backups.

TODO:

The USN journal records a limited amount of changes, and if backups are spaced too far apart, full scans are required as the data will be incomplete. This has the following implications:

  • Frequent (realtime) backups avoid this problem. If nothing has changed, the fileset will be compacted. Duplicati may need optimizing if this becomes a common scenario (compacting before uploading).
  • Frequent backups result in numerous filesets, and this may interfere with retention policy. Maybe for the current day, many more filesets would make sense in the "automatic" retention policy mode, to avoid deleting changing data.
  • Less frequent backups with USN support could be made possible by scanning the USN journal at regular intervals, and recording the changes, using a process / thread separate from the backup thread. When the backup is run, this data is then used instead of reading the journal at that time. There is no risk that modifications will be missed during reboots / Duplicati not running, as the USN numbers allow us to ensure that the journal is recorded w/o gaps.

@duplicatibot
Copy link

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/real-time-backup/263/38

@piegamesde
Copy link
Contributor

I think you can close the pull request and reopen it again if you don't want it to be merged at the moment.

@dgehri dgehri closed this Apr 21, 2018
@dgehri dgehri reopened this Apr 23, 2018
@dgehri
Copy link
Contributor Author

dgehri commented Apr 23, 2018

The pull request is now in sync with the latest canary.

@kenkendk
Copy link
Member

kenkendk commented May 9, 2018

Ok, all looks good.
For future commits, it is much easier to review if refactoring and actual changes are two different commits.

I added some logging that was not previously there, and made some minor fixes to the MD5 utility class:
https://github.com/duplicati/duplicati/tree/feature/usn-support-update

I did have one problem, namely that the EnumerateRecords function crashes the Mono compiler when it tries to build the enumeration class. This is quite strange as enumerator methods are used in many places, but for some reason this particular function is causing problems. It might be related to the type being enumerated over is private. In any case, I rewrote the function to use a manual instantiated IEnumerable<Record> (not as pretty, but at least the compiler does not crash).

Can you review/test my changes and then I will merge them into master, closing this PR in the progress?

@dgehri
Copy link
Contributor Author

dgehri commented May 9, 2018 via email

@dgehri
Copy link
Contributor Author

dgehri commented May 9, 2018

Ken,

Your changes look good, too. I just made a minor change to the Enumerator (ensuring that Current is read-only, and ensuring that m_entryData is > sizeof(long).

I think we are good to go.

@dgehri
Copy link
Contributor Author

dgehri commented May 9, 2018

Not sure why the checks failed - this seems to be a n issue with AppVeyor / Travis rather than this commit...

@kenkendk kenkendk merged commit ecd9a6c into duplicati:master May 11, 2018
@dgehri dgehri deleted the feature/usn branch May 13, 2018 18:45
@duplicatibot
Copy link

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/will-changed-files-use-macos-fsevents-service/5485/6

@duplicatibot
Copy link

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/2-0-4-22-ntfs-usn/7719/2

@duplicatibot
Copy link

This pull request has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/explain-usn-option/150/7

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants