The purpose of this test is evaluate the performance of branches developed by Kai Risku running backups of a repository with Mbox and SQLite files (a Thunderbird profile):
- Repository: a Thunderbird profile folder, with ~23 Gb, of which the Mbox files represents 19 Gb (82%), and the largest SQLite file has approximately 450 Mb.
- Storages - three local folders:
- one for the "Duplicacy official" compilation (DO)
- one for the "hash_window" compilation (HW)
- one for the "file_boundaries" compilation (FB)
- All storages with variable 1M chunks.
- Perform the full initial backup of the entire folder using the three jobs.
- Proceed with normal daily use for a few days, that is, sending and receiving emails from multiple profile accounts, and run a daily incremental backup.
- On the last day, perform a "compacting" (Thunderbird command) of the files.
- The repository and storage sizes were measured with Rclone (www.rclone.org).
prunecommand will be executed.
By the chart below we can see that the total size of the storage was consistently smaller for the hash_window compilation.
We can also see that the daily increase in storage size is slightly smaller for hash_window compilation, except for the last day when compaction was performed, and the performance of the original Duplicacy compilation was better.
The hash window compilation was also better in the total number of chunks and the number of new chunks each day:
The uploads were aligned with the storages increases:
Then, in the middle of the test, as the result already seemed consistent, and remembering Test 6, I decided to add 3 more jobs (DO, HW and FB), but without the databases, and see the results:
Note: on day 4 the "official" compilation job didn't run because there was a typo in the script and I only found it the next day when I checked the logs.
And finally, an important point to evaluate, especially in the case of cloud storages, is how much storage grows as the repository grows. And in this regard, the big "villain" is again the database backup:
(Remembering that the backup was running with variable chunks, which is not ideal for databases)
The hash window build seems to actually have slightly better performance for this use case.
The storage of the backup with all files (databases and txt files) grows from 3 to 21 (!) times the repository increase, but when we exclude the databases from backup, the increase in storage is only 1.2 to 2.5 times. We conclude then that the use of variable chunks is not really the best option when databases are involved, making backup unfeasible. On the other hand this configuration applies well to the other types of files.
Several other analyzes can be made, then I provide the complete data below:
|Day||Repository size by Rclone||Repository increase|
|DO - storage size by Rclone||DO - storage increase||Revision||DO - all chunks||DO - new chunks||DO - uploaded||backup time|
|HW - storage size by Rclone||HW - storage increase||Revision||HW - all chunks||HW - new chunks||HW - uploaded||backup time|
|FB - storage size by Rclone||FB - storage increase||Revision||FB - all chunks||FB - new chunks||FB - uploaded||backup time|
|DOnoDB - storage size by Rclone (kb)||DOnoDB - storage increase||Revision||DOnoDB - all chunks||DOnoDB - new chunks||DOnoDB - uploaded||backup time|
|HWnoDB - storage size by Rclone (kb)||HWnoDB - storage increase||Revision||HWnoDB - all chunks||HWnoDB - new chunks||HWnoDB - uploaded||backup time|
|FBnoDB - storage size by Rclone (kb)||FBnoDB - storage increase||Revision||FBnoDB - all chunks||FBnoDB - new chunks||FBnoDB - uploaded||backup time|