Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File list verify is unnecessarily slow on some filesystems #5061

Closed
2 tasks done
Jojo-1000 opened this issue Nov 18, 2023 · 2 comments · Fixed by #5062
Closed
2 tasks done

File list verify is unnecessarily slow on some filesystems #5061

Jojo-1000 opened this issue Nov 18, 2023 · 2 comments · Fixed by #5062

Comments

@Jojo-1000
Copy link
Contributor

  • I have searched open and closed issues for duplicates.
  • I have searched the forum for related topics.

Environment info

  • Duplicati version: current master
  • Operating system: Windows 10
  • Backend: File

Description

Listing the backend files for exFAT targets (may also apply to other FAT types) takes much longer than required. It is suspected this is due to unnecessary lookups of files by name.

Using DirectoryInfo to directly list the folder contents as FileInfo, this lookup is avoided and the list operation completes much faster (50 seconds instead of 1 hour for 110000 files).

ISystemIO should be updated to implement the more efficient method of listing.

Steps to reproduce

  1. Create a folder on exFAT partition with 10000 - 100000 empty files
  2. Create new backup with that folder as target
  3. Run backup
  • Actual result:
    File verify takes a long time (minutes to hours).
  • Expected result:
    File verify should not take longer than a minute.

Test code

This code simulates two different methods of file access. It was discovered that accessing LastAccessTime or other metadata is the main reason for the slowdown.

string path = @"F:\test";
int iterations = 1;
var timeInfo = TimeSpan.Zero;
var timeListNames = TimeSpan.Zero;
var timeLookupNames = TimeSpan.Zero;
var watch = System.Diagnostics.Stopwatch.StartNew();

for (int i = 0; i < iterations; ++i)
{
    // List files directly (current implementation)
    watch.Restart();
    string[] files = System.IO.Directory.GetFiles(path);
    watch.Stop();
    timeListNames += watch.Elapsed;
    watch.Start();
    var accessTimes = (from fileName in files
                       let fi = new System.IO.FileInfo(fileName)
                       select fi.LastAccessTime).ToList();
    watch.Stop();
    timeLookupNames += watch.Elapsed;

    // List by DirectoryInfo
    watch.Restart();
    var accessTimes2 = (from fi in new System.IO.DirectoryInfo(path).GetFiles()
                        select fi.LastAccessTime).ToList();
    watch.Stop();
    timeInfo += watch.Elapsed;

}
Console.WriteLine($"List only names: {timeListNames}\nList + lookup by name: {timeLookupNames}\nList DirectoryInfo: {timeInfo}");

Output with 17000 files

List only names: 00:00:00.0264646
List + lookup by name: 00:00:58.3627105
List DirectoryInfo: 00:00:00.0315012
@duplicatibot
Copy link

This issue has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/why-restore-needs-so-much-time-of-verifying-remote-data/17013/44

Jojo-1000 added a commit to Jojo-1000/duplicati that referenced this issue Nov 19, 2023
Directly list FileInfo instead of lookups by filename. This greatly speeds up backup verify on exFAT.

Closes duplicati#5061
@duplicatibot
Copy link

This issue has been mentioned on Duplicati. There might be relevant details there:

https://forum.duplicati.com/t/write-operations-on-source-disk/17375/10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants