Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

package:analyzer's listing in the in-memory file system has terrible performance for large amount of files #47220

Closed
isoos opened this issue Sep 15, 2021 · 10 comments
Assignees
Labels
area-analyzer Use area-analyzer for Dart analyzer issues, including the analysis server and code completion. P3 A lower priority bug or feature request

Comments

@isoos
Copy link

isoos commented Sep 15, 2021

package:analyzer version 2.2.0 with Dart SDK 2.14.1.

We are using the in-memory file system to temporary store the dartdoc-generated files on pub.dev. (The idea was to write a .tar.gz archive directly, instead of individual files). At the end of the process we've iterated over the files using getChildren() calls recursively. It turns out this is really inefficient for 100k+ files: takes more than 30 minutes (aborted process after that).

I've created the file list of the directory, attached as list.txt.gz.
list.txt.gz

Recreating that with empty content and scanning the files with recursive getChildren() call should help to reproduce the issue.

@isoos
Copy link
Author

isoos commented Sep 15, 2021

Related change in pub-dev that improved/fixed the performance issue: dart-lang/pub-dev#5106

@lrhn lrhn added the area-analyzer Use area-analyzer for Dart analyzer issues, including the analysis server and code completion. label Sep 15, 2021
@asashour
Copy link
Contributor

asashour commented Sep 15, 2021

If I may say something. I am unable to reproduce. By extending BaseTest, the below gives milliseconds for .getChildren()

What could be wrong with the below test?

Writing in 58 seconds
Reading in 360 millis
main() {
  defineReflectiveSuite(() {
    defineReflectiveTests(MemoryResourceProviderPerformanceTest);
  });
}

abstract class BaseTest extends FileSystemTestSupport {
...
}

@reflectiveTest
class MemoryResourceProviderPerformanceTest extends BaseTest {

  var chars = 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz1234567890';
  var rnd = Random();
  String getRandomString(int length) => String.fromCharCodes(Iterable.generate(
      length, (_) => chars.codeUnitAt(rnd.nextInt(chars.length))));

  test_getChildren() {
    var filesCount = 100 * 1000;
    var contentLength = 12 * 1024;
    var stopwatch = Stopwatch()..start();

    var folder = provider.getFolder('C:\\');
    for (int i = 0; i < filesCount; i++) {
      getFile(exists: true, content: getRandomString(contentLength), filePath: 'C:\\$i');
    }
    print('Writing in ${stopwatch.elapsed.inSeconds} seconds');
    stopwatch.reset();

    for (var i in folder.getChildren()) {
      if (i is File) {
        i.path;
        i.readAsStringSync();
      }
    }
    print('Reading in ${stopwatch.elapsed.inMilliseconds} millis');
    stopwatch.stop();
  }
}

@isoos
Copy link
Author

isoos commented Sep 16, 2021

@asashour: see the repro code below:

import 'dart:convert';
import 'dart:io';

import 'package:analyzer/file_system/file_system.dart';
import 'package:analyzer/file_system/memory_file_system.dart';
import 'package:http/http.dart' as http;

Future<void> main() async {
  final rs = await http.get(
      Uri.parse('https://github.com/dart-lang/sdk/files/7169842/list.txt.gz'));
  final lines = utf8.decode(gzip.decode(rs.bodyBytes)).split('\n');
  final provider = MemoryResourceProvider();
  for (final line in lines) {
    if (line.isEmpty) continue;
    final path = '/tmp/${line.substring(2)}';
    final f = provider.getFile(path);
    f.parent2.create();
    f.writeAsBytesSync([0, 1, 2]);
  }

  final sw = Stopwatch()..start();
  var count = 0;
  void scanFolder(Folder rs) {
    for (final c in rs.getChildren()) {
      if (c is Folder) {
        scanFolder(c);
      } else if (c is File) {
        count++;
        if (count % 100 == 0) {
          print('[${sw.elapsed}] $count');
        }
      }
    }
  }

  scanFolder(provider.getFolder('/tmp'));
  print(count);
  print(sw.elapsed);
}

@asashour
Copy link
Contributor

asashour commented Sep 16, 2021

Thanks for your response.

The reason seems to come from the below loop, in which the logic iterates over all paths, and for each one if its parent is the current path, then add it as a child.

provider._pathToResource.forEach((resourcePath, resource) {
if (provider.pathContext.dirname(resourcePath) == path) {
children.add(resource);
}
});

This would make recursive .getChildren() very heavy, as shown by your example.

One way to improve the performance, is by having a map of String path, List<String> children, however there could be better solutions.

@isoos
Copy link
Author

isoos commented Sep 16, 2021

Also, as a low-hanging fruit, maybe resource could cache the lazily evaluated dirname() of its path?

@bwilkerson
Copy link
Member

@scheglov

@scheglov
Copy link
Contributor

Thank you, I see the problem.

@scheglov
Copy link
Contributor

@scheglov scheglov self-assigned this Sep 16, 2021
@scheglov scheglov added the P3 A lower priority bug or feature request label Sep 16, 2021
dart-bot pushed a commit that referenced this issue Sep 19, 2021
And by storing the list of child names in a folder we can implement
Folder.getChildren() more efficiently.

Bug: #47220
Change-Id: I893dffd6ada4095b8580ec5d469142478f57f284
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/213682
Reviewed-by: Brian Wilkerson <brianwilkerson@google.com>
Commit-Queue: Konstantin Shcheglov <scheglov@google.com>
@scheglov
Copy link
Contributor

The fix was landed, I see about 200-300 ms to process the list.

@isoos
Copy link
Author

isoos commented Sep 19, 2021

awesome, thanks!

copybara-service bot pushed a commit that referenced this issue Sep 20, 2021
Its absence causes a dartdoc test failure.

Bug: #47220
Bug: https://buganizer.corp.google.com/issues/200468974
Change-Id: I0fe8af902cd9a11abb0102ce4744c57f1738ab5f
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/213880
Reviewed-by: Brian Wilkerson <brianwilkerson@google.com>
Reviewed-by: Janice Collins <jcollins@google.com>
Commit-Queue: Konstantin Shcheglov <scheglov@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-analyzer Use area-analyzer for Dart analyzer issues, including the analysis server and code completion. P3 A lower priority bug or feature request
Projects
None yet
Development

No branches or pull requests

5 participants