-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
package:analyzer's listing in the in-memory file system has terrible performance for large amount of files #47220
Comments
Related change in pub-dev that improved/fixed the performance issue: dart-lang/pub-dev#5106 |
If I may say something. I am unable to reproduce. By extending BaseTest, the below gives milliseconds for What could be wrong with the below test?
main() {
defineReflectiveSuite(() {
defineReflectiveTests(MemoryResourceProviderPerformanceTest);
});
}
abstract class BaseTest extends FileSystemTestSupport {
...
}
@reflectiveTest
class MemoryResourceProviderPerformanceTest extends BaseTest {
var chars = 'AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz1234567890';
var rnd = Random();
String getRandomString(int length) => String.fromCharCodes(Iterable.generate(
length, (_) => chars.codeUnitAt(rnd.nextInt(chars.length))));
test_getChildren() {
var filesCount = 100 * 1000;
var contentLength = 12 * 1024;
var stopwatch = Stopwatch()..start();
var folder = provider.getFolder('C:\\');
for (int i = 0; i < filesCount; i++) {
getFile(exists: true, content: getRandomString(contentLength), filePath: 'C:\\$i');
}
print('Writing in ${stopwatch.elapsed.inSeconds} seconds');
stopwatch.reset();
for (var i in folder.getChildren()) {
if (i is File) {
i.path;
i.readAsStringSync();
}
}
print('Reading in ${stopwatch.elapsed.inMilliseconds} millis');
stopwatch.stop();
}
} |
@asashour: see the repro code below: import 'dart:convert';
import 'dart:io';
import 'package:analyzer/file_system/file_system.dart';
import 'package:analyzer/file_system/memory_file_system.dart';
import 'package:http/http.dart' as http;
Future<void> main() async {
final rs = await http.get(
Uri.parse('https://github.com/dart-lang/sdk/files/7169842/list.txt.gz'));
final lines = utf8.decode(gzip.decode(rs.bodyBytes)).split('\n');
final provider = MemoryResourceProvider();
for (final line in lines) {
if (line.isEmpty) continue;
final path = '/tmp/${line.substring(2)}';
final f = provider.getFile(path);
f.parent2.create();
f.writeAsBytesSync([0, 1, 2]);
}
final sw = Stopwatch()..start();
var count = 0;
void scanFolder(Folder rs) {
for (final c in rs.getChildren()) {
if (c is Folder) {
scanFolder(c);
} else if (c is File) {
count++;
if (count % 100 == 0) {
print('[${sw.elapsed}] $count');
}
}
}
}
scanFolder(provider.getFolder('/tmp'));
print(count);
print(sw.elapsed);
} |
Thanks for your response. The reason seems to come from the below loop, in which the logic iterates over all paths, and for each one if its parent is the current sdk/pkg/analyzer/lib/file_system/memory_file_system.dart Lines 605 to 609 in 4aa6373
This would make recursive One way to improve the performance, is by having a map of |
Also, as a low-hanging fruit, maybe |
Thank you, I see the problem. |
And by storing the list of child names in a folder we can implement Folder.getChildren() more efficiently. Bug: #47220 Change-Id: I893dffd6ada4095b8580ec5d469142478f57f284 Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/213682 Reviewed-by: Brian Wilkerson <brianwilkerson@google.com> Commit-Queue: Konstantin Shcheglov <scheglov@google.com>
The fix was landed, I see about 200-300 ms to process the list. |
awesome, thanks! |
Its absence causes a dartdoc test failure. Bug: #47220 Bug: https://buganizer.corp.google.com/issues/200468974 Change-Id: I0fe8af902cd9a11abb0102ce4744c57f1738ab5f Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/213880 Reviewed-by: Brian Wilkerson <brianwilkerson@google.com> Reviewed-by: Janice Collins <jcollins@google.com> Commit-Queue: Konstantin Shcheglov <scheglov@google.com>
package:analyzer
version2.2.0
with Dart SDK2.14.1
.We are using the in-memory file system to temporary store the dartdoc-generated files on pub.dev. (The idea was to write a .tar.gz archive directly, instead of individual files). At the end of the process we've iterated over the files using
getChildren()
calls recursively. It turns out this is really inefficient for 100k+ files: takes more than 30 minutes (aborted process after that).I've created the file list of the directory, attached as
list.txt.gz
.list.txt.gz
Recreating that with empty content and scanning the files with recursive
getChildren()
call should help to reproduce the issue.The text was updated successfully, but these errors were encountered: