Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(macos) Analyzer sometimes stops responding in VSCode when mass file change events occur #45996

Closed
acoutts opened this issue May 12, 2021 · 23 comments
Labels
analyzer-server analyzer-stability area-analyzer Use area-analyzer for Dart analyzer issues, including the analysis server and code completion. P2 A bug or feature request we're likely to work on type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)

Comments

@acoutts
Copy link

acoutts commented May 12, 2021

I've been dming @DanTup for a couple of weeks about this issue (first reported here: Dart-Code/Dart-Code#3295

After tons and tons of testing to narrow things down it seems like when you have a large project open and there's an event which triggers a ton of file changes at once, the analyzer stops responding in vscode and never recovers from its analyzing state.

I cannot reliably reproduce the issue, but i've narrowed down to it sometimes can occur while doing the following:

  • switch git branches
  • flutter clean
  • flutter run
  • flutter pub get
  • adding/removing a folder from the vscode workspace

These seem to cause a lot of file changes and sometimes (quite often) results in the analyzer not recovering. To fix it you have to stop your debug session, restart the analyzer, and do a full build again to start over. It has been quite annoying and some others are noticing it too (cc @lukepighetti).

I tried adding the ios and .git folders to dart.analysisExcludedFolders but it had no effect.

@DanTup mentioned there was a similar issue in windows related to file watchers and he suspected it could be the same issue affecting macos here too.

In all cases everyone I know that has experienced the issue has been running macos. In all of my affected instances i'm running an iOs build (both simulator and physical device are the same effect). All of my affected systems have quite high specs in terms of disk speed.

My affected systems:

  • M1 macbook pro (16gb ram model)
  • Hackintosh desktop (i7 9700k, 64gb ram, NVMe)

I created a test project here with a large number of dart files (10,000) to try and reproduce the issue:
https://github.com/acoutts/dart-crash-issue

But in testing that project alone won't break things. I have to add my (large) app to the vscode workspace and that's when the issues start. The only changes I can think of between these are my project has more packages (more pods maybe - larger iOS folder) and a huge git history (800mb+ .git folder, using LFS as well).

Dart SDK version: 2.14.0-74.0.dev (dev) (Mon May 3 05:56:14 2021 -0700) on "macos_x64"
Flutter 2.3.0-2.0.pre.89 • channel master • https://github.com/flutter/flutter.git
Framework • revision 6e08ea0a2c (2 days ago) • 2021-05-10 22:34:02 +0530
Engine • revision f5b97d0b23
Tools • Dart 2.14.0 (build 2.14.0-74.0.dev)
Version: 1.57.0-insider
Commit: 54d33cf14cf6c0e86880b5b74a5a3628de42bce9
Date: 2021-05-10T05:12:57.526Z (2 days ago)
Electron: 12.0.4
Chrome: 89.0.4389.114
Node.js: 14.16.0
V8: 8.9.255.24-electron.0
OS: Darwin x64 20.4.0
@DanTup
Copy link
Collaborator

DanTup commented May 12, 2021

@aam this reminds me a lot of #40348, although there it was clear the server had crashed, but here it seems like it just stops responding (I don't know if that's because it's a different issue, or crashes behave differently on macOS).

I've not been able to reproduce this myself. Is there anything @acoutts could do that might help understand what's happening (for example getting some sort of dump of the analyzer VM process that could be examined)?

@acoutts
Copy link
Author

acoutts commented May 12, 2021

Using the same pubspec as my big app doesn't seem to break it, so it's not something inside the ios folder (we thought symlinks or maybe large file counts).

@acoutts
Copy link
Author

acoutts commented May 12, 2021

I managed to open the observatory after reproducing the issue. There's one isolate that says loading, but when I tap it, I just get a blank page.

Screen Shot 2021-05-12 at 12 08 57 PM

Screen Shot 2021-05-12 at 12 08 47 PM

@acoutts
Copy link
Author

acoutts commented May 12, 2021

If I open that isolate's debugger from the observatory before managing to reproduce the bug, I can view the output so it seems like you have to connect to that prior to the bug happening.

@srawlins srawlins added analyzer-server analyzer-stability area-analyzer Use area-analyzer for Dart analyzer issues, including the analysis server and code completion. P2 A bug or feature request we're likely to work on type-bug Incorrect behavior (everything from a crash to more subtle misbehavior) labels May 12, 2021
@matejthetree
Copy link

Happens to me also on intellij. What I noticed, is that If I enter the file where there is no analysis, and I change somethin (add and delete space) the analyzer recovers for that file only.

I have to do it in every file, or just invalidate caches and restart.

Let me know if there is something else that I can do

@acoutts
Copy link
Author

acoutts commented May 20, 2021

Is there any chance we can bump the priority on this one? It's really painful having to frequently pause everything to stop debug sessions, restart the analyzer, and do a new build again every time I change branches or run a build.

@bounty1342
Copy link

Seems to have trigger it once also.

Doctor summary (to see all details, run flutter doctor -v):
[✓] Flutter (Channel stable, 2.2.0, on macOS 11.2.3 20D91 darwin-x64, locale en-GB)
[✓] Android toolchain - develop for Android devices (Android SDK version 30.0.3)
[✓] Xcode - develop for iOS and macOS
[✓] Chrome - develop for the web
[✓] Android Studio
[✓] Android Studio (version 4.2)
[✓] VS Code (version 1.56.2)
[✓] Connected device (3 available)

• No issues found!

@chgibb
Copy link

chgibb commented May 23, 2021

We've just taken for granted that enabling/disabling sparse checkouts requires a restart of VSCode. I wonder if this is why.

@bounty1342
Copy link

I have my model generated with 'flutter pub run build_runner build --delete-conflicted-outputs'.
The analyser seems to fail to pickup the changes there. I also generate my I18n, and changes are not picked by default.
But if I rerun the analyser manually it finally works.

There, even that failed and needed to restart VS code entirely.

@aam
Copy link
Contributor

aam commented May 24, 2021

@aam this reminds me a lot of #40348, although there it was clear the server had crashed, but here it seems like it just stops responding (I don't know if that's because it's a different issue, or crashes behave differently on macOS).

I've not been able to reproduce this myself. Is there anything @acoutts could do that might help understand what's happening (for example getting some sort of dump of the analyzer VM process that could be examined)?

Not sure whether VM-level logging will reveal anything useful as it might be too low-level. Perhaps enabling logging of analysis server could reveal last thing it does when it hangs?

@scheglov

@scheglov
Copy link
Contributor

I observed the analysis server going into such state when its main isolate is "loading" a couple of times before. But I have always thought that it is not an issue of the Dart code, but of the VM. Even when there is a cycle in Dart code, the Observatory should continue profiling and show statistics, right?

We also had (and fixed, and re-enabled null safety) flutter/flutter#81492.

@DanTup
Copy link
Collaborator

DanTup commented May 24, 2021

@aam there are a few logs in the Dart-Code issue linked above - the last lines logged by the analysis server instrumentation file are watcher events (they look like files the server would just ignore, though I don't know if there were further events that occurred but weren't logged/flushed to disk):

1620421259477:Watch:<unknown>:/Users/andrewcoutts/Projects/app/ios/Runner.xcodeproj/project.pbxproj:modify
1620421261126:Watch:<unknown>:/Users/andrewcoutts/Projects/app/ios/Flutter/Generated.xcconfig:modify
1620421261126:Watch:<unknown>:/Users/andrewcoutts/Projects/app/ios/Flutter/flutter_export_environment.sh:modify
1620421261226:Watch:<unknown>:/Users/andrewcoutts/Projects/app/.flutter-plugins:modify
1620421261226:Watch:<unknown>:/Users/andrewcoutts/Projects/app/.flutter-plugins-dependencies:modify
1620421262523:Watch:<unknown>:/Users/andrewcoutts/Projects/app/ios/Flutter/Flutter.podspec:modify

We also had (and fixed, and re-enabled null safety) flutter/flutter#81492.

@acoutts can you still repro on versions since that change landed?

@acoutts
Copy link
Author

acoutts commented May 24, 2021

Since that change is in the flutter engine, wouldn't that not be relevant here since the issue is with dart and affects vscode even without a debug session active (ie: change git branches)?

@DanTup
Copy link
Collaborator

DanTup commented May 24, 2021

@acoutts the fix linked in that issue is in the analyzer. If I understand correctly, the roll was blocked on tests that run dart fix (which also uses the analyzer code).

@acoutts
Copy link
Author

acoutts commented May 25, 2021

I've just confirmed the issue is still happening on the latest dev release v2.3.0-12.1.pre which should have that fix in this as it's 2+ weeks before this version.

@DanTup
Copy link
Collaborator

DanTup commented May 26, 2021

@acoutts thanks for testing!

@aam is there any way to tell what's happening in the VM when this happens? Like @scheglov, I'd assumed that if the analyzer was just stuck in a loop that the Isolate would still be inspectable in Observatory. Do you know if that's the case?

@DanTup
Copy link
Collaborator

DanTup commented May 26, 2021

I've managed to reproduce this, and can now do so reliably (on my Mac at least). I'm triggering it by changing Git branches between two versions of flutter_gallery:

cd /tmp
git clone https://github.com/DanTup/flutter_gallery flutter_gallery
cd flutter_gallery
git checkout branch1

In another terminal, run the analysis server (it happens the same whether run from source or snapshot):

/Users/danny/Dev/Google/flutter/bin/cache/dart-sdk/bin/dart  --enable-vm-service=8999 --disable-service-auth-codes /Users/danny/Dev/Google/flutter/bin/cache/dart-sdk/bin/snapshots/analysis_server.dart.snapshot --instrumentation-log-file=/tmp/instrum.txt

Paste this line in to start the server analyzing and wait for it to complete:

{"id":"3","method":"analysis.setAnalysisRoots","params":{"excluded":[],"included":["/tmp/flutter_gallery"]}}

Paste this line in to verify the server is responsive (it will respond with an error):

{"id":"999","method":"foo","params":{}}

Wait for analysis to completely finish then open http://localhost:8999/ to get Observatory for the analysis server. Now in the Git terminal window, change branch:

git checkout branch2

The server now becomes unresponsive and no longer responses to sending the dummy "foo" request from above. Reloading the Observatory window shows the Isolate "loading" and unable to access the debugger functionality for.

I'll dig around in the server source a little to see if reducing/removing the code that runs in the watcher events makes any difference.

@DanTup
Copy link
Collaborator

DanTup commented May 26, 2021

@aam @scheglov I did some digging and found that removing this call to _createAnalysisContexts here stopped the issue (at least for this repro):

if (file_paths.isAnalysisOptionsYaml(pathContext, path) ||
file_paths.isDotPackages(pathContext, path) ||
file_paths.isPackageConfigJson(pathContext, path) ||
file_paths.isPubspecYaml(pathContext, path) ||
false) {
_createAnalysisContexts();
return;
}

I inlined that method and stripped out code until I was left with this triggering it:

if (file_paths.isAnalysisOptionsYaml(pathContext, path) ||
    file_paths.isDotPackages(pathContext, path) ||
    file_paths.isPackageConfigJson(pathContext, path) ||
    file_paths.isPubspecYaml(pathContext, path) ||
    false) {
  var collection = _collection;
  if (collection != null) {
    for (var context in collection.contexts) {
      var rootFolder = context.contextRoot.root;
      changeSubscriptions.remove(rootFolder)?.cancel();
    }
  }
  return;
}

It seems like cancelling the DirectoryWatchers stream inside its handler may be causing this. I was able to repro this in a simple script:

import 'dart:async';
import 'dart:io';

import 'package:watcher/watcher.dart';

late StreamSubscription<WatchEvent> sub;

main() {
  final folder = Directory('/tmp/flutter_gallery');
  Timer.periodic(Duration(seconds: 1), (_) => print('ping!'));
  sub = DirectoryWatcher(folder.path)
      .events
      .listen(_handleWatchEvent, onError: _handleWatchInterruption);
}

void _handleWatchEvent(WatchEvent e) {
  print(e.path);
  // sub.cancel(); // Uncommenting this triggers the issue
}

void _handleWatchInterruption(e) {
  print('Watcher error! $e');
}

If I run this script and switch between branches, it dumps a list of all the files as they are modified and prints ping every second. However if I uncomment the sub.cancel() call and run it, only the first file is printed and then ping is not printed any more:

Screenshot 2021-05-26 at 16 30 26

And the isolate looks the same in Observatory:

Screenshot 2021-05-26 at 16 30 57

I'm not sure whether this points to watcher or the VM, but it's probably as far as I can debug (at least without some pointers).

@aam
Copy link
Contributor

aam commented May 26, 2021

Awesome investigative work @DanTup, thank you! Yeah, something is up with the way how file system watchers are set up and torn down. Let me see if I can dig deeper.

@scheglov
Copy link
Contributor

@DanTup this is amazing. Thank you for finding this!

@aam
Copy link
Contributor

aam commented May 26, 2021

Hopefully https://dart-review.googlesource.com/c/sdk/+/201442 addresses the problem

@aam
Copy link
Contributor

aam commented May 26, 2021

there might be more cases where this could block: adding a new watcher seems to lead to a similar deadlock

@acoutts
Copy link
Author

acoutts commented May 29, 2021

Confirmed that this fixes it! It's now in flutter master channel.

dart-bot pushed a commit that referenced this issue Jun 7, 2021
… dart thread.

At present start/stop requests are scheduled on RunLoop thread.
This results in deadlocks since same RunLoop thread might be busy
with blocking writes of file watching events, not giving a chance
for Dart to read previously-written events. Reading would unblock
writer.
So this CL moves start/stop requests to run on Dart thread instead.

Fixes #45996

TEST=run analysis_server against flutter_gallery github-backed folder, switch between branches and ensure analyze_server remains responsive

Change-Id: I0464eeecf8e46ba3027fa0ed21cc323495d965c3
Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/201442
Commit-Queue: Alexander Aprelev <aam@google.com>
Reviewed-by: Siva Annamalai <asiva@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
analyzer-server analyzer-stability area-analyzer Use area-analyzer for Dart analyzer issues, including the analysis server and code completion. P2 A bug or feature request we're likely to work on type-bug Incorrect behavior (everything from a crash to more subtle misbehavior)
Projects
None yet
Development

No branches or pull requests

8 participants