DAS analyzing/ not analyzing loop issue #54712

jwren · 2024-01-23T21:15:20Z

With a recent (within days) master roll of the Dart SDK, the Dart Analysis Server is getting itself into a bad loop with IntelliJ. Loading a simple (macro) project, I get the following traffic from the DAS, on the order of over 8000 notifications that the analyzer is analyzing/ not analyzing:

1706038392215:Noti:{"event"::"server.status","params"::{"analysis"::{"isAnalyzing"::false}}}
1706038392215:Noti:{"event"::"server.status","params"::{"analysis"::{"isAnalyzing"::true}}}
1706038392215:Noti:{"event"::"server.status","params"::{"analysis"::{"isAnalyzing"::false}}}
1706038392215:Noti:{"event"::"server.status","params"::{"analysis"::{"isAnalyzing"::true}}}

The text was updated successfully, but these errors were encountered:

scheglov · 2024-01-23T21:51:38Z

https://dart-review.googlesource.com/c/sdk/+/348082 might fix it.

scheglov · 2024-01-23T22:26:44Z

Although... it might be a different issue.

scheglov · 2024-01-23T22:42:14Z

OK, I see it too.

1706049386780:Noti:{"event"::"server.status","params"::{"analysis"::{"isAnalyzing"::true}}}
1706049386787:Noti:{"event"::"server.status","params"::{"analysis"::{"isAnalyzing"::false}}}
1706049386787:Noti:{"event"::"server.status","params"::{"analysis"::{"isAnalyzing"::true}}}
1706049386794:Noti:{"event"::"server.status","params"::{"analysis"::{"isAnalyzing"::false}}}
1706049386795:Noti:{"event"::"server.status","params"::{"analysis"::{"isAnalyzing"::true}}}
1706049386803:Noti:{"event"::"server.status","params"::{"analysis"::{"isAnalyzing"::false}}}
1706049386806:Noti:{"event"::"server.status","params"::{"analysis"::{"isAnalyzing"::true}}}
1706049386816:Noti:{"event"::"server.status","params"::{"analysis"::{"isAnalyzing"::false}}}
1706049386819:Noti:{"event"::"server.status","params"::{"analysis"::{"isAnalyzing"::true}}}
1706049386829:Noti:{"event"::"server.status","params"::{"analysis"::{"isAnalyzing"::false}}}
1706049386834:Noti:{"event"::"server.status","params"::{"analysis"::{"isAnalyzing"::true}}}
1706049386843:Noti:{"event"::"server.status","params"::{"analysis"::{"isAnalyzing"::false}}}
1706049386844:Noti:{"event"::"analysis.implemented",<cut>

Although for me it is not 0 ms between true and false.
So, there is probably some work happening, and it prevents us from sending these events fast?

bwilkerson · 2024-01-23T22:55:34Z

So, there is probably some work happening, and it prevents us from sending these events fast?

I don't know what that means.

From the standpoint of the server's notifications, we want one notification when the analysis driver goes from idle to working, and one notification when the driver goes from working to idle (with the implication that unless the user changes the content of the analyzed files in some way the status won't change again). The observed behavior doesn't match that goal.

What do we need to do to change this behavior?

scheglov · 2024-01-23T23:02:08Z

Well, what you see matches the description. You ask for an analysis result, e.g. index, the analysis driver switches from idle to working, returns the result, and switches back to idle. Then the search engine, or an operation higher, e.g. implemented markers, asks another question to the driver, it switches to "working" again, etc.

scheglov · 2024-01-23T23:08:41Z

BTW, it looks that we do some work for analytics when switch to idle, and it is quite expensive work.
I guess this is what DAS was doing, and spending 7-8 ms each time.

bwilkerson · 2024-01-23T23:22:07Z

Well, what you see matches the description. You ask for an analysis result, e.g. index, the analysis driver switches from idle to working, returns the result, and switches back to idle. Then the search engine, or an operation higher, e.g. implemented markers, asks another question to the driver, it switches to "working" again, etc.

When I look at the log files on my machine I see the pattern you described: start/stop notifications separated by request/response pairs. But that's not the pattern that was observed. The observed pattern is a long list of start/stop pairs with no other traffic. What's being observed does not match my description of what we want to see.

BTW, it looks that we do some work for analytics when switch to idle, and it is quite expensive work.
I guess this is what DAS was doing, and spending 7-8 ms each time.

Is that observation suppose to be related to the problem being described, or an aside? I honestly can't tell.

jwren · 2024-01-23T23:32:50Z

@DanTup as a FYI. I have been seeing flickering in VS Code, I suspect it is due to the status bar changing when this status changes.

scheglov · 2024-01-24T00:59:13Z

Well, what you see matches the description. You ask for an analysis result, e.g. index, the analysis driver switches from idle to working, returns the result, and switches back to idle. Then the search engine, or an operation higher, e.g. implemented markers, asks another question to the driver, it switches to "working" again, etc.

When I look at the log files on my machine I see the pattern you described: start/stop notifications separated by request/response pairs. But that's not the pattern that was observed. The observed pattern is a long list of start/stop pairs with no other traffic. What's being observed does not match my description of what we want to see.

Here is what happens when you search for a method references in DAS.

We build the type hierarchy of the class that has this method. This is multiple requests to SearchEngine, which does multiple requests to AnalysisDrivers (potentially more than one driver instance).
I will stop describing it here.

Because there are multiple requests to AnalysisDriver, each time it switches from idle to working.

One logical request on the protocol level does multiple operations on lower levels.

BTW, it looks that we do some work for analytics when switch to idle, and it is quite expensive work.
I guess this is what DAS was doing, and spending 7-8 ms each time.

Is that observation suppose to be related to the problem being described, or an aside? I honestly can't tell.

This is a clarification, why there are 7-8 ms delays in my first message, where I provided a portion of the protocol log. It was unclear to me, why there were pauses. Now I know - analytics.

DanTup · 2024-01-24T10:43:27Z

@DanTup as a FYI. I have been seeing flickering in VS Code, I suspect it is due to the status bar changing when this status changes.

Do you mean you just see the "analyzing..." flash on and off, or that more of VS Code is flickering?

We do have a 100ms debounce in the client to stop the status bar being updated too often (though it could still happen 10 times per second). Moving this debounce to the server could make sense (it would save some useless protocol traffic), but someone noted recently that some of the CLI tooling might use these events so we'd need to be careful if we change them.

One thing that's not clear to me here is whether this is stuck in a loop or just generating a bunch of analysis status changes and then settles down (the original title says loop but I think what @scheglov saw/described is finite).

bwilkerson · 2024-01-24T16:07:21Z

I don't have a concrete proposal for how to change the implementation, but I think I know what the requirements for this notification are.

In the command-line analyzer (dart analyze), the notification is used to signal that all of the files that were requested to be analyzed have been analyzed and all of the diagnostic notifications for those files have been sent. It is critical that we not report that analysis is complete until after the last diagnostic notification has been sent.

This was the issue that led to some of the recent re-work in the analysis driver's result stream, and as far as I can tell this is now working as required.

In the IDE, we want to display status information whenever the server is busy for long periods of time and the server might be slow to be responded to other requests. The requirement here is to produce the most useful UX we can.

When performing the initial analysis of the contents of the workspace it is useful to let users know that the server is busy. Similarly when re-analyzing after switching branches. It's less clear that there's any value in letting users know when we are finding references to a method, both because it (hopefully) doesn't take nearly as long and because the lack of search results is likely to be enough of a clue as to what's happening.

Personally, I'd be happy to not have any such notifications while we're searching for references, but if we do we should have one notification when we first switch to busy and one notification just before we return the search results.

Moving this debounce to the server could make sense ...

I think this would be preferable. The server should work consistently across IDEs, and moving this logic to the server would aid in that goal.

It was unclear to me, why there were pauses. Now I know - analytics.

I opened #54716 to track this issue.

scheglov · 2024-01-24T17:39:10Z

I don't have a concrete proposal for how to change the implementation, but I think I know what the requirements for this notification are.

In the command-line analyzer (dart analyze), the notification is used to signal that all of the files that were requested to be analyzed have been analyzed and all of the diagnostic notifications for those files have been sent. It is critical that we not report that analysis is complete until after the last diagnostic notification has been sent.

This was the issue that led to some of the recent re-work in the analysis driver's result stream, and as far as I can tell this is now working as required.

Yes, if all you do is adding files to AnalysisDriver, it should produce single AnalysisStatusWorking followed by single AnalysisStatusIdle.

Note, dart analyze uses the same DAS as IDE.

In the IDE, we want to display status information whenever the server is busy for long periods of time and the server might be slow to be responded to other requests. The requirement here is to produce the most useful UX we can.

When performing the initial analysis of the contents of the workspace it is useful to let users know that the server is busy. Similarly when re-analyzing after switching branches. It's less clear that there's any value in letting users know when we are finding references to a method, both because it (hopefully) doesn't take nearly as long and because the lack of search results is likely to be enough of a clue as to what's happening.

Personally, I'd be happy to not have any such notifications while we're searching for references, but if we do we should have one notification when we first switch to busy and one notification just before we return the search results.

The (not) "nearly as long" is absolutely not guaranteed. I can imagine a tiny Flutter application that uses package:flutter/widgets.dart, and then when you search for anything, it will "discover" all files in package:flutter, all dependencies of your package, and go to index them all. This can easily take more time than the initial analysis of your tiny application.

So, the most honest description from AD is to say "I'm working". If this work is a part of a more complicated work on DAS level, it is up to DAS to track that there is such higher level operation in progress, and keep the status "the server is busy, and will be slow to respond". I think we fail to do this currently, and pretend that DAS is idle, while actually it is busy doing something, and will not respond.

Moving this debounce to the server could make sense ...

I think this would be preferable. The server should work consistently across IDEs, and moving this logic to the server would aid in that goal.

No. Doing debounce is just avoiding to implement the right solution for status tracking. And in practice for non-IDE users, like dart analyze, this would just delay receiving "done" notification for the duration of the debounce period.

DanTup · 2024-01-24T18:04:35Z

No. Doing debounce is just avoiding to implement the right solution for status tracking. And in practice for non-IDE users, like dart analyze, this would just delay receiving "done" notification for the duration of the debounce period.

I think there are two different goals here. Not flicking between idle/busy when we're performing a long operation is one thing, but it's also possible that the server is legitimately changing between those states very frequently servicing different requests. Although caching of priority files might hide this, without the caching we could have:

update overlay
server goes busy, sends notification
server goes idle, sends notification
code completion
server goes busy, sends notification
server goes idle, sends notification
semantic tokens
server goes busy, sends notification
server goes idle, sends notification
code actions
server goes busy, sends notification
server goes idle, sends notification

VS Code will send a lot of requests on every keystroke, and if the server is running very quickly, it could flip between states a lot. I don't think we need the UI to flicker in this case, so I think debouncing these messages makes sense.

And in practice for non-IDE users, like dart analyze, this would just delay receiving "done" notification for the duration of the debounce period.

We could debounce differently for non-IDEs (not long running servers) versus others and we don't need to debounce the end of analysis, only the start. Eg. "when analysis starts, delay showing the status for 100ms and then send it if analysis didn't finish in the meantime". This has the effect of filtering out very short analysis periods, which is really the goal (at least for something like the IDE status bar).

It's possible that the progress notifications in LSP and the legacy server aren't like-for-like though. In LSP it's really just a generic progress indicator intended to show the user is something busy (and not be used as a signal of any specific work being done), whereas in the legacy protocol it's more explicitly the start/stop of analysis.

scheglov · 2024-01-24T18:11:35Z

I think it is fine to delay displaying in IDE switching to/from working/idle.
But DAS should report what actually happens, if it is busy, it is busy.

Note, this discussion might be independent from using AD status as DAS status.
See above about higher level operations vs. AD operations.

jwren · 2024-01-24T18:13:33Z

On a theoretical level I completely agree @scheglov, but the amount of additional traffic here is on the order of additional megabytes per hour of traffic that we are talking about ignoring.

bwilkerson · 2024-01-24T22:20:48Z

Doing debounce is just avoiding to implement the right solution for status tracking.

I'd love to hear what you think the right solution is for status tracking.

scheglov · 2024-01-24T22:26:50Z

DAS is working (AKA {"isAnalyzing":true} in legacy protocol) when either:

AnalysisDriverScheduler is working.
There is a currently running operation, e.g. search.findElementReferences.
There are pending operations in the queue (already parsed from the input stream).

If not (1) or (2) or (3), pump the event queue, if still nothing in (3), DAS is idle.

bwilkerson · 2024-01-25T22:24:37Z

I believe that we now have a plan for fixing this issue:

Introduce a flag that the command-line analyzer can use to preserve the current semantics.
Introduce some debouncing logic into the analysis server (guarded by the flag) to prevent the frequent status updates.

If that proves to not be sufficient, then we can wrap frequently long-running operations, such as refactorings and searches, to suppress the notifications. But we only want to do that if it's necessary.

This name better describes when the status is produced. Inspired by #54712 Change-Id: I23a8f2f3f6ee500d21c8d93dd81f4f78f2b1840e Reviewed-on: https://dart-review.googlesource.com/c/sdk/+/348181 Commit-Queue: Konstantin Shcheglov <scheglov@google.com> Reviewed-by: Brian Wilkerson <brianwilkerson@google.com>

bwilkerson · 2024-01-27T16:11:13Z

The flag is introduced in https://dart-review.googlesource.com/c/sdk/+/348765. There's no code yet for debouncing the notifications, but we have a way to not impact the command-line tool when that's added.

@DanTup Given that we need to update both the server and the client, what's the easiest path forward for you? Do we need a custom server capability so that we don't degrade performance on the client side when the plugin is being used with an older version of the SDK?

DanTup · 2024-01-27T18:07:07Z

I think the only way we could not impact old extensions is to make the server debounce opt-in as a client capability. If it's done by default, then old plugins (that we can't change) would never know the server is debouncing so would always apply another layer.

That said, I'm not sure if having the debounce overlap would really matter. The Dart-Code debounce is 100ms.. if the server was implemented the same, I think it would be the same as increasing the client debounce to 200ms (eg., any analysis that is less than 200ms doesn't show on the status bar, rather than 100ms). I don't feel like that difference would matter because the purpose is not to show that analysis is stopping/starting, but really an indication of "Analysis is still happening" which only really matters when it's longer periods.

So, my suggestion is that we implement it and verify it works as expected above, then just use the SDK version that ships in (via these capabilities flags in Dart-Code) to disable the client-side debouncing, and old plugins will just get the increased debounce duration until they upgrade?

bwilkerson · 2024-01-29T16:18:51Z

That sounds reasonable to me. My only concern was that we not degrade the UX by not debouncing at all (which seemed possible if a version of Dart-Code that didn't debounce was used with a version of the server that also wasn't debouncing.

jwren assigned scheglov Jan 23, 2024

jwren added the analyzer-server label Jan 23, 2024

scheglov closed this as completed Jan 23, 2024

scheglov reopened this Jan 23, 2024

lrhn added the area-analyzer Use area-analyzer for Dart analyzer issues, including the analysis server and code completion. label Jan 24, 2024

scheglov removed their assignment Jan 24, 2024

bwilkerson mentioned this issue Jan 24, 2024

[analytics] Performance issues when reporting project structure #54716

Open

srawlins added the P2 A bug or feature request we're likely to work on label Jan 25, 2024

DanTup mentioned this issue Mar 1, 2024

dart analyze should use LSP to communicate with analysis server #55075

Open

srawlins added the type-bug Incorrect behavior (everything from a crash to more subtle misbehavior) label Mar 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAS analyzing/ not analyzing loop issue #54712

DAS analyzing/ not analyzing loop issue #54712

jwren commented Jan 23, 2024

scheglov commented Jan 23, 2024

scheglov commented Jan 23, 2024

scheglov commented Jan 23, 2024

bwilkerson commented Jan 23, 2024

scheglov commented Jan 23, 2024

scheglov commented Jan 23, 2024

bwilkerson commented Jan 23, 2024

jwren commented Jan 23, 2024

scheglov commented Jan 24, 2024

DanTup commented Jan 24, 2024

bwilkerson commented Jan 24, 2024

scheglov commented Jan 24, 2024

DanTup commented Jan 24, 2024

scheglov commented Jan 24, 2024 •

edited

Loading

jwren commented Jan 24, 2024

bwilkerson commented Jan 24, 2024

scheglov commented Jan 24, 2024

bwilkerson commented Jan 25, 2024

bwilkerson commented Jan 27, 2024

DanTup commented Jan 27, 2024

bwilkerson commented Jan 29, 2024

DAS analyzing/ not analyzing loop issue #54712

DAS analyzing/ not analyzing loop issue #54712

Comments

jwren commented Jan 23, 2024

scheglov commented Jan 23, 2024

scheglov commented Jan 23, 2024

scheglov commented Jan 23, 2024

bwilkerson commented Jan 23, 2024

scheglov commented Jan 23, 2024

scheglov commented Jan 23, 2024

bwilkerson commented Jan 23, 2024

jwren commented Jan 23, 2024

scheglov commented Jan 24, 2024

DanTup commented Jan 24, 2024

bwilkerson commented Jan 24, 2024

scheglov commented Jan 24, 2024

DanTup commented Jan 24, 2024

scheglov commented Jan 24, 2024 • edited Loading

jwren commented Jan 24, 2024

bwilkerson commented Jan 24, 2024

scheglov commented Jan 24, 2024

bwilkerson commented Jan 25, 2024

bwilkerson commented Jan 27, 2024

DanTup commented Jan 27, 2024

bwilkerson commented Jan 29, 2024

scheglov commented Jan 24, 2024 •

edited

Loading