Simplify diagnostic tagging by making it use the standard tagging model. #23448

CyrusNajmabadi · 2017-11-28T23:36:06Z

Customer scenario

User is working on a big solution and switching between git branches multiple times or close and reopen solutions multiple times. and on some unfortunate cases, VS will crash with out of memory exception.

Bugs this fixes

Fixes #24055
Fixes DevDiv 512757

Supersedes #22920, #23377, #23409, #23411

Workarounds, if any

after each git branch switching or solution open/close, give VS sometimes to process pending works enqueued by the operation.

Risk

this simplify our diagnostic tagger dramatically. so there is a risk where behavior might not exactly same as before. but we believe this is right direction to go.

Performance impact

this should remove OOM due to too many pending UI work items completely. that is source of most of our OOM crash.

Is this a regression from a previous update?

No

Root cause analysis

previously, diagnostic service didn't support pull model for all diagnostics source. so, tagger used event (push model) to hold onto last reported diagnostics and later use that to report tags. and that made us to use custom logic for the tagger which ends up forcing us to use UI thread to synchronize many states to remove potential race. and that caused us to push too many work items to UI thread in certain case such as git branch switching.

now, diagnostic service fully supporting pull model for all sources, this moves diagnostic taggers to follow our tagger framework which doesn't require UI thread for state synchronization. removing the root cause of OOM from the picture completely.

How was the bug found?

MemoryWatson

....

more dev detail.

From a conversation with @heejaechang #23409 (comment)

A while back the diagnostics subsystem had a limitation where you could only hear about some diagnostics if you explicitly listened for diagnostic events. i.e. if you weren't listenting and capturing those events, you couldn't go back and ask for those diagnostics later. This meant that we couldn't do diagnostic tagging (squiggles/fading/suggestions) like we did normal tagging. Normal tagging hears about events, pauses a bit, then goes and gets all the data necessary later to produce the tags. Because that data wasn't available 'later', diagnostic tagging had to aggregate the info and contort things to fit into the tagging infrastructure.

This restriction from the diagnostics service no longer exists. THat means we can great simplify how we do our tagging computation.

CyrusNajmabadi · 2017-11-28T23:36:37Z

Tagging @heejaechang @sharwell .

Note: this approach in incorrect in a very fundamental way currently. Wanted to start the discussion with you on how we might fix that.

CyrusNajmabadi · 2017-11-28T23:43:11Z

@heejaechang I took the appraoch whereby the event passes along the diagnostic id that was updated. However, this approach is fundamentally flawed with how tagging works currently. The reason for that is that tagging is hugely cancellable. i.e. any time tagging gets an event, it cancels the current work and enqueues the new recomputation. The assumption is that any recomputation supersedes any previous or inflight computation.

That's not the case with this optimization. For example, if we heard about a global "recompute tags for all providers" notification, we can't supersede that with a request to recompute tags for a single provider id.

We have a few options here afaict (other suggestions welcome!):

don't pass along the data about which provider changed. always recompute all providers. Pros: super simple. Cons: possibly expensive (though we are on the BG).
somehow make some parts of tagging conditionally cancellable. so a single provider update could cancel an update for that provider. But a provider update would not cancel the updates for other providers. Pros: fine grained updating. Cons: super complex (i can't see a suitable way to do this, not with how cancellation is threaded through tagging).
have the diagnostic service track which providers changed. have a way for tagging to call in and ask if things have changed for any of the Ids. Pros: simple on the tagging side. Cons: may not actually be any more efficient. May be just as complex on the diagnostic side.

Thoughts?

heejaechang · 2017-11-29T00:46:28Z

I think we can go with simple approach and do some perf investigation and see whether we need to optimize more.

...

by the way, even with this approach, it doesnt remove the fact that diagnostic service is based on solution, and tagger is based on buffer and dynamically buffer to workspace and buffer to document association can be changed. in other word, we still need to filter out diagnostic events that is not associated with this subject buffer

(see https://github.com/dotnet/roslyn/pull/22920/files#r153666048 if we want document specific events)

..

we also can make diagnostic event source to aggregate object ids as we aggregate text changes on text change event source.

CyrusNajmabadi · 2017-11-29T00:48:21Z

by the way, even with this approach, it doesnt remove the fact that diagnostic service is based on solution, and tagger is based on buffer and dynamically buffer to workspace and buffer to document association can be changed. in other word, we still need to filter out diagnostic events that is not associated with this subject buffer

That already happens here: https://github.com/dotnet/roslyn/blob/master/src/EditorFeatures/Core/Shared/Tagging/EventSources/TaggerEventSources.DiagnosticsChangedEventSource.cs#L28

CyrusNajmabadi · 2017-11-29T00:48:45Z

I think we can go with simple approach and do some perf investigation and see whether we need to optimize more.

Do you mean approach #1 ?

heejaechang · 2017-11-29T00:50:39Z

ha, we had that before? interesting...

heejaechang · 2017-11-29T00:55:30Z

@CyrusNajmabadi yep. I mean (1).

basically whenever there is event source changes, we do what we currently do for initial tags. ask all diagnostics reported on this file, filter out empty and not related ones (suggestion/unnecessary/regular), and report rest. we no longer need sub taggers since we always report all of them.

really simple. and for common case, probably good enough (since most of them will be empty or have small number of errors). probably will have issues for corner cases where file contains thousands of errors since it will repeatedly report those.

CyrusNajmabadi · 2017-11-29T00:59:28Z

probably will have issues for corner cases where file contains thousands of errors since it will repeatedly report those.

Well, we'll still always diff the spans against the old ones. So we'll only report the change.

CyrusNajmabadi · 2017-11-29T01:00:34Z

Ok. I pushed the simpler model. take a look when you get a chance.

CyrusNajmabadi · 2017-11-29T01:06:42Z

src/EditorFeatures/Core/Implementation/Diagnostics/AbstractDiagnosticsTaggerProvider.cs

Logic copied from:
https://github.com/dotnet/roslyn/pull/23448/files#diff-736f55c2945c85cc9e7af36d167b5145L132

CyrusNajmabadi · 2017-11-29T01:07:39Z

src/EditorFeatures/Core/Implementation/Diagnostics/AbstractDiagnosticsTaggerProvider.cs

From:
https://github.com/dotnet/roslyn/pull/23448/files#diff-736f55c2945c85cc9e7af36d167b5145L60

CyrusNajmabadi · 2017-11-29T01:09:03Z

Tagging @sharwell @heejaechang @dotnet/roslyn-ide -500 lines in diagnostic tagging FTW.

Note: this should probably be smoke tested and also perf tested. While is simplifies things greatly, and pushes a lot of work to the BG. It may end up performing more work and make have unintended consequences.

CyrusNajmabadi · 2017-11-29T07:03:10Z

Tagging @dotnet/roslyn-ide

sharwell · 2017-11-29T12:03:58Z

📝 I'm planning to look at this for 15.7+ unless we get indications of more substantial problems.

heejaechang · 2017-11-29T13:58:34Z

src/EditorFeatures/Core/Implementation/Diagnostics/AbstractDiagnosticsTaggerProvider.cs

I would add ActiveContextChanged Event source here as well. and remove this (http://source.roslyn.io/#Microsoft.CodeAnalysis.Features/Diagnostics/DiagnosticAnalyzerService_IncrementalAnalyzer.cs,58)

looks like we added the code above as a quick way to handle context change when it is first introduced to workspace. but that is not right way to handle it. DiagnosticAnalyzerService shouldn't care about buffer. it should only care about solution.

hrmm.. can a buffer even cross workspaces though? i think it can (from Misc Workspace to normal workspace, and vice versa). So i think we need WorkspaceRegistrationChanged as well. I will add ActiveContextChanged though.

heejaechang · 2017-11-29T14:00:26Z

src/EditorFeatures/Core/Implementation/Diagnostics/AbstractDiagnosticsTaggerProvider.cs

can buffer ever be null? I don't believe snapshot can have null buffer.

probably not. though i was just preserving hte code from before. so i would prefer to not change this.

heejaechang · 2017-11-29T14:01:46Z

src/EditorFeatures/Core/Implementation/Diagnostics/AbstractDiagnosticsTaggerProvider.cs

can we move this try/catch down to ProduceTags for one specific updateArg so that even if 1 fails, we still can get other analyzers squiggles?

heejaechang · 2017-11-29T14:03:27Z

awesome! things got really simpler! and tests are all passed!

heejaechang · 2017-11-29T14:04:31Z

probably need to do some manual testing before we check this in.

@jasonmalinowski what would be branch we can check this in not affecting ask mode? like post 15.6 branch or something?

jasonmalinowski · 2017-12-01T01:14:20Z

@heejaechang We have no such branch -- even if we did nobody would be looking at it. If your desire is for some dogfooding/testing on it, there's a few things we can do.

heejaechang · 2017-12-01T20:23:06Z

if we have a few release left for 15.6 RTM, then I think we should just check this in to 15.6

jasonmalinowski

The whole determination of editorSnapshot independent of requestedSnapshot seems really fishy here.

jasonmalinowski · 2018-01-04T02:17:48Z

src/EditorFeatures/Core/Implementation/Diagnostics/AbstractDiagnosticsTaggerProvider.cs

Why is this preferred over just GetText()? It should be available either way so be equally fast. But if something goes sideways, this version is broken.

agreed. i can change to just using GetText.

jasonmalinowski · 2018-01-04T02:24:03Z

src/EditorFeatures/Core/Implementation/Diagnostics/AbstractDiagnosticsTaggerProvider.cs

How is this ever different than spanToTag.SnapshotSpan.Snapshot? This code and then the .TranslateTo in ProduceTags(...) implies with much vigor they can be different, but I don't see how.

i think some of this logic dates back like 10ish years. It's quite possible that it was back at a time where we had less invariants about Documents/SourceTexts/Snapshots. I'm happy to remove (or assert it must be hte same).

Well, it can only date 8 years ago... ;-)

Where did this move from originally? Was originally the snapshot you're mapping from related to the snapshot the diagnostics were computed for? It implies I can get stale spans, but since those buffers will be the same we're not correctly mapping things forward.

This moved from: https://github.com/dotnet/roslyn/blob/master/src/EditorFeatures/Core/Implementation/Diagnostics/AbstractDiagnosticsTaggerProvider.AggregatingTagger.cs#L290

Comment on that says:

// Make sure we can find an editor snapshot for these errors. Otherwise we won't // be able to make ITagSpans for them. If we can't, just bail out. This happens // when the solution crawler is very far behind. However, it will have a more // up to date document within it that it will eventually process. Until then // we just keep around the stale tags we have.

I think a lot of the complexity here was that in the old system you had notifications from the diagnostic engine about a set of data, and we were trying to relate it to the editor-based data. There were moving at different speeds, with different invariants, and we had code like this to try to be resilient to that sort of thing.

If we're now just querying the diagnostic subsystem with teh correct tagger info we have, i don't think we have any issues adn we can likely clean this up even more.

Yup, looking at this, i don't think we need it anymore.

old one did translate to since DiagnosticUpdated event is not part of tag event source. due to that it didn't update internal interval tree when diagnostic is updated, rather it just hold onto snapshot at the moment. and later did "translate to" to move spans to right one itself. so at the time, 2 documents were actually different (one saved when DiagnsoticUpdated is called, and one when event source is raised - ProduceTags is called)

this new one no longer does that, now DiagnosticUpdated is part of tag event source. when diagnostic changed, it save spans in taggers internal interval tree and let tagger's own mechanism to handle translate to. so now there is only 1 document. one that is from SpansToTag.Document. the code seems just artifact of some copy over from existing code. but I think it is just matter of how it get to same data. I dont think it will cause bug since 2 should be same.

"// Make sure we can find an editor snapshot for these errors. Otherwise we won't
// be able to make ITagSpans for them. If we can't, just bail out. This happens
// when the solution crawler is very far behind. However, it will have a more
// up to date document within it that it will eventually process. Until then
// we just keep around the stale tags we have."

agree this is not needed anymore. since there is no longer 2 documents as we used to.

CyrusNajmabadi · 2018-01-11T19:00:15Z

Not a problem (i hope).

CyrusNajmabadi · 2018-01-11T19:03:50Z

@jasonmalinowski Did i do this properly?

jasonmalinowski · 2018-01-11T19:38:27Z

@CyrusNajmabadi Well done!

jinujoseph · 2018-01-11T20:01:23Z

Thanks @CyrusNajmabadi
Adding @Pilchie for ask mode approval

jinujoseph · 2018-01-11T20:10:50Z

just realized this is missing ask mode template

CyrusNajmabadi · 2018-01-11T20:48:37Z

@jasonmalinowski Can you help with the template?

CyrusNajmabadi · 2018-01-11T20:49:55Z

Also, i would recommend this be smoke tested.

heejaechang · 2018-01-11T21:37:05Z

@jinujoseph added ask mode template.

sharwell · 2018-01-11T21:59:19Z

@heejaechang I updated the ask mode template to link the other bugs

CyrusNajmabadi · 2018-01-11T22:09:21Z

thanks @sharwell !

heejaechang · 2018-01-11T22:20:01Z

@jinujoseph @mattscheffer we probably want to have manual testing as soon as possible.

jinujoseph · 2018-01-11T22:21:36Z

@richaverma1 to help with manual testing

Pilchie · 2018-01-12T02:57:49Z

The scenario meets the bar. Consider this approved, pending satisfactory results from the manual testing. @richaverma1 Can you add the "Approved to merge" label when testing is complete and you are happy with the results?

heejaechang · 2018-01-16T20:51:28Z

merged! thank you everyone! thank you @CyrusNajmabadi

CyrusNajmabadi · 2018-01-16T20:53:24Z

Any time!

yuvalshtemer · 2018-05-21T11:46:44Z

Pardon me for being a github beginner, but I'm in real pain because of this issue.. would appreciate some assistance on how to pull & install the fix

heejaechang · 2018-05-21T23:20:05Z

@yuvalshtemer latest VS should have this fix, you don't need to pull anything. just move to latest VS.

CyrusNajmabadi force-pushed the simpleDiagnosticTagging branch 4 times, most recently from 93063d5 to 8c6a67c Compare November 29, 2017 01:05

CyrusNajmabadi commented Nov 29, 2017

View reviewed changes

heejaechang reviewed Nov 29, 2017

View reviewed changes

heejaechang approved these changes Nov 29, 2017

View reviewed changes

heejaechang mentioned this pull request Nov 29, 2017

reduce diagnostic tagger's usages of UI thread for synchronization #22920

Closed

heejaechang mentioned this pull request Dec 20, 2017

Wrong squiggles on the code editor #23804

Closed

heejaechang mentioned this pull request Jan 4, 2018

Batch up and flatten diagnostic updates, second approach. #23409

Closed

jasonmalinowski reviewed Jan 4, 2018

View reviewed changes

Simplify diagnostic tagging by making it use the standard tagging model.

c6b74a2

CyrusNajmabadi force-pushed the simpleDiagnosticTagging branch from fd8c45e to c6b74a2 Compare January 11, 2018 19:03

jasonmalinowski approved these changes Jan 11, 2018

View reviewed changes

Cleanup.

170d0a6

richaverma1 added the Approved to merge label Jan 13, 2018

heejaechang merged commit 83520a7 into dotnet:dev15.6.x Jan 16, 2018

This was referenced Jan 16, 2018

Batch up and flatten diagnostic updates. #23377

Closed

Batch up and flatten diagnostic updates, third approach. #23411

Closed

jinujoseph mentioned this pull request Jan 16, 2018

Microsoft CodeAnalysis OOM Exception #24055

Closed

This was referenced Feb 8, 2018

Tagging gets very out of sync when typing, and takes a long time to get into a consistent state. #24714

Closed

Keep track of the associated text snapshot when diagnotsics are created. #24721

Merged

heejaechang mentioned this pull request Mar 2, 2018

ArgumentOutOfRangeException throw by AbstractDiagnosticsTaggerProvider.ProduceTags #21301

Closed

CyrusNajmabadi deleted the simpleDiagnosticTagging branch April 11, 2021 19:11

Simplify diagnostic tagging by making it use the standard tagging model. #23448

Simplify diagnostic tagging by making it use the standard tagging model. #23448

Uh oh!

Conversation

CyrusNajmabadi commented Nov 28, 2017 • edited by heejaechang Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Customer scenario

Bugs this fixes

Workarounds, if any

Risk

Performance impact

Is this a regression from a previous update?

Root cause analysis

How was the bug found?

Uh oh!

CyrusNajmabadi commented Nov 28, 2017

Uh oh!

CyrusNajmabadi commented Nov 28, 2017

Uh oh!

heejaechang commented Nov 29, 2017

Uh oh!

CyrusNajmabadi commented Nov 29, 2017

Uh oh!

CyrusNajmabadi commented Nov 29, 2017

Uh oh!

heejaechang commented Nov 29, 2017

Uh oh!

heejaechang commented Nov 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

CyrusNajmabadi commented Nov 29, 2017

Uh oh!

CyrusNajmabadi commented Nov 29, 2017

Uh oh!

CyrusNajmabadi Nov 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

CyrusNajmabadi commented Nov 29, 2017

Uh oh!

CyrusNajmabadi commented Nov 29, 2017

Uh oh!

sharwell commented Nov 29, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

heejaechang commented Nov 29, 2017

Uh oh!

heejaechang commented Nov 29, 2017

Uh oh!

jasonmalinowski commented Dec 1, 2017

Uh oh!

heejaechang commented Dec 1, 2017

Uh oh!

jasonmalinowski left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

CyrusNajmabadi commented Nov 28, 2017 •

edited by heejaechang

Loading

heejaechang commented Nov 29, 2017 •

edited

Loading

CyrusNajmabadi Nov 29, 2017 •

edited

Loading

heejaechang Jan 4, 2018 •

edited

Loading