Skip to content

Conversation

@CyrusNajmabadi
Copy link
Member

@CyrusNajmabadi CyrusNajmabadi commented Nov 28, 2017

Customer scenario

User is working on a big solution and switching between git branches multiple times or close and reopen solutions multiple times. and on some unfortunate cases, VS will crash with out of memory exception.

Bugs this fixes

Fixes #24055
Fixes DevDiv 512757

Supersedes #22920, #23377, #23409, #23411

Workarounds, if any

after each git branch switching or solution open/close, give VS sometimes to process pending works enqueued by the operation.

Risk

this simplify our diagnostic tagger dramatically. so there is a risk where behavior might not exactly same as before. but we believe this is right direction to go.

Performance impact

this should remove OOM due to too many pending UI work items completely. that is source of most of our OOM crash.

Is this a regression from a previous update?

No

Root cause analysis

previously, diagnostic service didn't support pull model for all diagnostics source. so, tagger used event (push model) to hold onto last reported diagnostics and later use that to report tags. and that made us to use custom logic for the tagger which ends up forcing us to use UI thread to synchronize many states to remove potential race. and that caused us to push too many work items to UI thread in certain case such as git branch switching.

now, diagnostic service fully supporting pull model for all sources, this moves diagnostic taggers to follow our tagger framework which doesn't require UI thread for state synchronization. removing the root cause of OOM from the picture completely.

How was the bug found?

MemoryWatson

....

more dev detail.

From a conversation with @heejaechang #23409 (comment)

A while back the diagnostics subsystem had a limitation where you could only hear about some diagnostics if you explicitly listened for diagnostic events. i.e. if you weren't listenting and capturing those events, you couldn't go back and ask for those diagnostics later. This meant that we couldn't do diagnostic tagging (squiggles/fading/suggestions) like we did normal tagging. Normal tagging hears about events, pauses a bit, then goes and gets all the data necessary later to produce the tags. Because that data wasn't available 'later', diagnostic tagging had to aggregate the info and contort things to fit into the tagging infrastructure.

This restriction from the diagnostics service no longer exists. THat means we can great simplify how we do our tagging computation.

@CyrusNajmabadi
Copy link
Member Author

Tagging @heejaechang @sharwell .

Note: this approach in incorrect in a very fundamental way currently. Wanted to start the discussion with you on how we might fix that.

@CyrusNajmabadi
Copy link
Member Author

@heejaechang I took the appraoch whereby the event passes along the diagnostic id that was updated. However, this approach is fundamentally flawed with how tagging works currently. The reason for that is that tagging is hugely cancellable. i.e. any time tagging gets an event, it cancels the current work and enqueues the new recomputation. The assumption is that any recomputation supersedes any previous or inflight computation.

That's not the case with this optimization. For example, if we heard about a global "recompute tags for all providers" notification, we can't supersede that with a request to recompute tags for a single provider id.

We have a few options here afaict (other suggestions welcome!):

  1. don't pass along the data about which provider changed. always recompute all providers. Pros: super simple. Cons: possibly expensive (though we are on the BG).
  2. somehow make some parts of tagging conditionally cancellable. so a single provider update could cancel an update for that provider. But a provider update would not cancel the updates for other providers. Pros: fine grained updating. Cons: super complex (i can't see a suitable way to do this, not with how cancellation is threaded through tagging).
  3. have the diagnostic service track which providers changed. have a way for tagging to call in and ask if things have changed for any of the Ids. Pros: simple on the tagging side. Cons: may not actually be any more efficient. May be just as complex on the diagnostic side.

Thoughts?

@heejaechang
Copy link
Contributor

I think we can go with simple approach and do some perf investigation and see whether we need to optimize more.

...

by the way, even with this approach, it doesnt remove the fact that diagnostic service is based on solution, and tagger is based on buffer and dynamically buffer to workspace and buffer to document association can be changed. in other word, we still need to filter out diagnostic events that is not associated with this subject buffer

(see https://github.com/dotnet/roslyn/pull/22920/files#r153666048 if we want document specific events)

..

we also can make diagnostic event source to aggregate object ids as we aggregate text changes on text change event source.

@CyrusNajmabadi
Copy link
Member Author

by the way, even with this approach, it doesnt remove the fact that diagnostic service is based on solution, and tagger is based on buffer and dynamically buffer to workspace and buffer to document association can be changed. in other word, we still need to filter out diagnostic events that is not associated with this subject buffer

That already happens here: https://github.com/dotnet/roslyn/blob/master/src/EditorFeatures/Core/Shared/Tagging/EventSources/TaggerEventSources.DiagnosticsChangedEventSource.cs#L28

@CyrusNajmabadi
Copy link
Member Author

I think we can go with simple approach and do some perf investigation and see whether we need to optimize more.

Do you mean approach #1 ?

@heejaechang
Copy link
Contributor

ha, we had that before? interesting...

@heejaechang
Copy link
Contributor

heejaechang commented Nov 29, 2017

@CyrusNajmabadi yep. I mean (1).

basically whenever there is event source changes, we do what we currently do for initial tags. ask all diagnostics reported on this file, filter out empty and not related ones (suggestion/unnecessary/regular), and report rest. we no longer need sub taggers since we always report all of them.

really simple. and for common case, probably good enough (since most of them will be empty or have small number of errors). probably will have issues for corner cases where file contains thousands of errors since it will repeatedly report those.

@CyrusNajmabadi
Copy link
Member Author

probably will have issues for corner cases where file contains thousands of errors since it will repeatedly report those.

Well, we'll still always diff the spans against the old ones. So we'll only report the change.

@CyrusNajmabadi
Copy link
Member Author

Ok. I pushed the simpler model. take a look when you get a chance.

@CyrusNajmabadi CyrusNajmabadi force-pushed the simpleDiagnosticTagging branch 4 times, most recently from 93063d5 to 8c6a67c Compare November 29, 2017 01:05
Copy link
Member Author

@CyrusNajmabadi CyrusNajmabadi Nov 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CyrusNajmabadi
Copy link
Member Author

Tagging @sharwell @heejaechang @dotnet/roslyn-ide -500 lines in diagnostic tagging FTW.

Note: this should probably be smoke tested and also perf tested. While is simplifies things greatly, and pushes a lot of work to the BG. It may end up performing more work and make have unintended consequences.

@CyrusNajmabadi
Copy link
Member Author

Tagging @dotnet/roslyn-ide

@sharwell
Copy link
Contributor

📝 I'm planning to look at this for 15.7+ unless we get indications of more substantial problems.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add ActiveContextChanged Event source here as well. and remove this (http://source.roslyn.io/#Microsoft.CodeAnalysis.Features/Diagnostics/DiagnosticAnalyzerService_IncrementalAnalyzer.cs,58)

looks like we added the code above as a quick way to handle context change when it is first introduced to workspace. but that is not right way to handle it. DiagnosticAnalyzerService shouldn't care about buffer. it should only care about solution.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hrmm.. can a buffer even cross workspaces though? i think it can (from Misc Workspace to normal workspace, and vice versa). So i think we need WorkspaceRegistrationChanged as well. I will add ActiveContextChanged though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can buffer ever be null? I don't believe snapshot can have null buffer.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not. though i was just preserving hte code from before. so i would prefer to not change this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we move this try/catch down to ProduceTags for one specific updateArg so that even if 1 fails, we still can get other analyzers squiggles?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@heejaechang
Copy link
Contributor

awesome! things got really simpler! and tests are all passed!

@heejaechang
Copy link
Contributor

probably need to do some manual testing before we check this in.

@jasonmalinowski what would be branch we can check this in not affecting ask mode? like post 15.6 branch or something?

@jasonmalinowski
Copy link
Member

@heejaechang We have no such branch -- even if we did nobody would be looking at it. If your desire is for some dogfooding/testing on it, there's a few things we can do.

@heejaechang
Copy link
Contributor

if we have a few release left for 15.6 RTM, then I think we should just check this in to 15.6

Copy link
Member

@jasonmalinowski jasonmalinowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The whole determination of editorSnapshot independent of requestedSnapshot seems really fishy here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this preferred over just GetText()? It should be available either way so be equally fast. But if something goes sideways, this version is broken.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed. i can change to just using GetText.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this ever different than spanToTag.SnapshotSpan.Snapshot? This code and then the .TranslateTo in ProduceTags(...) implies with much vigor they can be different, but I don't see how.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think some of this logic dates back like 10ish years. It's quite possible that it was back at a time where we had less invariants about Documents/SourceTexts/Snapshots. I'm happy to remove (or assert it must be hte same).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it can only date 8 years ago... ;-)

Where did this move from originally? Was originally the snapshot you're mapping from related to the snapshot the diagnostics were computed for? It implies I can get stale spans, but since those buffers will be the same we're not correctly mapping things forward.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on that says:

                // Make sure we can find an editor snapshot for these errors.  Otherwise we won't
                // be able to make ITagSpans for them.  If we can't, just bail out.  This happens
                // when the solution crawler is very far behind.  However, it will have a more
                // up to date document within it that it will eventually process.  Until then
                // we just keep around the stale tags we have.

I think a lot of the complexity here was that in the old system you had notifications from the diagnostic engine about a set of data, and we were trying to relate it to the editor-based data. There were moving at different speeds, with different invariants, and we had code like this to try to be resilient to that sort of thing.

If we're now just querying the diagnostic subsystem with teh correct tagger info we have, i don't think we have any issues adn we can likely clean this up even more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, looking at this, i don't think we need it anymore.

Copy link
Contributor

@heejaechang heejaechang Jan 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

old one did translate to since DiagnosticUpdated event is not part of tag event source. due to that it didn't update internal interval tree when diagnostic is updated, rather it just hold onto snapshot at the moment. and later did "translate to" to move spans to right one itself. so at the time, 2 documents were actually different (one saved when DiagnsoticUpdated is called, and one when event source is raised - ProduceTags is called)

this new one no longer does that, now DiagnosticUpdated is part of tag event source. when diagnostic changed, it save spans in taggers internal interval tree and let tagger's own mechanism to handle translate to. so now there is only 1 document. one that is from SpansToTag.Document. the code seems just artifact of some copy over from existing code. but I think it is just matter of how it get to same data. I dont think it will cause bug since 2 should be same.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"// Make sure we can find an editor snapshot for these errors. Otherwise we won't
// be able to make ITagSpans for them. If we can't, just bail out. This happens
// when the solution crawler is very far behind. However, it will have a more
// up to date document within it that it will eventually process. Until then
// we just keep around the stale tags we have."

agree this is not needed anymore. since there is no longer 2 documents as we used to.

@CyrusNajmabadi
Copy link
Member Author

Not a problem (i hope).

@CyrusNajmabadi CyrusNajmabadi force-pushed the simpleDiagnosticTagging branch from fd8c45e to c6b74a2 Compare January 11, 2018 19:03
@CyrusNajmabadi
Copy link
Member Author

@jasonmalinowski Did i do this properly?

@jasonmalinowski
Copy link
Member

@CyrusNajmabadi Well done!

@jinujoseph
Copy link
Contributor

Thanks @CyrusNajmabadi
Adding @Pilchie for ask mode approval

@jinujoseph
Copy link
Contributor

just realized this is missing ask mode template

@CyrusNajmabadi
Copy link
Member Author

@jasonmalinowski Can you help with the template?

@CyrusNajmabadi
Copy link
Member Author

Also, i would recommend this be smoke tested.

@heejaechang
Copy link
Contributor

@jinujoseph added ask mode template.

@sharwell
Copy link
Contributor

@heejaechang I updated the ask mode template to link the other bugs

@CyrusNajmabadi
Copy link
Member Author

thanks @sharwell !

@heejaechang
Copy link
Contributor

@jinujoseph @mattscheffer we probably want to have manual testing as soon as possible.

@jinujoseph
Copy link
Contributor

@richaverma1 to help with manual testing

@Pilchie
Copy link
Member

Pilchie commented Jan 12, 2018

The scenario meets the bar. Consider this approved, pending satisfactory results from the manual testing. @richaverma1 Can you add the "Approved to merge" label when testing is complete and you are happy with the results?

@heejaechang heejaechang merged commit 83520a7 into dotnet:dev15.6.x Jan 16, 2018
@heejaechang
Copy link
Contributor

merged! thank you everyone! thank you @CyrusNajmabadi

@CyrusNajmabadi
Copy link
Member Author

Any time!

@yuvalshtemer
Copy link

Pardon me for being a github beginner, but I'm in real pain because of this issue.. would appreciate some assistance on how to pull & install the fix

@heejaechang
Copy link
Contributor

@yuvalshtemer latest VS should have this fix, you don't need to pull anything. just move to latest VS.

@CyrusNajmabadi CyrusNajmabadi deleted the simpleDiagnosticTagging branch April 11, 2021 19:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants