Join GitHub today
GitHub is home to over 31 million developers working together to host and review code, manage projects, and build software together.Sign up
Reactive approach to FSAC design - discussion #1
I’ve written a document describing some of the more interesting ideas for improving general FSAC/language server/IDE backend.
It's funny, I was thinking actor model about halfway through the entry, and then you mention it at the end. Actors would give a nice layer over supervision/restarts (as we see with managing FSAC restarts in ionide today), and stateful actors (with persistence) would be an interesting way to model something like the symbol cache, where the serialized state is (a subset of) the actor state
To start, I think it's worth contextualizing where we are now. F# (at least within VS) is about where Roslyn was when this issue was filed in 2015: dotnet/roslyn#7082
To summarize that long thread:
Ionide and Rider use an out of process model for hosting FCS, but some of the fundamentals aren't much different. The working set can get huge in both IDEs, especially for larger solutions, and it's not uncommon to hear about 10+ GB of RAM being used. Many operations can be slow, and the fact that they're out of process doesn't make a difference.
As you're likely aware, we made a lot of changes to the compiler that aligned with VS 2019 GA after careful measurement of traces from users and our own dogfooding. Although there's still two issues left - boxing of
After the past 1.5 weeks of analyzing memory dumps and traces from a big internal solution under various loads, we've found two memory leaks - one of which is VS-specific and fixed, and the other we'll queue up a fix for (it also requires a change in the TPSDK, since that thing reflects into the internals of the compiler...). We also found a data structure that could be turned into a struct, saving up to 30MB of the working set for a given process using FCS. But barring any further findings, we're pretty much at the point where we need to consider a re-architecture for the F# tools in VS, and this will certainly affect Ionide and other tools as it will likely involve FCS to some degree.
Roslyn adopted an approach similar to what you write in your notes:
Not all is rosy in this world, though. There was a tremendously long tail of bugs that the Roslyn team had dealt with; nearly a year's worth of them. As it turns out, communicating information between N processes is challenging, and there are additional perf considerations regarding IPC. If any one person thinks they could do this themselves, just know that a team where people each at 5+ years of experience building these sorts of things dealt with issues for nearly a year. It's a huge undertaking. Frankly, it's a hell of a lot easier to have one monolithic .dll be responsible for everything if you can get away with it.
To avoid a similar tail of bugs, I think it's best to focus on improving FCS itself before multiple processes are considered.
One of the reasons why it worked out fairly well for Roslyn, despite the tail of bugs, is that Roslyn itself operates with an immutable snapshot model. When you invoke a feature, the known universe is quickly calculated with a compilation snapshot and everything is based entirely on that. If you take another snapshot and it's different than the previous one, the previous is discarded and the new snapshot is the known universe. If they're the same, then barely any work is done. FCS attempts to cache things that have been already figured out (see IncrementalBuilder.fs for more detail, though good luck in trying to understand its contents). However, in practice it is nowhere near as good as Roslyn's model due to the fact that files are re-checked all of the time.
Additionally, this model allows Roslyn to speculate on if certain work even needs to be done. For data flowing between cross-project references, Roslyn will speculate if things have changed based on the semantic meaning behind operations you perform. FCS has no ability to do this, so if you have a relatively deep chain of project-to-project references, changes will incur a typecheck of all dependent projects resulting in a lot of redundant work. This makes IntelliSense seem unreliably time-consuming when you make changes across multiple files.
Beyond what you mentioned about having a priority queue, FCS itself may need to adopt a snapshot model similar to Roslyn. Perhaps there are other alternatives, but the model has kept things sane in their enormous codebase, and because it allows certain speculation that's rather cheap to perform, IDE features can be pretty quick and consistent regardless of what you did prior to a particular action.
Additionally, a snapshot model like what Roslyn uses makes managing N processes significantly simpler. If all you do is work on the known universe that was computed (and compare against the last known universe efficiently), there isn't much complication in keeping things synchronized. The universe is what it is, and that's all there is to it. Failing to have a simplifying mechanism like this would spaghettify code and make N processes horribly difficult to manage properly.
Type Providers are a big problem. For those who aren't ware, ProvidedTypes.fs is an ad-hoc copy-paste of pieces of the F# compiler that has been further modified to accommodate the needs of type providers. You can think of every type provider library as being packaged up with its own fork of the compiler, and that fork does things subtly differently than whatever compiler a user is using.
This has serious performance ramifications:
Furthermore, changing things in the F# compiler can cause downstream breaking changes for any consumer of type providers. This is because ProvidedTypes.fs also reflects into the internals of the F# compiler that is hosting the type provider. We realized this when we were working on a fix for a known memory leak. Fixing it would break any type provider loaded with a newer compiler.
What this means is to that fixes to the F# compiler have the potential to inadvertently break anyone using a type provider. Although this isn't going to happen very often, it can happen, and it will require all type provider authors to update their dependencies any time it will happen. This will be the case when we fix the known memory leak we found.
This is not a sustainable model at all, so there are two options:
The first is untenable, but the latter also means that any time we ship an updated compiler, all type provider authors must update their dependencies. It also means that any consumer of type providers who does not update their dependencies could be broken just by updating their tool set.
An additional wrinkle here is that the TPSDK is neither owned, maintained, nor contributed to by Microsoft. We have no authority over it, yet it affects people using F# (and VS). We'll contribute back to it if we can, but if we must make a change before a release and doing so would break Type Providers, we will make that change and break Type Providers. Although they are part of the F# spec, none of their implementation is actually in our codebase. And we have no support obligation for any component that reflects into the internals of our systems. So, an additional approach we could take from the Microsoft perspective is to make the changes we need to make F# and FCS great, and say, "that's just too bad" if type providers break. I'd prefer not to have that happen, but it's being considered.
From my point of view when things go bad in ionide the reactor count starts to go through the roof (1000+ in a minute or so then completions unavailable for at least 3 or 4 mins) which indicates that cancellations and queued messages are not behaving correctly. I think I would also make parse requests cancellable early too as well as just the type requests.
@cartermp The comments about the TPSDK aren't really accurate.
The technical issues are real - though not insurmountable. However the procedural issues are not accurate:
There's absolutely nothing stopping us solving the issues identified together. We shouldn't just say "hey, we need to re-do everything or break compat" just because it's the first time Microsoft have worked with a technical problem in the area - that's not a sustainable path forward either. This has been a committed part of the surface area of the F# compiler and toolchain for many years and it is perfectly possible to make incremental improvements here.
We've long known the TP API used into the F# compiler has long had problems with cross-compilation, notably
Unfortunately, my thoughts remain unchanged despite your points @dsyme. Failing hard evidence that the TPSDK is not as I characterized it - ad-hoc copy-paste of IL reading/writing that duplicates work and is thus horrible for performance, built in such a way that unrelated fixes to the compiler can break all users - I'm just not convinced.
Keep in mind that my statements aren't how we feel about it, this is based on two weeks of analyzing traces, dumps, and the TPSDK codebase, understanding why the diagnostic data is the way it is, and trying fixes for issues we identified. If you or others have hard data to suggest otherwise, I'm absolutely happy to consider things differently.
I also suggest we table a discussion about the TPSDK here and move it elsewhere. There's plenty to discuss about FCS itself, which certainly doesn't have a shortage of addressable issues.
Both Kevin and I have long known about the issues you identify (Kevin in passing, me in detail). However, they must be seen in context. Essentially, we had to make compromises in 2015-18 and I'll stand by the them - indeed I'll stand by every line in ProvidedTypes.fs, though am by no means proud of it all. I understand you've spent two weeks discovering that compromises were made and there are lingering issues. What you see is essentially the lingering cost of the transition to .NET Core/Standard. We should now make progress on the specific remaining issues.
Put simply, in 2015-18 we were focusing on the the critical need to for type-providers to both execute in different runtime contexts (because of .NET Core) and cross-compile to different runtime targets (because of .NET Standard). Addressing this was by far our highest priority. At the time, type providers were assuming both .NET Framework design-time tooling and .NET Framework targets, including Reflection.Emit at design-time for generative type providers. We had to fix that. So we did.
Further, at the time our TP-to-compiler API was more or less "frozen" and couldn't even communicate the set of assemblies correctly. This was largely because we had no functioning cycle of iterative improvement to either FSharp.Core nor the compiler nor FCS. These delivery issues have now been fixed, which is why we can now make progress.
Further, we had to avoid the situation where type provider components had forced, unnecessary binary dependencies. Each TP had to be potentially stand-alone, since they are loaded into design-time tooling where conflicts can happen.
For this reason, I chose to do the following (this was after long discussions with Kevin where he didn't necessarily agree but eventually gave in since, well, we really had no other choice):
Finally, there is one other major sin in the TPSDK to achieve cross-targeting, notably
The last thing we did was this:
Together these compromises allowed us to deliver cross-executing, cross-targeting type providers in 2017-18. Mission accomplished.
So yes I would say the "adhoc" characterization is wrong. For the TPSDK, the use of internalized binary reader/writers was to reduce dependencies and was deliberate. As a result I can justify (though am not proud of) every line of code in ProvidedTypes.fs, given the priorities we had 2015-18. The priority was cross-targeting type providers without extensive dependencies on reflection emit and reflection-only-load. For that, we needed a binary reader/writer that we could internalize and we used the one we had available.
Yes, there is duplication of resources and code. But we can now reduce and solve that resource-duplication problem. I made compromises (knowingly), you've discovered these and some of their current ramifications, and now we should now make progress on the issues above.
referenced this issue
Apr 9, 2019
@dsyme Thanks, we'll engage on the TPSDK issues.
Parsing has been free-threaded for quite some time (thanks @auduchinok and @dsyme); this used to muck up the reactor queue but not anymore. In general it's made the IDE experience much more responsive.
Based on the perf analysis work we've done, there are incremental areas to improve and architectural changes that we may need to make: