Compilation Model Prototype - Information / Discussion / Proposed Plans #7077

TIHan · 2019-06-28T21:02:37Z

TIHan
Jun 28, 2019
Collaborator

For the community: I wrote this up internally with some minor modifications since I have posted it here. I just need some feedback on what everyone's thoughts are. Is this something that the community actually wants out of F#? Do you think I should or should not spend more time on this before/after F# 4.7? Perhaps the community would be interested in contributing?

My purpose is to really figure out how to take F# to the next level, in terms of tooling, testing, memory and performance. Right now, I do believe this could get us there long term, but I also want to get us some short term gains in the process, namely our testing.

Here is the PR to the prototype: #6947

Hopefully what I'm talking about is accurate and informational. There are a lot of pieces to this, and so this is an entire brain dump from my head put in a post. I'm sure something will need to be corrected.

I realize that I need a plan regarding the work of the compilation model prototype and our goals for .NET Core 3.0 / F# 4.7 for 16.3 release. Prototypes in the past have always stopped and never continued. One example is the metadata reader re-write using System.Reflection.Metadata which we may re-visit but I didn't make any plans on how that may continue. I don't want the compilation work to stagnate this way while I focus my attention to F# 4.7. Instead, I want to figure out how the compilation model could tie in with our goals for 16.3.

I'm providing a few Q/As with descriptions of what led to where I am at now. After that, I will get into my plan and current state of the prototype.

What is a "compilation model"?

I think Roslyn's comment is a good introduction:
/// The compilation object is an immutable representation of a single invocation of the compiler. Although immutable, a compilation is also on-demand, and will realize and cache data as necessary. A compilation can produce a new compilation from existing compilation with the application of small deltas. In many cases, it is more efficient than creating a new compilation from scratch, as the new compilation can reuse information from the oldcompilation.
This is effectively the snapshot model.
By itself, the "compilation object" described in Roslyn is reasonably simple. But, what I am calling "compilation model" here is a bit more than that. It's also an API that allows someone to query information on a compilation, that includes syntactic and semantic information such as syntax trees/nodes/tokens and symbols. It serves as a foundation for building tooling and services on top of it. Perhaps, "compilation model" may not be the right name for this, as it is including code analysis (SyntaxTree/SemanticModel) for F# that is similar to Roslyn's APIs, not an implementation of theirs.
The super long term end result is having the official way to compile and query info on F# code. Would mean fsc.exe would use this.
Why should we invest time in prototyping it? What is the motivation?
The primary motivation: User Experience.
This is lengthy and technical and not directly about compilation or 16.3, but necessary to get all my thoughts out. I want you all to understand why I went down the road I went; you need context.

For a little over half a year, the majority of my time has been spent on F# performance in VS, mainly memory. Users with large projects experience UI delays caused by large memory consumption and GCs. It has taken me an incredible amount of time to understand what is going on. We were able to identify issues with our caching strategies and how information is shared and not shared between projects.

We managed to do targeted fixes that alleviated a good portion of our memory issues, namely in type providers and IL metadata readers (thanks to Don's work). This certainly helped (with data confirming it), but it's not quite enough.

One of our main and still current issues with memory is how FCS's project cache works in conjunction with in-memory cross project referencing. Basically, it's a strong cache, that keeps 200 (cache size) projects in its cache and never evicts them until you close the solution. This is why memory keeps growing over time; doing a "find all refs" action will cause it to reach maximum memory usage as it will build every single project and put it into the cache.

The cache itself is not designed to be strong. It does have an eviction policy when it gets to a certain size. So, the natural thing to do is set that cache size number to something lower, like 3 or 5. However, setting the cache size to those numbers caused cache thrashing; meaning projects were constantly being rebuilt and thrown away even after one character change in a file. I managed to fix this by adding another cache layer that held only the signature output of a F# project; F# projects referencing other F# projects only need this. Once I did that, it started behaving as expected and memory usage drastically decreased when a Gen2 GC kicked in. We are talking what used to be 2.0gb of memory, to go down to ~(1.1 - 1.2)gb. Incredible difference.

This isn't the end though. After getting the cache to behave, it also stopped in-memory cross project referencing from working. FCS will determine that a project needs to be rebuilt if any project files on disk had their last write time changed or its stamp has changed. But because the project itself has been evicted from the cache and I put the signature cache layer in front of it, FCS doesn't know if it needs to be rebuilt. Fixing that would be somewhat invasive in FCS, but possible.

Besides memory, we also have issues of project references chain being rebuilt and not able to be canceled even if you are working in a file that would cause it to rebuild again if I just type any character in the editor. The background compiler of FCS is single-threaded so stuff can get blocked sometimes in VS as Roslyn services won't kick off anything until all active work stops. There are ways to cancel background builds, but we don't flow cancellation tokens from Roslyn to background builds. I tried allowing cancellation tokens to flow through, but it broke our single-threaded service entirely and nothing worked. It's difficult to reason about what is going on here. But, I think it's possible to fix it in its current state.

The TLDR of above is: FCS is stateful which makes it difficult to reason about in IDE scenarios. IDEs will kick off analysis of multiple files from different projects within a few seconds of each other. We also have multi-TFM projects, so the same project/file will be kicked off again for analysis. FCS is not designed to handle this even though it has some support. Changing cache and cancellation strategies around multiple projects is becoming difficult. Even the compiler is designed to only understand compiling for one single assembly; this is why we have issues with sharing information across projects, except for IL data from our IL metadata readers. All of this ultimately affects the user experience in regards to receiving feedback in an IDE.

This is where we enter Compilation. An immutable, snapshot state of the world for a single invocation to the compiler.
How does it help solve the issues that FCS has and bring a better user experience?
Compilation being immutable and snapshot-based does not automatically solve all issues up front. The person using the Compilation has to do that themselves.
Regarding FCS:
From an API perspective, FCS seems straightforward. You can parse/check files + projects, compile, get classification, tooltips, completions, etc. What is not straightforward is how it's managing all of that data, especially when projects and files are changing all the time in an IDE.

Simply speaking, Compilation is sort of like having your own IncrementalBuilder that is immutable and you deciding what to do with it. (For those who don't know what IncrementalBuilder is, it is our internal construct in FCS that accumulates type-checking of files and produces a final assembly output.) You can choose to cache Compilation or not. It's up to you what the Compilation will be used for and how; it will not lock you in to any particular strategy. Also, with Compilation being immutable, you know that nothing outside will affect any of its output; it becomes much easier to reason with.

FCS tries to do a lot of heavy lifting by managing its own caching and cancellation strategy (for the most part) when it comes to cross projects. This doesn't work for everyone and providing knobs to turn that tweak the behavior isn't sufficient. With Compilation, it will be up to the person integrating.
Regarding user experience:
Because Compilation makes it easier to reason about and provides a base foundation that works for various scenarios, not just IDE, developing a better user experience will be up to the person integrating Compilation into their environment, whether it be VS, VSCode, compiler services, compiler servers, language servers, etc. This doesn't mean that Compilation is absolved from having to worry about user experience; its implementation goal is to do the least amount of work possible in order to answer a question about source code as well as doing the least amount of work when a Compilation needs to change when building a new from the old. If the implementation strives to achieve this goal, it will support the person using Compilation to provide a better user experience from a tooling and performance/memory perspective. At least, that is what I hypothesize.

Now, onto the current state of the prototype and a proposed plan moving forward.

Proposed Plan

Dev16.3 / .NET Core 3.0 / F# 4.7

For the next couple of months, I will be focusing my attention on .NET Core 3.0 and F# 4.7 release, which is dev16.3. We have a particular feature set that we want to ship and we need to start getting ready for that.

The compilation work is for the long term, so there is no immediate need to get it in. However, pausing the work on it can lead to stagnation as it has with other prototypes in the past. The work done here is very important and it would be a shame to see it left alone. Is there a way for it to tie in with the F# 4.7 work for dev16.3? Here is what I came up with:
Language Unit Tests
Most of our unit tests against the language are done in many single files with corresponding baseline files. They do type-checking to see if code passes or errors. Some will also execute the code to confirm its success.

This is the old style of testing. They are not created in a UnitTest project. However, a while ago I came up with a simple framework for testing F# source code in a UnitTest project using FCS. There are not many of them right now.

What I would like, from a policy standpoint, is that any newer language tests be done using the simple framework in the UnitTest project. This includes the tests for the new features coming into F# 4.7, and I am more than happy to move those over.

Where compilation model comes into play here, is instead of using FCS for the tests or the perl/file/baseline stuff, we can use the compilation model. This is low risk as we are not affecting any user experience, we are building a better way to test the language and the compilation model gets its first around of actual use by it being an integral part of the tests. Right now our current test infrastructure isn't great and parts of it rely on VS being installed, even if the tests do not require it. This has been painful for contributors. I want us to be like Roslyn/CSharp in how they do their testing, which they use their compilation model.
Long Term, beyond Dev16.3
VS Experimental Option, incremental improvements to the integration of the compilation model in VS.

Possible early integration with TryDotNet since they will use it to compile single assemblies; low risk.

Large suite of unit tests using compilation to test language/ide features.

Compilation model API review with the Roslyn/C# team.

Current State of Compilation Model Prototype

Is a separate .NET Standard 2.0 project from FSharp.Compiler.Private. There are very little changes in FSharp.Compiler.Private.
FSharpCompilation
Supports parsing + type-checking multiple files.

Supports getting diagnostics from parsing and type-checking.

Supports re-using an old compilation to build a new one with a file/source that has changed.

Does not do its own assembly resolution. Developer has to provide all metadata references up front.

Supports getting a semantic model.

Supports getting a syntax tree.

Has work done to handle cross project referencing, but not tested yet.

Is immutable/snapshot-based

No support for scripts yet.

No support for compiling to DLLs/EXEs yet.
FSharpSemanticModel
Limited support for getting symbols.

Only able to get a symbol at a position in a source text.

Symbol information is practically non-existent, it's just an object with a name currently.

Experimental ability to get a symbol in a speculative way using a syntax node. This means I could take any syntax node and ask the semantic model to see if it can get me symbol information based on the semantic model's view of the world. Can be useful to get completions very quickly even in large files that require many seconds for type-checking.
FSharpSyntaxTree/Node/Token
An abstraction around our internal Ast types that allows for traversal through the syntax tree.

Currently very useful when walking up the tree once you have a FSharpSyntaxNode.

Going down the tree is supported as well, looking at descendants and direct children.

Are lazily created.

Supports getting FSharpSyntaxTokens for a FSharpSyntaxNode.

FSharypSyntaxTokens are still kinda of limited depending on the circumstance. Tokens that should obviously be associated with a particular node may not be. This is because we determine the relationship of a node and token by their ranges, or text spans, in a source file. Sometimes that range may not be correct. This can be resolved by fixing the ranges in pars.fsy / Yacc. One example of this is opening a namespace/module, I.e. "open System" - The OpenDecl node only has the "System" node/token, but clearly "open" should be part of it as well.
IncrementalLexer
In experiment phase and can be turned off.

Idea behind it is, we only lex text that has actually changed and re-use previous tokens.

Lexing is already fast in F#, something like 30-40ms for our large 17K line, TypeChecker.fs. But, this technically brings it down to 1-2ms on text changes, and we could probably go lower.

In our tooling today, we lex an entire file when its changed, but in VS we have this thing called Tokenizer, which in its implementation has its own form of keeping tokens in a cache incrementally. When typing code, you want to be able to get the lexical context immediately as soon as you type, such as if you are in a string or a comment to not do automatic brace completion. We don't wait to lex an entire file to determine this as 30-40ms is too slow; instead we do a separate partial lex on a line or multiple lines using some heuristics and caching. With IncrementalLexer, the goal is to only do this once and in a unified way.
Reliance on Microsoft.CodeAnalysis
Right now we rely on Microsoft.CodeAnalysis for a few things:
- SourceText
- Diagnostic
- ITemporaryStorageService
This is used to make shadow copies of streams and source text; then use a memory-mapped file to it.

This gives us a way to represent source files/text in a snapshot way without pulling source files/text into memory.

realvictorprm · 2019-06-28T21:29:46Z

realvictorprm
Jun 28, 2019

Thanks for posting this in advance :) I have read through this 90% but I need to reread it anyways as it's complex, it's a great insight into your work :)

0 replies

auduchinok · 2019-07-03T13:45:44Z

auduchinok
Jul 3, 2019
Collaborator

Thanks for the write up! It sounds promising.
I'll be happy to write new tests using the new model if it's possible to work with it in FSharp.Compiler.Service.sln.

0 replies

cartermp · 2019-07-03T14:51:13Z

cartermp
Jul 3, 2019

@auduchinok I think we eventually plan on moving this into FCS. Since there's going to be a long tail of bugs involved, we'll probably keep it separate for a while. We certainly wouldn't be using it in our own codebase until the quality is high enough, at which point we'd look to integrate with FCS so that all editors can use it. While it's separate I think it should be pretty easy to test, and we are intending on it being pretty easy to plug into an editor host so that you could dogfood it outside of Visual Studio. Its dependencies are portable so there shouldn't be an issue with that in theory.

0 replies

TIHan · 2021-08-16T18:25:56Z

TIHan
Aug 16, 2021
Collaborator Author

Locking discussion in favor of: #11976

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compilation Model Prototype - Information / Discussion / Proposed Plans #7077

{{title}}

{{editor}}'s edit

{{editor}}'s edit

What is a "compilation model"?

Why should we invest time in prototyping it? What is the motivation?

How does it help solve the issues that FCS has and bring a better user experience?

Dev16.3 / .NET Core 3.0 / F# 4.7

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Compilation Model Prototype - Information / Discussion / Proposed Plans #7077

TIHan Jun 28, 2019 Collaborator

What is a "compilation model"?

Why should we invest time in prototyping it? What is the motivation?

How does it help solve the issues that FCS has and bring a better user experience?

Proposed Plan

Dev16.3 / .NET Core 3.0 / F# 4.7

Current State of Compilation Model Prototype

Replies: 4 comments

realvictorprm Jun 28, 2019

auduchinok Jul 3, 2019 Collaborator

cartermp Jul 3, 2019

TIHan Aug 16, 2021 Collaborator Author

TIHan
Jun 28, 2019
Collaborator

realvictorprm
Jun 28, 2019

auduchinok
Jul 3, 2019
Collaborator

cartermp
Jul 3, 2019

TIHan
Aug 16, 2021
Collaborator Author