Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plan: Modernizing F# Analysis #11976

Open
6 of 10 tasks
TIHan opened this issue Aug 16, 2021 · 3 comments
Open
6 of 10 tasks

Plan: Modernizing F# Analysis #11976

TIHan opened this issue Aug 16, 2021 · 3 comments

Comments

@TIHan
Copy link
Member

TIHan commented Aug 16, 2021

@dsyme and I sat down to document my overall plan for modernizing the way F# and FCS do analysis (Don did most of the writing here). Much of this work is already done, this documents the plan end-to-end. Here's what we came up with, please comment and discuss below.

See also #7077 for a previous description.

Planning: Modernizing F# Analysis

This note describes the technical agenda to "modernize" the FCS analysis services to use best-known techniques from Roslyn.

Executive summary

The core of the plan is to adopt a more Roslyn-like model of analysis, based on

  1. Immutable snapshots of the contents of documents and projects

  2. Immutable views of their enrichment with analysis information

  3. A cacheless compiler-service API

In the long term this agenda delivers multiple critical benefits:

  1. High-performance multi-threaded analysis

  2. A more reliable basis for implementing multiple IDE features, including cross-file refactorings and analysis

  3. Alows features "in-memory documents" and "in-memory cross-project references to C#", simplifying the user experience of using F# in Visual Studio.

  4. It aligns F# with the architectural principles of Roslyn, allowing contributors to transfer experience between the two

  5. Looking forward, gives a strong basis for reliably make F# analysis more incremental w.r.t. incremental changes in inputs.

  6. Looking forward, gives a strong basis for reliably building a simple, reliable "out-of-proc" LSP implementation for F# following Roslyn design principles.

  7. Our compiler testing framework can be simplified; it will not read files on disk in order to run a test that verifies parsing and type-checking behavior.

The Current Situation and why it's a Problem

FCS provides services to compute analysis information from inputs. For the current API, some of these inputs are filenames, and so FCS relies in part on the state of the file system, which is highly mutable state and is highly problematic.

Specifically, when requesting the analysis of a file in a project (e.g. for a refactoring or a tooltip), the state of the current file is captured as a "snapshot", but the state of other files in the project is accessed via the file system as the analysis proceeds. This causes four problems:

A. The state of these files as saved on disk may have changed in-between

B. Differences between the saved and unsaved contents of in-memory buffers in the IDE.

C. It is extremely error-prone to implement incremental updates to analysis w.r.t. incremental changes in input

Additionally, FCS had two other major problems:

D. FCS is stateful and implements multiple kinds of caching for parsing and analysis

E. FCS was single-threaded, with a "reactor thread" compilation lock

Problem A leads to:

  1. Repeated polling checks on timestamps of files whenever checking for validity of the results

  2. Is a very frequent cause of bugs (BUG LINKS)

Problem B leads to:

  1. Confusion for the user who doesn't understand that prior files must be saved.

  2. Double type-checking of open files in a project when a change is made: once as part of the "background" build that is done using on-disk representations, once as part of the "foreground" build in order to get diagnostics for currently open documents. This is not something users perceive, but is a potential overall performance gain we can deliver for large projects.

  3. Unnecessary complexity and distinctions in the FCS API that can makes it difficult to understand what's going - for example, we must document and test the differences between foreground and background checking.

  4. A slew of bugs in cross-file refactorings (LINK LINK), cross-file goto-definition, cross-file tooltips (based on saved, not unsaved contents)

  5. Missed opportunities to take advantage of F# language features for more efficient incremental checking, in particular signature files.

Problems C & D leads to a slew of bugs related to not invalidating cache entries with regard to changes in on-disk files. Problem D also causes issues with memory usage and too many analysis results being "kept live" by the FCS caches.

Problems A-C also apply to the "referenced assemblies" inputs to analysis, particularly cross-project references.
Before the start of this work, specifying a cross-project reference was done via a graph of FSharpProjectOptions,
but no in-memory cross-project references were allowed to C# projects. Further, the cross-project references
lead to reading input files from disk for other projects and assessing their timestamps, leading to bugs
and inconsistencies.

In combination these issues lead to a kind of "grid lock" where the root causes of the kinds of bugs we see are not addressed using best-known techniques. We patch a few bugs, which can cause other bugs etc. We know the solution to unlock this, which is to follow the design principles used by Roslyn.

Aside: Problem B can be partly addressed by an existing "hack" in the FCS API that allows the file system used by FCS to be "shimmed". This is used by JetBrains Rider in order to implement in-memory documents. However this is an awkward solution that differs greatly from the Roslyn approach, and Problems A, C and D still remains.

What's Needed

The Roslyn approach to these problems is to

  1. Make all inputs to analysis be "snapshot" objects

  2. Make all analysis results to be on-demand stateless enrichments of these snapshots

  3. Do not implement adhoc caching of analysis objects within Roslyn, but rather allow liveness of analysis objects to determine lifetimes.

The technical agenda is based on transforming FCS to correspond to these principles.

Aside: when we say Roslyn analysis objects (e.g. Compilation LINK) are "on-demand stateless enrichments", this means there may be internal state recording what enrichments have already been computed, and this may be important or reasoning about memory usage. However, logically speaking, the analysis objects are still functional enrichments. Roslyn analysis objects are effectively like a composition of multiple lazy values - computed on-demand.

Technical Agenda

The agenda is as follows. Where new constructs are brought into existence in the FCS API, we show their correspondence to Roslyn equivalents

  • Add IFSharpSourceText (corresponds to SourceText in Roslyn). This allows for immutable views of snapshots of buffers

    This included adopting IFSharpSourceText for foreground analysis.

    OBSERVABLE GAIN: Among other things this prevented copying entire source files, a major cause of GC, and removed a slew of bugs and workarounds related to out-of-date snapshots.

  • Rewrite the "IncrementalBuild" engine to use a build graph

    This allows for simple and reliable implementation of on-demand stateless enrichments within the background build.

    There were also several major cleanup steps preparing for this. For example the build graph must be correctly incremental between the "diagnostics+tooltips" portion of analysis results and the more costly "symbol usages" portion, see Stop incremental builder from accumulating TcSymbolUses/TcResolutions/etc. #11666.

    OBSERVABLE GAIN: This enabled "in-memory cross project referencing" for F# projects referencing C# projects. This gives a much simpler user experience, because changes in C# projects are now reflected immediately in a F# project without having to compile the C# project on-disk, and analysis results are available in uncompiled solutions.

  • Make the build graph free-threaded

    OBSERVABLE GAIN: With this change, FCS started supporting concurrent requests for analysis results. This massively improves the performance of analysis and the responsiveness of the IDE.

  • Make the build graph more incremental w.r.t. changes in implementation file without change in signature file

    OBSERVABLE GAIN: This greatly improved performance of analysis in F# projects that contain signature files. The user sees diagnostics and other analysis results much more quickly when a change is made to an implementation file without change to the signature file.

  • Add FSharpReferencedProject, corresponding to Roslyn's CompilationReference

    OBSERVABLE GAIN: This enabled "in-memory cross project referencing" for C# -> F# projects. This gives a much simpler user experience, because changes in C# projects are now reflected immediately in a F# project without having to compile the C# project on-disk, and analysis results are available in uncompiled solutions.

  • Add FSharpSource, corresponding to Roslyn's SyntaxTree. A prototype is in this PR. FSharpSource is now added but not yet used

    An FSharpSource is an input to analysis and internally is an "IFSharpSourceText + F# parse tree". The background and foreground analysis routines will accept these objects.

    NOTE: These are not incremental w.r.t. incremental changes in the IFSharpSourceText (adding, removing lines) though in theory they could allow some incrementality in future refinements.

  • Make Visual Studio provide FSharpSource objects based on live buffers. A prototype is part of [WIP] In-Memory documents for FCS #11588.

    OBSERVABLE GAIN: This reliably implements the "in-memory documents" feature

    OBSERVABLE GAIN: This gives reliable rename-refactor for unsaved files.

    OBSERVABLE GAIN: This avoids a slew of other bugs and complexity going forward

  • Add FSharpProject, corresponding to Roslyn's Compilation. A prototype is in this PR.

    An FSharpProject is a handle to the "outputs" of analysis. That is, an FSharpProject is a Roslyn-like on-demand analysis object which can be used to request analysis information, e.g. diagnostics, tool tips, symbols uses, across an entire project.

    An FSharpProject is incremental w.r.t. to replacing the FSharpSource inputs. That is, the incrementality granularity is "replace the contents of an entire file". If, for example, the last FSharpSource in the compilation sequence is replaced, then the majority of the results of analysis will be re-used.

    An FSharpProject is on-demand in multiple ways, all mediated by the internal build graph. For example, if diagnostics are requested for the 3rd file out of 5 in the compilation sequence, only 3 files will be checked. If the diagnostics are then requested for the 5th file, the remaining two files will be checked. Some semantic information may be re-computed each time it is requested. Some may require more detail re-checking, e.g. to record symbol locations.

    An FSharpProject can be created without needing an FSharpChecker. Thus they are not tied to the caches of FSharpChecker.

    This addresses problem (D) above, in the sense that FCS clients like Visual Studio can choose their own lifetimes for FSharpProject objects (usually the lifetime of the associated Roslyn Workspace).

  • Make Visual Studio use FSharpProject objects instead of FSharpChecker. A prototype is part of [Experiment] FSharpProject snapshot #11775. A preparatory PR was [VS] Consolidate Roslyn workspace and FCS #11694

    OBSERVABLE GAIN: This reduces memory usage

    OBSERVABLE GAIN: This reduces bugginess due to invalidation, timing and state problems

    This is also preparatory to reliably making FCS out-of-process with LSP.

  • Stabilize, document, refine the public APIs as part of making FCS a binary compatible component in support of F# Analyzers

And that is all.

Looking ahead

Further Incrementality

One result of the above agenda is that is provides a basis to begin to implement finer-grained incremental adjustment of analysis results w.r.t. incremental changes in inputs. Currently (at the end of the above agenda) incrementality is at the granularity of replacing the contents of an entire file. We could now consider incrementality w.r.t. adding text at the end of a file, or changes within a line. This requires incremental parsing, checking. The aim here would be higher performance IDE analysis.

Roslyn supports this kind of incrementality but it is not an essential part of the above agenda.

LSP and Out of Process

Future changes to Roslyn will require F# to implement LSP, at least for the minimal of doing diagnostic analysis out-of-process (see
#11969, note this is a tiny part of LSP, and Ionide provides a full implementation).

An LSP implementation of F# will host FSharp.Compiler.Service and should ideally have an implementation architecture very similar to the C# out-of-proc LSP implementation. Completing this agenda allows us to use this approach. For example, the out-of-proc process will mirror the Roslyn workspace and hold handles to the appropriate FSharpProject objects, just as the C# version of the same holds a Roslyn Compilation object.

Crucially, this means the F# LSP implementation will be simple, reliable and relatively stateless (apart from holding FSharpProject
objects).

@TIHan TIHan added Area-Compiler Area-FCS Area-VS-Editor VS editor support for F# code, not covered elsewhere Area-LangService-API Plan labels Aug 16, 2021
@baronfel
Copy link
Member

Aside: Problem B can be partly addressed by an existing "hack" in the FCS API that allows the file system used by FCS to be "shimmed". This is used by JetBrains Rider in order to implement in-memory documents. However this is an awkward solution that differs greatly from the Roslyn approach, and Problems A, C and D still remains.

Yep, worth noting that we do this in FSAC as well and our in-memory FS is powered by LSP file changed messages. We do still have all of the mentioned issues as well 👍

I skimmed through the rest of this and it sounds generally quite nice. Interested in seeing the details of course, but I'm encouraged by everything I see here. Excellent writeup!

@alfonsogarciacaro
Copy link
Contributor

This is great work @TIHan, looking forward to it!

Not entirely sure if it's related but just to mention that it'd be nice if this work also takes into account FSC-based compilers targeting other platforms. Particularly about incremental compilation. For Fable, we use a custom build of FSC from a branch of @ncave fork that does some simple caching (it only recompiles the first changed file and those below) and skips work that's not needed to get the typed AST (basically symbol information, if I'm not mistaken). It'd be great if we could have a similar mechanism directly built into FSC.

@dsyme dsyme removed the Area-VS-Editor VS editor support for F# code, not covered elsewhere label Mar 30, 2022
@kerams
Copy link
Contributor

kerams commented Apr 4, 2022

I assume out of process hosting implies leaving the .NET Framework world, with free runtime performance gains, type providers' design time parts being able to target .NET 6, etc.? Can't wait.

Any updates on how it's going and what the plans for the immediate future are? Obviously Will's leaving the team has thrown a wrench into this endeavor a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: New
Development

No branches or pull requests

7 participants