Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel project analysis behind a feature flag #13521

Merged
merged 18 commits into from
Sep 24, 2022

Conversation

safesparrow
Copy link
Contributor

@safesparrow safesparrow commented Jul 17, 2022

FCS code analysis currently runs in serial - project-by-project.
There is a big potential for speedup by analysing all unrelated projects at the same time.
Some discussion around this happened on F# Slack with @vzarytovskii, @cartermp and others.

Given the simplest example:

graph TD;
    A-->B1;
    A-->B2;
    B1-->C;
    B2-->C;

the current order of analysis will be start A, start B1, start C, finish C, finish B1, start B2, finish B2, finish A.
With this change the order changes to start A, start B1&B2, start C, finish C, finish B1&B2, finish A

What this change does:

  • tcImports.RegisterAndImportReferencedAssemblies invokes computations for each reference using Async.Parallel instead of Async.Sequential. This includes DLL references and project references.
  • Both DLLs and project references are processed in parallel.
  • For project references this means many incremental builders are created and projects analysed in parallel
  • There is no logic that scans the project graph up-front - projects are analysed as references are discovered, top-down.

How the feature is enabled

The feature is currently wired up for the two main cases: 1. standalone compiler and 2. FSharpChecker.
It is controlled as follows:

  1. If the FCS_ParallelReferenceResolution environment variable is set, its value (true or false) dictates whether the feature is on or off.
  2. If the environment variable is not set (to a valid value):
  • FSC falls back to an internal CLI option --parallelreferenceresolution which when set, enables the feature:
Microsoft (R) F# Compiler version 12.0.5.0 for F# 7.0
Copyright (c) Microsoft Corporation. All Rights Reserved.

warning FS0075: The command-line option '--parallelreferenceresolution' is for test purposes only
  • FSharpChecker.Create has a new parameter that when set to true, enables the feature.

Some things I'm not 100% sure about:

  • I think the cache mechanism for IncrementalBuilders means that for a given project only one builder will be created, even if many dependents will try to process it at the same time (thinking about a race condition in A -> B1, B2 -> C where B1 and B2 request analysis of C).
  • I think the way I'm creating those computations with NodeCode and Async.Parallel is correct, but I'm not sure.
  • Is it OK to parallelise resolving assemblies and not just projects?
  • If this applies to assembly processing, do we want that, and do we want that to apply during compilation as well (rather than code analysis)?

Limiting parallelisation

When enabled, should parallelisation be configurable to maximum of X threads? If so, should that be per request, or globally for the process? How should it be configured?
I initially planned to add an option to limit it, but given that separate analysis requests as I understand are completely independent and thus there is no limit to how multi-threaded FCS is already, I would argue that this PR isn't a major change in that regard.
However it definitely increases the number of threads running analysis in a typical scenario.

Three issues with that that I can think of:

  • too many threads causing CPU contention, context switching - will ThreadPool scheduling save us and not schedule too many operations?
  • I/O contention - probably requires looking at the profiler.
  • high memory usage

What speedup does this make?

Besides any testing, this change feels very natural to me in principle.
At our company we operate on big solutions with ~100 F# projects. Before doing an accurate analysis I'm fairly sure that the level of parallelisation possible in that project graph is >=2, which means almost 2x speedup (and I think it's actually >> 2).

GC consideration

Running code analysis on many threads means much higher allocation rate leading to more work for GC.
From the tests I observed that with Workstation GC (which I believe is used by Rider by default) FCS was spending as much as 70% of time in GC.

Enabling Server GC had therefore a huge impact on the timings.

Test results

The below script can be used to measure the impact of this feature.

It utilises the benchmark generator from https://github.com/safesparrow/fsharp-benchmark-generator .

Script: (click to see details)
# Checkout and build a specific version of FSharp codebase
git clone https://github.com/safesparrow/fsharp
Push-Location fsharp
git checkout 4431af5960bcaece76660ec3995faaf7d8d178ac
./build.cmd -c Release -noVisualStudio
Pop-Location

# Checkout benchmark generator
git clone https://github.com/safesparrow/fsharp-benchmark-generator
Push-Location fsharp-benchmark-generator

Function FCSBenchmark {
    dotnet run -- -f ((Get-Location).Path + "/../fsharp/artifacts/bin/FSharp.Compiler.Service/release/netstandard2.0/FSharp.Compiler.Service.dll") -i .\inputs\50_leaves.json -n 1
}

# Run 4 benchmarks with 4 different environment configs, save results in files
$env:DOTNET_gcServer=0; $env:FCS_PARALLEL_PROJECTS_ANALYSIS="false"; FCSBenchmark > ../workstation_sequential.log
$env:DOTNET_gcServer=0; $env:FCS_PARALLEL_PROJECTS_ANALYSIS="true"; FCSBenchmark > ../workstation_parallel.log
$env:DOTNET_gcServer=1; $env:FCS_PARALLEL_PROJECTS_ANALYSIS="false"; FCSBenchmark > ../server_sequential.log
$env:DOTNET_gcServer=1; $env:FCS_PARALLEL_PROJECTS_ANALYSIS="true"; FCSBenchmark > ../server_parallel.log

Pop-Location

Test 1: Extremely parallel example: Root -> 50 leaves

See https://github.com/safesparrow/fsharp-benchmark-generator/blob/main/inputs/50_leaves.json for the definition.

Click to see project diagram image

10 iteration average (the first iteration is considerably slower):

GC Mode Time
Workstation Sequential 16005ms
Workstation Parallel 10849ms
Server Sequential 14594ms
Server Parallel 2659ms

Test 2: Almost sequential project structure: Fantomas.Tests

See https://github.com/safesparrow/fsharp-benchmark-generator/blob/main/inputs/fantomas.json for the definition.

Click to see project diagram image Here the only parallelisation can happen between `Fantomas.Client` and `Fantomas.FCS`.

However Fantomas.Client is a 3-file project, so its analysis doesn't take long and my claim is that the ~200ms difference is roughly the time it takes to analyse Fantomas.Client.

GC Mode Time
Workstation Sequential 8496ms
Workstation Parallel 8353ms
Server Sequential 6492ms
Server Parallel 6222ms

TODO:

  • Discuss all the above issues
  • Inject the "analyse in parallel" flag from top-level
  • Provide more test results
  • Test this change on the CI (currently it's disabled by default)

@safesparrow safesparrow changed the title Allow parallel project analysis with an environment variable Allow parallel project analysis with an environment variable WIP Jul 17, 2022
@safesparrow safesparrow marked this pull request as ready for review July 23, 2022 13:15
@safesparrow
Copy link
Contributor Author

safesparrow commented Jul 23, 2022

I have provided some test results which show the performance impact.
@vzarytovskii Would it be possible to consider this change?

As mentioned in the description there are tweaks to be made (eg. a different way to turn it on/off), but that requires input from the maintainers 🙂

@vzarytovskii
Copy link
Member

vzarytovskii commented Jul 25, 2022

Change is great, and needs a bunch of testing, especially in Visual Studio, since it loads projects differently, and has some quirks regarding the threading.

We will also probably need a flag passed to checker, and a VS feature flag (for easier in-VS testing, which will be shown in the settings).

Update 1: I'd also be interested in differences in memory consumption.
Update 2: GC is important, I wonder how does it behave on "slower" machines, I will test it on my laptop (which doesn't have great specs).

Thoughts @KevinRansom @dsyme ( + @TIHan and @cartermp, sorry for the ping :) )

@safesparrow
Copy link
Contributor Author

safesparrow commented Jul 25, 2022

We will also probably need a flag passed to checker, and a VS feature flag (for easier in-VS testing, which will be shown in the settings).

Makes sense. @auduchinok if this change were to land, how much work would it be for Rider to pick up the FCS version that supports it and an option to enable it?

Update 1: I'd also be interested in differences in memory consumption.

Yes, that's something worth looking at.
Would be nice to add more stats to https://github.com/safesparrow/fsharp-benchmark-generator/ .
Possibly worth making Benchmark.Runner a BDN benchmark with carefully chosen config as to not make it take too long.

Update 2: GC is important, I wonder how does it behave on "slower" machines, I will test it on my laptop (which doesn't have great specs).

Would be lovely if one day this question can be answered automatically by running performance tests on a variety of predefined [possibly virtual, if results are not skewed too much] machines.

For now we can capture those stats in https://github.com/safesparrow/fsharp-benchmark-generator/.

In the short-term I think what we should do is:

  • gather a set of metrics to look for
  • gather/generate a set of codebases to test this on

Then we can either:

  • find enough variations of hardware by asking people here to volunteer
  • get the hardware from the cloud

And then run https://github.com/safesparrow/fsharp-benchmark-generator/ for every combination.

Regarding VS:
What's special about it?
Can we distill any/most important unique environment conditions it provides into a reusable piece of code that can then be trivially set up in a benchmark?

@auduchinok
Copy link
Member

Makes sense. @auduchinok if this change were to land, how much work would it be for Rider to pick up the FCS version that supports it and an option to enable it?

It would be a bit more difficult this time, since there're many impactful refactorings has been made upstream. 🙂
I'm going to update it for 2022.3 release at some time during EAP.

@vzarytovskii
Copy link
Member

vzarytovskii commented Jul 25, 2022

Regarding VS:
What's special about it?

If I am not mistaken, it pretty much has its own threading model/primitives which may affect the behaviour here.

@safesparrow
Copy link
Contributor Author

Regarding VS:
What's special about it?

If I am not mistaken, it pretty much has its own threading model/primitives which may affect the behaviour here.

I think this is another vote for making this opt-in,
but nonetheless:
Do you know if it would be possible to use the same threading model in a benchmark to increase coverage without needing the full VS?
Obviously we'd want to try it in full VS at some point. But hopefully not all future changes like this (but maybe less significant) require a lot of manual effort to verify behaviour in VS.

@dsyme
Copy link
Contributor

dsyme commented Jul 28, 2022

I think the cache mechanism for IncrementalBuilders means that for a given project only one builder will be created, even if many dependents will try to process it at the same time (thinking about a race condition in A -> B1, B2 -> C where B1 and B2 request analysis of C).

Yes, this is correct

I think the way I'm creating those computations with NodeCode and Async.Parallel is correct, but I'm not sure.

This is correct.

Is it OK to parallelise resolving assemblies and not just projects?

I think it is ok.

On initial review I spotted some concerns about concurrent assembly loading/referencing, I'll document those below. However these are not in practice a problem in the case of IncrementalBuilder, the overall assembly loading and resolution process is held within a GraphNode fot the unique builder, and all requests requiring the builder will have to await the completion of that GraphNode.

The part I was concerned about are concurrent access with regard to the RegisterDll and RegisterCcu calls which are executing before the phase2 fixups executed, that is all these calls:

TryRegisterAndPrepareToImportReferencedDll
                    tcImports.RegisterDll dllinfo

PrepareToImportReferencedILAssembly: 

        tcImports.RegisterCcu ccuinfo

PrepareToImportReferencedFSharpAssembly:

        ccuRawDataAndInfos |> List.iter (p23 >> tcImports.RegisterCcu)

Currently the sequence is

    Sequential [
        Phase1 Assembly A
        RegisterDll Assembly A
        RegisterCcu  Assembly A
        Phase1 Assembly B
        RegisterDll Assembly B
        RegisterCcu  Assembly B
        Phase2 Assembly A
        Phase2 Assembly B
    ]

With the change it would become

    Parallel [
        Sequential [
            Phase1Assembly A
            RegisterDll Assembly A
            RegisterCcu  Assembly A
        ]
        Sequential [
            Phase1Assembly B
            RegisterDll Assembly B
            RegisterCcu  Assembly B
        ]
    ]
    Phase2 Assembly A
    Phase2 Assembly B

This amounts to the same thing, and as mentioned all of it is within the control of a sequentializing GraphNode here or here.

If this applies to assembly processing, do we want that, and do we want that to apply during compilation as well (rather than code analysis)?

Yes, we would should apply it during command-line compilation as well, subject to performance testing.

@safesparrow
Copy link
Contributor Author

safesparrow commented Sep 17, 2022

I have now wired up the feature setting from top-level in both FSC and FSharpChecker.
The user can opt-in to the feature by setting the environment variable FCS_ParallelReferenceResolution=true, or through MSBuild/IDE.

I also raised #13835 with a feature that produces OpenTelemetry traces, and the description contains example traces that compare the type-checking timeline with the feature on and off - see #13835 (comment) .
This was done to produce results that would help verify, that the behaviour of parallel analysis is as expected.

The few performance tests from which results I shared show the expected benefits from parallel work and as far as I can see, nothing indicates that work is being duplicated.

@vzarytovskii @dsyme What would be the next steps before this PR can be accepted as an opt-in, non-public/experimental feature?

@safesparrow safesparrow changed the title Allow parallel project analysis with an environment variable WIP Parallel project analysis behind a feature flag Sep 17, 2022
Copy link
Member

@vzarytovskii vzarytovskii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dsyme I'm fine with merging it
@safesparrow thanks a lot for this!

@dsyme
Copy link
Contributor

dsyme commented Sep 23, 2022

@safesparrow I can't merge to your branch, can you merge main into this please?

@dsyme
Copy link
Contributor

dsyme commented Sep 23, 2022

@safesparrow Thanks!

@dsyme
Copy link
Contributor

dsyme commented Sep 23, 2022

@safesparrow Actually it's showing it's needs another merge (normally we can update but for some reason your branches don't allow it)

@safesparrow
Copy link
Contributor Author

Actually it's showing it's needs another merge

Ah, those changes in main are coming quicker than PR checks can run 😄
I did another update now.

normally we can update but for some reason your branches don't allow it

Not sure why. The Allow edits by maintainers checkbox is checked:
image

@dsyme dsyme enabled auto-merge (squash) September 23, 2022 20:02
@safesparrow
Copy link
Contributor Author

safesparrow commented Sep 23, 2022

Not sure why, but a single regression test failed after the latest merge. I submitted a dummy commit to trigger a rerun.

EDIT: Actually during one of the merges I mishandled git and ended up deleting one of the regression test samples. This time tests should pass 🤞

auto-merge was automatically disabled September 24, 2022 00:08

Head branch was pushed to by a user without write access

@dsyme dsyme enabled auto-merge (squash) September 24, 2022 01:22
@dsyme
Copy link
Contributor

dsyme commented Sep 24, 2022

You seem to need to keep merging, my apologies.

@dsyme dsyme merged commit 97c7b9a into dotnet:main Sep 24, 2022
auduchinok pushed a commit to JetBrains/fsharp that referenced this pull request Dec 7, 2022
* Allow parallel project analysis with an environment variable

* reformat CompilerImports.fs

* Wire in the flag from top-level, add an internal option to the FSC, add a constructor arg to the FSharpChecker.

* Fantomas formatting

* Update surface baseline

* Cleanup

* Dummy commit to trigger PR checks

* Update surface baseline after merge

* Empty commit to trigger PR check rerun

* Empty commit to trigger PR check rerun

* Restore tests/fsharp/regression/13219/test.fsx
TheAngryByrd added a commit to TheAngryByrd/ionide-vscode-fsharp that referenced this pull request Feb 21, 2023
TheAngryByrd added a commit to TheAngryByrd/ionide-vscode-fsharp that referenced this pull request Feb 21, 2023
baronfel pushed a commit to ionide/ionide-vscode-fsharp that referenced this pull request Feb 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants