Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimized find all references and reduced memory usage in VS #8339

Merged
merged 25 commits into from Feb 14, 2020

Conversation

@TIHan
Copy link
Contributor

TIHan commented Jan 23, 2020

The aim of this PR is to optimize find all references and reduce overall memory usage in VS, or potentially other editors depending on their needs.

To accomplish this, we need to stop storing full symbol information for every single file in every project in incremental builder. This will break find all references and rename refactor though, so we have to do a bit of work.

The solution I propose here is as follows:

  • Build a similar mechanism like Roslyn's SymbolKey for our symbols.
    • Each file that is type checked will build a storage container, called ItemKeyStore, that is a memory mapped file that contains a continuous block of memory of ranges and a string(the symbol key).
      • This will live in IncrementalBuilder.
      • Memory mapped files will use memory, but not the actual process's memory; in our case, VS.
      • ItemKeyStore can easily query for all locations of a given Item.
    • Symbols that are considered equal have the exact same key string.
    • The key string is determined by the structure of the Item.
  • Full semantic classification information must be held in each file for every project.
    • At the moment, we will not store this in a memory mapped file but it would be wise to do so considering it also takes up memory but not as much as the symbol/item keys.
    • In VS, each time a symbol location is found, the classification service will be invoked for that small span of text in order to display the classification in the Find All References window. We need to keep a cache of the semantic information for a file, otherwise, re-type-checking would have to occur which can slow find all refs.
  • A new lexing function must be exposed to quickly lex a small span of text for classification.
    • Each location of a symbol that is found will invoke syntactic classification. We need to make this operation fast and not allocate a lot.
    • Unfortunately, it will be inaccurate in some scenarios that involve string tokens that span multiple lines, but if it's only used in find all references, it's not that bad if we miss some classification for now. We currently have a mechanism built to handle this called the Tokenizer, but it's quite complicated and allocates A LOT which will slow find all references considerably and increase memory pressure; neither of what we want. Fixing it might be more trouble than it's worth at the moment.
  • Find all references in VS will now be streamed, meaning it will start displaying results immediately as soon as symbols are found in each file.

Both the storage of symbol keys and semantic classification in incremental builder will be disabled by default.

This design isn't perfect; I would rather not store ItemKeyStore and semantic classification in incremental builder, but at the moment it is the path with the least resistance. I think we can resolve that by, as a public API callback, intercept check results while IncrementalBuilder is checking files. That way it is up to the callback to determine what we do with the information, leaving it out of incremental builder's responsibility. But, I feel a little awkward doing that.

I will be porting over a lot of work that was done in a prototype and this PR will be the real thing.

The prototype showed significant memory reduction in VS, even without calling find all references, due to lack of storing full symbol information in-memory. Find all references's performance also significantly improved for large solutions.

@forki

This comment has been minimized.

Copy link
Contributor

forki commented Jan 24, 2020

TIHan added 11 commits Jan 25, 2020
…ification from allocating a lot
@TIHan TIHan changed the title [WIP] Optimized find all references and reduced memory usage in VS Optimized find all references and reduced memory usage in VS Feb 4, 2020
@TIHan

This comment has been minimized.

Copy link
Contributor Author

TIHan commented Feb 4, 2020

This is ready for the most part.

I added a new public lexing API and marked it as experimental; we really really need a better API for lexing. My hope is this will improve over time. This was really needed for find all references for syntactic classification so I didn't have to use the other one. I only wanted to get classification for a small span of text; though it isn't perfect and will not be accurate for tokens that span multiple lines.

@TIHan

This comment has been minimized.

Copy link
Contributor Author

TIHan commented Feb 4, 2020

@dsyme I think this can be reviewed now.

The big things are:

  • Storage of semantic classification in-memory and item keys in memory-mapped files (Item key store/builder).
  • New experimental lexing API
src/fsharp/service/IncrementalBuild.fsi Show resolved Hide resolved
src/fsharp/service/ItemKey.fs Outdated Show resolved Hide resolved
src/fsharp/service/ItemKey.fs Outdated Show resolved Hide resolved
src/fsharp/service/ItemKey.fsi Show resolved Hide resolved
src/fsharp/service/SemanticClassification.fs Outdated Show resolved Hide resolved
src/fsharp/service/ServiceLexing.fs Show resolved Hide resolved
TIHan added 2 commits Feb 6, 2020
Copy link
Contributor

dsyme left a comment

The code looks great

My question is really about the memory-mapped file. You say "it uses memory but not the process's memory". I don't understand this. My impression of memory mapped files is that they are "mapped into the process's address space" and my mental model is that if a mmap is 1GB big then 1GB of process address space is used. The actual contents of the file may or may not be in physical memory but that's true for anything from the process address space - the contents are only brought into physical memory as needed but the memory mapping does consume virtual address space, which is 4GB limited for VS.

So if that's correct then this MemoryMappedFile burns VS devenv.exe address space? I thought if you wanted to get the data out of the process address space then you'd have to use an actual file on disk, like a temporary file??

my mental model is that if a mmap is 1GB big then 1GB of process address space is used.

Now it could be that the above statement is somehow wrong for ViewStream over mmap files. If so could you include a link to definitive documentation about that? Or a sample that shows that you can create, say, 10x1GB mmap streams (using the combination of calls we are using here to create them) in a 32 bit process, and have them all live and accessible?

[<Sealed>]
type DocumentCache<'Value when 'Value : not struct>() =
let cache = new MemoryCache("fsharp-cache")
let policy = CacheItemPolicy(SlidingExpiration = TimeSpan.FromSeconds 2.)

This comment has been minimized.

Copy link
@dsyme

dsyme Feb 11, 2020

Contributor

Why 2.0 seconds here?

This comment has been minimized.

Copy link
@TIHan

TIHan Feb 12, 2020

Author Contributor

Anything under 2 seconds, I believe, the caching stops working, meaning it won't actually cache the item. It's a really stupid bug. So, 2 seconds is really the minimum that we can go IIRC.

This comment has been minimized.

Copy link
@cartermp

cartermp Feb 13, 2020

Contributor

Can this be added as a comment?

This comment has been minimized.

Copy link
@TIHan

TIHan Feb 13, 2020

Author Contributor

Yea, makes sense.

This comment has been minimized.

Copy link
@dsyme

dsyme Feb 13, 2020

Contributor

Can we make it tunable via an environment variable, like a lot of other such settings? I always think it's good practice in case we have to have customers do in situ testing of a different setting?

This comment has been minimized.

Copy link
@TIHan

TIHan Feb 13, 2020

Author Contributor

We could expose this as a setting, but I really think we should not. What we have should just work without any tweaking. Adding more time to this could make it worse; remember if the we cache the same item again it will reset the sliding expiration time back to 0.

This comment has been minimized.

Copy link
@cartermp

cartermp Feb 13, 2020

Contributor

I don't think this should be a setting.

@TIHan

This comment has been minimized.

Copy link
Contributor Author

TIHan commented Feb 11, 2020

Thank you for the feedback @dsyme . I'll be looking over everything.

@TIHan

This comment has been minimized.

Copy link
Contributor Author

TIHan commented Feb 11, 2020

Memory-Mapped Files
https://docs.microsoft.com/en-us/dotnet/standard/io/memory-mapped-files

Non-persisted files are memory-mapped files that are not associated with a file on a disk. When the last process has finished working with the file, the data is lost and the file is reclaimed by garbage collection. These files are suitable for creating shared memory for inter-process communications (IPC).

This is what we are doing, with the exception of using it for IPC. I don't think it is possible to even share the information by IPC because we give the MMF name a "null" value. Though, as found by @baronfel, Mono, unfortunately, does not allow a "null" name in their MMF impl, but Desktop and Core do. So, we might need to special case that here.

This is the API we use: MemoryMappedFile.CreateNew, https://docs.microsoft.com/en-us/dotnet/api/system.io.memorymappedfiles.memorymappedfile.createnew
"To obtain a MemoryMappedFile object that represents a non-persisted memory-mapped file (not associated with a file on disk)."
It's parameter, mapName, can accept a null:

or null for a MemoryMappedFile that you do not intend to share across processes.

Regarding memory use, MMF uses virtual memory which could come from RAM or pages; it should not be using memory from the process's (devenv.exe) private memory but will in the address space. While this will use memory, it lowers the memory pressure for the actual process.

@TIHan

This comment has been minimized.

Copy link
Contributor Author

TIHan commented Feb 12, 2020

Now it could be that the above statement is somehow wrong for ViewStream over mmap files.

I think this isn't wrong.

open System
open System.IO.MemoryMappedFiles

let create1GB () =
    let size = 1024 * 1024 * 1024 // 1gb
    let mmf = MemoryMappedFile.CreateNew(null, int64 size)
    let view = mmf.CreateViewStream()
    (mmf, view)

[<EntryPoint>]
let main argv =
    let tenMMF =
        Array.init 10 (fun _ -> create1GB())
    Console.ReadLine() |> ignore
    Console.WriteLine(tenMMF)
    0

This will explode because of the address space for MMF regarding "views" because it is over 2GB in address space.
Not creating a "view", but only the MFF, it's fine. I had incorrect assumptions on how this worked, but can be fixed by creating smallers views when it's time to read/write. I'll make those adjustments.

TIHan added 5 commits Feb 12, 2020
@dsyme
dsyme approved these changes Feb 13, 2020
@dsyme

This comment has been minimized.

Copy link
Contributor

dsyme commented Feb 13, 2020

I've marked this as approved. I don't mind if the MMF is in process address space if its still a good way to store the data compactly outside the. NET heap (and we could always move it to a temp file?) I'll leave it for you to decide though

@TIHan TIHan merged commit 53f2911 into dotnet:master Feb 14, 2020
16 checks passed
16 checks passed
WIP Ready for review
Details
fsharp-ci Build #20200213.9 succeeded
Details
fsharp-ci (Build EndToEndBuildTests) Build EndToEndBuildTests succeeded
Details
fsharp-ci (Build Linux) Build Linux succeeded
Details
fsharp-ci (Build Linux_FCS) Build Linux_FCS succeeded
Details
fsharp-ci (Build MacOS) Build MacOS succeeded
Details
fsharp-ci (Build MacOS_FCS) Build MacOS_FCS succeeded
Details
fsharp-ci (Build SourceBuild_Linux) Build SourceBuild_Linux succeeded
Details
fsharp-ci (Build SourceBuild_Windows) Build SourceBuild_Windows succeeded
Details
fsharp-ci (Build UpToDate_Windows) Build UpToDate_Windows succeeded
Details
fsharp-ci (Build Windows coreclr_release) Build Windows coreclr_release succeeded
Details
fsharp-ci (Build Windows desktop_release) Build Windows desktop_release succeeded
Details
fsharp-ci (Build Windows fsharpqa_release) Build Windows fsharpqa_release succeeded
Details
fsharp-ci (Build Windows vs_release) Build Windows vs_release succeeded
Details
fsharp-ci (Build Windows_FCS) Build Windows_FCS succeeded
Details
license/cla All CLA requirements met.
Details
@cartermp

This comment has been minimized.

Copy link
Contributor

cartermp commented Feb 14, 2020

image

@cartermp cartermp mentioned this pull request Feb 14, 2020
2 of 14 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

5 participants
You can’t perform that action at this time.