Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use hashes for language service cache keys #6058

Merged
merged 5 commits into from
Jan 14, 2019
Merged

Use hashes for language service cache keys #6058

merged 5 commits into from
Jan 14, 2019

Conversation

cartermp
Copy link
Contributor

@cartermp cartermp commented Jan 2, 2019

Fixes #6028 and is an evolved version of #5944

In #6001 we replaced strings with ISourceText in the language service, reducing huge amounts of allocations in VS. However, it did not address the issue that our language service caches still use the full text of source as keys. This PR continues what #5944 does, except it uses the same hashing algorithm that Roslyn uses for the actual Roslyn SourceText type.

@cartermp
Copy link
Contributor Author

cartermp commented Jan 7, 2019

In case folks are wondering about collisions with the StringText type (which is ultimately just a string), here's what that looks like:

let calcProb x space =
    1.0 - ((space - 1.0)/space) ** (x * (x - 1.0) / 2.0)
    
let probs () =
    let space = 2.0 ** 32.0
    let probs =
        [ for x in 1 .. 10 -> (10.0 ** float x) ]
        |> List.map (fun powerOfTen -> bigint powerOfTen, calcProb powerOfTen space)
        
    for (power, prob) in probs do
        printfn "Num files: %A\nProbability of collision: %0.3f percent\n" power (prob * 100.0)

Result:

Num files: 10                                                                                                                                                                          
Probability of collision: 0.000 percent                                                                                                                                                
                                                                                                                                                                                       
Num files: 100                                                                                                                                                                         
Probability of collision: 0.000 percent                                                                                                                                                
                                                                                                                                                                                       
Num files: 1000                                                                                                                                                                        
Probability of collision: 0.012 percent                                                                                                                                                
                                                                                                                                                                                       
Num files: 10000                                                                                                                                                                       
Probability of collision: 1.157 percent                                                                                                                                                
                                                                                                                                                                                       
Num files: 100000                                                                                                                                                                      
Probability of collision: 68.781 percent                                                                                                                                               
                                                                                                                                                                                       
Num files: 1000000                                                                                                                                                                     
Probability of collision: 100.000 percent                                                                                                                                              
                                                                                                                                                                                       
Num files: 10000000                                                                                                                                                                    
Probability of collision: 100.000 percent                                                                                                                                              
                                                                                                                                                                                       
Num files: 100000000                                                                                                                                                                   
Probability of collision: 100.000 percent                                                                                                                                              
                                                                                                                                                                                       
Num files: 1000000000                                                                                                                                                                  
Probability of collision: 100.000 percent                                                                                                                                              
                                                                                                                                                                                       
Num files: 10000000000                                                                                                                                                                 
Probability of collision: 100.000 percent     

Keep in mind that the hash code is not the only item used to forma a key; files must also share the same name to be considered a match.

@abelbraaksma
Copy link
Contributor

I'm not entirely sure I follow your hash collision probability test, but if the results are correct, we'd need some serious checking of the hash algorithm. Is this the same as string itself? Is this solution wide or per project?

Though even with those numbers, it's still a big improvement over full text matching.

@abelbraaksma
Copy link
Contributor

Oh wait, you're hashing over the checksum, as opposed to the full text. And your numbers are about that, not about a possible underlying existing hash function. I spoke too soon.

@cartermp
Copy link
Contributor Author

cartermp commented Jan 8, 2019

It's a birthday problem applied over the total possible hashes you can get with GetHashCode on a string (2^32).

@dsyme
Copy link
Contributor

dsyme commented Jan 14, 2019

Re hash collisions - in this context we'd only be interested in two file contents with the same name and hash, which were a "small edit distance" away from each other. Which is exceptionally unlikely.

@TIHan TIHan merged commit b279f8d into dotnet:dev16.0 Jan 14, 2019
@alfonsogarciacaro
Copy link
Contributor

Does this affect FSharpChecker.ParseAndCheckProject? I mean can I safely now call this function multiple times trusting that the F# compiler won't parse again the contents of the files that haven't changed?

Right now I'm using FSharpChecker.ParseAndCheckFileInProject after the first compilation but this means sometimes I'm not recompiling all the files depending on the changed one, and some errors may go unnoticed.

@cartermp
Copy link
Contributor Author

cartermp commented Jan 30, 2019

Yes, this shouldn't have any observable effect aside (statistically speaking) aside from a reduction in memory usage. If you incur 10k or more changes to the same file in the same edit session (and those changes are spaced out long enough to end up getting cached) there's a 1.157% chance of a collision which would result in a re-parse, but this is very much in the realm of unlikely.

nosami pushed a commit to xamarin/visualfsharp that referenced this pull request Jan 26, 2022
* Use hashes for language service cache keys

* Fix dumb thing I forgot

* I guess this is required ayyy lmao

* Override HashCode instead and do a few less computations

* Remove ToString in obj expression
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants