Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit trivia by AST MainNode name. #992

Merged
merged 68 commits into from Aug 28, 2020
Merged

Limit trivia by AST MainNode name. #992

merged 68 commits into from Aug 28, 2020

Conversation

nojaf
Copy link
Contributor

@nojaf nojaf commented Aug 7, 2020

Hey @jindraivanek, in this PR I've experimented with limiting the allowed match for TriviaNode by the AST MainNode name.
I'd like to hear your opinion on this one.

So the problem was that when printing the trivia of the TypeDefnSig it also prints the trivia for the SynTypeDefnSimpleRepr.Union because they have exactly the range.

image

That is why as a test I further limited the allowed TriviaNodes by type name.

Potentially it can:

  • Speed up finding the correct trivia node as we further limit the set of options.

genTriviaFor ["TypeDefnSig"] range

  • Allow us to be sure that we only print the trivia once and that could lead that we don't need to clean up the already printed trivia.
    This could be beneficial for the benchmark.

If you think this has some value I can flesh out this PR a bit more.
Please let me know what you think.
Many thanks.

@nojaf nojaf requested a review from jindraivanek August 7, 2020 09:05
@nojaf nojaf linked an issue Aug 7, 2020 that may be closed by this pull request
3 tasks
@nojaf
Copy link
Contributor Author

nojaf commented Aug 8, 2020

So I've completed the exercise out of curiosity sake and it turns out this is not good for the performance:

| Method |    Mean |   Error |   StdDev | Rank |       Gen 0 |      Gen 1 |      Gen 2 | Allocated |
|------- |--------:|--------:|---------:|-----:|------------:|-----------:|-----------:|----------:|
| Format | 9.571 s | 3.266 s | 0.1790 s |    1 | 141000.0000 | 48000.0000 | 11000.0000 |    2.4 GB |

I was actually expecting that

let private findTriviaMainNodeFromNameAndRange nodes ((mainNodesNames, range): string list * range) =
    nodes
    |> List.tryFind(fun n ->
        match n.Type with
        | MainNode(mn) ->
            List.contains mn mainNodesNames && RangeHelpers.rangeEq n.Range range
        | _ -> false)

would have sped things up.

@nojaf
Copy link
Contributor Author

nojaf commented Aug 9, 2020

Splitting up the trivia main nodes by types upfront improves the situation yet nothing all that significant:

        let triviaByNodes =
            trivia
            |> List.choose (fun tn ->
                match tn.Type with
                | MainNode(mn) -> Some (mn, tn)
                | _ -> None)
            |> List.groupBy fst
            |> List.map (fun (k, g) -> k, List.map snd g)
            |> Map.ofList
| Method |    Mean |    Error |   StdDev | Rank |       Gen 0 |      Gen 1 |      Gen 2 | Allocated |
|------- |--------:|---------:|---------:|-----:|------------:|-----------:|-----------:|----------:|
| Format | 5.264 s | 0.0300 s | 0.0016 s |    1 | 103000.0000 | 36000.0000 | 11000.0000 |   2.37 GB |

@nojaf
Copy link
Contributor Author

nojaf commented Aug 9, 2020

Slowly getting there:

| Method |    Mean |    Error |   StdDev | Rank |       Gen 0 |      Gen 1 |      Gen 2 | Allocated |
|------- |--------:|---------:|---------:|-----:|------------:|-----------:|-----------:|----------:|
| Format | 4.625 s | 0.0928 s | 0.0051 s |    1 | 103000.0000 | 33000.0000 | 11000.0000 |   2.37 GB |

The more calls that can be replaced by specific main node or token calls the better it the numbers are.

@nojaf
Copy link
Contributor Author

nojaf commented Aug 9, 2020

Booya 🥰

| Method |    Mean |    Error |   StdDev | Rank |       Gen 0 |      Gen 1 |      Gen 2 | Allocated |
|------- |--------:|---------:|---------:|-----:|------------:|-----------:|-----------:|----------:|
| Format | 3.913 s | 0.8803 s | 0.0483 s |    1 | 149000.0000 | 55000.0000 | 20000.0000 |   2.38 GB |

@jindraivanek
Copy link
Contributor

I like the general idea. 👍

I think that with increased use of MainType we should replace MainType string with enum. Maybe use code from my abandoned experiment https://github.com/jindraivanek/fantomas/blob/6770151d604fe6d67bff0495dc2001ae1fe31813/src/Fantomas/FsAstTypes.fs ? Or at least use single case DU.

@nojaf
Copy link
Contributor Author

nojaf commented Aug 10, 2020

Good idea, that would speed things up. We could do the same thing for the by Token Map in Context as well.

 TriviaMainNodes: Map<string, TriviaNode list>
 TriviaTokenNodes: Map<string, TriviaNode list>

One thing I haven't mentioned here is that although the current tests pass, I did not take every possible scenario into account.
From time to time I was able to remove a genTrivia call completely and all the tests still pass so we need to be careful that we only call it when we really expect it.
Or to put it simply: we will gain some speed yet lose some correct results because we didn't have a test for them.
I'm pretty ok with this in a post v4 world, it will lead to some easy bugs and is worth the gain.

@nojaf nojaf marked this pull request as ready for review August 25, 2020 20:00
@nojaf nojaf changed the title Proof of concept to limit trivia by AST MainNode name. Limit trivia by AST MainNode name. Aug 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

FSI formatting does the wrong thing with comments on single-case DU
2 participants