Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel type checking #11550

Closed
kerams opened this issue May 11, 2021 · 6 comments
Closed

Parallel type checking #11550

kerams opened this issue May 11, 2021 · 6 comments

Comments

@kerams
Copy link
Contributor

kerams commented May 11, 2021

The recent awesome work on parallel parsing and IL gen got me thinking whether it would be feasible to parallelize type checking as well.

Given F#'s significant file order, if the compiler can quickly establish that file B can't possibly reference anything in file A declared directly above it, can't A and B be typechecked (and optimized) simultaneously? This would require an initial traversal of the AST in every file in order to create a file dependency graph. Specifically, things like this need to be checked:

  • If B opens a module or class defined in A, B depends on A
  • If A contains an AutoOpen that has immediate effect in B (because of overlapping namespaces, regardless of actual contents), B depends on A
  • If B references something from A using a qualified name, B depends on A

I think these can be gleaned from the untyped syntax tree alone, the question is whether it would be fast enough to offset the overhead and whether typechecking has a significant cost. Apologies in advance if there's a gaping flaw in here somewhere :).

@smoothdeveloper
Copy link
Contributor

smoothdeveloper commented May 12, 2021

@kerams, would you mind paying a look at the recent comments I made in the broader/fuzzy discussion for incremental recompile (#8044)?

What I touch on my comments in #8044 is targeted at identifying if a file and a diff of signatures before it should engender a recompile of the file.

The same approach could be taken earlier, it could be on diff of AST that we could already identify opportunities to skip recompile of a file, or more likely, proceed in checking the next stage.

Why is it related to incremental recompilation?

Because incremental recompilation will require serialisation and heuristics on the serialized graphs, and making things like a graph and serialised, means they can conceptually be parallelised for read only usage 🙂 (I know I'm stretching everything a little bit).

Now, regarding your more precise issue, I have few things coming to mind:

Are you familiar with the NameResolution business? if there was a way to populate those after AST parsing, and do the checks that you are looking for, and serialise the outcome, it would benefit both some of the rough ideas I explore for incremental recompilation and as you say, gets us closer to the ability to type check in parallel, according to your more precise plot (the F# world domination, errr, compilation plan, I presume).

If NameResolution as we know (but me) is not good enough (because needs a first pass of type checking, or parsing of references), we could maybe have a lower one, that still targets few use cases for your parallel type checking graph splitter endeavours.

That would be the beginning of a NameResolution that enrich at each stages:

  • AST
  • assembly reference loading
  • later type checking
  • codegen (not sure if relevant, but maybe for relinking, in context of type provider SDK...)

It means invalidation of the serialised NameResolutions could be fine grained in terms of which context it impacts.

One big pitfall, at least, nest of edge cases, is shadowing and aliasing features of the language, and also stuff like AutoOpen and open static.

It sounds like "after assembly reference resolution" maybe the next stage if bare AST can't help much in sorting out what you are pointing at.

edit:

Apologies in advance if there's a gaping flaw in here somewhere :)

Don't worry, you are covered at least 10x by all mine :)

@TIHan
Copy link
Member

TIHan commented May 12, 2021

We have a PR that could enable this: #11152

The restriction is that it will only parallelize type checking impl files that have a backing signature file. Our hope in the future is that we will add better tooling to create these signature files automatically through a gesture in an editor.

Trying to parallelize type checking for impl files without sig files is a much more difficult undertaking. Sure, you could add heuristics like you suggested, but we would have to have a dependency graph of types and functions; which may be more expensive in of itself.

@kerams
Copy link
Contributor Author

kerams commented May 12, 2021

Yes, I'm specifically talking about no sig scenarios.

a dependency graph of types and functions;

Yes, that would be expensive. I was thinking keeping track of namespaces and modules alone could theoretically suffice. For instance, if we collect those for file A and see a related open in B, we can't parallelize. It does not matter what exactly is used (or if the open is indeed redundant). This heuristic would only guarantee what definitely isn't used, and would because of this specificity hopefully run very fast.

@smoothdeveloper, I'm afraid I don't know enough about those topics, or whether your approach, if viable, would supersede mine (perhaps they could even complement each other). My evening thoughts were very narrowly concerned about full compilation performance.

@kerams
Copy link
Contributor Author

kerams commented May 12, 2021

Here's a really silly idea. If I extract all namespaces and top level modules (a very shallow AST traversal would be enough) from file A and do a full text search for them in file B, will I not be perfectly sure that the files are independent if there are no hits (other than perhaps a shared top level namespace)? :)))))

I'll answer myself. The aforementioned auto opens would cause a problem, but that can be easily checked separately. Partially qualified opens are worse though.

@smoothdeveloper
Copy link
Contributor

It seems to me like a great bang for buck first pass at splitting out stuff just at the AST level, like having both a genius and captain obvious being one and single person bringing this :D

I'm curious what roadblocks would be hit by this heuristics to turn faulty in the wild.

@vasily-kirichenko, any opinion?

@cartermp
Copy link
Contributor

cartermp commented Jun 4, 2021

Will move this to a discussion

@cartermp cartermp closed this as completed Jun 4, 2021
@dotnet dotnet locked and limited conversation to collaborators Jun 4, 2021

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants