-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Modules support #1293
Comments
This includes |
(Note: speaking as a CMake developer here with a focus on how modules work with CMake; other build systems have similar challenges, but not necessarily the same ones)
Note that I suspect that for static analysis, a lot more communication between build systems and tooling to know what is going on. The problem is that with explicit modules even
This isn't even counting the fact that the information presented there isn't directly usable as any referenced BMI files will be in the host compiler, not something the tool necessarily understands. |
In the age we live in, a programming language is only as good as it runtimes/frameworks and lastly editor support... I mean, who codes these days with just syntax highlighting? C++ modules are - from a user perspective - a very nice addition to the C++ syntax end really modernizes it. However, it is really being held back by editor support. AFAIK: C++ modules are only supported by VisualStudio (using msvc ixx files) and CLion (using clang cppm files). If clangd would support it, it will significantly improve editor support and with that its usability/popularity. As for compilers having there own flags/options/file-extensions for modules - which is regretful - just ignore all that and start with supporting the clangs own formats. Although this is - currently - not a critical functionality, I personally am hoping to see modules supported by clangd soon! |
from https://gitlab.kitware.com/cmake/cmake/-/issues/18355 |
It may be somewhat instructive to look at CLion's approach to supporting modules, which first appeared in the latest 2022.3 release: https://blog.jetbrains.com/clion/2022/11/clion-2022-3-released/#cpp_modules It's somewhat light on details but this part seems relevant:
|
This is not strictly correct since it is allowed to declare a module within a
Yeah. Although it didn't mention a lot technical details, I guess we can't learn a lot from this blog. And it only mentioned about the second requirement given by @sam-mccall about the implicit buildings. I guess it may be OK to be too good as the first step. I mean, maybe is it acceptable to ignore some requirements in the first step. Including:
And what is good as the first step? I think it may be good if:
How do you guys think about this? @HighCommander4 |
Please don't use extensions to decide which files are modules. It is perfectly valid to have a .cpp file be a module interface. Please just use a scanner. A lot of work went in to the standard wording to ensure that it is possible to very efficiently scan the whole codebase. I know some effort was started in clang-scan-deps for this. It would be great it clangd could reuse and possibly improve this code when doing its scan. |
Yeah, it is better and more clear. I've edited the post. |
After taking some times to look at the code, I have some lower-level ideas: It looks possible to support C++20 modules in
In my mind, it looks like it is workable. The version is not locked and it is possible to work with GCC and MSVC. Also there are some problems in the compiler side. e.g., the size of BMI generated by clang is too big now. Maybe this may break the cache. But I feel it OK as the first step. Also There are some higher-level issues: |
My perspective as a user is that yes, this level of support for modules would already be useful and valuable. I will defer to @sam-mccall for a maintainer's perspective of what would be accepted as a contribution. (I would imagine that it's fine to have various limitations especially if the support is behind an option/flag.) A few thoughts/questions:
|
Yes, when the compilation database is loaded, it calls this function with all of the entries in the database, which will call |
I am not pretty sure that I've understood the idea of the indexer. My current imagination is to let the TUScheduler to hold the ModulesDependencyGraph. And when Now it is OK roughly for the a-b test case in llvm/llvm-project#58723. I feel like the there is only one TUScheduler in one project (is this true?) So the BMIs won't be rebuilt every time we opened a new file. But I feel I missed some points since I didn't touch the indexer at all. I initially want to work in BackgroundIndex but the log shows it is not reached after I open a new project even if I add |
I believe so.
Note, I was also thinking of persisting the BMIs between clangd invocations.
The indexer also triggers AST builds (for every file in the project that isn't already indexed), that happens around here. That AST build will encounter |
It makes sense. I feel like it is OK
My thought is that it works as long as we rewrote the command line to refer to the new BMIs we built. |
I think that would work as part of the solution. Another part of the solution would be to ensure that the indexing of source files only starts after the BMIs have been built. Presumably, something should proactively drive the building of BMIs (not just when you open a source file that depends on it), so that the indexing of source files eventually gets unblocked. Maybe it makes sense to think of building the BMIs as the "first stage" of indexing, and indexing the source files as the "second stage"? |
Do you mean only indexing the source files after we built all the BMIs or after we finish the first stage (since it is possible that some BMI is not buildable)? Sounds like a good idea. It can ease the implementation. |
A few random thoughts... (Full disclosure: I did some brainstorming about this a few months ago and couldn't come up with anything good, hopefully y'all are more creative & understand modules better!)
|
(note: I know nothing about clangd's internals so this comment is based on guesses based on observable behavior. Hopefully it isn't complete nonsense and if needed someone can translate into the correct terminology) I assume clangd must already have an implementation of a proto-buildsystem internally. For example, it can schedule indexing tasks with a controlled amount of parallelism and trigger reindexing of downstream TUs when a header-file changes. Ideally it also notices when files change outside the editor (eg git) or when the editor is closed and is able to do incremental reindexing where needed (although I've experienced issues around this in the past, less so recently). I think part of the solution is to double-down on having an integrated build system. Either continue improving the integrated one, or switch to something like llbuild. The key thing that I don't think was required before was task dependencies. While you did have source/header file -> TU indexer job dependencies, they merely triggered rerunning the task. Now you will need some indexer tasks to have an input dependency on the BMI which will be generated by another task. And it is important to rerun the downstream indexers whenever an upstream BMI is meaningfully changed (defining "meaningfully" can be tricky; a first pass can just treat any content difference as meaningful, but I suspect that can be improved on). Additionally you cannot begin doing (much of) anything prior to scanning to determine module names and import requirements. And of course, when a file changes, it must be rescanned because it may have added an import, or changed which module/partition it declares itself to be part of. Files can also go from not generating a BMI to generating one eg if you add a I don't know how easy or hard it would be to fit the concept of task dependencies into the existing proto-buildsystem, but at least from an outsider's perspective, that seems like the best (of not only) option. Or maybe I've just spent too much time working on buildsystems, so they feel like the solution to every problem now... 😄 |
(sorry, hit the wrong button)
The "lite mode" ASTs (skip-function-bodies etc) may not be suitable for indexing, and vice versa (even opportunistically: full ASTs are much larger => slower to load). I believe we can't mix the two due to diamond dependency problems. They may also have conflicting requirements for how to deal with staleness. If these can be solved, it would be fantastic.
A very simple one for background indexing (no dependencies, as you note). However very deliberately no core functionality depends on this: the index is entirely optional and all features work (though degraded) if it's stale. If we're going to have a build system we block on (and modules ~requires it) then it needs to be something new. It's going to be responsible for the critical path to user-facing latency, as opposed to background index's "process an infinite amount of non-critical batch work". |
I'm not sure how bad it really is. A lot of work went into putting restrictions into the C++20 specification to ensure that it will be possible to write a very fast scanner. It should be possible to scan at or near disk speeds on a single core (maybe not as fast as a high-end NVMe, but certainly several hundred MB/s per core should be achievable). So I think this should scale up to any meaningful definition of "project". This still won't scall up to a 1TB mega-monorepo where you never check out the whole repo at once, and certainly never build it all locally. So you will need an alternative for them (luckily most of them already are willing to invest in custom tooling), but scanning all project files should still be fine for the vast majority of clangd users. I think (admittedly, without evidence yet) a reasonable rule of thumb would be that anything small enough to have a single compile_commands.json file (which implies parsing a single json file with the full command line for every TU) should be able to scan all files within it. @Bigcheese might be able to comment more authoritatively on the performance of clang-scan-deps specifically. That said, you don't have to eagerly do a full scan on everything. You can lazily determine the import DAG while indexing or loading the AST of the file being edited. The only thing you really need to know upfront is the mapping from module/partition name -> source file. While not 100% spec compliant[1], it is probably acceptable to do a "preflight scan" and search all files with the regex [1] Ideally line splicing wouldn't be legal inside of a keyword or directive. But I think anyone who splits
I think "~requires" is a good way to put it. The standard is very carefully worded to avoid placing restrictions on the implementation details. So a build system isn't technically required, and neither are BMIs. In theory, an implementation could parse each TU by simply starting at the top of the file and handling But having a build system (shorthand for any DAG-walking parallel task runner and incremental re-runner, not restricted to the traditional concept of a build system) and building some sort of BMI (a reusable cache of a module (unit)'s interface) is definitely what the design of C++ modules is optimized for. Hopefully it will result in significantly faster (re)indexing, and open file loading/parsing. Once you have that in place, about the only time I could imagine parsing modules directly from source being useful is when you cold-open a project and want to minimize the latency to first parse of a single open file. But even then, I'm not sure it will actually have lower latency than telling the build system to pause what it is doing and put all resources into building the BMIs for the transitive deps of the current file. Consider that it will be much simpler to parallelize the parsing of independent nodes in the DAG if they are separate tasks, so given enough cores (and core counts are more likely to go up than down over time!) your latency will be proportional to the depth of the DAG from your source file rather than its total size. And this becomes even more of a win if there are multiple open files (eg reopening a project with the same editor views as last time, or just restarting clangd) since they can share the benefit of parsing common deps only once.
I'm not a clang dev, but from hallway coverversions my understanding is that all compilers are planning to have as close [edit: as possible] to an O(1) loading cost for BMIs. So the cost of adding an import to a file should be as close to zero as possible, and you will only have to pay for loading the parts of the BMI that you actually use, even if the BMI is huge. I think this will involve both carefully designed on-disk data structures and using mmap to load the BMI and use those data structures in-place. So I wouldn't worry too much about this, at least not yet. Of course I could see a concern about the cost to generate the full vs partial BMI contributing to file-open latency, but that is a separate issue. While it may make sense to generate a partial BMI if a full one isn't available yet, it probably isn't worth generating a partial one to save loading time when a full BMI is ready to use. Either way, none of that is on the critical engineering path to getting minimally working module support into clangd, so it may be better to start with only full BMIs and add partials as a possible optimization later. I assume anyone using modules would prefer to get it working sooner, even if it is slower than ideal. |
I feel like the second point is good. (bazel know nothing about C++20 Named modules now). And it should be good to implement 1 at first.
I mean modules whose source codes are not in the tree of the project. For example, we'll have
My thought is that clangd won't need to consume such modules. It just fallbacks to the compiler to consume them. And it's the responsibility of users to make sure such modules are usable. I am not opposite to build modules by clangd as much as possible. I just feel it is not easy to find the corresponding source codes out of tree.
C++20 Named modules are significantly different from clang header modules and header units. Since named modules is a TU by itself but clang header modules and header units have header's semantics. So I image it may be OK to treat them as headers for clangd. Or at least I think it is OK to consider C++20 Named Modules at first. Since it should be the most complex one.
All of the three are required by C++20 named modules.
Yeah, now we get the consensus to scan the whole project at the first.
I am not sure if it will be really faster. Let's re-see this after we realized the full-scanning is bottle-neck.
Clang doesn't support this mode now. It looks like there are some extra works to implement it especially to remain the modules semantics. In my brain I feel like it is farer path.
Currently, it is not strictly
Strongly agreed : ) |
IIRC, we decided that line splicing is not allowed on
All known (to me) clang module deployments assume a highly consistent build environment. General C++ cannot assume that and must handle the "build a BMI per usage site" worst-case (collapsing compatible ones down is possible, but non-trivial to detect).
The "at cmake time" part is highly unlikely to be possible (in general). The [1] Consider where such a "build a complete |
My limited understanding of BMIs is that they're similar to PCHs (which is also what our preambles use), whose deserialization is done more or less on demand (e.g. when iterating all
While so far we have been discussing an "MVP" where clangd builds all the modules itself, in the longer term I'm hoping that we will be able to let users opt into a mode of operation where BMIs are shared with the project's build (which would require the build compiler to be clang and clangd to be version-locked to it, and possibly some other restrictions such as the user not using |
Update: I sent a patch (https://reviews.llvm.org/D153114) for review now. See the review page for details. It is welcome for all people here to give some comment. |
Update: https://reviews.llvm.org/D153114 is abandoned but the review opinions are pretty helpful. See llvm/llvm-project#66462 for the newest progress. |
Alternatives to https://reviews.llvm.org/D153114. Try to address clangd/clangd#1293. See the links for design ideas. We want to have some initial support in clang18. This is the initial support for C++20 Modules in clangd. As suggested by sammccall in https://reviews.llvm.org/D153114, we should minimize the scope of the initial patch to make it easier to review and understand so that every one are in the same page: > Don't attempt any cross-file or cross-version coordination: i.e. don't > try to reuse BMIs between different files, don't try to reuse BMIs > between (preamble) reparses of the same file, don't try to persist the > module graph. Instead, when building a preamble, synchronously scan > for the module graph, build the required PCMs on the single preamble > thread with filenames private to that preamble, and then proceed to > build the preamble. And this patch reflects the above opinions.
Alternatives to https://reviews.llvm.org/D153114. Try to address clangd/clangd#1293. See the links for design ideas. We want to have some initial support in clang18. This is the initial support for C++20 Modules in clangd. As suggested by sammccall in https://reviews.llvm.org/D153114, we should minimize the scope of the initial patch to make it easier to review and understand so that every one are in the same page: > Don't attempt any cross-file or cross-version coordination: i.e. don't > try to reuse BMIs between different files, don't try to reuse BMIs > between (preamble) reparses of the same file, don't try to persist the > module graph. Instead, when building a preamble, synchronously scan > for the module graph, build the required PCMs on the single preamble > thread with filenames private to that preamble, and then proceed to > build the preamble. And this patch reflects the above opinions.
Alternatives to https://reviews.llvm.org/D153114. Try to address clangd/clangd#1293. See the links for design ideas. We want to have some initial support in clang18. This is the initial support for C++20 Modules in clangd. As suggested by sammccall in https://reviews.llvm.org/D153114, we should minimize the scope of the initial patch to make it easier to review and understand so that every one are in the same page: > Don't attempt any cross-file or cross-version coordination: i.e. don't > try to reuse BMIs between different files, don't try to reuse BMIs > between (preamble) reparses of the same file, don't try to persist the > module graph. Instead, when building a preamble, synchronously scan > for the module graph, build the required PCMs on the single preamble > thread with filenames private to that preamble, and then proceed to > build the preamble. And this patch reflects the above opinions.
#66462) Alternatives to https://reviews.llvm.org/D153114. Try to address clangd/clangd#1293. See the links for design ideas and the consensus so far. We want to have some initial support in clang18. This is the initial support for C++20 Modules in clangd. As suggested by sammccall in https://reviews.llvm.org/D153114, we should minimize the scope of the initial patch to make it easier to review and understand so that every one are in the same page: > Don't attempt any cross-file or cross-version coordination: i.e. don't > try to reuse BMIs between different files, don't try to reuse BMIs > between (preamble) reparses of the same file, don't try to persist the > module graph. Instead, when building a preamble, synchronously scan > for the module graph, build the required PCMs on the single preamble > thread with filenames private to that preamble, and then proceed to > build the preamble. This patch reflects the above opinions. # Testing in real-world project I tested this with a modularized library: https://github.com/alibaba/async_simple/tree/CXX20Modules. This library has 3 modules (async_simple, std and asio) and 65 module units. (Note that a module consists of multiple module units). Both `std` module and `asio` module have 100k+ lines of code (maybe more, I didn't count). And async_simple itself has 8k lines of code. This is the scale of the project. The result shows that it works pretty well, ..., well, except I need to wait roughly 10s after opening/editing any file. And this falls in our expectations. We know it is hard to make it perfect in the first move. # What this patch does in detail - Introduced an option `--experimental-modules-support` for the support for C++20 Modules. So that no matter how bad this is, it wouldn't affect current users. Following off the page, we'll assume the option is enabled. - Introduced two classes `ModuleFilesInfo` and `ModuleDependencyScanner`. Now `ModuleDependencyScanner` is only used by `ModuleFilesInfo`. - The class `ModuleFilesInfo` records the built module files for specific single source file. The module files can only be built by the static member function `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`. - The class `PreambleData` adds a new member variable with type `ModuleFilesInfo`. This refers to the needed module files for the current file. It means the module files info is part of the preamble, which is suggested in the first patch too. - In `isPreambleCompatible()`, we add a call to `ModuleFilesInfo::CanReuse()` to check if the built module files are still up to date. - When we build the AST for a source file, we will load the built module files from ModuleFilesInfo. # What we need to do next Let's split the TODOs into clang part and clangd part to make things more clear. The TODOs in the clangd part include: 1. Enable reusing module files across source files. The may require us to bring a ModulesManager like thing which need to handle `scheduling`, `the possibility of BMI version conflicts` and `various events that can invalidate the module graph`. 2. Get a more efficient method to get the `<module-name> -> <module-unit-source>` map. Currently we always scan the whole project during `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`. This is clearly inefficient even if the scanning process is pretty fast. I think the potential solutions include: - Make a global scanner to monitor the state of every source file like I did in the first patch. The pain point is that we need to take care of the data races. - Ask the build systems to provide the map just like we ask them to provide the compilation database. 3. Persist the module files. So that we can reuse module files across clangd invocations or even across clangd instances. TODOs in the clang part include: 1. Clang should offer an option/mode to skip writing/reading the bodies of the functions. Or even if we can requrie the parser to skip parsing the function bodies. And it looks like we can say the support for C++20 Modules is initially workable after we made (1) and (2) (or even without (2)).
llvm#66462) Alternatives to https://reviews.llvm.org/D153114. Try to address clangd/clangd#1293. See the links for design ideas and the consensus so far. We want to have some initial support in clang18. This is the initial support for C++20 Modules in clangd. As suggested by sammccall in https://reviews.llvm.org/D153114, we should minimize the scope of the initial patch to make it easier to review and understand so that every one are in the same page: > Don't attempt any cross-file or cross-version coordination: i.e. don't > try to reuse BMIs between different files, don't try to reuse BMIs > between (preamble) reparses of the same file, don't try to persist the > module graph. Instead, when building a preamble, synchronously scan > for the module graph, build the required PCMs on the single preamble > thread with filenames private to that preamble, and then proceed to > build the preamble. This patch reflects the above opinions. # Testing in real-world project I tested this with a modularized library: https://github.com/alibaba/async_simple/tree/CXX20Modules. This library has 3 modules (async_simple, std and asio) and 65 module units. (Note that a module consists of multiple module units). Both `std` module and `asio` module have 100k+ lines of code (maybe more, I didn't count). And async_simple itself has 8k lines of code. This is the scale of the project. The result shows that it works pretty well, ..., well, except I need to wait roughly 10s after opening/editing any file. And this falls in our expectations. We know it is hard to make it perfect in the first move. # What this patch does in detail - Introduced an option `--experimental-modules-support` for the support for C++20 Modules. So that no matter how bad this is, it wouldn't affect current users. Following off the page, we'll assume the option is enabled. - Introduced two classes `ModuleFilesInfo` and `ModuleDependencyScanner`. Now `ModuleDependencyScanner` is only used by `ModuleFilesInfo`. - The class `ModuleFilesInfo` records the built module files for specific single source file. The module files can only be built by the static member function `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`. - The class `PreambleData` adds a new member variable with type `ModuleFilesInfo`. This refers to the needed module files for the current file. It means the module files info is part of the preamble, which is suggested in the first patch too. - In `isPreambleCompatible()`, we add a call to `ModuleFilesInfo::CanReuse()` to check if the built module files are still up to date. - When we build the AST for a source file, we will load the built module files from ModuleFilesInfo. # What we need to do next Let's split the TODOs into clang part and clangd part to make things more clear. The TODOs in the clangd part include: 1. Enable reusing module files across source files. The may require us to bring a ModulesManager like thing which need to handle `scheduling`, `the possibility of BMI version conflicts` and `various events that can invalidate the module graph`. 2. Get a more efficient method to get the `<module-name> -> <module-unit-source>` map. Currently we always scan the whole project during `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`. This is clearly inefficient even if the scanning process is pretty fast. I think the potential solutions include: - Make a global scanner to monitor the state of every source file like I did in the first patch. The pain point is that we need to take care of the data races. - Ask the build systems to provide the map just like we ask them to provide the compilation database. 3. Persist the module files. So that we can reuse module files across clangd invocations or even across clangd instances. TODOs in the clang part include: 1. Clang should offer an option/mode to skip writing/reading the bodies of the functions. Or even if we can requrie the parser to skip parsing the function bodies. And it looks like we can say the support for C++20 Modules is initially workable after we made (1) and (2) (or even without (2)).
llvm#66462) Alternatives to https://reviews.llvm.org/D153114. Try to address clangd/clangd#1293. See the links for design ideas and the consensus so far. We want to have some initial support in clang18. This is the initial support for C++20 Modules in clangd. As suggested by sammccall in https://reviews.llvm.org/D153114, we should minimize the scope of the initial patch to make it easier to review and understand so that every one are in the same page: > Don't attempt any cross-file or cross-version coordination: i.e. don't > try to reuse BMIs between different files, don't try to reuse BMIs > between (preamble) reparses of the same file, don't try to persist the > module graph. Instead, when building a preamble, synchronously scan > for the module graph, build the required PCMs on the single preamble > thread with filenames private to that preamble, and then proceed to > build the preamble. This patch reflects the above opinions. # Testing in real-world project I tested this with a modularized library: https://github.com/alibaba/async_simple/tree/CXX20Modules. This library has 3 modules (async_simple, std and asio) and 65 module units. (Note that a module consists of multiple module units). Both `std` module and `asio` module have 100k+ lines of code (maybe more, I didn't count). And async_simple itself has 8k lines of code. This is the scale of the project. The result shows that it works pretty well, ..., well, except I need to wait roughly 10s after opening/editing any file. And this falls in our expectations. We know it is hard to make it perfect in the first move. # What this patch does in detail - Introduced an option `--experimental-modules-support` for the support for C++20 Modules. So that no matter how bad this is, it wouldn't affect current users. Following off the page, we'll assume the option is enabled. - Introduced two classes `ModuleFilesInfo` and `ModuleDependencyScanner`. Now `ModuleDependencyScanner` is only used by `ModuleFilesInfo`. - The class `ModuleFilesInfo` records the built module files for specific single source file. The module files can only be built by the static member function `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`. - The class `PreambleData` adds a new member variable with type `ModuleFilesInfo`. This refers to the needed module files for the current file. It means the module files info is part of the preamble, which is suggested in the first patch too. - In `isPreambleCompatible()`, we add a call to `ModuleFilesInfo::CanReuse()` to check if the built module files are still up to date. - When we build the AST for a source file, we will load the built module files from ModuleFilesInfo. # What we need to do next Let's split the TODOs into clang part and clangd part to make things more clear. The TODOs in the clangd part include: 1. Enable reusing module files across source files. The may require us to bring a ModulesManager like thing which need to handle `scheduling`, `the possibility of BMI version conflicts` and `various events that can invalidate the module graph`. 2. Get a more efficient method to get the `<module-name> -> <module-unit-source>` map. Currently we always scan the whole project during `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`. This is clearly inefficient even if the scanning process is pretty fast. I think the potential solutions include: - Make a global scanner to monitor the state of every source file like I did in the first patch. The pain point is that we need to take care of the data races. - Ask the build systems to provide the map just like we ask them to provide the compilation database. 3. Persist the module files. So that we can reuse module files across clangd invocations or even across clangd instances. TODOs in the clang part include: 1. Clang should offer an option/mode to skip writing/reading the bodies of the functions. Or even if we can requrie the parser to skip parsing the function bodies. And it looks like we can say the support for C++20 Modules is initially workable after we made (1) and (2) (or even without (2)).
#66462) Summary: Alternatives to https://reviews.llvm.org/D153114. Try to address clangd/clangd#1293. See the links for design ideas and the consensus so far. We want to have some initial support in clang18. This is the initial support for C++20 Modules in clangd. As suggested by sammccall in https://reviews.llvm.org/D153114, we should minimize the scope of the initial patch to make it easier to review and understand so that every one are in the same page: > Don't attempt any cross-file or cross-version coordination: i.e. don't > try to reuse BMIs between different files, don't try to reuse BMIs > between (preamble) reparses of the same file, don't try to persist the > module graph. Instead, when building a preamble, synchronously scan > for the module graph, build the required PCMs on the single preamble > thread with filenames private to that preamble, and then proceed to > build the preamble. This patch reflects the above opinions. # Testing in real-world project I tested this with a modularized library: https://github.com/alibaba/async_simple/tree/CXX20Modules. This library has 3 modules (async_simple, std and asio) and 65 module units. (Note that a module consists of multiple module units). Both `std` module and `asio` module have 100k+ lines of code (maybe more, I didn't count). And async_simple itself has 8k lines of code. This is the scale of the project. The result shows that it works pretty well, ..., well, except I need to wait roughly 10s after opening/editing any file. And this falls in our expectations. We know it is hard to make it perfect in the first move. # What this patch does in detail - Introduced an option `--experimental-modules-support` for the support for C++20 Modules. So that no matter how bad this is, it wouldn't affect current users. Following off the page, we'll assume the option is enabled. - Introduced two classes `ModuleFilesInfo` and `ModuleDependencyScanner`. Now `ModuleDependencyScanner` is only used by `ModuleFilesInfo`. - The class `ModuleFilesInfo` records the built module files for specific single source file. The module files can only be built by the static member function `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`. - The class `PreambleData` adds a new member variable with type `ModuleFilesInfo`. This refers to the needed module files for the current file. It means the module files info is part of the preamble, which is suggested in the first patch too. - In `isPreambleCompatible()`, we add a call to `ModuleFilesInfo::CanReuse()` to check if the built module files are still up to date. - When we build the AST for a source file, we will load the built module files from ModuleFilesInfo. # What we need to do next Let's split the TODOs into clang part and clangd part to make things more clear. The TODOs in the clangd part include: 1. Enable reusing module files across source files. The may require us to bring a ModulesManager like thing which need to handle `scheduling`, `the possibility of BMI version conflicts` and `various events that can invalidate the module graph`. 2. Get a more efficient method to get the `<module-name> -> <module-unit-source>` map. Currently we always scan the whole project during `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`. This is clearly inefficient even if the scanning process is pretty fast. I think the potential solutions include: - Make a global scanner to monitor the state of every source file like I did in the first patch. The pain point is that we need to take care of the data races. - Ask the build systems to provide the map just like we ask them to provide the compilation database. 3. Persist the module files. So that we can reuse module files across clangd invocations or even across clangd instances. TODOs in the clang part include: 1. Clang should offer an option/mode to skip writing/reading the bodies of the functions. Or even if we can requrie the parser to skip parsing the function bodies. And it looks like we can say the support for C++20 Modules is initially workable after we made (1) and (2) (or even without (2)). Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250983
It would be nice if some of {C++20 modules, clang header modules} worked in clangd.
Essentially no configurations of this work today: occasionally things may happen to work but there are many reported problems and crashes. These are not just bugs that can be fixed, a design and new infrastructure is needed.
Because clang's module functionality can be enabled by setting driver flags, it is possible/easy for people to end up in this broken and unsupportable state, and wasting (their + our) time debugging it. We should consider failing early and explicitly instead.
AFAIK nobody has plans/availability to work on this soon.
Some issues that need to be addressed:
The text was updated successfully, but these errors were encountered: