Modules support #1293

sam-mccall · 2022-09-15T19:08:56Z

It would be nice if some of {C++20 modules, clang header modules} worked in clangd.

Essentially no configurations of this work today: occasionally things may happen to work but there are many reported problems and crashes. These are not just bugs that can be fixed, a design and new infrastructure is needed.

Because clang's module functionality can be enabled by setting driver flags, it is possible/easy for people to end up in this broken and unsupportable state, and wasting (their + our) time debugging it. We should consider failing early and explicitly instead.

AFAIK nobody has plans/availability to work on this soon.

Some issues that need to be addressed:

which subset of clang's many module features/flags are in scope?
explicit modules provided by build system: we can't assume these are compatible (may be a different compiler version). So may need to convert to implicitly building modules needed.
having AST workers/background threads spawning module builds independently and coordinating through the module cache is probably not good, we'll want some kind of worker pool and coordination
we presumably want to parse modules in the same lite mode as preambles (e.g. skip-function-bodies)
gcc modules flags are different than clang modules flags (and MSVC presumably different again), we need to parse them
how do modules fit into the dynamic index? what does a preamble with modules in it look like?
scattered places where we assume mainfile/preamble dichotomy, reason about #include structure, etc
actual handling of import/export AST nodes

HighCommander4 · 2022-09-16T00:06:50Z

Some other open bugs we have on file related to modules are #738, #797, #1105.

Trass3r · 2022-11-19T12:44:32Z

This includes -fmodules? What is the current state?
With an llvm LLVM_ENABLE_LIBCXX=ON LLVM_ENABLE_MODULES=ON workspace clangd doesn't crash or anything but produces lots of squiggles while the build itself runs through just fine.
Shouldn't clangd rather ignore those arguments if it isn't able to handle them properly? (-fmodules -fmodules-cache-path=... -Xclang -fmodules-local-submodule-visibility -gmodules)

mathstuf · 2022-12-01T14:29:57Z

(Note: speaking as a CMake developer here with a focus on how modules work with CMake; other build systems have similar challenges, but not necessarily the same ones)

gcc modules flags are different than clang modules flags (and MSVC presumably different again), we need to parse them

Note that gcc's module flags may indicate an active participant if -fmodule-mapper= points to a socket or process execution. That may not be emulatable (I believe build2 does this, but CMake uses static files written out during the build).

I suspect that for static analysis, a lot more communication between build systems and tooling to know what is going on. The problem is that with explicit modules even compile_commands.json is not useful until a build happens because:

for gcc, the -fmodule-mapper= argument is generated at build
for clang and msvc, modules are communicated via @rspfile arguments that don't exist until the build is performed

This isn't even counting the fact that the information presented there isn't directly usable as any referenced BMI files will be in the host compiler, not something the tool necessarily understands.

MK-Alias · 2022-12-02T14:10:47Z

In the age we live in, a programming language is only as good as it runtimes/frameworks and lastly editor support... I mean, who codes these days with just syntax highlighting? C++ modules are - from a user perspective - a very nice addition to the C++ syntax end really modernizes it. However, it is really being held back by editor support.

AFAIK: C++ modules are only supported by VisualStudio (using msvc ixx files) and CLion (using clang cppm files). If clangd would support it, it will significantly improve editor support and with that its usability/popularity.

As for compilers having there own flags/options/file-extensions for modules - which is regretful - just ignore all that and start with supporting the clangs own formats.

Although this is - currently - not a critical functionality, I personally am hoping to see modules supported by clangd soon!

sdykae · 2022-12-29T04:59:27Z

from https://gitlab.kitware.com/cmake/cmake/-/issues/18355

https://github.com/mathstuf/llvm-project/tree/p1689r5-csd
Waiting for stable llvm pr, and support to clangd to start development with vscode extension.

HighCommander4 · 2023-01-09T07:23:52Z

It may be somewhat instructive to look at CLion's approach to supporting modules, which first appeared in the latest 2022.3 release: https://blog.jetbrains.com/clion/2022/11/clion-2022-3-released/#cpp_modules

It's somewhat light on details but this part seems relevant:

CLion collects module information from all .ixx, .cppm, and .mxx files in the project and doesn’t rely on the specific approach you use to build modules. Whether it’s CMake with the Visual Studio C++ toolchain or CMake with Clang and specific compilation flags, CLion provides a similar experience

ChuanqiXu9 · 2023-05-31T06:28:09Z

CLion collects module information from all .ixx, .cppm, and .mxx files in the project and doesn’t rely on the specific approach you use to build modules.

This is not strictly correct since it is allowed to declare a module within a .cpp files. But it should be acceptable as the first step.

It may be somewhat instructive to look at CLion's approach to supporting modules, which first appeared in the latest 2022.3 release: https://blog.jetbrains.com/clion/2022/11/clion-2022-3-released/#cpp_modules

Yeah. Although it didn't mention a lot technical details, I guess we can't learn a lot from this blog. And it only mentioned about the second requirement given by @sam-mccall about the implicit buildings.

I guess it may be OK to be too good as the first step. I mean, maybe is it acceptable to ignore some requirements in the first step. Including:

Compatibility about GCC and MSVC.
Indexing things.
lite mode things (e.g. skip-function-bodies)
#include structures

And what is good as the first step? I think it may be good if:

We can collect the potential module files by clang-scan-deps in the project tree.
We can build these module files implicitly in order.
We can get the AST we just built.

How do you guys think about this? @HighCommander4

RedBeard0531 · 2023-05-31T06:37:42Z

We can collect the potential module files (by its suffixes) in the project tree.

Please don't use extensions to decide which files are modules. It is perfectly valid to have a .cpp file be a module interface. Please just use a scanner. A lot of work went in to the standard wording to ensure that it is possible to very efficiently scan the whole codebase. I know some effort was started in clang-scan-deps for this. It would be great it clangd could reuse and possibly improve this code when doing its scan.

ChuanqiXu9 · 2023-05-31T06:49:26Z

We can collect the potential module files (by its suffixes) in the project tree.

Please don't use extensions to decide which files are modules. It is perfectly valid to have a .cpp file be a module interface. Please just use a scanner. A lot of work went in to the standard wording to ensure that it is possible to very efficiently scan the whole codebase. I know some effort was started in clang-scan-deps for this. It would be great it clangd could reuse and possibly improve this code when doing its scan.

Yeah, it is better and more clear. I've edited the post.

ChuanqiXu9 · 2023-05-31T09:01:17Z

After taking some times to look at the code, I have some lower-level ideas: It looks possible to support C++20 modules in BackgroundIndex.

We need to introduce a new class to describe to dependencies information between named modules. And it looks OK to make the class only visible to BackgroundIndex . We can use clang-scan-deps to initialize new class.
Then in BackgroundIndex::changedFilesTask, if the file is a module interface unit, we will try to generate the module interface into a special cache maintained by clangd first before we index the file actually. It requires clangd to rewrite the command line to emit the BMI to other places. This is also helpful if we want to emit different BMI someday.
When we index source files, we need to rewrite the command line to lookup the BMI we generated at first. This is necessary to not breaking our normal build.

In my mind, it looks like it is workable. The version is not locked and it is possible to work with GCC and MSVC. Also there are some problems in the compiler side. e.g., the size of BMI generated by clang is too big now. Maybe this may break the cache. But I feel it OK as the first step.

Also There are some higher-level issues:
(1) When the clangd server initializes, will BackgroundIndex::changedFilesTask get called? From the name it looks like it won't be called. But I can't find other APIs.
(2) The idea above requires us always to scan all sources at first, which sounds a little bit scary since it only wastes time for non-module projects. The only method I got now is to add an option. This option is disabled by default now and it can be enabled by default someday if modules become popular.

HighCommander4 · 2023-06-02T05:44:05Z

And what is good as the first step? I think it may be good if:

We can collect the potential module files by clang-scan-deps in the project tree.

We can build these module files implicitly in order.

We can get the AST we just built.

How do you guys think about this? @HighCommander4

My perspective as a user is that yes, this level of support for modules would already be useful and valuable. I will defer to @sam-mccall for a maintainer's perspective of what would be accepted as a contribution. (I would imagine that it's fine to have various limitations especially if the support is behind an option/flag.)

A few thoughts/questions:

To make the performance acceptable, we'll want to persist BMIs rather than rebuilding them each time we encounter a module. (I think all previous comments assume this, I'm just making it explicit.)
To avoid stepping on the project's actual build (which may use a different compiler or a different version of clang), we should persist our BMIs in a clangd-specific place (such as .cache/clangd), like you said in the previous comment.
Is there an opportunity to reuse this persisted BMI cache between the indexer and the preambles of open files? When you open a source file that imports a module M, the BMI for M may or may not have already been built by the indexer. If it hasn't, I imagine the AST (preamble) build should build it on the spot (and then the indexer won't need to build it again when it gets around to indexing a source file that imports M), but if the indexer has already built M's BMI then I imagine the preamble build can just read it.

HighCommander4 · 2023-06-02T05:47:40Z

(1) When the clangd server initializes, will BackgroundIndex::changedFilesTask get called?

Yes, when the compilation database is loaded, it calls this function with all of the entries in the database, which will call changedFilesTask().

ChuanqiXu9 · 2023-06-02T06:14:03Z

Is there an opportunity to reuse this persisted BMI cache between the indexer and the preambles of open files?

I am not pretty sure that I've understood the idea of the indexer. My current imagination is to let the TUScheduler to hold the ModulesDependencyGraph. And when TUScheduler::update is called, check if the file has any modules dependencies, if it has such dependencies and the BMI doesn't exist, don't call ASTWorker::update now and assign a task to AsyncRunner to build the BMI and call the ASTWorker::update then.

Now it is OK roughly for the a-b test case in llvm/llvm-project#58723.

I feel like the there is only one TUScheduler in one project (is this true?) So the BMIs won't be rebuilt every time we opened a new file.

But I feel I missed some points since I didn't touch the indexer at all. I initially want to work in BackgroundIndex but the log shows it is not reached after I open a new project even if I add -background-index. Or is it OK to not make the indexer know about the modules?

HighCommander4 · 2023-06-02T06:22:18Z

I feel like the there is only one TUScheduler in one project (is this true?)

I believe so.

So the BMIs won't be rebuilt every time we opened a new file.

Note, I was also thinking of persisting the BMIs between clangd invocations.

is it OK to not make the indexer know about the modules?

The indexer also triggers AST builds (for every file in the project that isn't already indexed), that happens around here. That AST build will encounter imports, so I guess it should know about modules as well.

ChuanqiXu9 · 2023-06-02T06:29:14Z

Note, I was also thinking of persisting the BMIs between clangd invocations.

It makes sense. I feel like it is OK

The indexer also triggers AST builds (for every file in the project that isn't already indexed), that happens around here. That AST build will encounter imports, so I guess it should know about modules as well.

My thought is that it works as long as we rewrote the command line to refer to the new BMIs we built.

HighCommander4 · 2023-06-02T06:42:27Z

The indexer also triggers AST builds (for every file in the project that isn't already indexed), that happens around here. That AST build will encounter imports, so I guess it should know about modules as well.

My thought is that it works as long as we rewrote the command line to refer to the new BMIs we built.

I think that would work as part of the solution. Another part of the solution would be to ensure that the indexing of source files only starts after the BMIs have been built.

Presumably, something should proactively drive the building of BMIs (not just when you open a source file that depends on it), so that the indexing of source files eventually gets unblocked. Maybe it makes sense to think of building the BMIs as the "first stage" of indexing, and indexing the source files as the "second stage"?

ChuanqiXu9 · 2023-06-02T06:50:57Z

I think that would work as part of the solution. Another part of the solution would be to ensure that the indexing of source > files only starts after the BMIs have been built.

Presumably, something should proactively drive the building of BMIs (not just when you open a source file that depends on it), so that the indexing of source files eventually gets unblocked. Maybe it makes sense to think of building the BMIs as the "first stage" of indexing, and indexing the source files as the "second stage"?

Do you mean only indexing the source files after we built all the BMIs or after we finish the first stage (since it is possible that some BMI is not buildable)? Sounds like a good idea. It can ease the implementation.

sam-mccall · 2023-06-02T08:18:39Z

A few random thoughts...

(Full disclosure: I did some brainstorming about this a few months ago and couldn't come up with anything good, hopefully y'all are more creative & understand modules better!)

module building looks kinda like indexing, but i think focusing on background indexing is a red herring, and preamble building/indexing is where we should begin. We need to be able to open a file and start using it without any significant fraction of the project having been indexed. This is the single most critical latency path and to hit it today (building preamble) we take shortcuts which are unacceptable for background indexing (e.g. skip-function-bodies). The
the project may be so large that we cannot enumerate the modules, let alone touch every file. If we want to rely on clang-scan-deps for small projects, that makes sense, but we should abstract it somehow with a query interface that scales (like CompilationDatabase)
the big performance win with modules is the ability to reuse modules between invocations e.g. share the overlapping parts of preambles rather than starting from scratch. However editing headers will still force us to rebuild all dependent modules/preambles as modules are tightly coupled to their dependencies. (If you built a.pcm#1 against b.pcm#5 you can't reuse it with b.pcm#6).
we'll need some new abstractions, which are always hard to get right. TUScheduler is basically "full" in terms of complexity, we can't cram many more responsibilities in there without losing ability to reason about it. Obviously it will have to change, but the bulk of complexity should live somewhere else.
"modules" can be enabled in (too) many ways, and ultimately the design needs to accommodate as many as possible (there's no room for several implementations!). But we should be clear where we're starting (e.g. only fully-explicit command lines with -fmodule-file=foo=bar using c++20 syntax?) and roughly how other variants will fit in (can we translate implicit to explicit using clang-scan-deps? Do module maps make any difference?)

RedBeard0531 · 2023-06-02T08:20:28Z

(note: I know nothing about clangd's internals so this comment is based on guesses based on observable behavior. Hopefully it isn't complete nonsense and if needed someone can translate into the correct terminology)

I assume clangd must already have an implementation of a proto-buildsystem internally. For example, it can schedule indexing tasks with a controlled amount of parallelism and trigger reindexing of downstream TUs when a header-file changes. Ideally it also notices when files change outside the editor (eg git) or when the editor is closed and is able to do incremental reindexing where needed (although I've experienced issues around this in the past, less so recently).

I think part of the solution is to double-down on having an integrated build system. Either continue improving the integrated one, or switch to something like llbuild. The key thing that I don't think was required before was task dependencies. While you did have source/header file -> TU indexer job dependencies, they merely triggered rerunning the task. Now you will need some indexer tasks to have an input dependency on the BMI which will be generated by another task. And it is important to rerun the downstream indexers whenever an upstream BMI is meaningfully changed (defining "meaningfully" can be tricky; a first pass can just treat any content difference as meaningful, but I suspect that can be improved on).

Additionally you cannot begin doing (much of) anything prior to scanning to determine module names and import requirements. And of course, when a file changes, it must be rescanned because it may have added an import, or changed which module/partition it declares itself to be part of. Files can also go from not generating a BMI to generating one eg if you add a export module foo; line or change module blah; to module blah:blah;

I don't know how easy or hard it would be to fit the concept of task dependencies into the existing proto-buildsystem, but at least from an outsider's perspective, that seems like the best (of not only) option. Or maybe I've just spent too much time working on buildsystems, so they feel like the solution to every problem now... 😄

sam-mccall · 2023-06-02T08:32:02Z

(sorry, hit the wrong button)

Is there an opportunity to reuse this persisted BMI cache between the indexer and the preambles of open files?

The "lite mode" ASTs (skip-function-bodies etc) may not be suitable for indexing, and vice versa (even opportunistically: full ASTs are much larger => slower to load). I believe we can't mix the two due to diamond dependency problems. They may also have conflicting requirements for how to deal with staleness. If these can be solved, it would be fantastic.

I assume clangd must already have an implementation of a proto-buildsystem internally. For example, it can schedule indexing tasks

A very simple one for background indexing (no dependencies, as you note). However very deliberately no core functionality depends on this: the index is entirely optional and all features work (though degraded) if it's stale.

If we're going to have a build system we block on (and modules ~requires it) then it needs to be something new. It's going to be responsible for the critical path to user-facing latency, as opposed to background index's "process an infinite amount of non-critical batch work".

RedBeard0531 · 2023-06-02T10:15:49Z

the project may be so large that we cannot enumerate the modules, let alone touch every file.

So now I feel it is impossible to support modules well the scanning initially. Also I agree that it sounds bad. Any ideas?

I'm not sure how bad it really is. A lot of work went into putting restrictions into the C++20 specification to ensure that it will be possible to write a very fast scanner. It should be possible to scan at or near disk speeds on a single core (maybe not as fast as a high-end NVMe, but certainly several hundred MB/s per core should be achievable). So I think this should scale up to any meaningful definition of "project". This still won't scall up to a 1TB mega-monorepo where you never check out the whole repo at once, and certainly never build it all locally. So you will need an alternative for them (luckily most of them already are willing to invest in custom tooling), but scanning all project files should still be fine for the vast majority of clangd users. I think (admittedly, without evidence yet) a reasonable rule of thumb would be that anything small enough to have a single compile_commands.json file (which implies parsing a single json file with the full command line for every TU) should be able to scan all files within it. @Bigcheese might be able to comment more authoritatively on the performance of clang-scan-deps specifically.

That said, you don't have to eagerly do a full scan on everything. You can lazily determine the import DAG while indexing or loading the AST of the file being edited. The only thing you really need to know upfront is the mapping from module/partition name -> source file. While not 100% spec compliant[1], it is probably acceptable to do a "preflight scan" and search all files with the regex \bmodule\b and only do a full scan on them since they are the only ones that can define named modules (it is non-obvious, but this grammar rule means that module must appear in the source file itself, not only via an #include). Hopefully, in the (far) future the vast majority of C++ files will have module in them so this may not make sense forever, but since that is far from true today, you can save some time this way. Megaprojects, and possibly whatever the equivalent of /usr/include is for modules (still TBD), will need some sort of metadata (static files or server) to find the source for each named module.

[1] Ideally line splicing wouldn't be legal inside of a keyword or directive. But I think anyone who splits module across physical lines outside of a compiler stress test deserves what they get...

If we're going to have a build system we block on (and modules ~requires it) then it needs to be something new. It's going to be responsible for the critical path to user-facing latency, as opposed to background index's "process an infinite amount of non-critical batch work".

I think "~requires" is a good way to put it. The standard is very carefully worded to avoid placing restrictions on the implementation details. So a build system isn't technically required, and neither are BMIs. In theory, an implementation could parse each TU by simply starting at the top of the file and handling import similarly to #include by simply reading the source file directly, but in a special context to preserve the modular semantics. I don't know if clang supports such a mode, but I assume that the building blocks would be very useful since they would enable the equivalent of single-file -E -frewrite-includes roll-ups of TUs for bug reports and testing. Such a mode (if it exists) may be the shortest path to getting modules working in clangd, since it will at least "work" for both indexing and the open file AST, even if it is slower than it should be.

But having a build system (shorthand for any DAG-walking parallel task runner and incremental re-runner, not restricted to the traditional concept of a build system) and building some sort of BMI (a reusable cache of a module (unit)'s interface) is definitely what the design of C++ modules is optimized for. Hopefully it will result in significantly faster (re)indexing, and open file loading/parsing. Once you have that in place, about the only time I could imagine parsing modules directly from source being useful is when you cold-open a project and want to minimize the latency to first parse of a single open file. But even then, I'm not sure it will actually have lower latency than telling the build system to pause what it is doing and put all resources into building the BMIs for the transitive deps of the current file. Consider that it will be much simpler to parallelize the parsing of independent nodes in the DAG if they are separate tasks, so given enough cores (and core counts are more likely to go up than down over time!) your latency will be proportional to the depth of the DAG from your source file rather than its total size. And this becomes even more of a win if there are multiple open files (eg reopening a project with the same editor views as last time, or just restarting clangd) since they can share the benefit of parsing common deps only once.

full ASTs are much larger => slower to load

I'm not a clang dev, but from hallway coverversions my understanding is that all compilers are planning to have as close [edit: as possible] to an O(1) loading cost for BMIs. So the cost of adding an import to a file should be as close to zero as possible, and you will only have to pay for loading the parts of the BMI that you actually use, even if the BMI is huge. I think this will involve both carefully designed on-disk data structures and using mmap to load the BMI and use those data structures in-place. So I wouldn't worry too much about this, at least not yet. Of course I could see a concern about the cost to generate the full vs partial BMI contributing to file-open latency, but that is a separate issue. While it may make sense to generate a partial BMI if a full one isn't available yet, it probably isn't worth generating a partial one to save loading time when a full BMI is ready to use. Either way, none of that is on the critical engineering path to getting minimally working module support into clangd, so it may be better to start with only full BMIs and add partials as a possible optimization later. I assume anyone using modules would prefer to get it working sooner, even if it is slower than ideal.

ChuanqiXu9 · 2023-06-02T10:46:13Z

we can scan the whole project lexically

we can have the build system provide an artifact like "modules.json", generated at e.g. cmake time

we can query the build system at runtime (e.g. bazel knows module boundaries)

It's fine to fall back to #1, but we need a system that can be extended to #2 or #3.

I feel like the second point is good. (bazel know nothing about C++20 Named modules now). And it should be good to implement 1 at first.

I'm not sure exactly what you mean by "third-party module", but I don't see a way we can safely consume modules that clangd itself didn't build. (Version skew, among other things). So I think we'll need to be at least rewriting all module flags, but ideally programatically replacing all the module loading/caching/resolving interfaces.

I mean modules whose source codes are not in the tree of the project. For example, we'll have import std; in LLVM17 (in planning), and clearly the source code of std modules won't be part of the project. How can we find the source codes? Especially for other third-party modules.

but I don't see a way we can safely consume modules that clangd itself didn't build. (Version skew, among other things).

My thought is that clangd won't need to consume such modules. It just fallbacks to the compiler to consume them. And it's the responsibility of users to make sure such modules are usable. I am not opposite to build modules by clangd as much as possible. I just feel it is not easy to find the corresponding source codes out of tree.

Are you suggesting we only support C++20 named modules? Adding the complexity needed for modules in general but leaving out clang modules seems like a tough sell.
(FWIW my understanding is both Google and Meta are interested in having modules work in clangd and contributing to this, both are using clang modules rather than C++20 modules)

C++20 Named modules are significantly different from clang header modules and header units. Since named modules is a TU by itself but clang header modules and header units have header's semantics. So I image it may be OK to treat them as headers for clangd. Or at least I think it is OK to consider C++20 Named Modules at first. Since it should be the most complex one.

OK, my understanding is a bit hazy. Which of the following does clang's impl of C++20 modules require/forbid?

all imported modules have been built in a separate invocation, before the importing code is built

paths to all the imported modules are provided explicitly (rather than being probed for in some cache dir)

names of the imported modules are provided explicitly (rather than being retrieved from the PCM file)

All of the three are required by C++20 named modules.

I'm not sure how bad it really is. A lot of work went into putting restrictions into the C++20 specification to ensure that it will be possible to write a very fast scanner.

Yeah, now we get the consensus to scan the whole project at the first.

, it is probably acceptable to do a "preflight scan" and search all files with the regex \bmodule\b and only do a full scan on them since they are the only ones that can define named modules

I am not sure if it will be really faster. Let's re-see this after we realized the full-scanning is bottle-neck.

I don't know if clang supports such a mode, but I assume that the building blocks would be very useful since they would enable the equivalent of single-file -E -frewrite-includes roll-ups of TUs for bug reports and testing. Such a mode (if it exists) may be the shortest path to getting modules working in clangd, since it will at least "work" for both indexing and the open file AST, even if it is slower than it should be.

Clang doesn't support this mode now. It looks like there are some extra works to implement it especially to remain the modules semantics. In my brain I feel like it is farer path.

but from hallway coverversions my understanding is that all compilers are planning to have as close to an O(1) loading cost for BMIs.

Currently, it is not strictly O(1) in clang and I guess we won't never achieve that. But the cost is really low in my experience.

so it may be better to start with only full BMIs and add partials as a possible optimization later. I assume anyone using modules would prefer to get it working sooner, even if it is slower than ideal.

Strongly agreed : )

mathstuf · 2023-06-02T12:10:36Z

[1] Ideally line splicing wouldn't be legal inside of a keyword or directive. But I think anyone who splits module across physical lines outside of a compiler stress test deserves what they get...

IIRC, we decided that line splicing is not allowed on module lines (import or export). I don't know where that wording is in the standard though :/ .

Also, I believe clang modules are at least partly covered by C++20 as "header units". Whether or not we choose not to call them "modules", it seems like we should support codebases that use them.

All known (to me) clang module deployments assume a highly consistent build environment. General C++ cannot assume that and must handle the "build a BMI per usage site" worst-case (collapsing compatible ones down is possible, but non-trivial to detect).

we can have the build system provide an artifact like "modules.json", generated at e.g. cmake time

The "at cmake time" part is highly unlikely to be possible (in general). The @modulemap files in the compile_commands.json and any analogous modules.json files are build-time artifacts. Additionally, modules.json is only reliable when provided per-target anyways[1].

[1] Consider where such a "build a complete modules.json file" rule would exist. It would have to depend on every modules.json from each target, but when a target is rebuilt using ninja just_one_target, what would update this file reliably?

HighCommander4 · 2023-06-02T22:22:48Z

Is there an opportunity to reuse this persisted BMI cache between the indexer and the preambles of open files?

The "lite mode" ASTs (skip-function-bodies etc) may not be suitable for indexing, and vice versa (even opportunistically: full ASTs are much larger => slower to load).

My limited understanding of BMIs is that they're similar to PCHs (which is also what our preambles use), whose deserialization is done more or less on demand (e.g. when iterating all decls() of a NamespaceDecl). This makes me hopeful that a "full" BMI produced by the indexer can be consumed by a preamble build with fairly good performance (and in particular faster than rebuilding that BMI in "lite mode").

I don't see a way we can safely consume modules that clangd itself didn't build

While so far we have been discussing an "MVP" where clangd builds all the modules itself, in the longer term I'm hoping that we will be able to let users opt into a mode of operation where BMIs are shared with the project's build (which would require the build compiler to be clang and clangd to be version-locked to it, and possibly some other restrictions such as the user not using CompileFlags: Add: to add any clangd-specific flags that affect the BMI contents).

ChuanqiXu9 · 2023-06-16T10:08:09Z

Update: I sent a patch (https://reviews.llvm.org/D153114) for review now. See the review page for details. It is welcome for all people here to give some comment.

ChuanqiXu9 · 2023-09-15T07:09:52Z

Update: https://reviews.llvm.org/D153114 is abandoned but the review opinions are pretty helpful. See llvm/llvm-project#66462 for the newest progress.

Alternatives to https://reviews.llvm.org/D153114. Try to address clangd/clangd#1293. See the links for design ideas. We want to have some initial support in clang18. This is the initial support for C++20 Modules in clangd. As suggested by sammccall in https://reviews.llvm.org/D153114, we should minimize the scope of the initial patch to make it easier to review and understand so that every one are in the same page: > Don't attempt any cross-file or cross-version coordination: i.e. don't > try to reuse BMIs between different files, don't try to reuse BMIs > between (preamble) reparses of the same file, don't try to persist the > module graph. Instead, when building a preamble, synchronously scan > for the module graph, build the required PCMs on the single preamble > thread with filenames private to that preamble, and then proceed to > build the preamble. And this patch reflects the above opinions.

#66462) Alternatives to https://reviews.llvm.org/D153114. Try to address clangd/clangd#1293. See the links for design ideas and the consensus so far. We want to have some initial support in clang18. This is the initial support for C++20 Modules in clangd. As suggested by sammccall in https://reviews.llvm.org/D153114, we should minimize the scope of the initial patch to make it easier to review and understand so that every one are in the same page: > Don't attempt any cross-file or cross-version coordination: i.e. don't > try to reuse BMIs between different files, don't try to reuse BMIs > between (preamble) reparses of the same file, don't try to persist the > module graph. Instead, when building a preamble, synchronously scan > for the module graph, build the required PCMs on the single preamble > thread with filenames private to that preamble, and then proceed to > build the preamble. This patch reflects the above opinions. # Testing in real-world project I tested this with a modularized library: https://github.com/alibaba/async_simple/tree/CXX20Modules. This library has 3 modules (async_simple, std and asio) and 65 module units. (Note that a module consists of multiple module units). Both `std` module and `asio` module have 100k+ lines of code (maybe more, I didn't count). And async_simple itself has 8k lines of code. This is the scale of the project. The result shows that it works pretty well, ..., well, except I need to wait roughly 10s after opening/editing any file. And this falls in our expectations. We know it is hard to make it perfect in the first move. # What this patch does in detail - Introduced an option `--experimental-modules-support` for the support for C++20 Modules. So that no matter how bad this is, it wouldn't affect current users. Following off the page, we'll assume the option is enabled. - Introduced two classes `ModuleFilesInfo` and `ModuleDependencyScanner`. Now `ModuleDependencyScanner` is only used by `ModuleFilesInfo`. - The class `ModuleFilesInfo` records the built module files for specific single source file. The module files can only be built by the static member function `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`. - The class `PreambleData` adds a new member variable with type `ModuleFilesInfo`. This refers to the needed module files for the current file. It means the module files info is part of the preamble, which is suggested in the first patch too. - In `isPreambleCompatible()`, we add a call to `ModuleFilesInfo::CanReuse()` to check if the built module files are still up to date. - When we build the AST for a source file, we will load the built module files from ModuleFilesInfo. # What we need to do next Let's split the TODOs into clang part and clangd part to make things more clear. The TODOs in the clangd part include: 1. Enable reusing module files across source files. The may require us to bring a ModulesManager like thing which need to handle `scheduling`, `the possibility of BMI version conflicts` and `various events that can invalidate the module graph`. 2. Get a more efficient method to get the `<module-name> -> <module-unit-source>` map. Currently we always scan the whole project during `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`. This is clearly inefficient even if the scanning process is pretty fast. I think the potential solutions include: - Make a global scanner to monitor the state of every source file like I did in the first patch. The pain point is that we need to take care of the data races. - Ask the build systems to provide the map just like we ask them to provide the compilation database. 3. Persist the module files. So that we can reuse module files across clangd invocations or even across clangd instances. TODOs in the clang part include: 1. Clang should offer an option/mode to skip writing/reading the bodies of the functions. Or even if we can requrie the parser to skip parsing the function bodies. And it looks like we can say the support for C++20 Modules is initially workable after we made (1) and (2) (or even without (2)).

llvm#66462) Alternatives to https://reviews.llvm.org/D153114. Try to address clangd/clangd#1293. See the links for design ideas and the consensus so far. We want to have some initial support in clang18. This is the initial support for C++20 Modules in clangd. As suggested by sammccall in https://reviews.llvm.org/D153114, we should minimize the scope of the initial patch to make it easier to review and understand so that every one are in the same page: > Don't attempt any cross-file or cross-version coordination: i.e. don't > try to reuse BMIs between different files, don't try to reuse BMIs > between (preamble) reparses of the same file, don't try to persist the > module graph. Instead, when building a preamble, synchronously scan > for the module graph, build the required PCMs on the single preamble > thread with filenames private to that preamble, and then proceed to > build the preamble. This patch reflects the above opinions. # Testing in real-world project I tested this with a modularized library: https://github.com/alibaba/async_simple/tree/CXX20Modules. This library has 3 modules (async_simple, std and asio) and 65 module units. (Note that a module consists of multiple module units). Both `std` module and `asio` module have 100k+ lines of code (maybe more, I didn't count). And async_simple itself has 8k lines of code. This is the scale of the project. The result shows that it works pretty well, ..., well, except I need to wait roughly 10s after opening/editing any file. And this falls in our expectations. We know it is hard to make it perfect in the first move. # What this patch does in detail - Introduced an option `--experimental-modules-support` for the support for C++20 Modules. So that no matter how bad this is, it wouldn't affect current users. Following off the page, we'll assume the option is enabled. - Introduced two classes `ModuleFilesInfo` and `ModuleDependencyScanner`. Now `ModuleDependencyScanner` is only used by `ModuleFilesInfo`. - The class `ModuleFilesInfo` records the built module files for specific single source file. The module files can only be built by the static member function `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`. - The class `PreambleData` adds a new member variable with type `ModuleFilesInfo`. This refers to the needed module files for the current file. It means the module files info is part of the preamble, which is suggested in the first patch too. - In `isPreambleCompatible()`, we add a call to `ModuleFilesInfo::CanReuse()` to check if the built module files are still up to date. - When we build the AST for a source file, we will load the built module files from ModuleFilesInfo. # What we need to do next Let's split the TODOs into clang part and clangd part to make things more clear. The TODOs in the clangd part include: 1. Enable reusing module files across source files. The may require us to bring a ModulesManager like thing which need to handle `scheduling`, `the possibility of BMI version conflicts` and `various events that can invalidate the module graph`. 2. Get a more efficient method to get the `<module-name> -> <module-unit-source>` map. Currently we always scan the whole project during `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`. This is clearly inefficient even if the scanning process is pretty fast. I think the potential solutions include: - Make a global scanner to monitor the state of every source file like I did in the first patch. The pain point is that we need to take care of the data races. - Ask the build systems to provide the map just like we ask them to provide the compilation database. 3. Persist the module files. So that we can reuse module files across clangd invocations or even across clangd instances. TODOs in the clang part include: 1. Clang should offer an option/mode to skip writing/reading the bodies of the functions. Or even if we can requrie the parser to skip parsing the function bodies. And it looks like we can say the support for C++20 Modules is initially workable after we made (1) and (2) (or even without (2)).

#66462) Summary: Alternatives to https://reviews.llvm.org/D153114. Try to address clangd/clangd#1293. See the links for design ideas and the consensus so far. We want to have some initial support in clang18. This is the initial support for C++20 Modules in clangd. As suggested by sammccall in https://reviews.llvm.org/D153114, we should minimize the scope of the initial patch to make it easier to review and understand so that every one are in the same page: > Don't attempt any cross-file or cross-version coordination: i.e. don't > try to reuse BMIs between different files, don't try to reuse BMIs > between (preamble) reparses of the same file, don't try to persist the > module graph. Instead, when building a preamble, synchronously scan > for the module graph, build the required PCMs on the single preamble > thread with filenames private to that preamble, and then proceed to > build the preamble. This patch reflects the above opinions. # Testing in real-world project I tested this with a modularized library: https://github.com/alibaba/async_simple/tree/CXX20Modules. This library has 3 modules (async_simple, std and asio) and 65 module units. (Note that a module consists of multiple module units). Both `std` module and `asio` module have 100k+ lines of code (maybe more, I didn't count). And async_simple itself has 8k lines of code. This is the scale of the project. The result shows that it works pretty well, ..., well, except I need to wait roughly 10s after opening/editing any file. And this falls in our expectations. We know it is hard to make it perfect in the first move. # What this patch does in detail - Introduced an option `--experimental-modules-support` for the support for C++20 Modules. So that no matter how bad this is, it wouldn't affect current users. Following off the page, we'll assume the option is enabled. - Introduced two classes `ModuleFilesInfo` and `ModuleDependencyScanner`. Now `ModuleDependencyScanner` is only used by `ModuleFilesInfo`. - The class `ModuleFilesInfo` records the built module files for specific single source file. The module files can only be built by the static member function `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`. - The class `PreambleData` adds a new member variable with type `ModuleFilesInfo`. This refers to the needed module files for the current file. It means the module files info is part of the preamble, which is suggested in the first patch too. - In `isPreambleCompatible()`, we add a call to `ModuleFilesInfo::CanReuse()` to check if the built module files are still up to date. - When we build the AST for a source file, we will load the built module files from ModuleFilesInfo. # What we need to do next Let's split the TODOs into clang part and clangd part to make things more clear. The TODOs in the clangd part include: 1. Enable reusing module files across source files. The may require us to bring a ModulesManager like thing which need to handle `scheduling`, `the possibility of BMI version conflicts` and `various events that can invalidate the module graph`. 2. Get a more efficient method to get the `<module-name> -> <module-unit-source>` map. Currently we always scan the whole project during `ModuleFilesInfo::buildModuleFilesInfoFor(PathRef File, ...)`. This is clearly inefficient even if the scanning process is pretty fast. I think the potential solutions include: - Make a global scanner to monitor the state of every source file like I did in the first patch. The pain point is that we need to take care of the data races. - Ask the build systems to provide the map just like we ask them to provide the compilation database. 3. Persist the module files. So that we can reuse module files across clangd invocations or even across clangd instances. TODOs in the clang part include: 1. Clang should offer an option/mode to skip writing/reading the bodies of the functions. Or even if we can requrie the parser to skip parsing the function bodies. And it looks like we can say the support for C++20 Modules is initially workable after we made (1) and (2) (or even without (2)). Test Plan: Reviewers: Subscribers: Tasks: Tags: Differential Revision: https://phabricator.intern.facebook.com/D60250983

sam-mccall added the enhancement New feature or request label Sep 15, 2022

HighCommander4 mentioned this issue Nov 3, 2022

[clangd] [C++20] [Modules] Code Intelligence is broken for named modules llvm/llvm-project#58723

Closed

HighCommander4 mentioned this issue Dec 29, 2022

C++ 20 Modules Support #1468

Closed

ChuanqiXu9 mentioned this issue Jun 2, 2023

[c++20][Modules] incorrect error regarding ambiguous specialization llvm/llvm-project#62943

Closed

sam-mccall closed this as completed Jun 2, 2023

sam-mccall reopened this Jun 2, 2023

HighCommander4 mentioned this issue Jun 16, 2023

Crash with C++20 modules and typeid #1674

Open

HighCommander4 mentioned this issue Jul 3, 2023

[Bug] It takes too long time to build preamble when open a file. #1690

Open

HighCommander4 mentioned this issue Aug 8, 2023

clangd does not recognize c++20 modules #1724

Open

ChuanqiXu9 mentioned this issue Sep 15, 2023

[clangd] [C++20] [Modules] Introduce initial support for C++20 Modules llvm/llvm-project#66462

Merged

HighCommander4 mentioned this issue Nov 28, 2023

Using PrecompiledPreamble in module compilation results in no predefines #1838

Open

HighCommander4 mentioned this issue Jan 3, 2024

Clangd reports false positive ODR violation when using C++-20 module #1892

Open

zyn0217 mentioned this issue Feb 4, 2024

clangd crashes when including some standard library headers in the global module fragment llvm/llvm-project#80570

Closed

ChuanqiXu9 mentioned this issue Feb 18, 2024

[clangd] Fix C++20 modules crash llvm/llvm-project#81919

Closed

HighCommander4 mentioned this issue Mar 21, 2024

Crash when using both #include and c++20 modules #1981

Open

cr0sh mentioned this issue Apr 7, 2024

Recommended dev environment for C++ modules snu-sf-class/swpp202401#21

Closed

HighCommander4 mentioned this issue May 14, 2024

Code highlighting for C++23 with clang-cl on Windows #1850

Open

HighCommander4 mentioned this issue May 27, 2024

clangd crash seemingly in format strings llvm/llvm-project#93279

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modules support #1293

Modules support #1293

sam-mccall commented Sep 15, 2022 •

edited

Loading

HighCommander4 commented Sep 16, 2022

Trass3r commented Nov 19, 2022 •

edited

Loading

mathstuf commented Dec 1, 2022

MK-Alias commented Dec 2, 2022

sdykae commented Dec 29, 2022

HighCommander4 commented Jan 9, 2023

ChuanqiXu9 commented May 31, 2023 •

edited

Loading

RedBeard0531 commented May 31, 2023

ChuanqiXu9 commented May 31, 2023

ChuanqiXu9 commented May 31, 2023

HighCommander4 commented Jun 2, 2023

HighCommander4 commented Jun 2, 2023

ChuanqiXu9 commented Jun 2, 2023

HighCommander4 commented Jun 2, 2023 •

edited

Loading

ChuanqiXu9 commented Jun 2, 2023

HighCommander4 commented Jun 2, 2023

ChuanqiXu9 commented Jun 2, 2023

sam-mccall commented Jun 2, 2023

RedBeard0531 commented Jun 2, 2023 •

edited

Loading

sam-mccall commented Jun 2, 2023

RedBeard0531 commented Jun 2, 2023 •

edited

Loading

ChuanqiXu9 commented Jun 2, 2023

mathstuf commented Jun 2, 2023

HighCommander4 commented Jun 2, 2023 •

edited

Loading

ChuanqiXu9 commented Jun 16, 2023 •

edited

Loading

ChuanqiXu9 commented Sep 15, 2023

Modules support #1293

Modules support #1293

Comments

sam-mccall commented Sep 15, 2022 • edited Loading

HighCommander4 commented Sep 16, 2022

Trass3r commented Nov 19, 2022 • edited Loading

mathstuf commented Dec 1, 2022

MK-Alias commented Dec 2, 2022

sdykae commented Dec 29, 2022

HighCommander4 commented Jan 9, 2023

ChuanqiXu9 commented May 31, 2023 • edited Loading

RedBeard0531 commented May 31, 2023

ChuanqiXu9 commented May 31, 2023

ChuanqiXu9 commented May 31, 2023

HighCommander4 commented Jun 2, 2023

HighCommander4 commented Jun 2, 2023

ChuanqiXu9 commented Jun 2, 2023

HighCommander4 commented Jun 2, 2023 • edited Loading

ChuanqiXu9 commented Jun 2, 2023

HighCommander4 commented Jun 2, 2023

ChuanqiXu9 commented Jun 2, 2023

sam-mccall commented Jun 2, 2023

RedBeard0531 commented Jun 2, 2023 • edited Loading

sam-mccall commented Jun 2, 2023

RedBeard0531 commented Jun 2, 2023 • edited Loading

ChuanqiXu9 commented Jun 2, 2023

mathstuf commented Jun 2, 2023

HighCommander4 commented Jun 2, 2023 • edited Loading

ChuanqiXu9 commented Jun 16, 2023 • edited Loading

ChuanqiXu9 commented Sep 15, 2023

sam-mccall commented Sep 15, 2022 •

edited

Loading

Trass3r commented Nov 19, 2022 •

edited

Loading

ChuanqiXu9 commented May 31, 2023 •

edited

Loading

HighCommander4 commented Jun 2, 2023 •

edited

Loading

RedBeard0531 commented Jun 2, 2023 •

edited

Loading

RedBeard0531 commented Jun 2, 2023 •

edited

Loading

HighCommander4 commented Jun 2, 2023 •

edited

Loading

ChuanqiXu9 commented Jun 16, 2023 •

edited

Loading