-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support C++20 Modules #19940
base: master
Are you sure you want to change the base?
support C++20 Modules #19940
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
This PR already got a lot of attention at Google in the group of C++ toolchain maintainers / experts. There’s a desire to have it, but no concrete/incompatible plans yet. The design would need some changes so that it’s compatible and supports Google well. (Think of easier maintenance in the future) I’m not an expert in C++, but I will start the discussion internally and come back with possible requirements/changes when we figure out what they are. |
Some people are out of office. The main discussion will start second week of November. I’ll post next update after that. |
826867b
to
cf2c9ad
Compare
I rebase the PR to the latest master branch due to |
gentle ping :-) |
CMake developer here; just tracking how modules are being implemented in various places :) . I read through the design doc and had a few comments. Since it was already merged, I figured that here may be better; can move wherever is best though.
|
No, clang don't have such plans (deprecating 2 phase compilation model) at least for now.
Yes but the story of the 2 phase compilation model seems really appealing. So the build system supporting 2-phase compilation model may be a positive advantages. And in the future, the build systems may be able to support both (or even more) compilation models and the users can make the choice. |
To be more precise, there may be multiple kinds of BMIs in the future and Clang may have a 3-phase with the trimmed BMI being the "interesting" bit for importers in the future, but still using the full BMI for codegen. Clang is also getting a (proper rather than "frontend does the 2-phase internally" of today) 1-phase compilation like GCC and MSVC as well.
I agree. However, I prioritized 1-phase over 2-phase for CMake due to compiler support.
Agreed. However, given the simplicity of the 1-phase, I find it better for the initial implementation. There are a number of performance things that can be looked at in the future:
Basically my main interest is in getting things working across the ecosystem as a baseline before we start up our ricer cars. Of course, Bazel can do as they please; I can only offer my view on things here. |
This issue was filed against CMake. Unconditional redirection of |
The great thing about code, is that it can be changed. |
I like this option. Allow the community to develop this sooner than Google's timeline, but let us address your design concerns. |
I am deeply concerned with the two-year wait you’ve specified. This delay reflects an internal decision-making process that drastically impedes the progress of the Bazel user community. The entire Bazel community is being forced to align with your internal timeline, with their current needs being disregarded. This inevitably ties the hands of the community, curtailing experimentation and the adoption of new technologies in a timely manner. The motivation of the patch is that the missing C++20 Modules support in Bazel blocks our internal uses. Scalability and code style are indeed very important factors. That's also the reason we need code reviews. As mathstuf mentioned, it's not only Google that could achieve a battle-tested implementation. The open-source world thrives on responsiveness and iterative improvements. In the community, we should also be able to obtain an implementation that is scalable and well-designed through rigorous review processes. Tools and features improve through use, feedback, and refinements—not through prolonged periods of inaction. The community is more than capable of handling incremental changes and can be entrusted to do so.
Having an implementation in Java will require implementing a preprocessor. It may not be trivial. It shouldn't be bad to use compiler native scanners. Also it won't be a blocking issue if we want to implement a Java scanner some day. The scanner should be changeable by its nature. For example, there exist two implementations for collecting header file dependencies: one using Java native (see #13871) and the other using compiler native. |
+1 |
Indeed as
Agreed. Batch scanning would be an improvement for one-shot builds, but incremental/development builds may be better off with per-TU scanning commands (subject to process launch costs…things we need measurements for to actually know). |
Hey @PikachuHyA, thanks for your patience. The replies sparked some more internal discussions, and there seems to be more benefit from your implementation to Google than what was previously thought. I don't have a green light yet, but we're considering accepting the change, under an experimental flag, that guards the additional attributes on C++ rules and the implementation. The flag would mean that both implementation and the public interface is subject to change in the future. There's not so much objection to For the sake of improving the quality of the review, do you think you could break this XXL PR into several digestible pieces? I'll take care that each piece is reviewed in a couple of business days. From Bazel perspective the tricky part might be additional fields on C++ providers and C++ actions. Those might cause a regression, we need to benchmark it and figure out if it's small enough to be justifiable or should we do something extra to remove it. |
c43673c
to
f759279
Compare
f759279
to
88a230f
Compare
db1eb69
to
f52154b
Compare
f52154b
to
f07e7e2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to consider a way to tell Bazel that some sources do not use modules and can therefore completely skip scanning (and, if nothing in the target needs scanned, the target's collation step as well).
// if cpp20_module enabled, only c++20-deps-scanning will produce .d file | ||
// other actions will reuse the .d file from c++20-deps-scanning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is accurate as the "real" compile may mention other files in its .d
output. Including, but not limited to:
- the BMI files that are read (or only those that are used)
- modmap files
- header units which are translated into imports may stop reading the header and read the BMI directly
The last one should be covered by the header changing -> trigger a rescan, not listing it here allows the build graph to not-run the compile in case its change is non-consequential to the compile by waiting for the scanning to say so rather than queuing up the compile automatically.
Preconditions.checkState(module.isFileType(CppFileTypes.CPP_MODULE), "Non-module? %s", module); | ||
var skyValue = actionExecutionValues.get(module.getGeneratingActionKey()); | ||
if (skyValue == null) { | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a problematic error case; no messages or context about what happened?
src/main/java/com/google/devtools/build/lib/rules/cpp/CppCompileAction.java
Show resolved
Hide resolved
public CppCompileActionBuilder setPcmFiles(NestedSet<Artifact.DerivedArtifact> pcmFiles) { | ||
this.pcmFiles = pcmFiles; | ||
return this; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation seems weird here. Also looks like a missing newline after this brace.
<li>Clang use cppm </li> | ||
<li>GCC can use any source file extension </li> | ||
<li>MSVC use ixx </li> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All three can use any extension with the right flags (e.g., -x c++-module
or -interface
/-interfacePartition
). These are the preferred extensions.
var scanDepsBuilder = initializeCompileAction(sourceArtifact); | ||
scanDepsBuilder.setActionName(CppActionNames.CPP20_DEPS_SCANNING); | ||
scanDepsBuilder.setOutputs(ddiFile, dotdFile, null); | ||
// only c++20-deps-scanning add .d file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted elsewher, this seems unwise.
src/main/java/com/google/devtools/build/lib/rules/cpp/Cpp20ModuleDepMapAction.java
Show resolved
Hide resolved
content.append("module-file="); | ||
content.append(moduleName); | ||
content.append("="); | ||
content.append(modulePath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are any escaping mechanism required to be considered (e.g., spaces in the path)?
actionExecutionContext, | ||
out -> { | ||
OutputStreamWriter content = | ||
new OutputStreamWriter(out, StandardCharsets.ISO_8859_1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woah; not UTF-8? Or are paths in Bazel essentially limited to ASCII? Charset.defaultCharset()
seems better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This particular charset makes String
s behave just like byte arrays, which is useful when supporting both Unix (where paths are essentially just byte arrays) and Windows (which at least with the system APIs Bazel is using requires UTF-16). Unless the tools involved have any particular encoding requirements, this charset should be the most compatible choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it's worth a comment, but I'm not the intended audience here; maybe it's just implicit knowledge for Bazel developers.
@SerializedName("source-path") | ||
private String sourcePath; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the future header unit support would require reading use-source-path
(bool) and lookup-method
(enum). It might be prudent to read these and fail gracefully with a message about header unit non-support.
TL;DR The Bazel team has decided to accept this PR, I'll be doing the reviews and I'll get some help from internal C++ experts, namely @trybka. We identified the following risks:
We'd like to keep the maintenance costs at minimum - Bazel team will only do reviews on PRs after the initial community review. We won't address any issues that are reported. We don't mind if the community addresses them. We'd like to keep the change behind an experimental flag, to mitigate the risk of divergent implementations. While the change is under the experimental flag, there is no guarantee about incompatible changes. If Google does an internal implementation, we'd like it to match, to reduce maintenance costs. We'd also like to make the change as "modular" as possible, in order to make it easier to remove the future. That might happen in an unlikely scenario, that Google doesn't implement support for the C++20 modules and that this remains the only complexity in CppCompileAction that we can't be rewritten to Starlark. In case this scenario plays out, the C++20 modules support will probably need to be implemented in a different way. That said, we do see the benefits of this change for both the community and Google. Thank you for your contribution. |
hi @comius , I have split this XXL PR into 6 smaller commits. Initially, I hoped to divide it into independent small patches (see #22425 , #22427), but that proved to be unfeasible due to dependencies between the patches (#22429). Later, I plan to use stacked PRs to facilitate code review. However, stacked PRs require creating branches in the target repository first, and I'm not sure if I could be granted the necessary permissions. I've also created a demo of stacked PRs in my repository (https://github.com/PikachuHyA/bazel/pulls) as bakup. Do you have any suggestions on code review process? BTW. the windows CI is broken, I will fix it later. |
@mathstuf Thanks for your comments. I will make the related code changes as soon as possible. |
Nothing should require that; tools doing so should…work on that. It's kind of crazy to make tools not available for external contributors to projects. I believe https://stacked-git.github.io/ does most of its work locally so that at least you're not tied to any Github limitations. |
this PR implement the support C++20 Modules in bazel.
the design doc: bazelbuild/proposals#354
the discussion: #19939
the demo: https://github.com/PikachuHyA/async_simple
the extra tests: https://github.com/PikachuHyA/bazel_cxx20_module_test
see #4005