Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bazel 7.0: Java Toolchain Resolution for Bazel Ignores Java Flags #19934

Open
EdbertChan opened this issue Oct 24, 2023 · 9 comments
Open

Bazel 7.0: Java Toolchain Resolution for Bazel Ignores Java Flags #19934

EdbertChan opened this issue Oct 24, 2023 · 9 comments
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Rules-Java Issues for Java rules type: support / not a bug (process)

Comments

@EdbertChan
Copy link

EdbertChan commented Oct 24, 2023

Description of the bug:

To set and pin the Java toolchain to Java 11, Google documentation says to use

common --java_language_version=11
common --java_runtime_version=remotejdk_11
common --tool_java_language_version=11
common --tool_java_runtime_version=remotejdk_11
common --host_javabase=@bazel_tools//tools/jdk:remote_jdk11
common --javabase=@bazel_tools//tools/jdk:remote_jdk11
common --host_java_toolchain=@bazel_tools//tools/jdk:toolchain_java11
common --java_toolchain=@bazel_tools//tools/jdk:toolchain_java11

We observe that Bazel will not respect these flags in 7. However, this configuration works in Bazel 6.4.0.

Which category does this issue belong to?

No response

What's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

See branch at https://github.com/EdbertChan/SimpleBazelAndroidApp/tree/demonstrate_bazel_7_aquery

Which operating system are you running Bazel on?

MacOS

What is the output of bazel info release?

release 7.0.0-pre.20230906.2

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

No response

What's the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

No response

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

Yes. Last worked on Bazel 6.4

Have you found anything relevant by searching the web?

N/A

Any other information, logs, or outputs that you want to share?

When running with --toolchain_resolution_debug=., it looks like the resolution is overwritten at the last moment.

For instance, the log will be filled with

INFO: ToolchainResolution: Target platform @local_config_platform//:host: Selected execution platform @local_config_platform//:host, type @bazel_tools//tools/jdk:runtime_toolchain_type -> toolchain @remotejdk11_macos_aarch64//:jdk

But then the resolution gets set to remotejdk17. This does not make sense: it actually rejects the remotejdk11_macos_aarch64 when it has previously accepted and selected it.

Here is a portion of the toolchain debug:


INFO: ToolchainResolution: Target platform @local_config_platform//:host: Selected execution platform @local_config_platform//:host, type @bazel_tools//tools/jdk:runtime_toolchain_type -> toolchain @remotejdk11_macos_aarch64//:jdk
INFO: ToolchainResolution:     Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @remotejdk11_linux//:jdk; mismatching config settings: prefix_version_setting
INFO: ToolchainResolution:     Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @remotejdk11_linux_aarch64//:jdk; mismatching config settings: prefix_version_setting
INFO: ToolchainResolution:     Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @remotejdk11_linux_ppc64le//:jdk; mismatching config settings: prefix_version_setting
INFO: ToolchainResolution:     Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @remotejdk11_linux_s390x//:jdk; mismatching config settings: prefix_version_setting
INFO: ToolchainResolution:     Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @remotejdk11_macos//:jdk; mismatching config settings: prefix_version_setting
INFO: ToolchainResolution:     Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @remotejdk11_macos_aarch64//:jdk; mismatching config settings: prefix_version_setting
INFO: ToolchainResolution:     Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @remotejdk11_win//:jdk; mismatching config settings: prefix_version_setting
INFO: ToolchainResolution:     Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @remotejdk11_win_arm64//:jdk; mismatching config settings: prefix_version_setting
INFO: ToolchainResolution:     Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @remotejdk17_linux//:jdk; mismatching values: linux, x86_64
INFO: ToolchainResolution:     Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @remotejdk17_linux_aarch64//:jdk; mismatching values: linux
INFO: ToolchainResolution:     Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @remotejdk17_linux_ppc64le//:jdk; mismatching values: linux, ppc
INFO: ToolchainResolution:     Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @remotejdk17_linux_s390x//:jdk; mismatching values: linux, s390x
INFO: ToolchainResolution:     Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @remotejdk17_macos//:jdk; mismatching values: x86_64
INFO: ToolchainResolution:   Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: execution @local_config_platform//:host: Selected toolchain @remotejdk17_macos_aarch64//:jdk
INFO: ToolchainResolution:     Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @remotejdk17_win//:jdk; mismatching values: windows, x86_64
INFO: ToolchainResolution:     Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @remotejdk17_win_arm64//:jdk; mismatching values: windows
INFO: ToolchainResolution:     Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @remotejdk20_linux//:jdk; mismatching config settings: prefix_version_setting
INFO: ToolchainResolution:     Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @remotejdk20_linux_aarch64//:jdk; mismatching config settings: prefix_version_setting
INFO: ToolchainResolution:     Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @remotejdk20_macos//:jdk; mismatching config settings: prefix_version_setting
INFO: ToolchainResolution:     Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @remotejdk20_macos_aarch64//:jdk; mismatching config settings: prefix_version_setting
INFO: ToolchainResolution:     Type @bazel_tools//tools/jdk:runtime_toolchain_type: target platform @local_config_platform//:host: Rejected toolchain @remotejdk20_win//:jdk; mismatching config settings: prefix_version_setting
INFO: ToolchainResolution: Target platform @local_config_platform//:host: Selected execution platform @local_config_platform//:host, type @bazel_tools//tools/jdk:runtime_toolchain_type -> toolchain @remotejdk17_macos_aarch64//:jdk
@EdbertChan
Copy link
Author

After further research, I found this interesting piece of information.

Resolving @rules_java_builtin//toolchains:remotejdk_17 causes a reevaluation of @local_config_platform//:host and enqueues the evaluation with the new configurationKey that contains jdk_17.

Here are the screenshots from the debugger. Please let me know if I am on the right or wrong path.

https://imgur.com/a/cVCEtX6

@swarren12
Copy link

As an aside, I wonder if this is the same problem that I raised here: bazelbuild/rules_java#148.

If so, maybe mine is in the wrong place? 🤔

@fmeum
Copy link
Collaborator

fmeum commented Oct 25, 2023

Yes, I think this is identical to bazelbuild/rules_java#148. I don't fault you for that as the relevant code is still spread over both Bazel and rules_java :-)

For any given Java build, there are generally two Java runtimes in use:

  • The Java runtime that your targets are built for, that is, the one used to execute your java_test and executable java_binary targets. This runtime is controlled via --java_runtime_version and --tool_java_runtime_version.
  • The Java runtime that is used to run the Java compiler. This runtime can't be controlled directly via flags, instead it is set via the java_runtime attribute of java_toolchain. It just so happens that Bazel 6 uses a JDK 17 and Bazel 7 uses a JDK 21 for the Java toolchains registered by default. These toolchains use a transition on --java_runtime_version to resolve a matching JDK, which is why you see this additional resolution in the debugger.

Running the Java compiler on a recent JDK is generally a good idea as it should give the best build performance. Since Bazel relies on the --system javac flag to make the Java compiler use the classpath of the Java runtime you are building for, this should result in behavior that is identical to using that runtime to run the compiler, unless, e.g., you are using annotation processors that rely on compiler internals.

I haven't tried this myself yet, but just based on javac documentation it sounds as if --enable-previews' behavior only depends on the -source/-target or --release arguments, which are controlled via --{tool,}java_language_version. Thus, I would expect compilation for Java 17 with --enable-preview to behave in the same way regardless of whether it is a JDK 17 or JDK 21 hosting the compiler. @swarren12 Did you notice the contrary in your build or is bazelbuild/rules_java#148 more of a theoretical concern about the Bazel build not being provably identical to the Gradle build?

@swarren12
Copy link

swarren12 commented Oct 25, 2023

I haven't tried this myself yet, but just based on javac documentation it sounds as if --enable-previews' behavior only depends on the -source/-target or --release arguments, which are controlled via --{tool,}java_language_version. Thus, I would expect compilation for Java 17 with --enable-preview to behave in the same way regardless of whether it is a JDK 17 or JDK 21 hosting the compiler. @swarren12 Did you notice the contrary in your build or is bazelbuild/rules_java#148 more of a theoretical concern about the Bazel build not being provably identical to the Gradle build?

At the risk of making this thread somewhat convoluted to follow, I'll try to summarise what I believe my current problem is:

  • we would like to target a Java 17 release due to that being the ultimate runtime;
  • specifying --java_language_version=17 and friends causes Bazel to pass the --release 17 argument to javac;
  • we also need to manually --javacopt="--enable-preview" due to using features that were in preview as of Java 17;
  • javac (at least the implementation pulled in by rules_java) will only allow the --enable-preview argument if the --release X argument matches the latest JDK version (i.e. 21).

I don't know why you can't specify both --release 17 and --enable-preview, but I guess it's probably the semantics of it trying to enable Java 21 preview features, rather than those of Java 17?

Also, as a side note:

It just so happens that Bazel 6 uses a JDK 17 and Bazel 7 uses a JDK 21 for the Java toolchains registered by default. These toolchains use a transition on --java_runtime_version to resolve a matching JDK, which is why you see this additional resolution in the debugger.

This confused me at first because we are using Bazel 6.4.0 and getting Java 21 by default. But then I changed the version of Bazel to 6.3.2 and found that rules_java@7.0.6 didn't work with it, so I assume that there's some kind of Bazel <--> rules_java linking there? Downgrading to rules_java@6.3.2 made it go to Java 17 by default and the build worked as expected!

@EdbertChan
Copy link
Author

The issue appears to be because whatever is in _BASE_TOOLCHAIN_CONFIGURATION's java_runtime is taking precedent. If you change

java_runtime = Label("//toolchains:remotejdk_17")

to

java_runtime = Label("@bazel_tools//tools/jdk:remote_jdk11")

You will get everything to compile against 11. In rules_java@7.0.4, this got changed to jdk21.

Hopefully this helps.

@EdbertChan
Copy link
Author

EdbertChan commented Oct 25, 2023

Dumping the default_java_toolchain.bzl, here is the problem:

I'll spare you the full read but for toolchain_java{8,9,10,11}, the toolchain defaults to jdk 17 on runtime. The fix is probably very simple as injecting the right runtime parameter into each of the default_java_toolchain declarations for each individual version of Java.

We are testing a fix for this and can upstream it in a few hours, if not tomorrow. It would be good to get this in for Bazel 7.0.

DEBUG: /private/var/tmp/_bazel_edbert/c9b75f47fc5ed851a12a430053ea0046/external/rules_java/toolchains/default_java_toolchain.bzl:154:10: toolchain
DEBUG: /private/var/tmp/_bazel_edbert/c9b75f47fc5ed851a12a430053ea0046/external/rules_java/toolchains/default_java_toolchain.bzl:155:10: {"forcibly_disable_header_compilation": False, "genclass": [Label("@remote_java_tools//:GenClass")], "header_compiler": [Label("@remote_java_tools//:TurbineDirect")], "header_compiler_direct": [Label("@remote_java_tools//:TurbineDirect")], "ijar": [Label("@rules_java//toolchains:ijar")], "javabuilder": [Label("@remote_java_tools//:JavaBuilder")], "javac_supports_workers": True, "jacocorunner": Label("@remote_java_tools//:jacoco_coverage_runner_filegroup"), "jvm_opts": ["--add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED", "--add-exports=jdk.compiler/com.sun.tools.javac.main=ALL-UNNAMED", "--add-exports=jdk.compiler/com.sun.tools.javac.model=ALL-UNNAMED", "--add-exports=jdk.compiler/com.sun.tools.javac.processing=ALL-UNNAMED", "--add-exports=jdk.compiler/com.sun.tools.javac.resources=ALL-UNNAMED", "--add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED", "--add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED", "--add-opens=jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED", "--add-opens=jdk.compiler/com.sun.tools.javac.comp=ALL-UNNAMED", "--add-opens=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED", "--add-opens=jdk.compiler/com.sun.tools.javac.parser=ALL-UNNAMED", "--add-opens=java.base/java.nio=ALL-UNNAMED", "--add-opens=java.base/java.lang=ALL-UNNAMED", "-Dsun.io.useCanonCaches=false", "-XX:-CompactStrings", "-Xlog:disable", "-Xlog:all=warning:stderr:uptime,level,tags"], "turbine_jvm_opts": ["-XX:+UseParallelGC"], "misc": ["-XDskipDuplicateBridges=true", "-XDcompilePolicy=simple", "-g", "-parameters", "-Xep:ReturnValueIgnored:OFF", "-Xep:IgnoredPureGetter:OFF", "-Xep:EmptyTopLevelDeclaration:OFF", "-Xep:LenientFormatStringValidation:OFF", "-Xep:ReturnMissingNullable:OFF", "-Xep:UseCorrectAssertInTests:OFF"], "singlejar": [Label("@rules_java//toolchains:singlejar")], "bootclasspath": [Label("@rules_java//toolchains:platformclasspath")], "source_version": "8", "target_version": "8", "reduced_classpath_incompatible_processors": ["dagger.hilt.processor.internal.root.RootProcessor"], **"java_runtime": Label("@rules_java//toolchains:remotejdk_17")}**

@layus
Copy link
Contributor

layus commented Oct 26, 2023

@EdbertChan, I am actually interested in the full read, having banged my head on the issue for quite a while.

My fix was to use a local toolchain (and also nonprebuilt, but that addressing a different issue)

    default_java_toolchain(
        name = "nonprebuilt_toolchain_java" + str(version),
        configuration = NONPREBUILT_TOOLCHAIN_CONFIGURATION,
        java_runtime = "@local_jdk//:jdk",
        source_version = str(version),
        target_version = str(version),
    )
    for version in range(8, 31)

Tuning the version of java in JAVA_HOME finally got rid of the issue.

Is it fair to say that the issue resides in this single java_runtime value set by default for all java toolchains (see below) ? A value that I happen to override in my above fix.

https://github.com/bazelbuild/rules_java/blob/b14972c2fd5211d0bed50ac291e4e0785f247f47/toolchains/default_java_toolchain.bzl#L95C1-L95C55

# Default java_toolchain parameters
DEFAULT_TOOLCHAIN_CONFIGURATION = _BASE_TOOLCHAIN_CONFIGURATION
_BASE_TOOLCHAIN_CONFIGURATION = dict(
    ...,
    java_runtime = Label("//toolchains:remotejdk_21"),
)

@EdbertChan
Copy link
Author

EdbertChan commented Oct 26, 2023

I would not recommend you use your fix nor upstream it for multiple reasons.

  1. By deferring to local JDK, you will be breaking hermesticity.

  2. Interfering with the default_toolchain_configuration will also break some Android tooling inside of Bazel since they appear to go through the default_toolchain instead. We have tried to change

# Default java_toolchain parameters
DEFAULT_TOOLCHAIN_CONFIGURATION = _BASE_TOOLCHAIN_CONFIGURATION
_BASE_TOOLCHAIN_CONFIGURATION = dict(
    ...,
-    java_runtime = Label("//toolchains:remotejdk_21"),
+   java_runtime = Label("//toolchains:remotejdk_11"),
)

But we will get some compile time issues where certain java_tools fail to unzip external artifacts (I have not looked too deeply at this and sidestepped the issue).

  1. You also cannot do
    `
[
    default_java_toolchain(
        name = "toolchain_java%d" % release,
        configuration = DEFAULT_TOOLCHAIN_CONFIGURATION,
        source_version = "%s" % release,
        target_version = "%s" % release,
+     java_runtime = "@bazel_tools//tools/jdk:toolchain_jdk%d" %release,
    )
    for release in RELEASES
]

because you will get

java_runtime attribute of java_toolchain rule @rules_java//toolchains:toolchain_java11: '@bazel_tools//tools/jdk:toolchain_java11' does not have mandatory providers: 'JavaRuntimeInfo'

Unfortunately, that means the rules_java maintainers will have to get involved and supply a backward fix for every version of Java toolchain that has this JavaRuntimeInfo. But really, this is the proper way to fix the issue. I will leave a note in the rules_java on how to fix this.

bazelbuild/rules_java#148

  1. Interestingly enough, this also brings another issue we are facing. Migrating from 11 -> 17 via Bazel 6 -> 7 is causing workers to fail but not report back to the Bazel worker orchestrator. This is because Java 17 will start to emit error messages when the compiler starts to run out of memory. These somehow get written into the output of the worker and mangled with the response. As a result, the worker runs forever and never terminates which stalls the build. (Yes this should be a issue filed as well but we have only recently discovered this as the root cause).

  2. This brings me back to the need to pin to Java 11 but only for certain toolchains. We have decided to use this fix temporarily.

uber-common@cf503d0

The biggest drawback is that we cannot use the aformentioned

common --java_language_version=11
common --java_runtime_version=remotejdk_11
common --tool_java_language_version=11
common --tool_java_runtime_version=remotejdk_11

because those will fail at Android compile time. Probably because the Android tools themselves were compiled against 17 (again, have not looked at this one too deeply).

This is the one with the least blast radius and the most stable without getting too involved. Because of Java backwards compatibility, there should be no problem running Java 11- code on Java 11. Furthermore, in default_toolchain, the java_runtime parameter will override what is in the DEFAULT_TOOLCHAIN_CONFIGURATION.

We have verified this by using

bazel aquery "deps(<module>)" --toolchain_resolution_debug="."

and checked the toolchain message at the top. We also looked at the compile messages themselves and verified they were using exec /external/remotejdk_11.

Hope this helps!

@fmeum
Copy link
Collaborator

fmeum commented Oct 29, 2023

@EdbertChan Could you file a separate issue for 4.? Compiler error messages breaking the worker protocol sounds bad.

I am not entirely sure what is supposed to be the bug here: Major versions of Bazel are generally free to update the compilation JDK and if you as a user would like to use a specific one (e.g. to enable faithful --enable-preview), you can easily register your own Java compilation toolchain with a different java_runtime. For most users, compiling for a given Java version on a recent JDK should give the best compilation performance while not resulting in perceptible differences to also compiling on that version.

If Android breaks with this setup, then that is definitely a (separate) bug to solve.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Rules-Java Issues for Java rules type: support / not a bug (process)
Projects
None yet
Development

No branches or pull requests

8 participants