Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bazel coverage fails over remote ex for java_test #4685

Closed
werkt opened this issue Feb 22, 2018 · 71 comments
Closed

bazel coverage fails over remote ex for java_test #4685

werkt opened this issue Feb 22, 2018 · 71 comments
Assignees
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug

Comments

@werkt
Copy link
Contributor

werkt commented Feb 22, 2018

Description of the problem / feature request:

Tooling for coverage fails when running under remote execution (LcovMerger specifically)

Bugs: what's the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Run a test target with bazel coverage --spawn_strategy=remote --remote_executor=... //path/to:test

What operating system are you running Bazel on?

Ubuntu 14.04

What's the output of bazel info release?

release 0.10.1

Any other information, logs, or outputs that you want to share?

Relevant portion of coverage execution

+ exec bazel-out/k8-fastbuild/bin/external/bazel_tools/tools/test/LcovMerger --coverage_dir=%execroot%/_coverage/path/to/test/test --output_file=%execroot%/bazel-out/k8-fastbuild/testlogs/path/to/test/coverage.dat
%execroot%/bazel-out/k8-fastbuild/bin/external/bazel_tools/tools/test/LcovMerger: Cannot locate runfiles directory. (Set $JAVA_RUNFILES to inhibit searching.)
@aehlig aehlig added type: bug P2 We'll consider working on this in future. (Assignee optional) category: remote execution / caching labels Feb 26, 2018
@buchgr buchgr added the team-Remote-Exec Issues and PRs for the Execution (Remote) team label Feb 5, 2019
@buchgr
Copy link
Contributor

buchgr commented Feb 5, 2019

Is this bug still an issue?cc @lberki @iirina

@vmrob
Copy link

vmrob commented Sep 11, 2019

I see the same issue when running some of our C++/Python tests with remote execution. Bazel version is a slightly patched version of 0.25.3.

@vmrob
Copy link

vmrob commented Sep 12, 2019

I've come up with a workaround that basically involves the below patch. I have a feeling there is a more correct way to do this--possibly by including some implicit dependencies in the relevant coverage-aware Bazel rules, but I am not familiar enough with the codebase to efficiently tackle that.

commit ba16394fa207709b2b2a2fdc22de30cc72cf4398 (HEAD -> rbe-coverage)
Author: Victor Robertson <victor.robertson@getcruise.com>
Date:   Wed Sep 11 23:01:49 2019 -0700

    Fix coverage for RBE [EP-12079]

    This patch fixes code coverage in RBE by removing the distinction
    between JAVA_RUNFILES and TEST_SRCDIR in the collect_coverage.sh script.
    This is important as Bazel will not send JAVA_RUNFILES to the remote
    environment. Unfortunately, this means that a Java sdk will be included
    with every coverage test--perhaps this isn't so bad.

    This also means that tests with coverage support must include the
    dependencies normally provided by Bazel. In cruise/cruise, we use the
    following additional dependencies:

    - @embedded_jdk//:jdk
    - @bazel_tools//tools/jdk:JacocoCoverageRunner
    - @bazel_tools//tools/test/CoverageOutputGenerator/java/com/google/devtools/coverageoutputgenerator:Main

diff --git a/tools/test/collect_coverage.sh b/tools/test/collect_coverage.sh
index a9e212ce5d..df3b22cc78 100755
--- a/tools/test/collect_coverage.sh
+++ b/tools/test/collect_coverage.sh
@@ -188,7 +188,8 @@ if [[ $DISPLAY_LCOV_CMD ]] ; then
   echo "-----------------"
 fi

-# JAVA_RUNFILES is set to the runfiles of the test, which does not necessarily
-# contain a JVM (it does only if the test has a Java binary somewhere). So let
-# the LCOV merger discover where its own runfiles tree is.
-JAVA_RUNFILES= exec $LCOV_MERGER_CMD
+# cruise: Since JAVA_RUNFILES would never be pushed to RBE, we allow it to
+# remain as it is and ensure that TEST_SRCDIR has the right dependencies to run
+# coverage reports. Normally this would be preceeded with JAVA_RUNFILES= to
+# clear it out.
+exec $LCOV_MERGER_CMD

As mentioned in the patch, this means our test configurations include this little kludge. You can imagine how this is used (data = add_coverage_test_runtime_data()) along with the associated config.

def add_coverage_test_runtime_data():
    return select({
        "@//tools:is_coverage_build": [
            "@embedded_jdk//:jdk",
            "@bazel_tools//tools/jdk:JacocoCoverageRunner",
            "@bazel_tools//tools/test/CoverageOutputGenerator/java/com/google/devtools/coverageoutputgenerator:Main",
        ],
        "//conditions:default": [],
    })

Hopefully this information helps.

ulfjack added a commit to ulfjack/bazel that referenced this issue Dec 7, 2019
Requires:
 - bazelbuild#10379 is merged
 - a new remote coverage tools zip pushed
 - coverage.WORKSPACE updated to the new tools

This changes the @bazel_tools//tools/test/BUILD file to fully delegate to the @remote_coverage_tools repository, which must contain rules for :lcov_merger and :coverage_report_generator.

This makes the @remote_coverage_tools reference self-contained, which allows overriding the tools using --override_repository, and allows independently replacing or fixing them.

Progress on bazelbuild#4685.

Change-Id: I321c62332f00d910f4ccfb3244d63e60627d59ad
ulfjack added a commit to ulfjack/bazel that referenced this issue Dec 7, 2019
Requires:
 - bazelbuild#10379 is merged
 - a new remote coverage tools zip pushed
 - coverage.WORKSPACE updated to the new tools

This changes the @bazel_tools//tools/test/BUILD file to fully delegate to the @remote_coverage_tools repository, which must contain rules for :lcov_merger and :coverage_report_generator.

This makes the @remote_coverage_tools reference self-contained, which allows overriding the tools using --override_repository, and allows independently replacing or fixing them.

Progress on bazelbuild#4685.

Change-Id: I321c62332f00d910f4ccfb3244d63e60627d59ad
@ulfjack ulfjack self-assigned this Dec 7, 2019
@ulfjack
Copy link
Contributor

ulfjack commented Dec 7, 2019

The test action only stages the files-to-build, but not the runfiles tree. The coverage tool used by Google is a single, self-contained executable file. This is at odds with the new coverage tool, which is built as a Java binary, and the wrapper script for java_binary expects a runfiles tree. This happens to work locally because the wrapper script manages to escape the symlink sandbox by following a symlink back to the exec root (it wouldn't work with stricter local sandboxing).

I'm still not sure how best to fix it, but I can improve the situation. I'm reluctant to change how the test action stages the coverage tool. However, I have two changes to make the coverage tools repository self-contained (this is a nice cleanup in any case and makes it more flexible in the future), and I have a prototype for making the coverage tools repository compatible with the test action setup by making it work without a runfiles tree. That works on Linux and MacOS because it currently assumes the presence of /bin/bash.

bazel-io pushed a commit that referenced this issue Dec 9, 2019
Currently, the coverage tool setup is smeared across @bazel_tools//tools/test/BUILD, @bazel_tools//tools/test/CoverageOutputGenerator/.../BUILD (both shipped inside Bazel), and @remote_coverage_tools//, which is separately uploaded as a zipped repository. The second BUILD file contains java_* rules which are actually used. The remote_coverage_tools repository contains a copy of the java_* rules, but they are never used, and the java_import in the former directly references the deploy jar in the latter.

Instead, the plan is to only ship @bazel_tools//tools/test/BUILD in Bazel, and use alias rules to point to the @remote_coverage_tools repository. There, we define two rules (:lcov_merger and :coverage_report_generator), which fully define the implementation. This allows the repository to be swapped out for another repository with a different implementation using --override_repository, which facilitates independent work on the merger and generator.

The underlying problem is that the merger currently does not work with remote execution, and there is no workaround because all the relevant parts are hard-coded in Bazel, which requires a Bazel release to fix. By making the coverage tools self-contained, we make it easier to solve such problems in the future.

This should not cause any problems for the existing workaround described in #4685.

Progress on #4685.

Change-Id: I26758db573a1966b40169314d4aec61eff83f83b

Closes #10379.

Change-Id: I26758db573a1966b40169314d4aec61eff83f83b
PiperOrigin-RevId: 284581461
@benjaminp
Copy link
Collaborator

I'm still not sure how best to fix it, but I can improve the situation. I'm reluctant to change how the test action stages the coverage tool.

Why the reluctance? Wouldn't adding the coverage binary properly as a tool to the test action (i.e., fix #4033) be the cleanest resolution?

@ulfjack
Copy link
Contributor

ulfjack commented Dec 9, 2019

Definitely maybe. Merging the runfiles tree into the tests runfiles tree can potentially cause conflicts, and we also don't want to tests to interact with it directly - both the tool and the contract are subject to change. We could stage the runfiles tree separately from the test runfiles tree, although I suspect that that isn't possible in Bazel right now. At the same time, the tool is a single self-contained deploy jar - it's only the java_binary wrapper script that needs a runfiles tree, not the tool itself.

@benjaminp
Copy link
Collaborator

The JVM could be in runfiles, so I don't think they're trivial to dispense with for the coverge merger.

I suppose you could split the coverage merging into a separate spawn like test.xml generation, but the test itself would need to glob all the intermediate coverage artifacts into a zip or something, which is pointless overhead.

@ulfjack
Copy link
Contributor

ulfjack commented Dec 9, 2019

Actually, I discussed splitting it into a separate spawn earlier today, and I think that would have some advantages. We are already close to the per-action execution time limit in some cases, and doing coverage work in the same action is problematic. We can use tree artifacts to track the intermediate artifacts to avoid zipping / unzipping.

You're right about the JVM.

bazel-io pushed a commit that referenced this issue Dec 10, 2019
Progress on #4685.

Closes #10393.

Change-Id: I2cb5f90c75e341419bb05d035b0ae1cc26a5eba5
PiperOrigin-RevId: 284764946
bazel-io pushed a commit that referenced this issue Dec 12, 2019
Requires:
 - #10379 is merged
 - a new remote coverage tools zip pushed
 - coverage.WORKSPACE updated to the new tools

This changes the @bazel_tools//tools/test/BUILD file to fully delegate to the @remote_coverage_tools repository, which must contain rules for :lcov_merger and :coverage_report_generator.

This makes the @remote_coverage_tools reference self-contained, which allows overriding the tools using --override_repository, and allows independently replacing or fixing them.

Progress on #4685.

Change-Id: I321c62332f00d910f4ccfb3244d63e60627d59ad

Closes #10383.

Change-Id: I321c62332f00d910f4ccfb3244d63e60627d59ad
PiperOrigin-RevId: 285246323
@ulfjack
Copy link
Contributor

ulfjack commented Dec 17, 2019

I'm wondering if moving the lcov merger to a post-process would allow us to use an aspect to attach it to the test. We currently require that all test rules depend on the lcov_merger in order to get the postprocessing into the test action, and that requires extra work from rule authors and is also not documented anywhere.

@benjaminp
Copy link
Collaborator

That will probably fix #6293, too.

@ulfjack
Copy link
Contributor

ulfjack commented Dec 17, 2019

Fully integrating Go coverage requires some path to generate lcov data. However, I agree that moving to a post-process (aspect or no) would make it more flexible; it allows pulling back the coverage data even if the coverage tools for language X don't support lcov conversion yet (or ever). There's a similar problem with Python coverage which I looked into today - coverage.py doesn't support generating lcov, so right now it is impossible to integrate into Bazel's coverage system, and you can't even manually post-process because the files are silently dropped.

@ulfjack
Copy link
Contributor

ulfjack commented Dec 17, 2019

I'll look into declaring a tree artifact for the coverage output dir, if I find the time.

@adam-azarchs
Copy link
Contributor

In our repo --build_runfile_links takes a long, long time. It also doesn't actually help with this issue for on the remote builder (it does allow things to work with --experimental_split_coverage_postprocessing but then the results are empty). The issue here I think is that while it's building a runfiles tree for our test (which is not in java) it is not building one for the coverage post-processing tool, which is what's failing here.

copybaranaut pushed a commit to pixie-io/pixie that referenced this issue Jun 22, 2022
Summary:
Coverage runs were getting `nobuild_runfile_links` from a expansion
of `remote_download_minimal`. This happens to cause issues. So explicitly
set `build_runfile_links` for coverage runs.

See bazelbuild/bazel#4685 (comment)

Test Plan: Ran the coverage build, most of the failures go away.

Reviewers: zasgar, michelle, jamesbartlett

Reviewed By: jamesbartlett

Signed-off-by: Vihang Mehta <vihang@pixielabs.ai>

Differential Revision: https://phab.corp.pixielabs.ai/D11654

GitOrigin-RevId: d469d5d
@davido
Copy link
Contributor

davido commented Jun 27, 2022

When using --experimental_split_coverage_postprocessing option, I'm seeing this crash (Bazel 5.0 and Bazel@HEAD):

FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.RuntimeException: Unrecoverable error while evaluating node 'UnshareableActionLookupData{actionLookupKey=ConfiguredTargetKey{label=//javatests/com/google/gerrit/common:server_tests, config=BuildConfigurationValue.Key[9540ab7a9954532d9292c74b14a5bd39d2759f38c86d1e6c545f5192a7edbba2]}, actionIndex=13}' (requested by nodes 'UnshareableActionLookupData{actionLookupKey=CoverageReportKeySingleton, actionIndex=1}', 'TestCompletionKey{configuredTargetKey=ConfiguredTargetKey{label=//javatests/com/google/gerrit/common:server_tests, config=BuildConfigurationValue.Key[9540ab7a9954532d9292c74b14a5bd39d2759f38c86d1e6c545f5192a7edbba2]}, topLevelArtifactContext=com.google.devtools.build.lib.analysis.TopLevelArtifactContext@90904c3b, exclusiveTesting=false}')
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:674)
	at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:382)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: java.lang.NullPointerException
	at com.google.devtools.build.lib.skyframe.ActionMetadataHandler.getMetadata(ActionMetadataHandler.java:229)
	at com.google.devtools.build.lib.exec.StandaloneTestStrategy$BazelTestAttemptContinuation.execute(StandaloneTestStrategy.java:839)
	at com.google.devtools.build.lib.analysis.test.TestRunnerAction$RunAttemptsContinuation.execute(TestRunnerAction.java:1161)
	at com.google.devtools.build.lib.analysis.test.TestRunnerAction.execute(TestRunnerAction.java:932)
	at com.google.devtools.build.lib.analysis.test.TestRunnerAction.execute(TestRunnerAction.java:921)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$5.execute(SkyframeActionExecutor.java:907)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.continueAction(SkyframeActionExecutor.java:1076)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:1031)
	at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:152)
	at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:91)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:492)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:856)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.computeInternal(ActionExecutionFunction.java:349)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:169)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:590)
	... 4 more

@adam-azarchs
Copy link
Contributor

That should go away if you set --remote_download_outputs=all.

@davido
Copy link
Contributor

davido commented Jun 27, 2022

That should go away if you set --remote_download_outputs=all.

I tried it, and I'm getting the same error on Bazel@HEAD (b598c51). I'm running it against Gerrit project:

  $ bazeldev coverage --experimental_split_coverage_postprocessing \
    --remote_download_outputs=all \
    --config=remote \
    --remote_instance_name=projects/$PROJECT/instances/default_instance \
    --coverage_report_generator=@bazel_tools//tools/test:coverage_report_generator \
    --combined_report=lcov
     javatests/com/google/gerrit/common/...

orishuss pushed a commit to orishuss/pixie that referenced this issue Jul 14, 2022
Summary:
Coverage runs were getting `nobuild_runfile_links` from a expansion
of `remote_download_minimal`. This happens to cause issues. So explicitly
set `build_runfile_links` for coverage runs.

See bazelbuild/bazel#4685 (comment)

Test Plan: Ran the coverage build, most of the failures go away.

Reviewers: zasgar, michelle, jamesbartlett

Reviewed By: jamesbartlett

Signed-off-by: Vihang Mehta <vihang@pixielabs.ai>

Differential Revision: https://phab.corp.pixielabs.ai/D11654

GitOrigin-RevId: d469d5d
orishuss pushed a commit to orishuss/pixie that referenced this issue Jul 14, 2022
Summary:
Coverage runs were getting `nobuild_runfile_links` from a expansion
of `remote_download_minimal`. This happens to cause issues. So explicitly
set `build_runfile_links` for coverage runs.

See bazelbuild/bazel#4685 (comment)

Test Plan: Ran the coverage build, most of the failures go away.

Reviewers: zasgar, michelle, jamesbartlett

Reviewed By: jamesbartlett

Signed-off-by: Vihang Mehta <vihang@pixielabs.ai>

Differential Revision: https://phab.corp.pixielabs.ai/D11654

GitOrigin-RevId: d469d5d
orishuss pushed a commit to orishuss/pixie that referenced this issue Jul 14, 2022
Summary:
Coverage runs were getting `nobuild_runfile_links` from a expansion
of `remote_download_minimal`. This happens to cause issues. So explicitly
set `build_runfile_links` for coverage runs.

See bazelbuild/bazel#4685 (comment)

Test Plan: Ran the coverage build, most of the failures go away.

Reviewers: zasgar, michelle, jamesbartlett

Reviewed By: jamesbartlett

Signed-off-by: Vihang Mehta <vihang@pixielabs.ai>

Differential Revision: https://phab.corp.pixielabs.ai/D11654

GitOrigin-RevId: d469d5d
@fmeum
Copy link
Collaborator

fmeum commented Oct 14, 2022

That should go away if you set --remote_download_outputs=all.

I tried it, and I'm getting the same error on Bazel@HEAD (b598c51). I'm running it against Gerrit project:

  $ bazeldev coverage --experimental_split_coverage_postprocessing \
    --remote_download_outputs=all \
    --config=remote \
    --remote_instance_name=projects/$PROJECT/instances/default_instance \
    --coverage_report_generator=@bazel_tools//tools/test:coverage_report_generator \
    --combined_report=lcov
     javatests/com/google/gerrit/common/...

@davido Could you try again with --experimental_fetch_all_coverage_outputs?

@fmeum
Copy link
Collaborator

fmeum commented Oct 14, 2022

@adam-azarchs I was able to reproduce the VerifyError with Bazel's integration tests. Adding a suitable --experimental_remote_download_regex fixed it, which I turned into a PR. Could you try #16475?

With this PR and the two flags --experimental_fetch_all_coverage_outputs and --experimental_split_coverage_postprocessing, remote coverage with --remote_download_minimal appears to work.

@c-mita Do you see any potential issues if the two experimental flags were enabled by default?

@c-mita
Copy link
Member

c-mita commented Oct 17, 2022

--experimental_fetch_all_coverage_outputs and --experimental_split_coverage_postprocessing?

I don't really know of a fundamental reason why they couldn't be enabled by default, although there may be one or two lingering issues that should be resolved first (#15363 comes to mind).

@adam-azarchs
Copy link
Contributor

@adam-azarchs I was able to reproduce the VerifyError with Bazel's integration tests. Adding a suitable --experimental_remote_download_regex fixed it, which I turned into a PR. Could you try #16475?

With this PR and the two flags --experimental_fetch_all_coverage_outputs and --experimental_split_coverage_postprocessing, remote coverage with --remote_download_minimal appears to work.

I'm on PTO this week but can check it out next week. However the last time I tried those flags in combination I got something that passed but didn't record any coverage information.

@LKreutzer
Copy link

The same issue can come up with a py_test target. In our case it was solved by not using any --remote_download_minimal or --remote_download_toplevel options.

@adam-azarchs
Copy link
Contributor

@fmeum I tried a coverage build with refs/pull/16475/merge, and now coverage for our go and python rules actually works! So that's nice. However, it seems to break coverage for some of our internal rules; I guess I'll need to figure that one out myself.

@fmeum
Copy link
Collaborator

fmeum commented Oct 26, 2022

@fmeum I tried a coverage build with refs/pull/16475/merge, and now coverage for our go and python rules actually works! So that's nice.

Could you also try #16556 when it's ready? That approach is much more reliable than #16475.

However, it seems to break coverage for some of our internal rules; I guess I'll need to figure that one out myself.

The most common reason is that InstrumentedFilesInfo isn't propagated correctly, see bazelbuild/rules_go@ba48db5 for an example of how we fixed that in rules_go.

@davido
Copy link
Contributor

davido commented Nov 19, 2022

@davido Could you try again with --experimental_fetch_all_coverage_outputs?

@fmeum With this option I'm getting a different error:

[...]
INFO: Found 2 test targets...
INFO: From Compiling java_tools/src/tools/singlejar/combiners.cc [for tool]:
external/remote_java_tools/java_tools/src/tools/singlejar/combiners.cc:190:21: warning: unused variable 'MULTI_RELEASE_LENGTH' [-Wunused-const-variable]
static const size_t MULTI_RELEASE_LENGTH = strlen(MULTI_RELEASE);
                    ^
1 warning generated.
ERROR: <builtin>: Coverage report generation failed: (Exit 34): INVALID_ARGUMENT: Invalid arguments: 
  "command.ValidateSpec": Invalid spec - docker container must be specified
INFO: Elapsed time: 171.203s, Critical Path: 166.13s
INFO: 457 processes: 301 remote cache hit, 36 internal, 120 remote.
FAILED: Build did NOT complete successfully
//javatests/com/google/gerrit/common:server_tests                        PASSED in 1.9s
  /home/davido/.cache/bazel/_bazel_davido/27a001f4182820ef315d8d2d4f1edafe/execroot/gerrit/bazel-out/k8-fastbuild/testlogs/javatests/com/google/gerrit/common/server_tests/coverage.dat
//javatests/com/google/gerrit/common/data:data_tests                     PASSED in 1.9s
  /home/davido/.cache/bazel/_bazel_davido/27a001f4182820ef315d8d2d4f1edafe/execroot/gerrit/bazel-out/k8-fastbuild/testlogs/javatests/com/google/gerrit/common/data/data_tests/coverage.dat

@fmeum
Copy link
Collaborator

fmeum commented Nov 19, 2022

@davido At least that error doesn't seem to be directly related to Bazel anymore. Only place I can find it is at bazelbuild/bazel-toolchains#870.

@davido
Copy link
Contributor

davido commented Nov 19, 2022

@fmeum Thanks for confirming, but the above issue seems to be entirely unrelated to Gerrit Code Review project, where we are trying (for years) to make bazel coverage command work as expected on RBE.

I've just tried to run bazel coverage command on Bazel@HEAD (0c4de91) against Bazel project itself on RBE for some java tests and got exactly the same failure:

  $ cd ~/projects/bazel
  $ git describe --always
  0c4de91858
  $ bazeldev coverage \
  --experimental_fetch_all_coverage_outputs \
  --config=remote \
  --remote_instance_name=projects/$PROJECT/instances/default_instance \
  --experimental_split_coverage_postprocessing \
  --remote_download_outputs=all \
  --coverage_report_generator=@bazel_tools//tools/test:coverage_report_generator \
  --combined_report=lcov \
  src/test/java/com/google/devtools/common/options/...
[...]
INFO: From Generating proto_library //src/main/protobuf:execution_statistics_proto:
[libprotobuf WARNING external/com_google_protobuf/src/google/protobuf/compiler/java/java_file.cc:244] The optimize_for = LITE_RUNTIME option is no longer supported by protobuf Java code generator and is ignored--protoc will always generate full runtime code for Java. To use Java Lite runtime, users should use the Java Lite plugin instead. See:
  https://github.com/protocolbuffers/protobuf/blob/master/java/lite.md
ERROR: <builtin>: Coverage report generation failed: (Exit 34): INVALID_ARGUMENT: Invalid arguments: 
  "command.ValidateSpec": Invalid spec - docker container must be specified
INFO: Elapsed time: 331.919s, Critical Path: 303.42s
INFO: 1810 processes: 67 internal, 1743 remote.
FAILED: Build did NOT complete successfully
//src/test/java/com/google/devtools/common/options:AllTests              PASSED in 10.7s
  /home/davido/.cache/bazel/_bazel_davido/0fa756dec521553dbe2dde6b6eac99b4/execroot/io_bazel/bazel-out/k8-fastbuild/testlogs/src/test/java/com/google/devtools/common/options/AllTests/coverage.dat
//src/test/java/com/google/devtools/common/options/processor:OptionProcessorTest PASSED in 6.0s
  /home/davido/.cache/bazel/_bazel_davido/0fa756dec521553dbe2dde6b6eac99b4/execroot/io_bazel/bazel-out/k8-fastbuild/testlogs/src/test/java/com/google/devtools/common/options/processor/OptionProcessorTest/coverage.dat
//src/test/java/com/google/devtools/common/options/testing:OptionsTesterTest PASSED in 2.2s
  /home/davido/.cache/bazel/_bazel_davido/0fa756dec521553dbe2dde6b6eac99b4/execroot/io_bazel/bazel-out/k8-fastbuild/testlogs/src/test/java/com/google/devtools/common/options/testing/OptionsTesterTest/coverage.dat

Executed 3 out of 3 tests: 3 tests pass.
There were tests whose specified size is too big. Use the --test_verbose_timeout_warnings command line option to see which ones these are.
All tests passed but there were other errors during the build.

So that I'm still wondering if someone can run bazel coverage command on RBE for java tests and what is the difference?

@fmeum
Copy link
Collaborator

fmeum commented Nov 19, 2022

@davido Which remote backend are you running against? I can't find that error message in the Bazel codebase.

@davido
Copy link
Contributor

davido commented Nov 19, 2022

@davido Which remote backend are you running against?

We use Google cloud RBE, see the documentation upstream how to set it up and running: [1], [2].

We also use custom rbe_autoconfig with custom docker image: [3], generated using this repo: [4], based on v5.1.2 release of bazel-toolchains: [5].

The relevant WORKSPACE part is here: [6].

http_archive(
    name = "rbe_jdk11",
    sha256 = "dbcfd6f26589ef506b91fe03a12dc559ca9c84699e4cf6381150522287f0e6f6",
    strip_prefix = "rbe_autoconfig-3.1.0",
    urls = [
        "https://gerrit-bazel.storage.googleapis.com/rbe_autoconfig/v3.1.0.tar.gz",
        "https://github.com/davido/rbe_autoconfig/archive/v3.1.0.tar.gz",
    ],
)

So, I tried to log Bazel's gRPC communication in binary protocol buffer format to /tmp/grpc.log
by passing --experimental_remote_grpc_log=/tmp/grpc.log option, and figured that the failing command is:

^Ebazel^R@40dcf1c86a23e086bf208efa56a38090d5809af81105d8deec079cda46376b58^Z$5811da23-3627-471d-814b-64ce608acaa1"$98bcf6d2-d830-4a3f-81fb-f4cfc1d4e733*^NCoverageReport:^Fsystem^Rc^H^C^R_Invalid arguments: 
  "command.ValidateSpec": Invalid spec - docker container must be specified^Z1build.bazel.remote.execution.v2.Execution/Execute"

If I pass this option: --strategy=CoverageReport=local then bazel coverage works as expected on RBE.

[1] https://gerrit.googlesource.com/gerrit/+/master/Documentation/dev-bazel.txt#585
[2] https://gerrit-documentation.storage.googleapis.com/Documentation/3.7.0/dev-bazel.html#RBE
[3] https://gerrit.googlesource.com/gerrit/+/master/tools/platforms/Dockerfile
[4] https://github.com/davido/rbe_autoconfig
[5] https://github.com/bazelbuild/bazel-toolchains
[6] https://gerrit.googlesource.com/gerrit/+/master/WORKSPACE#43

@adam-azarchs
Copy link
Contributor

My time for testing this stuff has been limited, but with bazel 6.0.0 and the following in my .bazelrc I am successfully able to collect coverage:

coverage --strategy=CoverageReport=local
coverage --experimental_split_coverage_postprocessing
coverage --experimental_fetch_all_coverage_outputs
coverage --remote_download_outputs=all
coverage --experimental_remote_download_regex=.*/((testlogs/.*/_coverage/.*)|coverage.dat$|_coverage/_coverage_report.dat$)

Without --remote_download_outputs=all, it still crashes out. Without the --experimental_remote_download_regex (cribbed from #16475) the coverage results come up empty.

We do have some internal rules which were collecting coverage properly (both locally and remotely) without --experimental_split_coverage_postprocessing but which with the flag are producing empty coverage results for reasons I haven't yet had a chance to track down; I suspect it has something to do with the fact that they were sending coverage information directly to $COVERAGE_OUTPUT_FILE, so they didn't need the post-processing, and that's somehow interfering with the split post-processing.

@sluongng
Copy link
Contributor

sluongng commented May 3, 2023

As of Bazel 6.1.0, @coeuvre has implemented #16556 which supports Bazel coverage while using build without the bytes (--remote_download_minimal).

The test in the PR should help document the ideal setup. I suggest we close this issue for now.

@coeuvre coeuvre closed this as completed May 4, 2023
@louwers
Copy link

louwers commented May 14, 2023

Do the docs need to be updated? https://bazel.build/configure/coverage#remote-execution

@charlesoconor
Copy link

Have a question about adding the downloader regex. Was using the one describe above.

coverage --experimental_remote_download_regex=.*/((testlogs/.*/_coverage/.*)|coverage.dat$|_coverage/_coverage_report.dat$)

When running java tests it pulls back large jar objects under the _coverage _coverage/coverage-runtime_merged_instr.jar which seem to be a couple of megs. Not sure what they are used for changing the regex to

coverage:remote_execution --experimental_remote_download_regex=.*/(coverage.dat$|_coverage/_coverage_report.dat$) 

Seems to have worked and is a lot faster is that jar file needed for anything? Using bazel 6.2.1.

Also a little worried that removing testlogs will hide my test output on fails so will probably add that back in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P2 We'll consider working on this in future. (Assignee optional) team-Remote-Exec Issues and PRs for the Execution (Remote) team type: bug
Projects
No open projects
remote: bugs
  
To do
Development

No branches or pull requests