C++ coverage performance #8178

htuch · 2019-04-28T23:37:06Z

Envoy continues to experiment with switching away from its own custom coverage solution to native bazel coverage, but we have two stumbling blocks

bazel C++ coverage for Envoy crashes JVM #7279 <-- we need to be able to use remote caching (and ideally full RBE in the near future) to realize build time gains.
Test execution is too slow. It's slow enough that the parallelism we gain from executing all tests in parallel vs. our single monolithic binary today is no win.

Investigating (2) in envoyproxy/envoy#6703 has some interesting possible improvements. We see that performance when we build all targets //test/... as about 2h 8 m for build+test on CI, which is a bit slower than our existing single test binary solution which clocks in at around < 2h . This should probably be much faster, as test execution can now be split across cores.

I think that what's going on is that collect_cc_coverage.sh has major performance issues with gcov (let alone the lcov trace merger which we aren't hitting yet but is pretty bad based on historical performance work #1118 (comment)).

I instrumented and measured test vs. collection time in envoyproxy/envoy#6703 for CC=gcc CXX=g++ COVERAGE_TARGET=//test/common/upstream:cluster_manager_impl_test VALIDATE_COVERAGE=false ./test/run_envoy_bazel_coverage.sh. This yielded:

Test time: 2.87s
collect_cc_coverage.sh time: 43.56s

What's going on I think is the performance of this loop is to blame:

bazel/tools/test/collect_cc_coverage.sh

Line 87 in 9a32b86

cat "${COVERAGE_MANIFEST}" | grep ".gcno$" | while read gcno_path; do

.

Iterating over all the files in a large tree for each test in shell is pretty slow. Maybe we could have a high performance C++ gcov trace merger?

Ideally we switch to lcov/clang going forward, but this will require the low performance geninfo to be replaced to make it practical for Envoy.

CC @mattklein123 @lizan @iirina

The text was updated successfully, but these errors were encountered:

jmarantz · 2019-04-29T12:07:25Z

What's the Bazel team's position in this? Is Envoy different from (say) TensorFlow, or other large C++ projects, in it coverage requirements?

It seems like we shouldn't have to do this work in an Envoy-specific way, but maybe I'm missing something.

I was unclear on one point above: you call out "test execution is too slow" and also "collect_cc_coverage.sh has major performance issues" -- are these 2 problems or 1?

htuch · 2019-04-29T14:58:31Z

@jmarantz these are the same problem. The Bazel test phase includes coverage information collection and merging, as it all happens under the same Bazel action. Tests that take a few seconds now take an order of magnitude longer. There's definitely low hanging fruit here, switching away from shell traversal probably will be a big win. I have Linux perf I can share as well, but it's a bit muddy as perf doesn't seem to do a great job over these shell scripts.

jmarantz · 2019-04-29T16:27:38Z

Cool. A high performance C++ gcov trace merger sounds promising. Do you think it might also be high enough performance to use Python?

lizan · 2019-05-01T19:35:36Z

How about using LLVM coverage tools? I did a quick test with the instruction on: http://releases.llvm.org/8.0.0/tools/clang/docs/SourceBasedCodeCoverage.html

What I did is:

add -fprofile-instr-generate -fcoverage-mapping copt and linkopt to instrument targets
run tests with bazel, --strategy=TestRunner=standalone (otherwise profraw isn't coming outside of sandbox), and --test_env=LLVM_PROFILE_FILE="test.profraw"
llvm-profdata merge -sparse find -L bazel-out | grep test.profraw -o coverage.profdata

Merging 400+ tests profraw into profdata took only about 30 seconds in a n1-standard-16 GCE.

sgowroji · 2023-02-15T13:39:56Z

Hi there! We're doing a clean up of old issues and will be closing this one. Please reopen if you’d like to discuss anything further. We’ll respond as soon as we have the bandwidth/resources to do so.

jmarantz · 2023-02-15T13:41:46Z

Out of curiosity why is this being closed, as opposed to being addressed?

Is it fixed?

RE "discuss anything further" -- was there a discussion? I didn't see any comments from the Bazel team.

jin added team-Rules-CPP Issues for C++ rules untriaged labels Apr 30, 2019

scentini assigned iirina May 2, 2019

scentini added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels May 2, 2019

meisterT unassigned iirina May 12, 2020

oquenchil added P3 We're not considering working on this, but happy to review a PR. (No assignee) type: feature request and removed P2 We'll consider working on this in future. (Assignee optional) labels Nov 19, 2020

sgowroji added the stale Issues or PRs that are stale (no activity for 30 days) label Feb 15, 2023

sgowroji closed this as not planned Won't fix, can't repro, duplicate, stale Feb 15, 2023

sgowroji added not stale Issues or PRs that are inactive but not considered stale and removed stale Issues or PRs that are stale (no activity for 30 days) labels Feb 15, 2023

sgowroji reopened this Feb 15, 2023

phlax mentioned this issue Aug 15, 2023

Coverage may have gotten even slower envoyproxy/envoy#29031

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C++ coverage performance #8178

C++ coverage performance #8178

htuch commented Apr 28, 2019

jmarantz commented Apr 29, 2019 •

edited

htuch commented Apr 29, 2019 •

edited

jmarantz commented Apr 29, 2019

lizan commented May 1, 2019

sgowroji commented Feb 15, 2023 •

edited

jmarantz commented Feb 15, 2023

C++ coverage performance #8178

C++ coverage performance #8178

Comments

htuch commented Apr 28, 2019

jmarantz commented Apr 29, 2019 • edited

htuch commented Apr 29, 2019 • edited

jmarantz commented Apr 29, 2019

lizan commented May 1, 2019

sgowroji commented Feb 15, 2023 • edited

jmarantz commented Feb 15, 2023

jmarantz commented Apr 29, 2019 •

edited

htuch commented Apr 29, 2019 •

edited

sgowroji commented Feb 15, 2023 •

edited