Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C++ coverage performance #8178

Open
htuch opened this issue Apr 28, 2019 · 6 comments
Open

C++ coverage performance #8178

htuch opened this issue Apr 28, 2019 · 6 comments
Labels
not stale Issues or PRs that are inactive but not considered stale P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-CPP Issues for C++ rules type: feature request

Comments

@htuch
Copy link

htuch commented Apr 28, 2019

Envoy continues to experiment with switching away from its own custom coverage solution to native bazel coverage, but we have two stumbling blocks

  1. bazel C++ coverage for Envoy crashes JVM #7279 <-- we need to be able to use remote caching (and ideally full RBE in the near future) to realize build time gains.
  2. Test execution is too slow. It's slow enough that the parallelism we gain from executing all tests in parallel vs. our single monolithic binary today is no win.

Investigating (2) in envoyproxy/envoy#6703 has some interesting possible improvements. We see that performance when we build all targets //test/... as about 2h 8 m for build+test on CI, which is a bit slower than our existing single test binary solution which clocks in at around < 2h . This should probably be much faster, as test execution can now be split across cores.

I think that what's going on is that collect_cc_coverage.sh has major performance issues with gcov (let alone the lcov trace merger which we aren't hitting yet but is pretty bad based on historical performance work #1118 (comment)).

I instrumented and measured test vs. collection time in envoyproxy/envoy#6703 for CC=gcc CXX=g++ COVERAGE_TARGET=//test/common/upstream:cluster_manager_impl_test VALIDATE_COVERAGE=false ./test/run_envoy_bazel_coverage.sh. This yielded:

  • Test time: 2.87s
  • collect_cc_coverage.sh time: 43.56s

What's going on I think is the performance of this loop is to blame:

cat "${COVERAGE_MANIFEST}" | grep ".gcno$" | while read gcno_path; do
.

Iterating over all the files in a large tree for each test in shell is pretty slow. Maybe we could have a high performance C++ gcov trace merger?

Ideally we switch to lcov/clang going forward, but this will require the low performance geninfo to be replaced to make it practical for Envoy.

CC @mattklein123 @lizan @iirina

@jmarantz
Copy link

jmarantz commented Apr 29, 2019

What's the Bazel team's position in this? Is Envoy different from (say) TensorFlow, or other large C++ projects, in it coverage requirements?

It seems like we shouldn't have to do this work in an Envoy-specific way, but maybe I'm missing something.

I was unclear on one point above: you call out "test execution is too slow" and also "collect_cc_coverage.sh has major performance issues" -- are these 2 problems or 1?

@htuch
Copy link
Author

htuch commented Apr 29, 2019

@jmarantz these are the same problem. The Bazel test phase includes coverage information collection and merging, as it all happens under the same Bazel action. Tests that take a few seconds now take an order of magnitude longer. There's definitely low hanging fruit here, switching away from shell traversal probably will be a big win. I have Linux perf I can share as well, but it's a bit muddy as perf doesn't seem to do a great job over these shell scripts.

@jmarantz
Copy link

Cool. A high performance C++ gcov trace merger sounds promising. Do you think it might also be high enough performance to use Python?

@jin jin added team-Rules-CPP Issues for C++ rules untriaged labels Apr 30, 2019
@lizan
Copy link

lizan commented May 1, 2019

How about using LLVM coverage tools? I did a quick test with the instruction on: http://releases.llvm.org/8.0.0/tools/clang/docs/SourceBasedCodeCoverage.html

What I did is:

  • add -fprofile-instr-generate -fcoverage-mapping copt and linkopt to instrument targets
  • run tests with bazel, --strategy=TestRunner=standalone (otherwise profraw isn't coming outside of sandbox), and --test_env=LLVM_PROFILE_FILE="test.profraw"
  • llvm-profdata merge -sparse find -L bazel-out | grep test.profraw -o coverage.profdata

Merging 400+ tests profraw into profdata took only about 30 seconds in a n1-standard-16 GCE.

@scentini scentini added P2 We'll consider working on this in future. (Assignee optional) and removed untriaged labels May 2, 2019
@oquenchil oquenchil added P3 We're not considering working on this, but happy to review a PR. (No assignee) type: feature request and removed P2 We'll consider working on this in future. (Assignee optional) labels Nov 19, 2020
@sgowroji sgowroji added the stale Issues or PRs that are stale (no activity for 30 days) label Feb 15, 2023
@sgowroji
Copy link
Member

sgowroji commented Feb 15, 2023

Hi there! We're doing a clean up of old issues and will be closing this one. Please reopen if you’d like to discuss anything further. We’ll respond as soon as we have the bandwidth/resources to do so.

@sgowroji sgowroji closed this as not planned Won't fix, can't repro, duplicate, stale Feb 15, 2023
@jmarantz
Copy link

Out of curiosity why is this being closed, as opposed to being addressed?

Is it fixed?

RE "discuss anything further" -- was there a discussion? I didn't see any comments from the Bazel team.

@sgowroji sgowroji added not stale Issues or PRs that are inactive but not considered stale and removed stale Issues or PRs that are stale (no activity for 30 days) labels Feb 15, 2023
@sgowroji sgowroji reopened this Feb 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
not stale Issues or PRs that are inactive but not considered stale P3 We're not considering working on this, but happy to review a PR. (No assignee) team-Rules-CPP Issues for C++ rules type: feature request
Projects
None yet
Development

No branches or pull requests

8 participants