New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
C++ coverage performance #8178
Comments
What's the Bazel team's position in this? Is Envoy different from (say) TensorFlow, or other large C++ projects, in it coverage requirements? It seems like we shouldn't have to do this work in an Envoy-specific way, but maybe I'm missing something. I was unclear on one point above: you call out "test execution is too slow" and also "collect_cc_coverage.sh has major performance issues" -- are these 2 problems or 1? |
@jmarantz these are the same problem. The Bazel test phase includes coverage information collection and merging, as it all happens under the same Bazel action. Tests that take a few seconds now take an order of magnitude longer. There's definitely low hanging fruit here, switching away from shell traversal probably will be a big win. I have Linux perf I can share as well, but it's a bit muddy as perf doesn't seem to do a great job over these shell scripts. |
Cool. A high performance C++ gcov trace merger sounds promising. Do you think it might also be high enough performance to use Python? |
How about using LLVM coverage tools? I did a quick test with the instruction on: http://releases.llvm.org/8.0.0/tools/clang/docs/SourceBasedCodeCoverage.html What I did is:
Merging 400+ tests profraw into profdata took only about 30 seconds in a |
Hi there! We're doing a clean up of old issues and will be closing this one. Please reopen if you’d like to discuss anything further. We’ll respond as soon as we have the bandwidth/resources to do so. |
Out of curiosity why is this being closed, as opposed to being addressed? Is it fixed? RE "discuss anything further" -- was there a discussion? I didn't see any comments from the Bazel team. |
Envoy continues to experiment with switching away from its own custom coverage solution to native
bazel coverage
, but we have two stumbling blocksInvestigating (2) in envoyproxy/envoy#6703 has some interesting possible improvements. We see that performance when we build all targets
//test/...
as about 2h 8 m for build+test on CI, which is a bit slower than our existing single test binary solution which clocks in at around < 2h . This should probably be much faster, as test execution can now be split across cores.I think that what's going on is that
collect_cc_coverage.sh
has major performance issues withgcov
(let alone thelcov
trace merger which we aren't hitting yet but is pretty bad based on historical performance work #1118 (comment)).I instrumented and measured test vs. collection time in envoyproxy/envoy#6703 for
CC=gcc CXX=g++ COVERAGE_TARGET=//test/common/upstream:cluster_manager_impl_test VALIDATE_COVERAGE=false ./test/run_envoy_bazel_coverage.sh
. This yielded:collect_cc_coverage.sh
time: 43.56sWhat's going on I think is the performance of this loop is to blame:
bazel/tools/test/collect_cc_coverage.sh
Line 87 in 9a32b86
Iterating over all the files in a large tree for each test in shell is pretty slow. Maybe we could have a high performance C++ gcov trace merger?
Ideally we switch to
lcov
/clang
going forward, but this will require the low performancegeninfo
to be replaced to make it practical for Envoy.CC @mattklein123 @lizan @iirina
The text was updated successfully, but these errors were encountered: