[fix][ci] Fix codecov reporting by configuring to wait for all builds sending coverage #19237

lhotari · 2023-01-15T14:09:46Z

Motivation

currently the codecov reporting processes the uploaded coverage reports too early and thus the coverage result is invalid.
the after_n_builds configuration setting should be used for waiting for all builds to finish before merging the coverage reports from parallel builds.
- see https://docs.codecov.com/docs/notifications#preventing-notifications-until-after-n-builds

Modifications

Set after_n_builds to 10.

Documentation

doc
doc-required
doc-not-needed
doc-complete

… sending coverage - see https://docs.codecov.com/docs/notifications#preventing-notifications-until-after-n-builds

lhotari · 2023-01-15T14:10:25Z

Code coverage was activated in #17382 . @yaalsn would you mind reviewing this PR?

yaalsn · 2023-01-16T02:02:47Z

It would be better to add after_n_builds, thanks for your improvement! After every CI finishes, the codecov will update the result in the comment and its platform, so currently the final result is right.

yaalsn · 2023-01-16T02:12:05Z

The final codecov result of comparison in PR comment is not accurated because the master branch CI should run after every PR merged, but now it doesn't because pulsar's CI runner resource is not sufficient and it needs long time to finish. Maybe the first thing is that we need to shorten the unit test running time.

lhotari · 2023-01-16T05:56:42Z

The final codecov result of comparison in PR comment is not accurated because the master branch CI should run after every PR merged, but now it doesn't because pulsar's CI runner resource is not sufficient and it needs long time to finish. Maybe the first thing is that we need to shorten the unit test running time.

My assumption is that the main reason for the false comparison results have been the missing after_n_builds setting. It's true that it won't be exactly accurate if master branch results are outdated.

Let's see if this assumption holds. The codecov upload was still missing from the 10th build job that resides in ci-pulsar-flaky.yaml and that's why the results didn't show up.

lhotari · 2023-01-16T08:02:42Z

One more attempt.

The coverage profile wasn't activated for the BROKER_FLAKY unit test group:

pulsar/build/run_unit_group.sh

Lines 36 to 40 in f23363c

    
           if echo "${FUNCNAME[@]}" | grep "flaky"; then 
        
             TARGET="verify" 
        
           else 
        
             TARGET="verify -Pcoverage" 
        
           fi

@yaalsn Do you remember the reason for doing this?

In addition, the coverage profile wasn't activated when the install target (maven goal) was used instead of verify. I fixed the issues in the run_unit_group.sh script and also upgraded jacoco-maven-plugin to 0.8.8 . Jacoco 0.8.8 officially supports Java 17: https://github.com/jacoco/jacoco/releases/tag/v0.8.8

lhotari · 2023-01-16T08:06:08Z

@yaalsn Do you remember the reason for doing this?

I found this description in #18081 "Except flaky test group to run code coverage, otherwise we cannot get an accurate base coverage."
Why would that make it more accurate? We have a lot of tests in the flaky test group.

- ensure that there's enough time to process Jacoco coverage data - https://maven.apache.org/surefire/maven-surefire-plugin/examples/shutdown.html

lhotari · 2023-01-16T10:30:14Z

codecov has a known issue codecov/codecov-action#598 / codecov/feedback#126 where uploading to codecov might fail with this error "Unable to locate build via Github Actions API. Please upload with the Codecov repository upload token to resolve issue."

[2023-01-16T10:14:31.630Z] ['info'] Pinging Codecov: https://codecov.io/upload/v4?package=github-action-3.1.1-uploader-0.3.2&token=*******&branch=lh-fix-codecov-config-to-wait-for-all-unit-tests&build=3928945879&build_url=https%3A%2F%2Fgithub.com%2Fapache%2Fpulsar%2Factions%2Fruns%2F3928945879&commit=b8189e0dfe212686a6aba0319866ee8f996dc796&job=Pulsar+CI&pr=19237&service=github-actions&slug=apache%2Fpulsar&name=&tag=&flags=unittests&parent=
[2023-01-16T10:14:32.646Z] ['error'] There was an error running the uploader: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 404 - {'detail': ErrorDetail(string='Unable to locate build via Github Actions API. Please upload with the Codecov repository upload token to resolve issue.', code='not_found')}

It's too bad that there isn't a way to retry uploading in this case.

yaalsn · 2023-01-17T05:30:02Z

One more attempt.

The coverage profile wasn't activated for the BROKER_FLAKY unit test group:

pulsar/build/run_unit_group.sh

Lines 36 to 40 in f23363c

if echo "${FUNCNAME[@]}" | grep "flaky"; then

TARGET="verify"

else

TARGET="verify -Pcoverage"

fi

@yaalsn Do you remember the reason for doing this?

In addition, the coverage profile wasn't activated when the install target (maven goal) was used instead of verify. I fixed the issues in the run_unit_group.sh script and also upgraded jacoco-maven-plugin to 0.8.8 . Jacoco 0.8.8 officially supports Java 17: https://github.com/jacoco/jacoco/releases/tag/v0.8.8

The flaky test group allows failure when running CI, if any test fails during the flaky test group running, the result will be not accurated.

lhotari · 2023-01-17T06:06:59Z

The flaky test group allows failure when running CI, if any test fails during the flaky test group running, the result will be not accurated.

A few flaky tests won't contribute several percentage difference in coverage.
The problem might be caused by the special FailFastListener which will skip all remaining tests for a specific forked test process. I'll create a PR to disable fail fast mode for the flaky build job.

lhotari · 2023-01-17T08:12:39Z

The flaky test group allows failure when running CI, if any test fails during the flaky test group running, the result will be not accurated.

@yaalsn The flaky test group doesn't allow failures when running CI. It's the "quarantine" group. We don't have many tests in that group.

I took a look at the reasons for differences. It's caused by builds that fail because of flaky tests in the master branch. That's why the coverage metrics for the base commit will be wrong.
The Pulsar master branch builds have been failing recently because of a high number of very flaky tests.
This is the link to list pulsar-ci builds for the master branch:
https://github.com/apache/pulsar/actions/workflows/pulsar-ci.yaml?query=branch%3Amaster

[fix][ci] Fox codecov reporting by configuring to wait for all builds…

597db97

… sending coverage - see https://docs.codecov.com/docs/notifications#preventing-notifications-until-after-n-builds

lhotari added area/ci ready-to-test labels Jan 15, 2023

lhotari added this to the 2.12.0 milestone Jan 15, 2023

lhotari requested review from Technoboy-, codelipenghui, nodece, tisonkun, nicoloboschi and mattisonchao January 15, 2023 14:09

lhotari self-assigned this Jan 15, 2023

github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jan 15, 2023

lhotari requested a review from eolivelli January 15, 2023 14:10

lhotari changed the title ~~[fix][ci] Fox codecov reporting by configuring to wait for all builds sending coverage~~ [fix][ci] Fix codecov reporting by configuring to wait for all builds sending coverage Jan 15, 2023

yaalsn approved these changes Jan 16, 2023

View reviewed changes

nodece approved these changes Jan 16, 2023

View reviewed changes

Upload to codecov in pulsar-ci-flaky.yaml

1636927

lhotari added 2 commits January 16, 2023 09:50

Activate "-Pcoverage" profile for all unit test runs

5a00094

upgrade jacoco-maven-plugin version to 0.8.8

81d9c9d

Increase forkedProcessExitTimeoutInSeconds to 60 seconds

b8189e0

- ensure that there's enough time to process Jacoco coverage data - https://maven.apache.org/surefire/maven-surefire-plugin/examples/shutdown.html

Retry uploading to codecov multiple times

71230c7

apache deleted a comment from codecov-commenter Jan 16, 2023

lhotari merged commit 13386b7 into apache:master Jan 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix][ci] Fix codecov reporting by configuring to wait for all builds sending coverage #19237

[fix][ci] Fix codecov reporting by configuring to wait for all builds sending coverage #19237

lhotari commented Jan 15, 2023

lhotari commented Jan 15, 2023

yaalsn commented Jan 16, 2023

yaalsn commented Jan 16, 2023

lhotari commented Jan 16, 2023

lhotari commented Jan 16, 2023 •

edited

lhotari commented Jan 16, 2023

lhotari commented Jan 16, 2023 •

edited

yaalsn commented Jan 17, 2023

lhotari commented Jan 17, 2023

lhotari commented Jan 17, 2023

[fix][ci] Fix codecov reporting by configuring to wait for all builds sending coverage #19237

[fix][ci] Fix codecov reporting by configuring to wait for all builds sending coverage #19237

Conversation

lhotari commented Jan 15, 2023

Motivation

Modifications

Documentation

lhotari commented Jan 15, 2023

yaalsn commented Jan 16, 2023

yaalsn commented Jan 16, 2023

lhotari commented Jan 16, 2023

lhotari commented Jan 16, 2023 • edited

lhotari commented Jan 16, 2023

lhotari commented Jan 16, 2023 • edited

yaalsn commented Jan 17, 2023

lhotari commented Jan 17, 2023

lhotari commented Jan 17, 2023

lhotari commented Jan 16, 2023 •

edited

lhotari commented Jan 16, 2023 •

edited