Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix][ci] Fix codecov reporting by configuring to wait for all builds sending coverage #19237

Merged

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Jan 15, 2023

Motivation

Modifications

Set after_n_builds to 10.

Documentation

  • doc
  • doc-required
  • doc-not-needed
  • doc-complete

@lhotari lhotari added this to the 2.12.0 milestone Jan 15, 2023
@lhotari lhotari self-assigned this Jan 15, 2023
@github-actions github-actions bot added the doc-not-needed Your PR changes do not impact docs label Jan 15, 2023
@lhotari
Copy link
Member Author

lhotari commented Jan 15, 2023

Code coverage was activated in #17382 . @yaalsn would you mind reviewing this PR?

@lhotari lhotari changed the title [fix][ci] Fox codecov reporting by configuring to wait for all builds sending coverage [fix][ci] Fix codecov reporting by configuring to wait for all builds sending coverage Jan 15, 2023
@yaalsn
Copy link
Contributor

yaalsn commented Jan 16, 2023

It would be better to add after_n_builds, thanks for your improvement! After every CI finishes, the codecov will update the result in the comment and its platform, so currently the final result is right.

@yaalsn
Copy link
Contributor

yaalsn commented Jan 16, 2023

The final codecov result of comparison in PR comment is not accurated because the master branch CI should run after every PR merged, but now it doesn't because pulsar's CI runner resource is not sufficient and it needs long time to finish. Maybe the first thing is that we need to shorten the unit test running time.

@lhotari
Copy link
Member Author

lhotari commented Jan 16, 2023

The final codecov result of comparison in PR comment is not accurated because the master branch CI should run after every PR merged, but now it doesn't because pulsar's CI runner resource is not sufficient and it needs long time to finish. Maybe the first thing is that we need to shorten the unit test running time.

My assumption is that the main reason for the false comparison results have been the missing after_n_builds setting. It's true that it won't be exactly accurate if master branch results are outdated.

Let's see if this assumption holds. The codecov upload was still missing from the 10th build job that resides in ci-pulsar-flaky.yaml and that's why the results didn't show up.

@lhotari
Copy link
Member Author

lhotari commented Jan 16, 2023

One more attempt.

The coverage profile wasn't activated for the BROKER_FLAKY unit test group:

if echo "${FUNCNAME[@]}" | grep "flaky"; then
TARGET="verify"
else
TARGET="verify -Pcoverage"
fi

@yaalsn Do you remember the reason for doing this?

In addition, the coverage profile wasn't activated when the install target (maven goal) was used instead of verify. I fixed the issues in the run_unit_group.sh script and also upgraded jacoco-maven-plugin to 0.8.8 . Jacoco 0.8.8 officially supports Java 17: https://github.com/jacoco/jacoco/releases/tag/v0.8.8

@lhotari
Copy link
Member Author

lhotari commented Jan 16, 2023

@yaalsn Do you remember the reason for doing this?

I found this description in #18081 "Except flaky test group to run code coverage, otherwise we cannot get an accurate base coverage."
Why would that make it more accurate? We have a lot of tests in the flaky test group.

@lhotari
Copy link
Member Author

lhotari commented Jan 16, 2023

codecov has a known issue codecov/codecov-action#598 / codecov/feedback#126 where uploading to codecov might fail with this error "Unable to locate build via Github Actions API. Please upload with the Codecov repository upload token to resolve issue."

[2023-01-16T10:14:31.630Z] ['info'] Pinging Codecov: https://codecov.io/upload/v4?package=github-action-3.1.1-uploader-0.3.2&token=*******&branch=lh-fix-codecov-config-to-wait-for-all-unit-tests&build=3928945879&build_url=https%3A%2F%2Fgithub.com%2Fapache%2Fpulsar%2Factions%2Fruns%2F3928945879&commit=b8189e0dfe212686a6aba0319866ee8f996dc796&job=Pulsar+CI&pr=19237&service=github-actions&slug=apache%2Fpulsar&name=&tag=&flags=unittests&parent=
[2023-01-16T10:14:32.646Z] ['error'] There was an error running the uploader: Error uploading to [https://codecov.io:](https://codecov.io/) Error: There was an error fetching the storage URL during POST: 404 - {'detail': ErrorDetail(string='Unable to locate build via Github Actions API. Please upload with the Codecov repository upload token to resolve issue.', code='not_found')}

It's too bad that there isn't a way to retry uploading in this case.

@apache apache deleted a comment from codecov-commenter Jan 16, 2023
@lhotari lhotari merged commit 13386b7 into apache:master Jan 16, 2023
@yaalsn
Copy link
Contributor

yaalsn commented Jan 17, 2023

One more attempt.

The coverage profile wasn't activated for the BROKER_FLAKY unit test group:

if echo "${FUNCNAME[@]}" | grep "flaky"; then
TARGET="verify"
else
TARGET="verify -Pcoverage"
fi

@yaalsn Do you remember the reason for doing this?

In addition, the coverage profile wasn't activated when the install target (maven goal) was used instead of verify. I fixed the issues in the run_unit_group.sh script and also upgraded jacoco-maven-plugin to 0.8.8 . Jacoco 0.8.8 officially supports Java 17: https://github.com/jacoco/jacoco/releases/tag/v0.8.8

The flaky test group allows failure when running CI, if any test fails during the flaky test group running, the result will be not accurated.

@lhotari
Copy link
Member Author

lhotari commented Jan 17, 2023

The flaky test group allows failure when running CI, if any test fails during the flaky test group running, the result will be not accurated.

A few flaky tests won't contribute several percentage difference in coverage.
The problem might be caused by the special FailFastListener which will skip all remaining tests for a specific forked test process. I'll create a PR to disable fail fast mode for the flaky build job.

@lhotari
Copy link
Member Author

lhotari commented Jan 17, 2023

The flaky test group allows failure when running CI, if any test fails during the flaky test group running, the result will be not accurated.

@yaalsn The flaky test group doesn't allow failures when running CI. It's the "quarantine" group. We don't have many tests in that group.

I took a look at the reasons for differences. It's caused by builds that fail because of flaky tests in the master branch. That's why the coverage metrics for the base commit will be wrong.
The Pulsar master branch builds have been failing recently because of a high number of very flaky tests.
This is the link to list pulsar-ci builds for the master branch:
https://github.com/apache/pulsar/actions/workflows/pulsar-ci.yaml?query=branch%3Amaster

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ci doc-not-needed Your PR changes do not impact docs ready-to-test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants