-
-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Code coverage uploads fail occasionally at the CI #9832
Comments
Also to note here that the change done at #9686 is very much needed until simplecov-ruby/simplecov#1019 is resolved. We just found out another problem after implementing that fix. |
It also seems that the selective running of the specs causes coverage comparison failures. For example, at #9621, codecov is reporting coverage degradation for the core classes and those specs weren't even run for that PR because it didn't touch any classes that would trigger the core specs to run: The bottom line from this is that in case any code in the target class is covered in the coverage report, it will compare that coverage to any previous runs which might have covered the same file. And in case the coverage differs, it will be a problem for Codecov as we are comparing against the "project" coverage, not the "patch" coverage. |
Same thing as in the original post happened also at: The coverage upload failed at the One possible solution which was also suggested at similar problems after searching would be to modify the coverage upload script so that it would retry e.g. 10 times and wait 1 minute between each retry until the upload succeeds or the retries amount is reached. |
Even with the official Codecov action we seem to have the same problem (at PR #10344): Run log:
|
Related discussions:
So the bottom line is that:
Potential solution is provided in the discussion linked above: codecov/codecov-action#557 (comment)
codecov/codecov-action#557 (comment)
codecov/codecov-action#557 (comment)
So if we expose our Codecov token through the workflow files by hard coding it there, i.e. not through the configured environment variables, it should work for both workflows triggered from the |
@ahukkanen great summary. I guess that if the only risk is that someone could send reports on our behalf and nothing else, then we can live with that. I'll also subscribe to those issues so we can keep try again with the solution provided by them. Go ahead with the change 👍🏽 |
With about one week of experience, I haven't yet seen CodeCov failing unnecessarily so the workaround seems to work. We'll keep following the situation, I believe we haven't yet tested it under heavy loads, i.e. when there is a large amount of workflows run and a large amount of CodeCov requests at the same time. |
Today I noticed the first error on this run when there was rather high load on the CI actions: Codecov returned HTTP 500 which is different what we used to see before setting the token. Here's the whole log from the upload coverage step:
Not much details available what might have caused it but the good thing is that it's no longer the same error that we used to get. This is likely something internal at Codecov's end. |
I posted my findings also at: codecov/codecov-action#926. Would be really helpful if they provided a retry option since the issues we are facing now seem to be at their end, so there's not much we can do about them. |
I think we need to reopen this issue since this is not permanently solved. Codecov is still failing quite often because one of the actions fails to upload its coverage. Not all of them but if we are missing even one, it has a high impact on the reported coverage. There is still no activity on this matter from Codecov's end but in the issue linked above (codecov/codecov-action#926), there are a couple of suggestions how we could try to solve this issue:
|
@ahukkanen This PR tried to enable fail_ci_if_error, which would fail the pipeline if there is some HTTP error, but as @andreslucena pointed out here #11837 (comment), even though there was no error on uploading the coverage data, the codecov still failed. What i could observe was that there was no upload error, yet the codecov took ages to process the data to display it... Maybe we have a very big codebase? :D I have tried in the same build to split the pipeline in 3802333, yet the codecov was still failing. As i understand the github actions, there is no way of allowing github to finish all the workflows, to generate artifacts containing coverage data, then download all the artifacts using one single workflow ... and send it to codecov in a more compact request (either a merged result or multiple workflows ) |
@alecslupu OK, that is actually interesting because it seems the problem may be somewhere else than I originally thought. In my previous investigations, I figured the problem would be that some of the actions fail to upload the coverage report, which can still happen. But at #11837 I was particularly interested that you said:
What do you mean by "codecov processing time"? |
I think i have answered here : #9832 (comment) To be honest, even if i try to load https://app.codecov.io/gh/decidim/decidim/pull/11837 on codecov is taking me ages ( aka not loading data) What is more interesting is that i have this commit (updates only a js library) that generates an indirect change ... https://app.codecov.io/gh/decidim/decidim/commit/94c3c595ca42ef02073aada193ca70780734c907/indirect-changes |
I am just saying ... maybe is worth trying to find other apps that provide similar services like enable the code climate ( which is already installed), yet the codeclimate does not appear to support multiple workflow runs in the same time. |
@alecslupu Yes, that is definitely a good idea at this point since it doesn't seem we are able to find a solution to this problem. In the past it was suggested by @microstudi that we would use Code Climate also for the coverage checks as for his experience, it has worked well. I don't know if there are any downsides to that. |
I am thinking that any of the following would allow us to wait for all the other workflows to finish. If this is the case, we could patch the current workflows so that we upload the coverage as artifact, the in another place to download all of them, unzip, merge in a single file, then push it to whatever cov service. https://github.com/mktcode/consecutive-workflow-action |
@alecslupu From the point above I understood that at least with Codecov we would still have the same problem with that strategy if Codecov reports the coverage to GitHub before it has finished analyzing the whole report. Just as a side note that we've never seen this problem with any modules where we are using Codecov since they only upload a single report. |
This is the only feature that we need, right? The term that I've seen used for this feature is "carryforward". I see that Coverall supports it: https://coveralls.io/better-monorepo-support (I've never used it) |
Actually, CodeClimate was used in the past in Decidim. They decided to change to suport multi action uploads. I don't know if codeclimate suports it now |
Describe the bug
At #9686 we tried to solve the issue with the "flaky" code coverage reports that sometimes report coverage dropped even when it didn't really drop.
The implemented solution was based on simplecov-ruby/simplecov#1019 which explains simplecov having a bug and we can avoid that bug by letting codecov itself merge the reports and keeping them separate on parallel runs.
After this was implemented, we started getting this kind of error when trying to upload the reports to codecov:
To Reproduce
This happens randomly, so see #9686 (comment).
That comment explains an actual case where this happened and the reason why it happened.
Expected behavior
It would be expected that the code coverage report uploading succeeds normally.
Screenshots
Stacktrace
Extra data (please complete the following information):
develop
Additional context
The codecov uploader we are using (https://codecov.io/bash) would allow passing the "codecov token" using
-t TOKEN
. It would need to be added here:decidim/.github/upload_coverage.sh
Lines 11 to 13 in c6791f1
After investigating the codecov script, this could be also solved simply by providing the
CODECOV_TOKEN
environment value to the upload script.The problem is that at least I don't have access to the Codecov account to get that token.
Anyone knows who has access to the Codecov account?
The text was updated successfully, but these errors were encountered: