Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix - Cache miss CI fall back #3799

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from

Conversation

Agnul97
Copy link
Contributor

@Agnul97 Agnul97 commented Jun 27, 2023

Brief description of the PR.
This PR introduces a fall-back strategy for cache-miss cases found in the CI process.

Related Issue
I've noticed some rare occasions where the jobs, while retrieving the maven artifacts cache, get stuck and did not manage to retrieve it. I found what was the problem here https://github.com/actions/cache/blob/main/tips-and-workarounds.md#cache-segment-restore-timeout. Basically, sometimes the platform doesn't respond and the Cache gitAction fails to retrieve the cache; to this end, a time-out has been inserted in order to prevent a too-long restore phase.

Description of the solution adopted
I've inserted a new step, in the test jobs, that verifies if the previous cache restore step concluded with a cache miss (when the above-mentioned time-out is reached there is a cache miss as stated in https://github.com/actions/cache/blob/main/tips-and-workarounds.md#cache-segment-restore-timeout) and in that case, the step builds the maven artifacts. To insert this fall-back strategy I used the restore feature of the Cache gitAction (actions/cache/restore@v3) in order to prevent the cache saving in the post-run phase that is performed by the Cache gitAction.

Additionally, by thinking that a similar fail could happen when the build job saves the cache, I improved that job in order to insert a failing of the workflow in the (rare) case where it doesn't manage to save the cache. I decided to do so because I thought that It would be more time-efficient to re-run the workflow if this unlucky event happens, and hope that the same failure doesn't happen, instead of wasting time for the rebuild of the artifacts in each test job. I never encountered an event like this in the workflows of eclipse/kapua that I have analyzed but I think it could be possible.

@codecov
Copy link

codecov bot commented Jun 27, 2023

Codecov Report

Merging #3799 (6e04978) into develop (1a06fb1) will increase coverage by 0.01%.
The diff coverage is n/a.

❗ Current head 6e04978 differs from pull request most recent head 192fe1a. Consider uploading reports for the commit 192fe1a to get more accurate results

Impacted file tree graph

@@              Coverage Diff              @@
##             develop    #3799      +/-   ##
=============================================
+ Coverage      23.15%   23.16%   +0.01%     
  Complexity        26       26              
=============================================
  Files           1866     1866              
  Lines          35288    35288              
  Branches        2782     2782              
=============================================
+ Hits            8170     8175       +5     
+ Misses         26807    26801       -6     
- Partials         311      312       +1     

see 1 file with indirect coverage changes

with:
path: ~/.m2/repository
key: ${{ github.run_id }}-${{ github.run_number }}-maven-cache
- name: Maven artifacts creation #if for some reason there was a cache miss then create the maven artifacts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would fall back to the case of each test job to run the entire build independently - are we sure that's preferrable to simply failing the job?
As you are protecting from cache-get failures, reattempting the individual job again would probably be faster?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest introducing the cache retrieval timeout explained in https://github.com/actions/cache/blob/main/tips-and-workarounds.md#cache-segment-restore-timeout - or was that already in place?

Copy link
Contributor Author

@Agnul97 Agnul97 Jun 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the timeout was already in place (10 minutes). I honestly don't know in that case when the platform will be available in the future so I preferred to insert the building of the artifacts instead of risking losing other 10 minutes in a retry of the cache retrieval (if you think that building the artifacts last something like 10 minutes)

I noticed analyzing the past workflows under the repo that in some very rare cases the cache retrieval exceeded the time-out so I think that inserting this building of the artifacts would not be so heavy overall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants