Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make better use of caching in CI #538

Merged
merged 5 commits into from
Oct 5, 2023
Merged

Make better use of caching in CI #538

merged 5 commits into from
Oct 5, 2023

Conversation

savente93
Copy link
Contributor

@savente93 savente93 commented Oct 2, 2023

Issue addressed

NA

Explanation

As with all purely CI PRs there sadly arent' any tests that I can show to demonstrate it works, but I've worked on an dtested it in a seperate repository that the reviewer can look at should they want: https://github.com/savente93/ci-playground. Because the dependency caches tend to be quite large, there's not enough space in the cache allowance to let each PR have it's own cache so I went with the following setup:

  1. Every week on sunday all caches get cleared (to make sure they don't drift). There is also a manual dispatch for this one
  2. then caches for each of the python version tests suites as well as the docs generation is generated on main. this is a read only cache that doesn't get updated until it's regenerated
  3. If the resulting environment.yml is unchanged the tests can run without doing any updating
  4. if it is changed, the main cache is still loaded and and update is preformed to get the new dependencies. This sadly has to happen every time the CI is run, but still improves on the current situation.
  5. Since I've managed to cache mamba itself too, the rest of the flow can't continue without a cache hit, so I've added a step to just exit the flow if no cache is restred with a bit of a nicer error message.
  6. The byte codes generated are also included in the cache, this might help with the issues experienced in use xugrid in mesh methods #535

Checklist

  • Updated tests or added new tests
  • Branch is up to date with main
  • Tests & pre-commit hooks pass
  • Updated documentation if needed
  • Updated changelog.rst if needed

Additional Notes (optional)

Add any additional notes or information that may be helpful.

@savente93 savente93 changed the title copy over caching solutions from ci-playground Make better use of caching in CI Oct 2, 2023
@savente93 savente93 requested a review from Jaapel October 3, 2023 07:05
@savente93 savente93 marked this pull request as ready for review October 3, 2023 07:06
.github/workflows/tests.yml Show resolved Hide resolved
- name: Update environment & write to cache
run: mamba env update -n hydromt -f environment.yml
if: steps.cache.outputs.cache-hit != 'true'
- name: Fail on no cache restore
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean the workflow fails if there is no cache available? What would be the expected cases for when no caching is available?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it does, this should actually never happen, but because mamba itself is in the cache the flow would fail anyway, this is just to provide a bit of a nicer error message. In the case the cache is unavailable, it should be regenerated (which happens on a schedule)

@@ -0,0 +1,95 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we refresh our cache? To obtain new versions of packages? Because there are seperate tools for that one I think

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Various reasons. Obtained new versions of packages, integrate new packages into the cache, make sure all the tests are running from the same cache. Because mamba solving is not stable, it's important to keep all the tests in sync by periodically refreshing the cache that they all use. Besides, the cache limit is at 10GB which we bump up against about weekly, instead of trying to do very clever cache invalidation I thought it was much more maintainable to just clean out and regenerate the cache every week, that way we make sure we keep within our limits and that everything uses the up to date versions of everything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that dependabot does not do mamba / conda, so I think this is way better than nothing :)

@savente93 savente93 requested a review from Jaapel October 5, 2023 09:33
@savente93 savente93 merged commit 3f12d92 into main Oct 5, 2023
3 checks passed
@savente93 savente93 deleted the caches branch October 5, 2023 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants