Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use julia-actions/cache in CI #1004

Closed
wants to merge 3 commits into from
Closed

Conversation

rikhuijzer
Copy link

Thanks to Ian Butterworth, julia-actions/cache caches ~/.julia/compiled too (julia-actions/cache#71).

Maybe interesting for the SciML ecosystem.

@rikhuijzer
Copy link
Author

rikhuijzer commented Dec 1, 2023

I've set fail_fast: false in 3aec1fb so that GitHub doesn't automatically cancel all runs if one fails.

@devmotion
Copy link
Member

I think generally in SciML it's better to use fail_fast: true since there's already a very high computational burden due to CI in SciML and usually a large number of CI jobs. So IMO everything that fails should fail immediately.

@rikhuijzer
Copy link
Author

I think generally in SciML it's better to use fail_fast: true since there's already a very high computational burden due to CI in SciML and usually a large number of CI jobs. So IMO everything that fails should fail immediately.

I think that makes sense if you assume developer time has little value

@ChrisRackauckas
Copy link
Member

What is this caching doing? Before and after?

Copy link

codecov bot commented Dec 1, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (4f8e26f) 86.09% compared to head (94a7c6e) 86.09%.
Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #1004   +/-   ##
=======================================
  Coverage   86.09%   86.09%           
=======================================
  Files          11       11           
  Lines         151      151           
=======================================
  Hits          130      130           
  Misses         21       21           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ChrisRackauckas
Copy link
Member

Bump

@rikhuijzer
Copy link
Author

rikhuijzer commented Dec 23, 2023

What is this caching doing? Before and after?

Simply put, it is similar to what DifferentialEquations.jl had but then the complexity is moved inside https://github.com/julia-actions/cache. This allowed cache to come up with some improvements over time including the caching of the compiled directory so that precompiled binaries can be re-used between jobs, and caching of packages. Before this patch, DifferentialEquations.jl only cached ~/.julia/artifacts.

@rikhuijzer
Copy link
Author

rikhuijzer commented Dec 23, 2023

I think generally in SciML it's better to use fail_fast: true since there's already a very high computational burden due to CI in SciML and usually a large number of CI jobs. So IMO everything that fails should fail immediately.

I think that makes sense if you assume developer time has little value

I said this because at Pluto.jl there was a lot of developer time wasted by this. What would happen there is that about a dozen jobs that took about 20 minutes would be cancelled as soon as one job started to fail. In practice, this was problematic because often most jobs were about 80% done with their execution when they were cancelled. By cancelling, this 80% progress was thrown away. Sometimes, this was no problem because the problem was obvious, but sometimes the problem was not obvious and having more information about which jobs did pass could be very helpful in debugging.

@ChrisRackauckas
Copy link
Member

I said this because at Pluto.jl there was a lot of developer time wasted by this. What would happen there is that about a dozen jobs that took about 20 minutes would be cancelled as soon as one job started to fail. In practice, this was problematic because often most jobs were about 80% done with their execution when they were cancelled. By cancelling, this 80% progress was thrown away. Sometimes, this was no problem because the problem was obvious, but sometimes the problem was not obvious and having more information about which jobs did pass could be very helpful in debugging.

Looking at the Pluto tests, those tests are quite short and light. In comparison, an OrdinaryDiffEq run SciML/OrdinaryDiffEq.jl#2092 can easily take around 10 hours of compute time, split across jobs, but that's just a lot. Withs the tens of contributors active across SciML, if we're not actively canceling jobs it can take a few hours to get an open machine to start running tests, let alone finish. So fail fast generally makes tests run hours faster because the queue is no longer clogged and it makes them start faster. We're trying to secure more money for more resources but it's not easy to come by.

Simply put, it is similar to what DifferentialEquations.jl had but then the complexity is moved inside https://github.com/julia-actions/cache. This allowed cache to come up with some improvements over time including the caching of the compiled directory so that precompiled binaries can be re-used between jobs, and caching of packages. Before this patch, DifferentialEquations.jl only cached ~/.julia/artifacts.

Does this work for the way we have the groups setup in OrdinaryDiffEq?

@ChrisRackauckas
Copy link
Member

Is there a way to precompile once and then use that for all of the groups?

@rikhuijzer
Copy link
Author

Is there a way to precompile once and then use that for all of the groups?

I don’t know

Thanks to Ian Butterworth, `julia-actions/cache` caches `~/.julia/compiled` too (julia-actions/cache#71).

Maybe interesting for the SciML ecosystem.
@ChrisRackauckas
Copy link
Member

@rikhuijzer rikhuijzer closed this by deleting the head repository Apr 16, 2024
@ChrisRackauckas
Copy link
Member

Is this incorporated into setup-julia now or something? What's the reason to close?

@rikhuijzer
Copy link
Author

Is this incorporated into setup-julia now or something? What's the reason to close?

The close was unintentional. I was just removing old forks from GitHub. Sorry for the noise

@ChrisRackauckas
Copy link
Member

@thazhemadam will this kind of caching be addressed in the centralization changes?

@thazhemadam
Copy link
Member

Yes, I was planning on having it available as a default, with an option to opt out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants