Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Minimal jobs to check API on pull requests #221

Closed
maximusron opened this issue Apr 5, 2024 · 10 comments
Closed

[CI] Minimal jobs to check API on pull requests #221

maximusron opened this issue Apr 5, 2024 · 10 comments

Comments

@maximusron
Copy link
Collaborator

maximusron commented Apr 5, 2024

The current CI has an extensive suite of jobs covering every permutation of osx, windows, linux with both clang-repl and cling following every tool that uses CppInterOp. This A) takes up a lot of resources and B) are not really necessary to validate most pull requests changing the InterOp API.

I would propose segmenting a minimal workflow that have 3 sets of jobs each for windows, os x and ubuntu that run the InterOp build and test on the clang 16-18 and only run tools like the xeus products, cppyy and WASM once that is verified. Extensive build coverage can also be migrated to the repos of the tools themselves. Essentially it would be good if we don't tag every job with pull_request but have a set that would serve as a sanity check.

The other jobs that cover every permutation can be ran post merge to check main. This can speedup workflows especially when features are being developed in parallel

@vgvassilev @mcbarton would this be something we can do?

@maximusron
Copy link
Collaborator Author

Not sure if this is the case but it seems like each job runner is building LLVM. I was wondering if we can also merge runners that use the same LLVM clang-repl/cling build to test multiple tools if that can make the jobs more efficient. That might make debugging failures harder, but is there any other way we can improve this config?

@mcbarton
Copy link
Collaborator

mcbarton commented Apr 5, 2024

Reducing the extensiveness of the CI shouldn't be done in my opinion. It may seem silly to have an extensive PR to test small changes, but small changes can have unexpected consequences, and the CI can help catch them before they end up in the main.

The main reason I see for the PRs being slow when having multiple PRs is due to limitations of the caching system and number of Github runners which can be used simultaneously accross the organisation.

The llvm cache has to be created for each PR, and cannot be shared with the main branch or between PRs. This will overwhelm the cache storage limit even with the number of jobs reduced. An alternative to the cache of llvm builds may be to custom VMs to run the workflows on. See the beta feature here

github/roadmap#826

I don't see a solution to the number of simultaneous runners, but the custom vm option once available should allow for a dramatic speed up in PR run time.

@vgvassilev
Copy link
Contributor

I believe the cache in main can be reused. I think for llvm17 we can just take the binary package as we do not have patches on top iirc. The same for llvm18...

@mcbarton
Copy link
Collaborator

mcbarton commented Apr 6, 2024

We apply a patch for llvm 17 so I don't believe we can use the binary packages for this version. llvm 18 requires no patches, so a binary package could be used once a web assembley build is available (don't believe this is currently available on emscripten forge though).

@vgvassilev can you point me to an example reusing the cache from the main branch? I will then modify the ci based on this. This change should alleviate the problem for now.

@vgvassilev
Copy link
Contributor

Other ci builds give priority on the cache from the main branch and the PRs do not need to build its own llvm caches. That probably means that the caches of main fit in the 10GB limits...

I do not think the current situation should stay because it practically slows down the development by days and @maximusron's bandwidth in that project will grow...

@alexander-penev, do you have an idea what we can do here?

@mcbarton
Copy link
Collaborator

mcbarton commented Apr 6, 2024

Reusing the main branch cache and using binary packages for llvm 18 should bring the cache under 10GB, and speed up PR runs considerably. I think the name is this issue should be renamed since its about solving the caching issue rather than reducing the number of jobs.

@vgvassilev
Copy link
Contributor

Ok, fair point, we need to figure out how to address it in a smart way...

@maximusron
Copy link
Collaborator Author

I understand we want to keep the coverage the same but I still stand by some reorganisation to make it easier to go through the logs. We want a set of jobs that are somewhat primary like just the CppInterOp builds on all clang versions. The passing of this workflow can be made the condition to run other sections for just cppyy, xeus (similar to how this PR #223 makes the runs smarter)

Something like this will definitely help because there can be developers working only on CppInterOp. Cppyy developers will need to look at InterOp + cppyy and xeus developers would look at InterOp + xeus.

@mcbarton
Copy link
Collaborator

@maximusron do you agree this issue of the ci taking too long has been fixed now?

@maximusron
Copy link
Collaborator Author

Closing as this is fixed by #229 #227 #249 #270 and probably a few other PR's here and there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants