ci: limit usage of large runners #3722
Conversation
671eb60 to
6174336
Compare
|
@aluzzardi Where is the self hosted runner hosted? Trying to rely fully on the local cache is pretty tough now because we want to rebuild the engine quite frequently, and engine now also encapsulates local cache, so trying to share local cache between different engine builds running in parallel is not really feasible at the moment. Then there's remote cache, but last time we enabled full remote caching it seemed to cause as many problems as it solved (#2365). Self-hosted runners might improve that situation though by allowing us to e.g. put the self hosted runner in AWS and use S3 remote caching w/ a vpc endpoint. I'd guess that gives us better performance, less throttling, etc. relative to free-tier GHA caching. |
Currently, it's a DigitalOcean box (16 cores, 32GB ram). Flexible to move it anywhere else, this was just a test. |
I think DO has some S3-api compatible service, so it actually may be usable |
|
Also fun related fact: I just saw someone in the buildkit slack channel say that the S3 backend is additive. It's not like registry where you overwrite tags; it just keeps growing every time you export cache. Need to verify but pretty intriguing if true. Means we need to prune our cache somehow, but that's doable. |
Wow, that's amazing. We could switch to EC2/S3 then. Picked DO just because I could get it done in a few minutes. |
6174336 to
cad5bc0
Compare
|
@sipsma I just rebased and I'm getting: Haven't got a chance to debug yet -- could it be concurrency related? |
It's certainly possible but I can't currently see what would cause a race condition like that. The testing so far around that has been me manually invoking python and go tests that do provisioning side-by-side, so not exactly thorough. My plan is to finish the switch we talked about earlier (more in helper, less in SDK), then to start automating as much testing of all of this as possible. So I'll be sure to cover this sort of case as part of that. |
fe8ac20 to
24f1714
Compare
|
Need more time to deal with this -- changed the PR to just limit large GH runners for |
06a0cbb to
195cb24
Compare
195cb24 to
1360a6f
Compare
|
I don't know why readthedocs.org is failing /cc @helderco |
1360a6f to
b33a0bd
Compare
Signed-off-by: Andrea Luzzardi <al@dagger.io>
b33a0bd to
d5d72ab
Compare
/cc @gerhard @sipsma @vito
Low urgency. Experimented with moving our CI to self-hosted to harness buildkit cache.