Skip to content

ci: rfq: migrate linux CI jobs to namespace#35144

Draft
willcl-ark wants to merge 8 commits intobitcoin:masterfrom
willcl-ark:namespace-runners
Draft

ci: rfq: migrate linux CI jobs to namespace#35144
willcl-ark wants to merge 8 commits intobitcoin:masterfrom
willcl-ark:namespace-runners

Conversation

@willcl-ark
Copy link
Copy Markdown
Member

  • Migrate linux CI jobs over to namespace.so
  • Configure docker to use namespace's remote docker builders (and implicit shared docker buildkit cache)
  • Configure caches to use the namespace cache volumes
  • Fixup the kernel headers needed is the ASAN job for the USDT tests.
    • The namespace hypervisor/host is using a newer kernel which does not have headers in the Ubuntu version in the container, but they do provide in-kernel headers, which we can mount into the container.
  • Ensure CI still works on GHA for forks

@DrahtBot DrahtBot added the Tests label Apr 23, 2026
@DrahtBot
Copy link
Copy Markdown
Contributor

DrahtBot commented Apr 23, 2026

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Reviews

See the guideline for information on the review process.
A summary of reviews will appear here.

Conflicts

Reviewers, this pull request conflicts with the following ones:

  • #35140 (ci: Temporarily move CI from Cirrus to GHA by maflcko)
  • #33593 (guix: Use UCRT runtime for Windows release binaries by hebasto)

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@willcl-ark
Copy link
Copy Markdown
Member Author

With the announcement that Cirrus-app is closing down after being acquired by OpenAI, we need to take action on our runners (again), sadly.

As previously there are two main options:

  1. Self-hosted
  2. Hosted

I have investigated self-hosting twice now, and don't think it's right for us. If we use a single (xl) machine, we can natively share docker buildkit, ccache, depends, etc. caches. One more than 1 machine, you start to need additional servers spun up to host these services, or have sub-optimal caching. On the security side, the github runner service needs to be isolated using a vm, and all the plumbing this entails. I don't think this is worth it unless we want to commit a few engineers to manage this with some decent chunk of their time.

(I do have a nix configuration for n servers each running m firecracker vm's with a github-runner service in each, which will all register with GH and run jobs, but no shared caching of any kind is configured. If anyone's interested in this or wants to try and extend it, I can share it.)

On the hosted side, there are many options still about. I looked at Runs-on, WarpBuild and Namespace, as these 3 had the most-acceptable github app permission requirements, which rules many others out (they usually wanted admin: Read/Write).

Runs-on

Runs on will be the cheapest. (Probably) requires the most hands-on maintenance, and has the least-likeable permissions.

Runs-on uses your own AWS account to provision runners using AWS instances, in combination with a CloudFormation configuration they supply, to effectively run the CI on "your own (AWS)" machines, which you pay cost price for.

You can't really get any cheaper, without self-hosting. The downsides are that:

  • you might need an AWS maintenance person
  • the "Github app" you install does need some write permissions, although they claim that "it's generated by your own cloudformation stack, so it should be trustworthy". I could not verify this independently.

WarpBuild

Decent outfit. Runners are good, fast and highly concurrent. Drop-in caching and docker builder solutions. Simply pay as you go. I don't think there is a limit to concurrency.

No dangerous org permissions needed, management UI is nice, but they are expensive. I did not contact them for volume pricing, because they already came out more than double namespace rates , at which point I moved my focus on.

Namespace

Seem to be ~ equivalent to WarpBuild feature-wise, offering drop in caching*( see below), docker builders and concurrent runners. The concurrency is limited by the contract, but can be adjusted. You can pay as you go, or enter contracted annual minute/cache allowances.

namespace caching

This is a little different to the other two; instead of saving and restoring cache blobs, you get volume mounts. I have found the cache hitrate of these to be mixed, which is a known property of these. Apparently this is due to a volume per runner (or group?) which is then mounted. As these gradually warm up for all jobs, cache hitrate should improve. I think there are perhaps other approaches we could use to try and improve this, but worth noting, especially if we are being billed /minute!

I do think this sounds like cache hitrate will never be 100% on say a doc change PR, unless you luck out and all jobs go on the same runners, or something. TBD.

Cache mounts (and saves) are ~instant though, so you could save 30-60 seconds per job there. And perhaps in long-running operation these hitrates all rise up to about 90-something % and this is not an issue.


Overall, my conclusions are:

  • I don't think we have the (human) resources to self-host, unless we want a single machine (and therefore non-scalable) setup.
  • My preference is to go with hosted runners from Namespace, if the caching is OK, and the price is right for us.

@furkansahin
Copy link
Copy Markdown

furkansahin commented Apr 23, 2026

Shameless plug, but given the criteria you've laid out, I think it's worth adding Ubicloud to the list. I am a software engineer there, so take this with the appropriate grain of salt.

On price. Pricing is public and multiple times cheaper than GitHub-hosted runners at equivalent specs, including on ARM64. Happy to talk volume/committed-use for a project at Bitcoin's scale.

On caching. We offer a drop-in GitHub Actions cache backend (swap the action, keep your workflow) that stores blobs rather than volume-mounting, so the hit-rate pathology you described for Namespace doesn't apply. A doc-only PR will get the same hits any other runner would.

On usage. We run CI for a bunch of OSS projects with similar shapes to Bitcoin Core, so the heavy-C++-build / depends-cache / ccache workflow isn't unfamiliar territory.

Usage is the usual runs-on: ubicloud-standard-8 (or -arm) swap. If it's useful, I can give some credits so you can test the functionality. Every account gets a 1,250 minutes of Ubicloud runner time for free every month.

Feel free to reach out to me furkan[at]ubicloud[dot]com or support[at]ubicloud[dot]com

@hebasto
Copy link
Copy Markdown
Member

hebasto commented Apr 23, 2026

@willcl-ark

Thank you for the detailed analysis of the available options. Could you additionally provide details on the supported architectures for them?

@willcl-ark
Copy link
Copy Markdown
Member Author

Shameless plug, but given the criteria you've laid out, I think it's worth adding Ubicloud to the list. I am a software engineer there, so take this with the appropriate grain of salt.

I did not test Ubicloud as I saw in your https://www.ubicloud.com/docs/github-actions-integration/quickstart that the github app needed Administration: Read + Write permission on the org, which isn't acceptable for our use-case.

Is there a way around that requirement?

@willcl-ark
Copy link
Copy Markdown
Member Author

@willcl-ark

Thank you for the detailed analysis of the available options. Could you additionally provide details on the supported architectures for them?

@hebasto I reviewed them manually previously, but had claude put together a table, which I fixed up a little myself:

Provider Linux x64/amd64 Linux arm64 Windows x64 Windows arm64 macOS arm64 (Apple Silicon)
Namespace ✅ (AmpereOne or Apple Silicon) ✅ (early access)
RunsOn
WarpBuild ✅ (M4 Pro, M5 pro just announced)
Ubicloud

Notes:

  • Namespace Linux-on-Apple-Silicon is in early access.
  • None of the four advertise Windows arm64.

If you want to know more details let me know. There are even more providers about, but these are the ones I looked at due to their app permission requirements.

The action now also chooses runner-specific Docker setup rather than
just a cache backend.

Rename the interface so the workflow can describe whether a job runs on
GitHub-hosted or Namespace infrastructure.
Replace Cirrus-specific provider names and runner labels with Namespace
profiles.
Namespace cannot persist the old cache layout under runner.temp due to
the way that caches are now volume mounts: https://namespace.so/docs/solutions/github-actions/caching

Put reusable state under stable cache paths while leaving the working
tree and build directory in the temporary workspace, and mount those
exact paths into the container so the build layout stays unchanged.
Keep the local restore and save cache actions, but switch them to
Namespace mounts on Namespace runners and GitHub path caches on
GitHub-hosted runners.

This preserves Namespace-native cache volumes in the main repository
while restoring ccache, depends, and previous-release reuse for fallback
Linux jobs such as 32-bit ARM and forks.
Update the CI README to match the Namespace-based workflow: required
runner profiles, required actions, cache setup, and the remaining jobs
that still run on GitHub-hosted infrastructure.

Importantly, record that the runner profile's cache allow-list should
restrict updates to the default branch when the desired behavior is
restore on pull requests but persist only from the main branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants