Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support self-hosted Cirrus workers on forks #29274

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
51 changes: 36 additions & 15 deletions .cirrus.yml
Expand Up @@ -8,25 +8,42 @@ env: # Global defaults
CCACHE_DIR: "/tmp/ccache_dir"
CCACHE_NOHASHDIR: "1" # Debug info might contain a stale path if the build dir changes, but this is fine

# A self-hosted machine(s) can be used via Cirrus CI. It can be configured with
# multiple users to run tasks in parallel. No sudo permission is required.
#
# https://cirrus-ci.org/guide/persistent-workers/
#
# It is possible to select a specific persistent worker by label. Refer to the
# Generally, a persistent worker must run Ubuntu 23.04+ or Debian 12+.
#
# The following specific types should exist, with the following requirements:
# - small: For an x86_64 machine, recommended to have 2 CPU cores and 8 GB of memory.
# - medium: For an x86_64 machine, recommended to have 4 CPU cores and 16 GB of memory.
# - noble: For a machine running the Linux kernel shipped with exactly Ubuntu Noble 24.04. The machine is recommended to have 4 CPU cores and 16 GB of memory.
# - arm64: For an aarch64 machine, recommended to have 2 CPU cores and 8 GB of memory.
#
# CI jobs for the latter two configurations can be skipped by setting NO_ARM=1 or
# NO_NOBLE=1 as a custom env variable in Cirrus.
#
# The above machine types are matched to each job by their label. Refer to the
# Cirrus CI docs for more details.
#
# Generally, a persistent worker must run Ubuntu 23.04+ or Debian 12+.
# Specifically,
# On machines that are persisted between CI jobs, RESTART_CI_DOCKER_BEFORE_RUN=1
# ensures that previous containers and artifacts are cleared before each run.
# This requires installing Podman instead of Docker.
#
# Futhermore:
# - apt-get is required due to PACKAGE_MANAGER_INSTALL
# - podman-docker-4.1+ is required due to the use of `podman` when
# RESTART_CI_DOCKER_BEFORE_RUN is set and 4.1+ due to the bugfix in 4.1
# - podman-docker-4.1+ is required due to the bugfix in 4.1
# (https://github.com/bitcoin/bitcoin/pull/21652#issuecomment-1657098200)
# - The ./ci/ depedencies (with cirrus-cli) should be installed:
# - The ./ci/ dependencies (with cirrus-cli) should be installed. One-liner example
# for a single user setup with sudo permission:
#
# ```
# apt update && apt install git screen python3 bash podman-docker curl -y && curl -L -o cirrus "https://github.com/cirruslabs/cirrus-cli/releases/latest/download/cirrus-linux-$(dpkg --print-architecture)" && mv cirrus /usr/local/bin/cirrus && chmod +x /usr/local/bin/cirrus
# ```
#
# - There are no strict requirements on the hardware, because having less CPUs
# runs the same CI script (maybe slower). To avoid rare and intermittent OOM
# - There are no strict requirements on the hardware, because having fewer CPU threads
# run the same CI script (maybe slower). To avoid rare and intermittent OOM
# due to short memory usage spikes, it is recommended to add (and persist)
# swap:
#
Expand All @@ -40,15 +57,15 @@ env: # Global defaults
# RESTART_CI_DOCKER_BEFORE_RUN=1 screen cirrus worker run --labels type=todo_fill_in_type --token todo_fill_in_token
# ```
#
# The following specific types should exist, with the following requirements:
# - small: For an x86_64 machine, recommended to have 2 CPUs and 8 GB of memory.
# - medium: For an x86_64 machine, recommended to have 4 CPUs and 16 GB of memory.
# - noble: For a machine running the Linux kernel shipped with exaclty Ubuntu Noble 24.04. The machine is recommended to have 4 CPUs and 16 GB of memory.
# - arm64: For an aarch64 machine, recommended to have 2 CPUs and 8 GB of memory.

# https://cirrus-ci.org/guide/tips-and-tricks/#sharing-configuration-between-tasks
filter_template: &FILTER_TEMPLATE
skip: $CIRRUS_REPO_FULL_NAME == "bitcoin-core/gui" && $CIRRUS_PR == "" # No need to run on the read-only mirror, unless it is a PR. https://cirrus-ci.org/guide/writing-tasks/#conditional-task-execution
# No need to run on the read-only mirror, unless it is a PR.
# https://cirrus-ci.org/guide/writing-tasks/#conditional-task-execution
skip: $CIRRUS_REPO_FULL_NAME == "bitcoin-core/gui" && $CIRRUS_PR == ""
# Allow forks to skip CI runs when a branch is pushed.
# Use only_if rather than skip, because otherwise all jobs are marked as
# skipped when a pull request is first opened.
only_if: $NO_BRANCH != "true" || $CIRRUS_PR != ""
stateful: false # https://cirrus-ci.org/guide/writing-tasks/#stateful-tasks

base_template: &BASE_TEMPLATE
Expand Down Expand Up @@ -106,7 +123,9 @@ task:

task:
name: 'ARM, unit tests, no functional tests'
# Skip if no ARM worker exists (for a fork)
<< : *GLOBAL_TASK_TEMPLATE
skip: $NO_ARM == "true"
persistent_worker:
labels:
type: arm64 # Use arm64 worker to sidestep qemu and avoid a slow CI: https://github.com/bitcoin/bitcoin/pull/28087#issuecomment-1649399453
Expand Down Expand Up @@ -161,11 +180,13 @@ task:

task:
name: 'ASan + LSan + UBSan + integer, no depends, USDT'
# Skip if no Noble worker exists (for a fork)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this needed? Won't this be skipped already by default if the worker is offline?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It stays pending, at least in the Cirrus interface.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It stays pending, at least in the Cirrus interface.

But that seems preferable to me, as it communicates clearly that the task didn't run.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying that UX in Sjors#36 ...

Copy link
Member Author

@Sjors Sjors Feb 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the first push of a branch, due to the UI glitched described below (#29274 (comment)) they show as skipped:

Scherm­afbeelding 2024-02-27 om 17 40 10

That seems fine. On the second push they are put in "queued sate" (forever?).

Scherm­afbeelding 2024-02-27 om 17 56 11

I agree that hiding the task altogether is not ideal (as ecf01f4 does). Perhaps marking it as skipped is better.

Copy link
Member Author

@Sjors Sjors Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well for one thing it's way faster (using just an idle desktop computer).

Copy link
Member Author

@Sjors Sjors Feb 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also not this black and white. Sure, Microsoft and Cirrus can screw us over by faking the CI result. But by not relying on them for hosting for the runners, it's both easier to catch them doing shenanigans and it's easier to move elsewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

easier to catch them

Why? Someone will have to run a fully local backup either way, whose cost is independent of what is running on the other side.

easier to move elsewhere

Why? Moving elsewhere also requires moving the repo off of the GitHub mirror, or writing a GH integration and a CI framework from scratch, both of whose costs are largely independent of where the CI CPUs happen to sit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I run a self-hosted CI I can see what it's doing. If it fails on my machine and is marked as successful on Github, I can see that. If I collude with Github that's a different story of course.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI cleans up after itself, latest when the next run starts, so I don't think it is reliably possible to inspect a failure of a remote-triggered CI run locally, even assuming no "attack" from GH.

More importantly, the CI machine is literally doing RCE of downloaded binary executables and random code, so reliably detecting a test failure (assuming a GH attack) seems questionable at best.

enable_bpfcc_script:
# In the image build step, no external environment variables are available,
# so any settings will need to be written to the settings env file:
- sed -i "s|\${CIRRUS_CI}|true|g" ./ci/test/00_setup_env_native_asan.sh
<< : *GLOBAL_TASK_TEMPLATE
skip: $NO_NOBLE == "true"
persistent_worker:
labels:
type: noble # Must use this specific worker (needed for USDT functional tests)
Expand Down
10 changes: 8 additions & 2 deletions .github/workflows/ci.yml
Expand Up @@ -9,7 +9,10 @@ on:
# See: https://docs.github.com/en/actions/using-workflows/events-that-trigger-workflows#push.
push:
branches:
- '**'
# Do not run branch pushes on forks, because they are typically redundant with
# pull requests to the main repo.
- 'bitcoin/**'
- 'bitcoin-core/**'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objection, but I presume this will cause issues on forks (Inquisition, Knots, ...)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, this probably needs an ENV flag.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Knots pull requests seem to be typically made against the 25.x-knots branch. Those would run normally. But with the above code CI would not run when e.g. a future 26.x.knots is created (or rebased).

Inquisition seems to follow the some pattern.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aaargh, afaik Github CI doesn't support custom ENV variables for forks. So this is solved, but on Github forks are unconditionally skipped.

tags-ignore:
- '**'

Expand All @@ -26,7 +29,10 @@ jobs:
test-each-commit:
name: 'test each commit'
runs-on: ubuntu-22.04
if: github.event_name == 'pull_request' && github.event.pull_request.commits != 1
# Only run on pull requests, skip if there's only commit.
# Do not run on forks.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nicer to fix the actual problem, but I don't understand what this job is doing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hebasto @ryanofsky any thoughts on how to make the test-each-commit task work for a scenario like in Sjors#30 where the base branch is fork/some-branch and the PR branch is fork/some-pr?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example of earlier failure: https://github.com/Sjors/bitcoin/actions/runs/7556559882/job/20573828392?pr=28#step:4:42

 shell: /usr/bin/bash -e {0}
  env:
    DANGER_RUN_CI_ON_HOST: 1
    CI_FAILFAST_TEST_LEAVE_DANGLING: 1
    MAKEJOBS: -j10
    MAX_COUNT: 6
    FETCH_DEPTH: 8
Warning: you are leaving 1 commit behind, not connected to
any of your branches:

  3b5c5d4 ci: add self-hosted runners for GitHub

If you want to keep it by creating a new branch, this may be a good time
to do so with:

 git branch <new-branch-name> 3b5c5d4

HEAD is now at cbe740b ci: vary /tmp/env
fatal: bad revision '^^@'

if: github.event_name == 'pull_request' && github.event.pull_request.commits != 1 &&
(github.repository == 'bitcoin-core/gui' || github.repository == 'bitcoin/bitcoin')
timeout-minutes: 360 # Use maximum time, see https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#jobsjob_idtimeout-minutes. Assuming a worst case time of 1 hour per commit, this leads to a --max-count=6 below.
env:
MAX_COUNT: 6
Expand Down