Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding version 2 of the SDW CI scripts #40

Merged
merged 17 commits into from
Mar 13, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
153 changes: 68 additions & 85 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@
This document explains how to install the CI for Securedrop Workstation.

It involves a combination of dom0 and VM configuration on a Qubes installation, as well as steps in
Github/Tailscale.
Github.

The information assumes you'll be running this on a virtual machine such as VMware.

# Qubes install and initial provisioning

Expand All @@ -26,133 +28,114 @@ hardware) by running `qvm-start sys-usb`.
5. Update dom0 and install `make`, referring again to
[the next section of SDW docs](https://workstation.securedrop.org/en/stable/admin/install.html#apply-dom0-updates-estimated-wait-time-15-30-minutes)

In our case, we also install `open-vm-tools` and run `sudo systemctl enable vmtoolsd`,
as our scripts use vmtoolsd to issue commands to the dom0 from the VMware API.

```
sudo qubes-dom0-update make
sudo qubes-dom0-update make open-vm-tools
```

6. Run any updates you see in the Qubes menu and then reboot.

7. In dom0, create the sd-ssh StandaloneVM:
7. In dom0, create the sd-dev StandaloneVM. If it's Qubes 4.2, you can use the fedora-38-xfce template.
mig5 marked this conversation as resolved.
Show resolved Hide resolved

```
sudo qvm-create --standalone --template fedora-37 --label red sd-ssh
qvm-volume resize sd-ssh:root 50G
qvm-volume resize sd-ssh:private 20G
qvm-tags sd-ssh add sd-client
sudo qvm-create --standalone --template fedora-38 --label red sd-dev
qvm-volume resize sd-dev:root 50G
qvm-volume resize sd-dev:private 20G
```

Also ensure that you check the box to 'Start qube automatically on boot' in the Qubes settings.

# Install dependencies on sd-ssh VM
# Install dependencies on sd-dev VM

1. Open a terminal in the sd-ssh VM and perform the following steps to install the core dependencies:
1. Open a terminal in the sd-dev VM and perform the following steps to install the core dependencies:

```
sudo dnf install openssh-server rpm-build dnf-plugins-core python3-pip python3-flask python3-paramiko python3-scp
sudo pip3 install python-dotenv github-webhook
sudo systemctl enable sshd
sudo systemctl start sshd
sudo dnf install rpm-build dnf-plugins-core
```

2. Install Tailscale:
2. Setup docker:

```
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up --advertise-tags=tag:servers,tag:sd-ci-servers
sudo dnf config-manager --add-repo https://download.docker.com/linux/fedora/docker-ce.repo
mig5 marked this conversation as resolved.
Show resolved Hide resolved
sudo dnf install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -a -G docker user
sudo systemctl enable docker
```

Complete the approval of the device in Tailscale as an admin, by copying
the link that is returned in the last step.
Set up the sd-dev machine to automatically start at boot.

Sign in with your GitHub account and approve your VM as a device on the
`freedomofpress.org.github` tailnet, with a name describing the
hardware like `sd-ssh-t14`; it will show up on [the machines
list](https://login.tailscale.com/admin/machines). When authorizing
Tailscale in Github OAuth consent, be sure to choose the "Multi-user"
`freedomofpress` tailnet, if your Github account is a member of
multiple organizations.
# Snapshot the VM

3. Setup the firewall:
At this point, if you're using VMware, you'll want to shut down and snapshot the VM, as it's now
in a good state and could be cloned to make more of them!

```
sudo -i

iptables -I INPUT 3 -m tcp -p tcp --dport 22 -i tailscale0 -j ACCEPT
ip6tables -I INPUT 3 -m tcp -p tcp --dport 22 -i tailscale0 -j ACCEPT
iptables -I INPUT 3 -m tcp -p tcp --dport 5000 -i tailscale0 -j ACCEPT
ip6tables -I INPUT 3 -m tcp -p tcp --dport 5000 -i tailscale0 -j ACCEPT
iptables-save > /etc/qubes/iptables.rules
ip6tables-save > /etc/qubes/ip6tables.rules
```
# Configure the scripts on GitHub

4. Setup docker:
1. Generate a PAT in Github with full `repo:` access and ensure that that PAT is written to
`sd-dev/.sdci-ghp.txt` on the machine that will execute the run.py on the host machine.
This will be used by `status.py`, so that the script can post git commit statuses back to Github.

```
sudo dnf config-manager --add-repo https://download.docker.com/linux/fedora/docker-ce.repo
sudo dnf install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
sudo usermod -a -G docker user
systemctl enable docker
```
2. Configure the webhook in your repository for the 'push' event, with the same secret you put in
the systemd file.

# Install the CI scripts from this repository
The Payload URL of the webhook should be `https://ws-ci-runner.securedrop.org/hook/postreceive` and
the Content type should be `application/json`. Ensure you keep `Enable SSL verification` turned on.

You're nearly done! Now you need to install the actual CI scripts, systemd unit files, and other
config from this very repo into your dom0 and sd-ssh.
# Test

1. Start by cloning this repo into your sd-ssh VM.
Test the CI flow with `./run.py --version 4.1 --commit [some commit hash]`

2. In `sd-ssh`, run the following script as 'user' (not as root/sudo)

```
./install/sd-ssh
```
# Options for `run.py`

This will pull up the `.flaskenv` file. Edit it to fill in `SDCI_REPO_WEBHOOK_SECRET` and adjust the
`FLASK_RUN_HOST` to the IP of your sd-ssh machine's Tailscale IP so that the service listens only on
that interface.
There are a few options for `run.py` which is the main entry point that the webhook service calls.

3. Copy files from `sd-ssh` to `dom0` (do this any time you pull an
update to the git repository, from the home directory):
## `--version [4.1|4.2]`

```
qvm-run --pass-io sd-ssh 'tar -c -C /home/user securedrop-workstation-ci' | tar xvf -
```
Set the version number of Qubes you are going to be running on, for example, 4.1 or 4.2.

4. In `dom0`, run as 'user' (not as root/sudo)
This helps the script find a VM with that version in its name, to use for the CI run.

```
./install/dom0
```
## `--commit [sha]`

# Configure the scripts on GitHub
If you pass a commit hash, this will be understood that you want to run CI tests.

1. Generate a PAT in Github with full `repo:` access and ensure that that PAT is written to
`/home/user/sdci-ghp.txt`. This will be used by `upload-report`, so that the script can post git
commit statuses back to Github.
## `--snapshot [id]`

2. Configure the webhook in your repository for the 'push' event, with the same secret you put in
the systemd file in step 7.
If you pass this option, the VM will be reverted to this snapshot if it exists, before being
powered up.

The Payload URL of the webhook should be `https://ws-ci-runner.securedrop.org/hook/postreceive` and
the Content type should be `application/json`. Ensure you keep `Enable SSL verification` turned on.
If you do not pass this option, a snapshot ID will be read from the config file for this
VM, and the VM will be restored to that snapshot instead. (There is never a scenario whereby
the VM is *not* restored from snapshot first, as that is our way of guaranteeing a 'clean
start')

## `--update`

If you pass this flag, the system will boot the Qubes VM and run dom0, template and StandaloneVM
updates via salt in the standard Qubes way.

If you also passed `--commit`, it will be undertood that you want to run CI tests immediately
after having applied the updates. In this case, it will reboot the VM after applying updates
but before running the CI test suite. This flow is useful for running 'nightly' tests.

# Generate SSH upload key
## `--save`

Generate an SSH key on sd-ssh with `ssh-keygen -t ed25519 -f
~/.ssh/id_ed25519_sdci_upload`, and ensure that this key is in the
`/home/wscirunner/.ssh/authorized_keys` on the tailscale proxy droplet.
This ensures that the `upload-report` script can successfully scp up the
log file to the proxy droplet. Run `ssh -i ~/.ssh/id_ed25519_sdci_upload
wscirunner@ws-ci-runner.securedrop.org` once, on the sd-ssh VM, to
accept the host key signature for the first time.
If you pass this flag, the system will save a new snapshot of the VM and store the new snapshot
ID in the config file. This option is meant to mainly be used in conjunction with `--update`,
e.g as an automatic routine patching procedure.

(TODO: cover setting up an SSH config file.)

## Reboot and test
# Options for `nightlies.py`

Do a full reboot of the Qubes system.
The `nightlies.py` script is designed to run via cron or similar schedule. It takes `--branch` as
an argument.

Then, double-check that sd-ssh has started automatically at boot and that it has started the Flask
webhook service (look for a process `/usr/bin/python3 -m flask run`).
It will clone the repo, check out that branch, detect the appropriate Qubes version from that
branch, detect the latest commit, then run `run.py` with the flag `--update` and the `--commit`
hash.

After that, you can push a commit to the repository and test the webhook and CI process works.
This is designed to apply software updates in Qubes, stop/start the guest and then proceed with
CI.
77 changes: 25 additions & 52 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@

## About

This collection of scripts is for running the securedrop-workstation CI on a Qubes device.
This collection of scripts is for running the securedrop-workstation CI on a hypervisor that is
running Qubes virtual machines.

## Installation instructions

Expand All @@ -12,67 +13,39 @@ Please see the [INSTALL.md](INSTALL.md).

![Architecture diagram](SD_Qubes_CI.png)
mig5 marked this conversation as resolved.
Show resolved Hide resolved

1. The webhook in Github delivers the payload to the proxy droplet.
1. The webhook in Github delivers the payload to a remote server via HTTPS.

2. The proxy droplet proxies the payload through to the sd-ssh VM running on a Qubes device, via the
Tailscale tunnel.
2. The server passes that payload to a Flask service that parses the payload. This service
then posts a commit status to Github saying the build is 'queued'.

3. The sd-ssh VM sends a commit status to Github notifying that the build has been received and is
pending.
3. The Flask service executes the `run.py` script which makes calls to a hypervisor (currently
VMware) to find a Qubes VM with a matching version, restore it from snapshot and boot it.

4. The sd-ssh VM clones the repository and checks out the commit learned from the payload, at
`/var/lib/sdci-ci-runner/securedrop-workstation_{SHA}`.
4. The script adds various files to the dom0 and the sd-dev StandaloneVM on that Qubes VM.

5. The sd-ssh VM sends an RPC call to the dom0 to run the `runner.py` (wrapped in flock to avoid
concurrent builds).
5. The script then instructs dom0 to run a command on the sd-dev StandaloneVM to clone the
SDW CI repository and then issue an RPC call to the dom0 to run the `dom0/runner.py`
script.

6. The runner.py reports a commit status back to Github (via sd-ssh) that the build has started (or,
if there is a build already running, that it is queued).
6. The runner.py reports a commit status back to Github (via sd-dev) that the build has started.

7. The runner.py tarballs up the codebase from the sd-ssh VM, and proceeds with the
7. The runner.py tarballs up the codebase from the sd-dev VM, and proceeds with the
`make clone; make dev; make test` sequence, logging all output to a log file.

8. The runner.py then leverages the securedrop-workstation's `sdw-admin.py --uninstall --force` to
tear everything down, along with cleaning up some remaining cruft.
The runner.py will detect if any of the commands succeed or fail. If a step fails, the
whole procedure is halted. In either case, a commit status is sent back to Github indicating
whether it was a success or a failure.

The runner.py will detect if any of the commands succeed or fail but it should not abort on failure
(so that the teardown still completes).
8. At the end of the process, the server copies the log file from dom0 and stores it in the
same place that the commit status links to, for viewing later.

9. At the end of the process, the dom0 will copy its log file to the sd-ssh VM and then calls
`upload-report` on the sd-ssh VM with the status of the build.
9. The Qubes VM is then powered off.

10. That `upload-report` script will upload the log to the ws-ci-runner proxy for viewing in a
browser at https://ws-ci-runner.securedrop.org, and will also post a commit status to Github,
with the `target_url` pointing to the HTTPS URL of that log file on the ws-ci-runner, and with
the status of the build. At this point, the commit shows either a green tick or a red cross and
has a link to the log file.
## Parallelization

## Queuing and canceling builds
The server is able to iterate until it finds a Qubes VM that is powered off. If it's off, it
assumes it is available for use.

The webhook can handle multiple commits delivered to it. The jobs get issued to the dom0 with a
maximum `flock` wait of 86400s (24h).

If another job is already running, it means the lock is held, so the other jobs wait for the lock to
be released before starting.

Once the lock releases, one of the pending jobs will claim the lock and start running.

While a job is waiting, the commit in Github has a status of 'pending' with the message 'The build
is queued'.

When a build starts, the commit status changes to a description of 'The build is running'. The
commit status state is technically still 'pending' because Github makes no distinction between
'queued' and 'running', except in the description field of the commit status.

If you need to cancel a build that is queued, run `cancel.py --sha xxxxxxxx` on the sd-ssh VM. This
will:

- remove the codebase that was checked out to this commit on sd-ssh
- kill the pending process on the dom0 (by way of the qubes.SDCICanceler RPC script)
- update the git commit status at Github to say that this build was canceled by an administrator.
The commit status will now be of state 'error' with a red cross.

## Automatic updates of the sd-ssh VM

The installation adds a systemd timer and script to perform daily updates of the sd-ssh VM to keep
it up to date.
If all Qubes VMs with the matching version are powered on, it's assumed that they are all busy
running CI runners already. In this case, it sleeps for up to 1 hour and keeps occasionally
retrying throughout.