Skip to content

Commit

Permalink
Adding version 2 of the SDW CI scripts. Update docs (diagrams TODO)
Browse files Browse the repository at this point in the history
  • Loading branch information
mig5 committed Mar 4, 2024
1 parent 92bfa5b commit 86fe060
Show file tree
Hide file tree
Showing 23 changed files with 891 additions and 4,683 deletions.
146 changes: 59 additions & 87 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,9 @@
This document explains how to install the CI for Securedrop Workstation.

It involves a combination of dom0 and VM configuration on a Qubes installation, as well as steps in
Github/Tailscale.
Github.

The information assumes you'll be running this on a virtual machine such as VMware.

# Qubes install and initial provisioning

Expand All @@ -26,66 +28,35 @@ hardware) by running `qvm-start sys-usb`.
5. Update dom0 and install `make`, referring again to
[the next section of SDW docs](https://workstation.securedrop.org/en/stable/admin/install.html#apply-dom0-updates-estimated-wait-time-15-30-minutes)

In our case, we also install `open-vm-tools` and run `sudo systemctl enable vmtoolsd`,
as our scripts use vmtoolsd to issue commands to the dom0 from the VMware API.

```
sudo qubes-dom0-update make
sudo qubes-dom0-update make open-vm-tools
```

6. Run any updates you see in the Qubes menu and then reboot.

7. In dom0, create the sd-ssh StandaloneVM:
7. In dom0, create the sd-dev StandaloneVM:

```
sudo qvm-create --standalone --template fedora-37 --label red sd-ssh
qvm-volume resize sd-ssh:root 50G
qvm-volume resize sd-ssh:private 20G
qvm-tags sd-ssh add sd-client
sudo qvm-create --standalone --template fedora-37 --label red sd-dev
qvm-volume resize sd-dev:root 50G
qvm-volume resize sd-dev:private 20G
qvm-tags sd-dev add sd-client
```

Also ensure that you check the box to 'Start qube automatically on boot' in the Qubes settings.

# Install dependencies on sd-ssh VM

1. Open a terminal in the sd-ssh VM and perform the following steps to install the core dependencies:

```
sudo dnf install openssh-server rpm-build dnf-plugins-core python3-pip python3-flask python3-paramiko python3-scp
sudo pip3 install python-dotenv github-webhook
sudo systemctl enable sshd
sudo systemctl start sshd
```
# Install dependencies on sd-dev VM

2. Install Tailscale:
1. Open a terminal in the sd-dev VM and perform the following steps to install the core dependencies:

```
curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up --advertise-tags=tag:servers,tag:sd-ci-servers
sudo dnf install rpm-build dnf-plugins-core
```

Complete the approval of the device in Tailscale as an admin, by copying
the link that is returned in the last step.

Sign in with your GitHub account and approve your VM as a device on the
`freedomofpress.org.github` tailnet, with a name describing the
hardware like `sd-ssh-t14`; it will show up on [the machines
list](https://login.tailscale.com/admin/machines). When authorizing
Tailscale in Github OAuth consent, be sure to choose the "Multi-user"
`freedomofpress` tailnet, if your Github account is a member of
multiple organizations.

3. Setup the firewall:

```
sudo -i
iptables -I INPUT 3 -m tcp -p tcp --dport 22 -i tailscale0 -j ACCEPT
ip6tables -I INPUT 3 -m tcp -p tcp --dport 22 -i tailscale0 -j ACCEPT
iptables -I INPUT 3 -m tcp -p tcp --dport 5000 -i tailscale0 -j ACCEPT
ip6tables -I INPUT 3 -m tcp -p tcp --dport 5000 -i tailscale0 -j ACCEPT
iptables-save > /etc/qubes/iptables.rules
ip6tables-save > /etc/qubes/ip6tables.rules
```

4. Setup docker:
2. Setup docker:

```
sudo dnf config-manager --add-repo https://download.docker.com/linux/fedora/docker-ce.repo
Expand All @@ -94,65 +65,66 @@ sudo usermod -a -G docker user
systemctl enable docker
```

# Install the CI scripts from this repository
Set up the sd-dev machine to automatically start at boot.

You're nearly done! Now you need to install the actual CI scripts, systemd unit files, and other
config from this very repo into your dom0 and sd-ssh.
# Snapshot the VM

1. Start by cloning this repo into your sd-ssh VM.
At this point, if you're using VMware, you'll want to shut down and snapshot the VM, as it's now
in a good state and could be cloned to make more of them!

2. In `sd-ssh`, run the following script as 'user' (not as root/sudo)
# Configure the scripts on GitHub

```
./install/sd-ssh
```
1. Generate a PAT in Github with full `repo:` access and ensure that that PAT is written to
`sd-dev/.sdci-ghp.txt` on the machine that will execute the run.py on the host machine.
This will be used by `status.py`, so that the script can post git commit statuses back to Github.

This will pull up the `.flaskenv` file. Edit it to fill in `SDCI_REPO_WEBHOOK_SECRET` and adjust the
`FLASK_RUN_HOST` to the IP of your sd-ssh machine's Tailscale IP so that the service listens only on
that interface.
2. Configure the webhook in your repository for the 'push' event, with the same secret you put in
the systemd file.

3. Copy files from `sd-ssh` to `dom0` (do this any time you pull an
update to the git repository, from the home directory):
The Payload URL of the webhook should be `https://ws-ci-runner.securedrop.org/hook/postreceive` and
the Content type should be `application/json`. Ensure you keep `Enable SSL verification` turned on.

```
qvm-run --pass-io sd-ssh 'tar -c -C /home/user securedrop-workstation-ci' | tar xvf -
```
# Test

4. In `dom0`, run as 'user' (not as root/sudo)
Test the CI flow with `./run.py --version 4.1 --commit [some commit hash]`

```
./install/dom0
```

# Configure the scripts on GitHub
# Options for `run.py`

1. Generate a PAT in Github with full `repo:` access and ensure that that PAT is written to
`/home/user/sdci-ghp.txt`. This will be used by `upload-report`, so that the script can post git
commit statuses back to Github.
There are a few options for `run.py` which is the main entry point that the webhook service calls.

2. Configure the webhook in your repository for the 'push' event, with the same secret you put in
the systemd file in step 7.
## `--version [4.1|4.2]`

The Payload URL of the webhook should be `https://ws-ci-runner.securedrop.org/hook/postreceive` and
the Content type should be `application/json`. Ensure you keep `Enable SSL verification` turned on.
Set the version number of Qubes you are going to be running on, for example, 4.1 or 4.2.

This helps the script find a VM with that version in its name, to use for the CI run.

## `--commit [sha]`

If you pass a commit hash, this will be understood that you want to run CI tests.

## `--snapshot [id]`

# Generate SSH upload key
If you pass this option, the VM will be reverted to this snapshot if it exists, before being
powered up.

Generate an SSH key on sd-ssh with `ssh-keygen -t ed25519 -f
~/.ssh/id_ed25519_sdci_upload`, and ensure that this key is in the
`/home/wscirunner/.ssh/authorized_keys` on the tailscale proxy droplet.
This ensures that the `upload-report` script can successfully scp up the
log file to the proxy droplet. Run `ssh -i ~/.ssh/id_ed25519_sdci_upload
wscirunner@ws-ci-runner.securedrop.org` once, on the sd-ssh VM, to
accept the host key signature for the first time.
If you do not pass this option, a snapshot ID will be read from the config file for this
VM, and the VM will be restored to that snapshot instead. (There is never a scenario whereby
the VM is *not* restored from snapshot first, as that is our way of guaranteeing a 'clean
start')

## `--update`

(TODO: cover setting up an SSH config file.)
If you pass this flag, the system will boot the Qubes VM and run dom0, template and StandaloneVM
updates via salt in the standard Qubes way.

## Reboot and test
If you also passed `--commit`, it will be undertood that you want to run CI tests immediataly
after having applied the updates. In this case, it will reboot the VM. This flow is useful for
running 'nightly' tests.

Do a full reboot of the Qubes system.
## `--save`

Then, double-check that sd-ssh has started automatically at boot and that it has started the Flask
webhook service (look for a process `/usr/bin/python3 -m flask run`).
If you pass this flag, the system will save a new snapshot of the VM and store the new snapshot
ID in the config file. This option is meant to mainly be used in conjunction with `--update`,
e.g as an automatic routine patching procedure.

After that, you can push a commit to the repository and test the webhook and CI process works.
79 changes: 25 additions & 54 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,77 +2,48 @@

## About

This collection of scripts is for running the securedrop-workstation CI on a Qubes device.
This collection of scripts is for running the securedrop-workstation CI on a hypervisor that is
running Qubes virtual machines.

## Installation instructions

Please see the [INSTALL.md](INSTALL.md).

## How it works

![Architecture diagram](SD_Qubes_CI.png)
1. The webhook in Github delivers the payload to a remote server via HTTPS.

1. The webhook in Github delivers the payload to the proxy droplet.
2. The server passes that payload to a Flask service that parses the payload. This service
then posts a commit status to Github saying the build is 'queued'.

2. The proxy droplet proxies the payload through to the sd-ssh VM running on a Qubes device, via the
Tailscale tunnel.
3. The Flask service executes the `run.py` script which makes calls to a VMware hypervisor
to find a Qubes VM with a matching version, restore it from snapshot and boot it.

3. The sd-ssh VM sends a commit status to Github notifying that the build has been received and is
pending.
4. The script adds various files to the dom0 and sd-dev StandaloneVM.

4. The sd-ssh VM clones the repository and checks out the commit learned from the payload, at
`/var/lib/sdci-ci-runner/securedrop-workstation_{SHA}`.
5. The script then instructs dom0 to run a command on the sd-dev StandaloneVM to clone the
SDW CI repository and then issue an RPC call to the dom0 to run the `dom0/runner.py`
script.

5. The sd-ssh VM sends an RPC call to the dom0 to run the `runner.py` (wrapped in flock to avoid
concurrent builds).
6. The runner.py reports a commit status back to Github (via sd-dev) that the build has started.

6. The runner.py reports a commit status back to Github (via sd-ssh) that the build has started (or,
if there is a build already running, that it is queued).

7. The runner.py tarballs up the codebase from the sd-ssh VM, and proceeds with the
7. The runner.py tarballs up the codebase from the sd-dev VM, and proceeds with the
`make clone; make dev; make test` sequence, logging all output to a log file.

8. The runner.py then leverages the securedrop-workstation's `sdw-admin.py --uninstall --force` to
tear everything down, along with cleaning up some remaining cruft.

The runner.py will detect if any of the commands succeed or fail but it should not abort on failure
(so that the teardown still completes).

9. At the end of the process, the dom0 will copy its log file to the sd-ssh VM and then calls
`upload-report` on the sd-ssh VM with the status of the build.

10. That `upload-report` script will upload the log to the ws-ci-runner proxy for viewing in a
browser at https://ws-ci-runner.securedrop.org, and will also post a commit status to Github,
with the `target_url` pointing to the HTTPS URL of that log file on the ws-ci-runner, and with
the status of the build. At this point, the commit shows either a green tick or a red cross and
has a link to the log file.

## Queuing and canceling builds

The webhook can handle multiple commits delivered to it. The jobs get issued to the dom0 with a
maximum `flock` wait of 86400s (24h).

If another job is already running, it means the lock is held, so the other jobs wait for the lock to
be released before starting.

Once the lock releases, one of the pending jobs will claim the lock and start running.

While a job is waiting, the commit in Github has a status of 'pending' with the message 'The build
is queued'.
The runner.py will detect if any of the commands succeed or fail. If a step fails, the
whole procedure is halted. In either case, a commit status is sent back to Github indicating
whether it was a success or a failure.

When a build starts, the commit status changes to a description of 'The build is running'. The
commit status state is technically still 'pending' because Github makes no distinction between
'queued' and 'running', except in the description field of the commit status.
8. At the end of the process, the server copies the log file from dom0 and stores it in the
same place that the commit status links to, for viewing later.

If you need to cancel a build that is queued, run `cancel.py --sha xxxxxxxx` on the sd-ssh VM. This
will:
9. The Qubes VM is then powered off.

- remove the codebase that was checked out to this commit on sd-ssh
- kill the pending process on the dom0 (by way of the qubes.SDCICanceler RPC script)
- update the git commit status at Github to say that this build was canceled by an administrator.
The commit status will now be of state 'error' with a red cross.
## Parallelization

## Automatic updates of the sd-ssh VM
The server is able to iterate until it finds a Qubes VM that is powered off. If it's off, it
assumes it is available for use.

The installation adds a systemd timer and script to perform daily updates of the sd-ssh VM to keep
it up to date.
If all Qubes VMs with the matching version are powered on, it's assumed that they are all busy
running CI runners already. In this case, it sleeps for up to 1 hour and keeps occasionally
retrying throughout.

0 comments on commit 86fe060

Please sign in to comment.