Move to continuous deployment #395

thewilkybarkid · 2021-08-10T08:46:23Z

Currently, deployment involves pushing manually to two separate branches, then manually running a process in the Azure console. The goal should be that every change committed is built, tested and deployed without any human intervention.

This involves having processes and checks that are reliable (refs #388).

This is a start at breaking up the monolithic workflows, by separating the build and deploy jobs and running them in sequence. This means that each job can be focused, rather than having to be aware of the rest of the process. For example, only the deploy job needs to know about the Azure CLI. Refs #395

Can't see it in the document, but reading the azure/webapps-deploy code it rewrites the configuration file with new image tags if specified. This means we don't have to rely on tags, such as latest or develop, that are changed over time, and so there's more clarity. Refs #395

The GitHub Actions syntax in a384f78 was incorrect, as the IMAGE_TAG variable was not interpolated and didn't exist in the Azure environment. The Docker commands were working without interpolation, however, as the IMAGE_TAG is available in their environment. As the deploy job passed when it should not have, I've opened Azure/webapps-deploy#189. Refs #395

GitHub Actions hides any secrets from the logs; the registry username is stored as a secret. As the username is just the name of the app itself the logs are hard to read, since unrelated uses of the word are hidden. Usernames aren't secrets, they can be stored in the environment. Refs #395

Adds a Makefile for simplifying the use of potentially complicated commands in both CI and local development. Refs #390, #395

The IMAGE_TAG variable was accidentally overwritten in 7021c95, meaning that the first command that tried to use the real variable broke. Refs #390, #395

The linter was run with the --fix option, so any change that can be fixed automatically was being so, rather than reported as a failure. This was only spotted as it caused the image to be rebuilt in 7021c95 following the code change. Refs #395

The 'latest' and 'develop' tags are redundant since de9445f, as specific images are used instead. But, rather than stop updating these, this separates the deployment from the retag, so it now happens as an optional step after deployment (if retagging fails, it shouldn't fail the workflow). Refs #395

Refs #395

I thought the syntax respected semantic versioning; it might be it only works if the action has tags beginning with a 'v'. Refs #395

The develop workflow creates the image, so there's no need for the staging workflow to recreate it. If the image hasn't been created by the develop workflow, then the deployment will fail. This should be the expected behaviour, rather than creating the image and not testing it fully. Refs #395

Uses their branch as their name, rather than the jobs that the workflow does. Refs #395

Docker Compose keeps containers and volumes after running by default. The integrations tests need a clean database, so running an environment beforehand can see indeterminate results. Rather than cleaning the database before starting, this removes the containers after running. It also removes the persistent database, which has little value. Refs #388, #395

When debugging integration test failures it's useful to see what the server did, so this writes all the Docker logs to a file. Refs #388, #395

This prevents Docker Compose from overriding the NODE_ENV variable, which causes different behaviour in the entry-point script. This means that it now has to cater for a non-development environment with an empty database. Refs #395

The build.target setting was introduced in version 3.4, and so the file needs to declare itself incompatible with earlier versions. Refs #395

By default, a shell script carries on even if a step fails. If, for example, a migration fails, the app starts up anyway with an unknown and possibly dangerous result. This change causes a failure in any command to fail the whole script, except in cases where it's acceptable (as we test the exit code). Refs #390, #395

This change removes all usage of eslint-disable statements, most of which were invalid anyway (and luckily weren't hiding other issues). Refs #395

It hasn't been possible to log in when running production mode locally. I've just discovered that it's the use of NODE_ENV causing broken configuration when not running the site on a prereview.org domain. 66f2dfd had caused the NODE_ENV to be 'production' locally. Using the NODE_ENV to change settings will always create problems; instead, we can calculate the setting based on the ORCID callback URL, which will work no matter where we run the site. Refs #395

When the code is changed, we might need migration files to ensure the database has the structure that the code expects. The ORM library can generate these, but they have to be manually run and checked. This change adds a check to CI to ensure that they are updated (i.e. the ORM doesn't find any differences). I expect CI to fail, as they are not currently up to date. Refs #388, #395, #400

The database migrations are out of sync with the codebase: 1. The code doesn't know about some of the indexes in the migrations. 2. The code has indexes that don't exist in the migrations. 3. The code has misconfigured indexes; the ORM cannot read them but doesn't error until it tries to apply the broken SQL. These problems caused a failure in CI as it now checks that no migrations are missing. This change removes the broken and unapplied changes and adds in code for the missing ones to allow CI to pass. I'm not sure if any of the valid but unapplied indexes would help with performance issues, but they can be revisited later. Refs #388, #395, #400, d4c18e4

The Ubuntu behaves differently from my local macOS in that the reference /dev/null doesn't work if the input is empty (i.e. there are no untracked changes). Luckily GNU xargs has a switch to not run in these cases (i.e. it's possible on Ubuntu but not macOS). Refs #388, #395, #400, 54be830

Use of failure() in GitHub Actions causes a step to run if any previous step fails. We want only to try and store the integration test results if that particular step fails; otherwise, there's nothing to do. We still have to use the global failure() condition as steps don't run if something fails. Refs #388, #395

This change separates Prettier from ESLint, which allows it to run correctly on all files. Many files aren't currently formatted correctly, so I've added an optional step to the CI workflow. Refs #395

Refs #395

Refs #395, #398

Currently, if a script (e.g. database migrations) fails, the error is shown to the console and then swallowed. The app will then be in an unknown state, leading to strange behaviour. This change makes sure that the script throws the error and the resulting exit code halts the process. Refs #395

thewilkybarkid mentioned this issue Aug 10, 2021

Control infrastructure from code #396

Open

thewilkybarkid added a commit that referenced this issue Aug 10, 2021

build(make): encapsulate tasks in Make

7021c95

Adds a Makefile for simplifying the use of potentially complicated commands in both CI and local development. Refs #390, #395

thewilkybarkid added a commit that referenced this issue Aug 10, 2021

build(make): allow existing IMAGE_TAG to be used

1dbf8e6

The IMAGE_TAG variable was accidentally overwritten in 7021c95, meaning that the first command that tried to use the real variable broke. Refs #390, #395

thewilkybarkid added a commit that referenced this issue Aug 10, 2021

ci: fix syntax errors

8e5f328

Refs #395

thewilkybarkid added a commit that referenced this issue Aug 10, 2021

ci: fix GitHub Action tags

9405fec

I thought the syntax respected semantic versioning; it might be it only works if the action has tags beginning with a 'v'. Refs #395

thewilkybarkid added a commit that referenced this issue Aug 10, 2021

ci: simplify workflow names

d0ad089

Uses their branch as their name, rather than the jobs that the workflow does. Refs #395

thewilkybarkid added a commit that referenced this issue Aug 13, 2021

ci(integration): record Docker logs

68899fc

When debugging integration test failures it's useful to see what the server did, so this writes all the Docker logs to a file. Refs #388, #395

thewilkybarkid added a commit that referenced this issue Aug 13, 2021

build(docker): use correct Docker Compose file version

4718f3c

The build.target setting was introduced in version 3.4, and so the file needs to declare itself incompatible with earlier versions. Refs #395

thewilkybarkid added a commit that referenced this issue Aug 19, 2021

ci: lint all the backend code

655f4e2

This change removes all usage of eslint-disable statements, most of which were invalid anyway (and luckily weren't hiding other issues). Refs #395

thewilkybarkid added a commit that referenced this issue Oct 27, 2021

build(prettier): ignore generated files

f814632

Refs #395

thewilkybarkid added a commit that referenced this issue Nov 2, 2021

refactor: use TypeScript for Jest configuration

90585a0

Refs #395, #398

thewilkybarkid mentioned this issue Nov 2, 2021

Lint the whole codebase #422

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move to continuous deployment #395

Move to continuous deployment #395

thewilkybarkid commented Aug 10, 2021 •

edited

Loading

Move to continuous deployment #395

Move to continuous deployment #395

Comments

thewilkybarkid commented Aug 10, 2021 • edited Loading

thewilkybarkid commented Aug 10, 2021 •

edited

Loading