Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to continuous deployment #395

Open
thewilkybarkid opened this issue Aug 10, 2021 · 0 comments
Open

Move to continuous deployment #395

thewilkybarkid opened this issue Aug 10, 2021 · 0 comments

Comments

@thewilkybarkid
Copy link
Member

thewilkybarkid commented Aug 10, 2021

Currently, deployment involves pushing manually to two separate branches, then manually running a process in the Azure console. The goal should be that every change committed is built, tested and deployed without any human intervention.

This involves having processes and checks that are reliable (refs #388).

thewilkybarkid added a commit that referenced this issue Aug 10, 2021
This is a start at breaking up the monolithic workflows, by separating the build and deploy jobs and running them in sequence. This means that each job can be focused, rather than having to be aware of the rest of the process. For example, only the deploy job needs to know about the Azure CLI.

Refs #395
thewilkybarkid added a commit that referenced this issue Aug 10, 2021
Can't see it in the document, but reading the azure/webapps-deploy code it rewrites the configuration file with new image tags if specified. This means we don't have to rely on tags, such as latest or develop, that are changed over time, and so there's more clarity.

Refs #395
thewilkybarkid added a commit that referenced this issue Aug 10, 2021
The GitHub Actions syntax in a384f78 was incorrect, as the IMAGE_TAG variable was not interpolated and didn't exist in the Azure environment.

The Docker commands were working without interpolation, however, as the IMAGE_TAG is available in their environment.

As the deploy job passed when it should not have, I've opened Azure/webapps-deploy#189.

Refs #395
thewilkybarkid added a commit that referenced this issue Aug 10, 2021
GitHub Actions hides any secrets from the logs; the registry username is stored as a secret. As the username is just the name of the app itself the logs are hard to read, since unrelated uses of the word are hidden.

Usernames aren't secrets, they can be stored in the environment.

Refs #395
thewilkybarkid added a commit that referenced this issue Aug 10, 2021
Adds a Makefile for simplifying the use of potentially complicated commands in both CI and local development.

Refs #390, #395
thewilkybarkid added a commit that referenced this issue Aug 10, 2021
The IMAGE_TAG variable was accidentally overwritten in 7021c95, meaning that the first command that tried to use the real variable broke.

Refs #390, #395
thewilkybarkid added a commit that referenced this issue Aug 10, 2021
The linter was run with the --fix option, so any change that can be fixed automatically was being so, rather than reported as a failure.

This was only spotted as it caused the image to be rebuilt in 7021c95 following the code change.

Refs #395
thewilkybarkid added a commit that referenced this issue Aug 10, 2021
The 'latest' and 'develop' tags are redundant since de9445f, as specific images are used instead. But, rather than stop updating these, this separates the deployment from the retag, so it now happens as an optional step after deployment (if retagging fails, it shouldn't fail the workflow).

Refs #395
thewilkybarkid added a commit that referenced this issue Aug 10, 2021
thewilkybarkid added a commit that referenced this issue Aug 10, 2021
I thought the syntax respected semantic versioning; it might be it only works if the action has tags beginning with a 'v'.

Refs #395
thewilkybarkid added a commit that referenced this issue Aug 10, 2021
The develop workflow creates the image, so there's no need for the staging workflow to recreate it.

If the image hasn't been created by the develop workflow, then the deployment will fail. This should be the expected behaviour, rather than creating the image and not testing it fully.

Refs #395
thewilkybarkid added a commit that referenced this issue Aug 10, 2021
Uses their branch as their name, rather than the jobs that the workflow does.

Refs #395
thewilkybarkid added a commit that referenced this issue Aug 12, 2021
Docker Compose keeps containers and volumes after running by default. The integrations tests need a clean database, so running an environment beforehand can see indeterminate results. Rather than cleaning the database before starting, this removes the containers after running. It also removes the persistent database, which has little value.

Refs #388, #395
thewilkybarkid added a commit that referenced this issue Aug 13, 2021
When debugging integration test failures it's useful to see what the server did, so this writes all the Docker logs to a file.

Refs #388, #395
thewilkybarkid added a commit that referenced this issue Aug 13, 2021
This prevents Docker Compose from overriding the NODE_ENV variable, which causes different behaviour in the entry-point script. This means that it now has to cater for a non-development environment with an empty database.

Refs #395
thewilkybarkid added a commit that referenced this issue Aug 13, 2021
The build.target setting was introduced in version 3.4, and so the file needs to declare itself incompatible with earlier versions.

Refs #395
thewilkybarkid added a commit that referenced this issue Aug 17, 2021
By default, a shell script carries on even if a step fails. If, for example, a migration fails, the app starts up anyway with an unknown and possibly dangerous result.

This change causes a failure in any command to fail the whole script, except in cases where it's acceptable (as we test the exit code).

Refs #390, #395
thewilkybarkid added a commit that referenced this issue Aug 19, 2021
This change removes all usage of eslint-disable statements, most of which were invalid anyway (and luckily weren't hiding other issues).

Refs #395
thewilkybarkid added a commit that referenced this issue Aug 23, 2021
It hasn't been possible to log in when running production mode locally. I've just discovered that it's the use of NODE_ENV causing broken configuration when not running the site on a prereview.org domain. 66f2dfd had caused the NODE_ENV to be 'production' locally.

Using the NODE_ENV to change settings will always create problems; instead, we can calculate the setting based on the ORCID callback URL, which will work no matter where we run the site.

Refs #395
thewilkybarkid added a commit that referenced this issue Oct 8, 2021
When the code is changed, we might need migration files to ensure the database has the structure that the code expects. The ORM library can generate these, but they have to be manually run and checked. This change adds a check to CI to ensure that they are updated (i.e. the ORM doesn't find any differences).

I expect CI to fail, as they are not currently up to date.

Refs #388, #395, #400
thewilkybarkid added a commit that referenced this issue Oct 11, 2021
The database migrations are out of sync with the codebase:

1. The code doesn't know about some of the indexes in the migrations.
2. The code has indexes that don't exist in the migrations.
3. The code has misconfigured indexes; the ORM cannot read them but doesn't error until it tries to apply the broken SQL.

These problems caused a failure in CI as it now checks that no migrations are missing. This change removes the broken and unapplied changes and adds in code for the missing ones to allow CI to pass.

I'm not sure if any of the valid but unapplied indexes would help with performance issues, but they can be revisited later.

Refs #388, #395, #400, d4c18e4
thewilkybarkid added a commit that referenced this issue Oct 11, 2021
The Ubuntu behaves differently from my local macOS in that the reference /dev/null doesn't work if the input is empty (i.e. there are no untracked changes). Luckily GNU xargs has a switch to not run in these cases (i.e. it's possible on Ubuntu but not macOS).

Refs #388, #395, #400, 54be830
thewilkybarkid added a commit that referenced this issue Oct 26, 2021
Use of failure() in GitHub Actions causes a step to run if any previous step fails. We want only to try and store the integration test results if that particular step fails; otherwise, there's nothing to do.

We still have to use the global failure() condition as steps don't run if something fails.

Refs #388, #395
thewilkybarkid added a commit that referenced this issue Oct 27, 2021
This change separates Prettier from ESLint, which allows it to run correctly on all files.

Many files aren't currently formatted correctly, so I've added an optional step to the CI workflow.

Refs #395
thewilkybarkid added a commit that referenced this issue Oct 27, 2021
thewilkybarkid added a commit that referenced this issue Nov 25, 2021
Currently, if a script (e.g. database migrations) fails, the error is shown to the console and then swallowed. The app will then be in an unknown state, leading to strange behaviour.

This change makes sure that the script throws the error and the resulting exit code halts the process.

Refs #395
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant