Auto-deploy to production #106

bickelj · 2024-03-25T20:53:30Z

Last week I suggested it would not be a heavy lift to auto-deploy to production. The team requested it. In order to deploy to production more safely, I added PR #102 so that the deployment script can verify that deployment to test worked.

Without this change, someone needs to verify that the automated deployment to the test environment succeeded and then manually type two or three commands to deploy to production. At the moment, there is only one person who has ever done such production deployments. With this change, however, the deployment to the test environment gets verified by visiting a URL that exposes the current deployed version. When the body of a request sent to that URL returns the same string as the version (tag) sent to this deployment job, this action can safely conclude that the test deployment has succeeded. How and why can it make such a haughty conclusion? Several reasons: * The `deploy.sh` script saves the new version only on success. * The server in the `reverse-proxy` container serves the version that the `deploy.sh` script wrote. * The `reverse-proxy` container does not start until the `database` and `web` containers are running according to docker health checks. In other words, accessing this version string implies that the state of the environment is OK. If the version string matches in the test environment, this gives some confidence that auto-deploying to the production environment is relatively safe. Therefore it auto-deploys to production. This means that merging pull requests in the `service` repository is sufficient to cause automatic deployments to production and at the same time the production deployment is safe due to the same steps having run for a deployment to test. It means no more asking for some person to deploy to production anymore! Issue #106 Auto-deploy to production

Without this change, someone needs to verify that the automated deployment to the test environment succeeded and then manually type two or three commands to deploy to production. At the moment, there is only one person who has ever done such production deployments. With this change, however, the deployment to the test environment gets verified by visiting a URL that exposes the current deployed version. When the body of a request sent to that URL returns the same string as the version (tag) sent to this deployment job, this action can safely conclude that the test deployment has succeeded. How and why can it make such a haughty conclusion? Several reasons: * The `deploy.sh` script saves the new version only on success. * The server in the `reverse-proxy` container serves the version that the `deploy.sh` script wrote. * The `reverse-proxy` container does not start until the `database` and `web` containers are running according to docker health checks. In other words, accessing this version string implies that the state of the environment is OK. If the version string matches in the test environment, this gives some confidence that auto-deploying to the production environment is relatively safe. Therefore it auto-deploys to production. This means that merging pull requests in the `service` repository is sufficient to cause automatic deployments to production and at the same time the production deployment is safe due to the same steps having run for a deployment to test. It means no more asking for some person to deploy to production anymore! Since "send tag to machine" is indecipherable, the action is renamed. Issue #106 Auto-deploy to production

Issue #106 Auto-deploy to production

bickelj · 2024-03-28T14:09:10Z

I just tested the guard against production deployment (by pushing a tag on a branch) and that part seemed to work, see https://github.com/PhilanthropyDataCommons/deploy/actions/runs/8468483885/job/23201622661:

However the test deployment did not work.
Ah, it exited 4. What is that?
test ! -z "${KNOWN_HOSTS}" || exit 4
I suppose KNOWN_HOSTS needs to be passed in.

And I don't think the SSH stuff needs to be passed to the action that checks for a tag to be in main.

Before this commit, the `KNOWN_HOSTS` variable was not sent to `trigger_deployment.sh`, causing failures. It also seems inappropriate to pass secrets to 3rd-party actions if avoidable, so this commit also removes access to the secrets for the action that checks for the given tag to be on the main branch. Issue #106 Auto-deploy to production

bickelj · 2024-03-28T14:45:03Z

Oh, right, passing the same tag on the same (old) branch means the old workflow ran here: https://github.com/PhilanthropyDataCommons/deploy/actions/runs/8469064608/workflow, need to try a new tag on a rebased branch.

Issue #106 Auto-deploy to production

bickelj · 2024-03-28T14:59:41Z

The trigger_deployment.sh script ran successfully, but the action that polls a URL does not seem to be working as expected.

Locally,

$ curl https://api-test.philanthropydatacommons.org/software-version
20240328-cb1e2ff-throwaway

And the action partially works as expected in that it asks every 20 seconds successfully, nginx logs on the test host:

nginx 14:50:44.90 INFO  ==> ** Starting NGINX **
me - - [28/Mar/2024:14:51:00 +0000] "GET /software-version HTTP/1.1" 200  27 "-" "curl/7.x.x" "-"
github - - [28/Mar/2024:14:51:01 +0000] "GET /software-version HTTP/1.1" 200  27 "-" "-" "-"
github - - [28/Mar/2024:14:51:21 +0000] "GET /software-version HTTP/1.1" 200  27 "-" "-" "-"
github - - [28/Mar/2024:14:51:41 +0000] "GET /software-version HTTP/1.1" 200  27 "-" "-" "-"
github - - [28/Mar/2024:14:52:01 +0000] "GET /software-version HTTP/1.1" 200  27 "-" "-" "-"
github - - [28/Mar/2024:14:52:21 +0000] "GET /software-version HTTP/1.1" 200  27 "-" "-" "-"

And it timed out successfully after 10 minutes.

However, it should have succeeded. I wonder if I passed the wrong variable name. There was some inconsistency in the documentation about snake_case versus camelCase.

Uh, this is odd:

Error: Expected body: 20240328-cb1e2ff-throwaway, actual body: 20240328-cb1e2ff-throwaway

Those look pretty identical to me but maybe there's a whitespace difference.

Yes, the presence of an empty line 17 means there's a newline while the expected body does not have the newline.

bickelj · 2024-03-28T15:08:25Z

Two options:

Expect the newline
Remove the newline

Because this version ends up in files on a GNU system, I think we should keep it and expect it (option 1).

bickelj · 2024-03-28T15:18:09Z

Surprise: it is not clear how to best represent a line terminator in YAML.

yaml.info says you can use escape characters in a double-quoted string.
yaml.org says 1.1 treats line break characters one way while 1.2 treats them another.

I'll try the straightforward addition of \n to the quoted string and see what happens.

Without this change, the CI action that asks which version is running in a target environment would fail because of a line terminator difference in the expected string versus the returned string. This change attempts to fix the issue by adding an escaped endline to the expected response body. Issue #106 Auto-deploy to production

bickelj · 2024-03-28T15:31:42Z

Oh no, the function that reads the expectedBody string trims whitespace.

Next try: expectBodyRegex.

Without this change, the CI action that asks which version is running in a target environment would fail because of a line terminator difference in the expected string versus the returned string. This change attempts to fix the issue by using a regular expression that allows trailing whitespace. An attempt to add an escaped endline to the expected response body fails because inputs are trimmed in the action (at the time of this commit). Issue #106 Auto-deploy to production

bickelj · 2024-03-28T15:59:17Z

The outstanding issues should be resolved. It is time to try the full pipeline beginning with a service repo merge.

Edit: OK, I was hoping to find an easy dependabot PR to merge in service but no dice. I'll do the intermediate thing: push a tag here in deploy. I added 20240328-457c494 and pushed it. Results here.

Hmm, main branch check is not as expected either.

It might be with the way it's doing a git checkout. I see fetch-tags: false in the checkout action.

Probably that's an issue. Reproduced locally like this, following (most of) the commands run by the checkout action:

git init throwaway_deploy
cd throwaway_deploy
git remote add origin https://github.com/PhilanthropyDataCommons/deploy
git config --local gc.auto 0
git -c protocol.version=2 fetch --no-tags --prune --no-recurse-submodules --depth=20 origin +457c494dc3c01eb726cf739be4a149666e11cc34:refs/tags/20240328-457c494
git log --graph --decorate --all
...

And I do not see main in there. Suppose I do it without the --no-tags and replace with --tags, then I get an error. But regardless there should be a way to include tags in checkout on the action.

In order to check whether the checked-out tag is part of the main branch, the tags (including branch names) need to be present for the check to succeed. Issue #106 Auto-deploy to production

bickelj · 2024-03-28T16:40:00Z

Ah, here is how it's supposed to look when the check-if-this-tag-is-in-that-branch action successfully determines that it does not exist in that branch:

The failures earlier were actual errors when running the action. This looks much better:

[action-contains-tag] Branch 'remotes/origin/main' does not contain tag '20240328-3e030bd-throwaway'.

bickelj · 2024-03-28T16:54:24Z

I pushed tag 20240328-b785d59 which should cause both a test and production deployment in sequence assuming test deployment works. If that all works, this issue should be resolved. It might still be nice to see it happen when triggered from the service repo, though.

Good:

[action-contains-tag] Branch 'remotes/origin/main' contains tag '20240328-b785d59'.

bickelj · 2024-03-28T16:59:27Z

Apparently I enabled a protection rule at https://github.com/PhilanthropyDataCommons/deploy/settings/environments/556150864/edit a long time ago:

Deployment protection rules

Configure reviewers, timers, and custom rules that must pass before deployments to this environment can proceed.
Required reviewers

Specify people or teams that may approve workflow runs when they access this environment.

I'm removing that review requirement.

bickelj · 2024-03-28T17:04:34Z

It should work fine. If not, re-open.

bickelj self-assigned this Mar 25, 2024

bickelj mentioned this issue Mar 25, 2024

Auto-deploy to production after test deployment #107

Merged

bickelj added a commit that referenced this issue Mar 28, 2024

Update auto-deploy documentation

0150c8e

Issue #106 Auto-deploy to production

bickelj mentioned this issue Mar 28, 2024

Pass env vars appropriately to deployment CI jobs #108

Merged

bickelj added a commit that referenced this issue Mar 28, 2024

Update auto-deploy documentation

cb1e2ff

Issue #106 Auto-deploy to production

bickelj mentioned this issue Mar 28, 2024

Expect a line terminator in software-version URL #109

Merged

bickelj mentioned this issue Mar 28, 2024

Fetch tags to check main branch during auto-deploy #110

Merged

bickelj closed this as completed Mar 28, 2024

bickelj mentioned this issue Mar 29, 2024

Remove pointless tests PhilanthropyDataCommons/service#857

Merged

bickelj mentioned this issue Apr 17, 2024

Change deploy model PhilanthropyDataCommons/service#622

Closed

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Auto-deploy to production #106

Auto-deploy to production #106

bickelj commented Mar 25, 2024

bickelj commented Mar 28, 2024 •

edited

Loading

bickelj commented Mar 28, 2024

bickelj commented Mar 28, 2024 •

edited

Loading

bickelj commented Mar 28, 2024

bickelj commented Mar 28, 2024

bickelj commented Mar 28, 2024 •

edited

Loading

bickelj commented Mar 28, 2024 •

edited

Loading

bickelj commented Mar 28, 2024

bickelj commented Mar 28, 2024 •

edited

Loading

bickelj commented Mar 28, 2024

bickelj commented Mar 28, 2024

Auto-deploy to production #106

Auto-deploy to production #106

Comments

bickelj commented Mar 25, 2024

bickelj commented Mar 28, 2024 • edited Loading

bickelj commented Mar 28, 2024

bickelj commented Mar 28, 2024 • edited Loading

bickelj commented Mar 28, 2024

bickelj commented Mar 28, 2024

bickelj commented Mar 28, 2024 • edited Loading

bickelj commented Mar 28, 2024 • edited Loading

bickelj commented Mar 28, 2024

bickelj commented Mar 28, 2024 • edited Loading

bickelj commented Mar 28, 2024

bickelj commented Mar 28, 2024

bickelj commented Mar 28, 2024 •

edited

Loading

bickelj commented Mar 28, 2024 •

edited

Loading

bickelj commented Mar 28, 2024 •

edited

Loading

bickelj commented Mar 28, 2024 •

edited

Loading

bickelj commented Mar 28, 2024 •

edited

Loading