-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify app deployments using aws ecs register-task-definition #150
Conversation
262bc45
to
f429b2a
Compare
Currently this is only wired up end-to-end for content-store. The other apps have placeholder values, which will be filled in in future commits. The idea here is to get terraform to template out the task definition as JSON, which will later be passed to the AWS CLI: ``` aws ecs register-task-definition --cli-input-json file://terraform-outputs.json ``` This allows us to decouple the config management (which terraform is good at) from the application deployments (which terraform is not good at), and get the best of both worlds. A few extra points of interest in this commit: - I've prematurely moved the environment variables which I think will be the same in every app into a `defaults.tf` file, which declares them using locals. This should reduce the amount of duplication as we move the other apps over. - Rather than hardcoding "eu-west-1" everywhere, I've created a `region.tf` file which uses the aws_region datasource. - Rather than fetching the secrets datasources in app_content_store.tf I've moved them into their own file, because this feels less noisy. - There were a few places where we needed to fetch remote state (from govuk-aws) - I've moved these all into `remote_state.tf` - The new `output` for the things related to content-store is a map with `draft` and `live` keys, as this makes it a bit easier to pass around in concourse. Once this is wired up in concourse, we'll be able to remove deployments/apps/content-store, as that won't be needed anymore.
The JSON needed to call `aws ecs register-task-definition --cli-input-json` is available in the terraform outputs now. This means that instead of needing to apply a separate terraform deployment to create a new content-store task-definition, we can just use the AWS CLI to create it. To pass the terraform output from the run-terraform job to the deploy-content-store job, we're using an S3 resource. This means that once run-terraform completes, we pull down the terraform outputs, and then write them to an S3 bucket. They're then downloaded, and passed as an input to the deploy-content-store job. The S3 resource uses versioning, so old versions of the terraform outputs will be recoverable too. To prevent the deploy-content-store job from triggering every time run-terraform passes, I've done a slightly hacky thing. The run-terraform job also takes the _old_ version of the terraform outputs as an input, compares the new version with the old version, and if they're the same, deletes the new version to prevent it from being uploaded to S3. The effect of this is that when the outputs change, a new version of the resource is created, but otherwise it won't trigger any downstream jobs. Other things to note in this commit: - I've created v2 tasks, instead of trying to make the old tasks forward and backward compatible. We can remove the old tasks once all the apps are using the new ones. - I was able to replace the govuk-infrastructure-content-store resource with a govuk-infrastructure-concourse-tasks resource, since the only things in this repo that affect the deploy-content-store job are the tasks in concourse/tasks now. Currently this is only used by the deploy-content-store job, but all the other apps should be able to use it too, once they're doing things in the new style.
This is no longer needed now we're using the new-style `aws ecs register-task-definition --cli-input-json` deployment stylie. 🎉
eaeaafa
to
f7ca68b
Compare
concourse/pipelines/deploy.yml
Outdated
@@ -291,9 +309,27 @@ jobs: | |||
-auto-approve | |||
|
|||
terraform output -json > "$root_dir/govuk-terraform-outputs/govuk-terraform-outputs.json" | |||
|
|||
for app in content-store; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may have missed something but where is content-store
defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh, this is really gross code 😆 Shellcheck was telling me off for it too.
content-store
isn't a variable here, it's the literal string "content-store"
. The loop is there so that when we do the next app we can just add it like:
for app in content-store frontend; do
... perhaps that's premature optimisation and I should have started with app="content-store"
and dropped the loop.
Ideally I would have used an array here, but the shell is dash
, not bash
, so there are no arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I got confused by it being without quotes. I always wondered why they make life so difficult and do not provide bash, does it make the container that much bigger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're piggy backing on GOV.UK PaaS's images.
We get dash
because they chose to do FROM alpine
, which tries really hard to keep itself small. dash
instead of bash
is just one part of that, but it is definitely annoying. It's also annoying when you do fly hijack
, because by default that tries to use bash
, which then errors because it's not in the container. You have to explicitly tell it to use sh
.
At some point we'll want to own our own concourse task images, at which point we might decide to use bash as the shell instead...
On the other hand, if I'd had bash here I probably would have done some even more confusing stuff with arrays, so 🤷🏻 maybe it's a good thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the for loop to a function in 0c3c0dd
concourse/pipelines/deploy.yml
Outdated
@@ -291,9 +309,27 @@ jobs: | |||
-auto-approve | |||
|
|||
terraform output -json > "$root_dir/govuk-terraform-outputs/govuk-terraform-outputs.json" | |||
|
|||
for app in content-store; do | |||
terraform output -json "${app}" > "$root_dir/updated-${app}-terraform-outputs/${app}.json" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if a new task to do the app-based terraform output comparison should be created? Because how will this code be scaled for the dozens of apps that we have? Maybe a parameterizable task would be more appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intention is just to keep adding app names to the for loop. So it will end up like:
for app in content-store frontend publisher publshing-api router router-api etc...; do
terraform output -json "${app}" ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that's a bit disgusting though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the for loop to a function in 0c3c0dd - hopefully that's a little bit less confusing?
value = { | ||
draft = { | ||
task_definition_cli_input_json = module.draft_content_store.cli_input_json, | ||
security_groups = module.draft_content_store.security_groups, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on the concourse pipeline, am I correct to say that there will be a task definition/service update if the security-group is only updated by not the cli_input_json? One would think only the cli_input_json
warrants a new task definition/service update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, you're quite right.
There are some downstream deploys (not for content store, yet, but for publisher) which do depend on the security group though - the run-task thing we're using for DB migrations needs to know the security groups (so the task has permission to talk to the DB).
At the moment, those tasks get deployed in the same job as the task definition, so we want them to trigger when the SGs change as well as when the task definition changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ummm... I'm not convinced about this because the security groups to the DBs are set once while the DB migrations happen when the developer introduces a new schema/migration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good work on simplifying the terraform; I like that it is a flatter structure now. I made some comments for your consideration in line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some comments after you clarified the for
loop, thanks.
concourse/pipelines/deploy.yml
Outdated
@@ -291,9 +309,27 @@ jobs: | |||
-auto-approve | |||
|
|||
terraform output -json > "$root_dir/govuk-terraform-outputs/govuk-terraform-outputs.json" | |||
|
|||
for app in content-store; do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I got confused by it being without quotes. I always wondered why they make life so difficult and do not provide bash, does it make the container that much bigger?
value = { | ||
draft = { | ||
task_definition_cli_input_json = module.draft_content_store.cli_input_json, | ||
security_groups = module.draft_content_store.security_groups, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by tying both live and draft together, you assume that one draft can't be updated independently from live/vice versa?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is by design, on the assumption that we won't ever be in the situation where we want to deploy draft independently from live, or web independently from worker.
I think that's more or less the situation with Jenkins at the moment, although I haven't checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had a look at what Jenkins does.
There's just one "Deploy_App" job, which specifies the app (say "content-store"). You don't get to specify draft or live.
Capistrano has a set_servers
rake task, which uses ovuk_node_list -c content-store
to get a list of servers to deploy to. That will include draft and live versions.
So people should already be used to the idea that it's not possible to deploy draft and live independently. Personally, I think this constraint is a good thing - I can't think of a legitimate reason to have them on different versions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was more thinking if we want to tweak CPU/memory of live instances because of increased traffic, we may not need to do that for the draft ones. Anyway, we can cross that bridge when we get there if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, they do get different task definitions, so they don't have to have the same memory / CPU. They just have to be deployed at the same time.
In https://github.com/alphagov/govuk-infrastructure/pull/150/files#r567817592 the use of a for loop with a single, static string caused confusion. This was always a bit of a hack to try to be future proof. Even in the future though, this would get ugly: ``` for app in content-store frontend publisher publshing-api router router-api etc...; do ... ``` Using a function instead will make it a bit clearer what's happening when there are multiple apps: ``` ... update_terraform_outputs content-store update_terraform_outputs frontend update_terraform_outputs publisher update_terraform_outputs publshing-api update_terraform_outputs router update_terraform_outputs router-api ```
Shorter words are better, and I think these terms are clearer too. Also they're both three letters, so the code lines up more nicely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
most of my questions have been answered, thanks. I'm happy to approve.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Let's try it 👍
This seems like a good approach, given we've struggled with managing task definition revisions/deploys through a mix of Terraform and the AWS CLI. Just using terraform outputs and the AWS CLI hopefully will simplify things a bit (like less directory spelunking in Concourse tasks).
I found the Concourse pipeline a little confusing. It would be great to have some documentation beyond this PR description (perhaps you could move your comment into the docs
directory?) - not a blocker to merging.
This rewrites the way we deploy publisher, following on from the refactor of content store in #150. The intent is to simplify the way we deploy publisher (web and worker) and run tasks using the publisher task definition. This change: - removes the deployment modules for publisher-web and publisher-worker - adds logic to the govuk-publishing-platform/publisher_app file to generate the JSON required to register the web and worker task definitions for publisher - removes the now unecessary task definition module for publisher - modifies the deploy pipeline for publisher to register the publisher task definitions using the AWS CLi, rather than Terraform - modifies the deploy pipeline to store the JSON generated by the big Terraform apply (govuk-publishing-platform) in S3, so we can use it to register task definitions, overriding only the IMAGE key. - modifies the deploy pipeline so that we can run db migrations for publisher, which will break smokey, and anything that uses run-task temporarily.
This rewrites the way we deploy publisher, following on from the refactor of content store in #150. The intent is to simplify the way we deploy publisher (web and worker) and run tasks using the publisher task definition. This change: - removes the deployment modules for publisher-web and publisher-worker - adds logic to the govuk-publishing-platform/publisher_app file to generate the JSON required to register the web and worker task definitions for publisher - removes the now unecessary task definition module for publisher - modifies the deploy pipeline for publisher to register the publisher task definitions using the AWS CLi, rather than Terraform - modifies the deploy pipeline to store the JSON generated by the big Terraform apply (govuk-publishing-platform) in S3, so we can use it to register task definitions, overriding only the IMAGE key. - modifies the deploy pipeline so that we can run db migrations for publisher, which will break smokey, and anything that uses run-task temporarily.
This rewrites the way we deploy publisher, following on from the refactor of content store in #150. The intent is to simplify the way we deploy publisher (web and worker) and run tasks using the publisher task definition. This change: - removes the deployment modules for publisher-web and publisher-worker - adds logic to the govuk-publishing-platform/publisher_app file to generate the JSON required to register the web and worker task definitions for publisher - removes the now unecessary task definition module for publisher - modifies the deploy pipeline for publisher to register the publisher task definitions using the AWS CLi, rather than Terraform - modifies the deploy pipeline to store the JSON generated by the big Terraform apply (govuk-publishing-platform) in S3, so we can use it to register task definitions, overriding only the IMAGE key. - modifies the deploy pipeline so that we can run db migrations for publisher, which will break smokey, and anything that uses run-task temporarily.
This rewrites the way we deploy publisher, following on from the refactor of content store in #150. The intent is to simplify the way we deploy publisher (web and worker) and run tasks using the publisher task definition. This change: - removes the deployment modules for publisher-web and publisher-worker - adds logic to the govuk-publishing-platform/publisher_app file to generate the JSON required to register the web and worker task definitions for publisher - removes the now unecessary task definition module for publisher - modifies the deploy pipeline for publisher to register the publisher task definitions using the AWS CLi, rather than Terraform - modifies the deploy pipeline to store the JSON generated by the big Terraform apply (govuk-publishing-platform) in S3, so we can use it to register task definitions, overriding only the IMAGE key. - modifies the deploy pipeline so that we can run db migrations for publisher, which will break smokey, and anything that uses run-task temporarily.
This rewrites the way we deploy publisher, following on from the refactor of content store in #150. The intent is to simplify the way we deploy publisher (web and worker) and run tasks using the publisher task definition. This change: - removes the deployment modules for publisher-web and publisher-worker - adds logic to the govuk-publishing-platform/publisher_app file to generate the JSON required to register the web and worker task definitions for publisher - removes the now unecessary task definition module for publisher - modifies the deploy pipeline for publisher to register the publisher task definitions using the AWS CLi, rather than Terraform - modifies the deploy pipeline to store the JSON generated by the big Terraform apply (govuk-publishing-platform) in S3, so we can use it to register task definitions, overriding only the IMAGE key. - modifies the deploy pipeline so that we can run db migrations for publisher, which will break smokey, and anything that uses run-task temporarily.
This rewrites the way we deploy publisher, following on from the refactor of content store in #150. The intent is to simplify the way we deploy publisher (web and worker) and run tasks using the publisher task definition. This change: - removes the deployment modules for publisher-web and publisher-worker - adds logic to the govuk-publishing-platform/publisher_app file to generate the JSON required to register the web and worker task definitions for publisher - removes the now unecessary task definition module for publisher - modifies the deploy pipeline for publisher to register the publisher task definitions using the AWS CLi, rather than Terraform - modifies the deploy pipeline to store the JSON generated by the big Terraform apply (govuk-publishing-platform) in S3, so we can use it to register task definitions, overriding only the IMAGE key. - modifies the deploy pipeline so that we can run db migrations for publisher, which will break smokey, and anything that uses run-task temporarily.
This rewrites the way we deploy publisher, following on from the refactor of content store in #150. The intent is to simplify the way we deploy publisher (web and worker) and run tasks using the publisher task definition. This change: - removes the deployment modules for publisher-web and publisher-worker - adds logic to the govuk-publishing-platform/publisher_app file to generate the JSON required to register the web and worker task definitions for publisher - removes the now unecessary task definition module for publisher - modifies the deploy pipeline for publisher to register the publisher task definitions using the AWS CLi, rather than Terraform - modifies the deploy pipeline to store the JSON generated by the big Terraform apply (govuk-publishing-platform) in S3, so we can use it to register task definitions, overriding only the IMAGE key. - modifies the deploy pipeline so that we can run db migrations for publisher, which will break smokey, and anything that uses run-task temporarily.
This change outputs the JSON config for signon's task definition, so we can create the task definition using aws ecs register-task-definition. This follows PR #150 in refactoring the way we deploy applications.
This change outputs the JSON config for signon's task definition, so we can create the task definition using aws ecs register-task-definition. This follows PR #150 in refactoring the way we deploy applications.
In PR #150, we refactored the way we deploy applications by outputting the application task definition json when applying the govuk platforming terraform. This PR refactors (draft-)static app to use this new deployment style Ref: 1. [trello card](https://trello.com/c/5JT7LE7k/382-update-the-static-application-to-use-the-new-deployment-approach) This should look similar to #152. We no longer register new task definitions in Terraform. Instead, our Big Terraform (:muscle:) apply outputs a task definition JSON which is passed to aws ecs register-task-definition. This should make the Terraform simpler. See PRs #150 and #152, and the design doc (internal) for the motivation behind this refactoring. <https://github.com/alphagov/govuk-infrastructure|alphagov/govuk-infrastructure>alphagov/govuk-infrastructure | Today at 4:54 PM | Added by GitHub
In PR #150, we refactored the way we deploy applications by outputting the application task definition json when applying the govuk platforming terraform. This json file is then diffed with previous version and if changes exist, a new task definition is created for the app using AWS cli. This PR refactors (draft-)static app to use this new deployment style. Similar PRs are #152 and #153 for other apps. Ref: 1. [trello card](https://trello.com/c/5JT7LE7k/382-update-the-static-application-to-use-the-new-deployment-approach)
This change follows PR #150. Applying the govuk-publishing-platform module will output the task definition and network configuration required to run Smoke tests from an ECS task. We won't use Terraform to register the task definition. Instead we'll output the JSON required by aws ecs register-task-definition.
This switches smokey to the new method of registering task definitions in ECS, following PR #150.
In PR #150, we refactored the way we deploy applications by outputting the application task definition json when applying the govuk platforming terraform. This json file is then diffed with previous version and if changes exist, a new task definition is created for the app using AWS cli. This PR refactors (draft-)static app to use this new deployment style. Similar PRs are #152 and #153 for other apps. Ref: 1. [trello card](https://trello.com/c/5JT7LE7k/382-update-the-static-application-to-use-the-new-deployment-approach)
In PR #150, we refactored the way we deploy applications by outputting the application task definition json when applying the govuk platforming terraform. This json file is then diffed with previous version and if changes exist, a new task definition is created for the app using AWS cli. This PR refactors (draft-)static app to use this new deployment style. Similar PRs are #152 and #153 for other apps. Ref: 1. [trello card](https://trello.com/c/5JT7LE7k/382-update-the-static-application-to-use-the-new-deployment-approach)
In PR #150, we refactored the way we deploy applications by outputting the application task definition json when applying the govuk platforming terraform. This json file is then diffed with previous version and if changes exist, a new task definition is created for the app using AWS cli. This PR refactors (draft-)router-api app to use this new deployment style. Similar PRs are #152, #153 and #154 for other apps. Ref: 1. [trello card](https://trello.com/c/HDvhDo1t/379-update-the-router-api-application-to-use-the-new-deployment-approach)
In PR #150, we refactored the way we deploy applications by outputting the application task definition json when applying the govuk platforming terraform. This json file is then diffed with previous version and if changes exist, a new task definition is created for the app using AWS cli. This PR refactors (draft-)router-api app to use this new deployment style. Similar PRs are #152, #153 and #154 for other apps. Ref: 1. [trello card](https://trello.com/c/HDvhDo1t/379-update-the-router-api-application-to-use-the-new-deployment-approach)
In PR #150, we refactored the way we deploy applications by outputting the application task definition json when applying the govuk platforming terraform. This json file is then diffed with previous version and if changes exist, a new task definition is created for the app using AWS cli. This PR refactors (draft-)router-api app to use this new deployment style. Similar PRs are #152, #153 and #154 for other apps. Ref: 1. [trello card](https://trello.com/c/HDvhDo1t/379-update-the-router-api-application-to-use-the-new-deployment-approach)
In PR #150, we refactored the way we deploy applications by outputting the application task definition json when applying the govuk platforming terraform. This json file is then diffed with previous version and if changes exist, a new task definition is created for the app using AWS cli. This PR refactors (draft-)router-api app to use this new deployment style. Similar PRs are #152, #153 and #154 for other apps. Ref: 1. [trello card](https://trello.com/c/HDvhDo1t/379-update-the-router-api-application-to-use-the-new-deployment-approach)
In PR #150, we refactored the way we deploy applications by outputting the application task definition json when applying the govuk platforming terraform. This json file is then diffed with previous version and if changes exist, a new task definition is created for the app using AWS cli. This PR refactors (draft-)router-api app to use this new deployment style. Similar PRs are #152, #153 and #154 for other apps. Ref: 1. [trello card](https://trello.com/c/HDvhDo1t/379-update-the-router-api-application-to-use-the-new-deployment-approach)
This change follows PR #150. Applying the govuk-publishing-platform module will output the task definition and network configuration required to run Smoke tests from an ECS task. We won't use Terraform to register the task definition. Instead we'll output the JSON required by aws ecs register-task-definition.
This switches smokey to the new method of registering task definitions in ECS, following PR #150.
In PR #150, we refactored the way we deploy applications by outputting the application task definition json when applying the govuk platforming terraform. This json file is then diffed with previous version and if changes exist, a new task definition is created for the app using AWS cli. This PR refactors (draft-)router-api app to use this new deployment style. Similar PRs are #152, #153 and #154 for other apps. Ref: 1. [trello card](https://trello.com/c/HDvhDo1t/379-update-the-router-api-application-to-use-the-new-deployment-approach)
In PR #150, we refactored the way we deploy applications by outputting the application task definition json when applying the govuk platforming terraform. This json file is then diffed with previous version and if changes exist, a new task definition is created for the app using AWS cli. This PR refactors (draft-)router app to use this new deployment style. Similar PRs are #152, #153 and #154 for other apps. Ref: 1. [trello card](https://trello.com/c/1lZxLN8V/380-update-the-router-application-to-use-the-new-deployment-approach)
In PR #150, we refactored the way we deploy applications by outputting the application task definition json when applying the govuk platforming terraform. This json file is then diffed with previous version and if changes exist, a new task definition is created for the app using AWS cli. This PR refactors (draft-)router app to use this new deployment style. Similar PRs are #152, #153 and #154 for other apps. Ref: 1. [trello card](https://trello.com/c/1lZxLN8V/380-update-the-router-application-to-use-the-new-deployment-approach)
This follows PR #150 in changing how we register new task definitions. Rather than having a separate state file for each task definition, we output the task definition from the main govuk-publishing-platform module, and use the task definition JSON to register the task definition using the AWS CLI. This should result in less boilerplate in the Terraform config and fewer state refreshes when registering a task definition, https://trello.com/c/oA5wjym7/378-update-the-publishing-api-application-to-use-the-new-deployment-approach
This follows PR #150 in changing how we register new task definitions. Rather than having a separate state file for each task definition, we output the task definition from the main govuk-publishing-platform module, and use the task definition JSON to register the task definition using the AWS CLI. This should result in less boilerplate in the Terraform config and fewer state refreshes when registering a task definition, https://trello.com/c/oA5wjym7/378-update-the-publishing-api-application-to-use-the-new-deployment-approach
In PR #150, we refactored the way we deploy applications by outputting the application task definition json when applying the govuk platforming terraform. This json file is then diffed with previous version and if changes exist, a new task definition is created for the app using AWS cli. This PR refactors (draft-)router-api app to use this new deployment style. Similar PRs are #152, #153 and #154 for other apps. Ref: 1. [trello card](https://trello.com/c/HDvhDo1t/379-update-the-router-api-application-to-use-the-new-deployment-approach)
In PR #150, we refactored the way we deploy applications by outputting the application task definition json when applying the govuk platforming terraform. This json file is then diffed with previous version and if changes exist, a new task definition is created for the app using AWS cli. This PR refactors (draft-)router app to use this new deployment style. Similar PRs are #152, #153 and #154 for other apps. Ref: 1. [trello card](https://trello.com/c/1lZxLN8V/380-update-the-router-application-to-use-the-new-deployment-approach)
This follows PR #150 in changing how we register new task definitions. Rather than having a separate state file for each task definition, we output the task definition from the main govuk-publishing-platform module, and use the task definition JSON to register the task definition using the AWS CLI. This should result in less boilerplate in the Terraform config and fewer state refreshes when registering a task definition, https://trello.com/c/oA5wjym7/378-update-the-publishing-api-application-to-use-the-new-deployment-approach
Having separate terraform deployments for each application results in a lot of boilerplate, and maintenance headaches. Instead, we can simplify the pipeline by emitting the JSON required to register a task definition from the main terraform, and passing that to
aws ecs register-task-definition
using the--cli-input-json
flag.This PR might be a little tricky to review from a cold start, so let me explain it as a play in two acts. I'm going to be avante garde and work backwards, from the end to the beginning. I think it's easier to understand it that way (but if you don't like it, the individual commits have a dry and boring description of the work).
Act II: In which concourse deploys the app
Enter stage left: Concourse
Enter stage right: the AWS CLI
The first file to look at is concourse/tasks/update-task-definition.sh.
This shell script runs:
aws ecs register-task-definition \ --cli-input-json "file://task-definition.json"
Which is the absolute core of the idea in this PR. This replaces the old
terraform apply
approach, which the old version of update-task-definition used.Where does it get
file://task-definition.json
from though?That's a slightly modified version of
app-terraform-outputs/content-store.json
, which is provided as an input to the task.The modification is done by
jq
, which fishes out the right task definition for this variant (live
ordraft
) and overrides the docker image in the first container definition:When the task finishes, it writes the ARN of the new task definition to an output, which is then used by the
update-ecs-service
to update the service. I've created av2
version ofupdate-ecs-service
in this PR too, but the differences to what was there before are mostly cosmetic.Where does
app-terraform-outputs
come from though?It comes from a new S3 resource,
content-store-terraform-outputs
. This is provided as aget
step to the deploy-content-store job, having beenput
by the run-terraform job.The run-terraform job creates the file in
content-store-terraform-outputs
by running:There is one tricky issue that I had to work around here though. By default, every
put
to the S3 resource creates a new version, even if the contents of the file is identical. That would mean redeploying content-store after every single run of terraform, even though most of the time that's not needed.I've worked around this by passing
content-store-terraform-outputs
in to run-terraform, and then comparing the new terraform outputs with the old ones. If the outputs are exactly the same, then I remove the updated version to ensure that theput
step doesn't succeed in creating a new version. Theput
step is in atry
block, so this can fail without failing the whole run-terraform job.To understand how the JSON in this terraform output gets created in the first place, we need to think back to Act I.
Exit: Concourse
Exit: the AWS CLI
Act I: In which terraform declares how to deploy the app
Enter stage left: Terraform
To set things up for the thrilling "Act II", we need our terraform deployment to generate some JSON which is compatible with the AWS CLI's
register-task-definition
command.I've added this in modules/app/task_definition.tf. Note that this doesn't actually create a task definition in AWS, it just templates out the data structure, and provides it as an output.
Because I've added this functionality to the
app
module which is used everywhere, I've had to provide placeholder variables in from all of the apps. Even though these aren't all filled in correctly (e.g. the env vars are blank), this is harmless because the JSON isn't used anywhere for these apps.The only app that's wired up to generate the task definition JSON properly is content-store. You can see how this works in
deployments/govuk-publishing-platform/app_content_store.tf
. Many of the environment variables are the same for every GOV.UK app, so I've pulled these out into adefaults.tf
file. Other inputs come from remote state, or from secrets manager. These are also given their own files to keep things clear.Ultimately, the
govuk-publishing-platform
terraform deployment provides all the details needed for concourse to deploy the content-store (and draft-content-store) applications in one output, which is structured to make it convenient to access -outputs.tf
. This allows Concourse to usejq
in Act II to access the task definition for the appropriate variant (draft
orlive
).Exit, pursued by a bear
Epilogue
This has been an adventure, but it's not over yet! There are 8 more apps which need to be refactored to use this new-style deployment approach. Hopefully, this PR lays enough of the ground work that the rest of the refactoring should be much simpler.
For the sake of everyone's sanity, I'm going to split the remaining apps into separate trello cards.
Trello
https://trello.com/c/u36XaWOu/320-refactor-terraform-app-deployments