Cloud.gov Provisioning System
This repository holds the terraform configuration (and BOSH vars and ops-files) to bootstrap our infrastructure.
Be sure to read the internal developer documentation ("cg-provision") for non-public information about using this repository.
Our Terraform code is organized by two concepts, with two corresponding directories.
- Modules are reusable units of terraform code. Each module describes a useful concept in our infrastructure. A module can be reused in different contexts by declaring variables that the caller must pass in. Modules are located in
- Two important modules are
- Read more about Terraform modules: https://developer.hashicorp.com/terraform/language/modules
- Two important modules are
- Stacks combine and configure modules for use in an environment. They are also parameterized by variables, with values such as the environment name. Stacks are located in
As an example, if we wanted to write terraform code to deploy several CloudFront distributions in front of three load balancers in an environment, we could:
- Create a
cloudfrontmodule that declares a CloudFront distribution, a Shield Advanced resource to protect it, and an Access Control List (ACL) association between the distribution and an ACL. (The ACL itself is not declared in the module.) It could take an origin, a list of domains, and an ACL ARN as variables.
- Create a
cloudfrontstack uses the
cloudfrontmodule three times, once for each load balancer in the environment. It would pass an external domain and load balancer domain to each module. It could also declare a single ACL for the environment and pass its ARN to each
cloudfrontmodule. The stack could take an environment name as a variable.
- Add a job to Concourse for each environment. Each job would deploy the
cloudfrontstack and pass the environment name as a variable.
In the future, we would like to add a third concept: An entire runtime environment. An environment would combine multiple stacks to represent the entire cloud.gov runtime stack. This collection of resources could be deployed as a single unit to a new AWS region or multiple times in the same region.
main stack is a template that is used to provision the production,
staging, and development "environments."
regionalmasterbosh stack contains our masterbosh for a given region, which deploys the tooling BOSH for that region.
The tooling BOSH then deploys the BOSH directors in the main stacks across all accounts in that region.
tooling stack is the same as the
regionalmasterbosh stack, but has some extras from before we started going multi-region
- concourse and staging concourse
- buckets that we need only one of across all accounts and regions
- some things that really should be in child environment accounts
- nessus In the future, we should work towards disentagling these pieces out, so the old tooling is deployed as a regionalmasterbosh and the other stuff is its own stack(s)
dns stacks are both outside of GovCloud (commercial AWS).
Wiring up users
As mentioned above, we have four categories of environment:
main- this is the thing we're actually after. It's the pieces that directly support the platform components. There should be several of these across multiple AWS accounts
tooling- this is used to support the things in the
mainplatform - our CI system, managment tools such as Nessus, etc.
external- this manages some things that don't (or historically didn't) exist in govcloud (really just cloudfront and the users, etc, to support it). There's one of these per
dns- this manages route53. There's exactly one of these, although we really should split it out to one per
main+ one for
To allow the
tooling environment to manage the
main environment, there's a
tooling-terraform role associated with each
main environment, which has an
assumerole policy allowing access by concourse workers in the
To add a new
main environment, see the README here
bosh directory contains vars and opsfiles for use by the BOSH directors.
How Concourse Creates AWS Resources
The Concourse worker VMs must have AWS access to create and apply Terraform plans. How they are given that access depends on the partition being changed.
You can determine how a failing Concourse container is configured by hijacking it. Connect to the container (see
fly hijack --help) and run
aws configure list to see the current configuration.
The Concourse worker VMs are associated with an IAM role with read-write access to GovCloud resources. The AWS SDK in the Concourse containers is automatically configured to fetch credentials from the Instance metadata service. No further configuration is necessary - note that no access keys are passed to GovCloud jobs in pipeline.yml.
AWS IAM roles documentation: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html
Each Concourse job that manages AWS Commercial resources must override the Concourse worker's IAM role. The jobs set the
AWS_DEFAULT_REGION environment variables to do this. Environment variables have higher precedence in the AWS SDK, so they are used instead of the IAM role. No further configuration in Terraform is necessary.
DNS and CloudFront Stacks
The DNS stack is a special case because it must read state from GovCloud but read and write resources and state to Commercial. AWS IAM users cannot have cross-partition permissions, so the job must use two separate AWS accounts (one for each partition).
To achieve this, the Concourse jobs pass an access key to a Commercial IAM user as a TF_VAR instead of using the standard
AWS_* environment variables. (Setting
AWS_ variables would make the AWS SDK use them by default, and we want it to continue using the GovCloud IAM role by default.)
The IAM role and
TF_VAR_ credentials are used as follows:
terraform initcommand is run with the Commercial credentials using this script. This configures the s3 backend for the DNS stack to be set up in the Commercial account.
- The terraform provider is configured with the Commercial credentials, ensuring that all resources will be created in the Commercial account.
terraform_remote_statedata blocks for each GovCloud s3 state object are configured with the GovCloud region. Because they are accessed using Terraform's initialization process, but separately from the initial
terraform init, they are not passed the Commercial credentials. Without any credentials set explicitly, the AWS SDK uses the GovCloud IAM role.
Since IaaS is a shared resource (we don't have the money or time to provision
entire stacks for each developer), we never apply this configuration manually.
Instead, all execution is done through the Concourse pipeline, which is
configured to first run
terraform plan, and then wait for manual triggering
If you want to make infrastructure changes:
- Create a branch and pull-request with your changes and ask for review and merge from a teammate.
- Once the teammate 👍 the changes, head over to the Concourse pipeline and review the resultant Terraform plan output.
- If the plan looks like what you intended, then manually trigger the appropriate apply jobs.
Other Points of Note
You may see
aws_key_id_prev as outputs for our
modules. These are used for cred
modules/stack/base and some of the VPC
modules. It's not entirely clear why, and why the VPC modules weren't simply