GOV.UK on Kubernetes
Kubernetes (or "k8s" if you hate typing) is the cool kid on the cloud computing block, so I decided to learn some of it by getting GOV.UK, a distributed system I know well, up and running in k8s.
Kubernetes is a container orchestration system but, in reality, GOV.UK is not running in containers. We're doing everything the old fashioned way: stuff installed onto VMs. Luckily we've produced a dockerised local development environment, govuk-docker, which I hope I'll be able to heavily borrow from.
- Kubernetes, for container orchestration.
- Concourse, for continuous integration.
- Terraform, to create the infrastructure underpinning k8s.
- NixOS, to configure the infrastructure underpinning k8s.
1. Deploy infrastructure
# Generate config file based on the comments cp config.template config nano config # Generate Terraform and NixOS configurations ./generate-config.sh # Conjure infrastructure out of thin air ./provisioning/create.sh
2. Add DNS records
./provisioning/create.sh script gives you a list of nameservers
to point your external domain name to. Add the NS records wherever
you manage your DNS.
If you missed the nameserver output, you can get them with
3. Deploy CI and build apps
After DNS has resolved and
ci.<external domain> works you can deploy
the Concourse configuration and trigger a build of all the apps:
4. Deploy live environment
Now deploy the "live" environment configuration:
You don't need to wait for Concourse to finish building the apps to do this, Kubernetes will retry downloading any images which aren't yet ready.
When everything is up and running, you will be able to access the
www-origin.live.web.<external domain> and
An app isn't working
Some useful commands to check the status of the apps are:
# List all pods ./util/kubectl.sh --namespace=live get pods # Give detailed information about a pod ./util/kubectl.sh --namespace=live describe pod <pod name> # Retrieve the logs of a pod and follow updates ./util/kubectl.sh --namespace=live logs -f <pod name>
I get 502 Bad Gateway errors
Some things to check are:
- Are the pods receiving the request but hitting an error?
- Do the caddy logs on the
webmachine show any problems?
- Can internal domain names, like
finder-frontend.live.in-cluster.govuk-k8s.testbe resolved from the
- Do the ALBs exist in the AWS Console? Do they have healthy instances?
- Do the Route53 records exist in the AWS Console? Do they point to the right ALBs?
Note that it can take a few minutes for the web server to first resolve the new internal domains.