This demo was prepared for "Running Consul at Scale—Journey from RFC to Production" at SREcon 2016 in Santa Clara, CA.
Properly executed, it will spin up 3 Consul Servers and N number of Consul client nodes. These nodes will be setup with:
All nodes will be connected to the Consul servers and a dynamically generated hosts file will be created and passed to all nodes using kvexpress.
This has been tested with 500 to 2500 client nodes successfully.
IMPORTANT NOTE: If you execute this demo - it will cost you money. Please don't leave your 123 nodes running any longer than you need to.
- Amazon Web Services account to run the VMs/nodes. This demo assumes that we'll be using EC2 Classic in US West 2.
- Datadog account to see the metrics generated. You can signup for a free trial here.
- Packer to build the AMIs.
- Terraform to deploy the cluster of nodes.
- direnv to help load some environment variables.
NOTE: You can build a VM on AWS to build and deploy the cluster from. Example script provided here
cp envrc .envrc- setup some AWS and Datadog specific API keys.
cd terraform && cp variables.dist variables.tf- setup some Terraform configuration.
make build- build and prepare the AMI with Packer.
terraform/variables.tfwith the AMI id.
make cluster- build the entire cluster of nodes.
- Once the first 3 nodes are created - in order to activate the hosts creation process, log into one of those three nodes and run the commands located in hosts-activate.sh
- As all the nodes are coming online, you should start seeing
kvexpress.outand other kvexpress related metrics flowing into Datadog. Some example metrics can be seen here.
- All dns queries to dnsmasq will generate
goshe.dnsmasq.queriesmetrics through goshe.
- Some example metrics definitions are available here.
- Removing a node - just kill one or stop Consul on the node - will automatically update the dynamically generated hosts file located at
sudo service consul stopto stop Consul. Take a look at that file on all other nodes.
- Every time the file changes, a diff is sent to Datadog. Take a look at some example events.
- Make sure to destroy your cluster:
make destroy. PLEASE NOTE: At larger cluster sizes - more than 200 nodes - Terraform may NOT be able to destroy your cluster cleanly because of AWS API errors. You may need to destroy your cluster using the web UI manually.