This demo was prepared for "Running Consul at Scale—Journey from RFC to Production" at SREcon 2016 in Santa Clara, CA.
Properly executed, it will spin up 3 Consul Servers and N number of Consul client nodes. These nodes will be setup with:
All nodes will be connected to the Consul servers and a dynamically generated hosts file will be created and passed to all nodes using kvexpress.
This has been tested with 500 to 2500 client nodes successfully.
IMPORTANT NOTE: If you execute this demo - it will cost you money. Please don't leave your 123 nodes running any longer than you need to.
- Amazon Web Services account to run the VMs/nodes. This demo assumes that we'll be using EC2 Classic in US West 2.
- Datadog account to see the metrics generated. You can signup for a free trial here.
- Packer to build the AMIs.
- Terraform to deploy the cluster of nodes.
- direnv to help load some environment variables.
NOTE: You can build a VM on AWS to build and deploy the cluster from. Example script provided here
cp envrc .envrc
- setup some AWS and Datadog specific API keys.cd terraform && cp variables.dist variables.tf
- setup some Terraform configuration.make build
- build and prepare the AMI with Packer.- Update
terraform/variables.tf
with the AMI id. make cluster
- build the entire cluster of nodes.- Once the first 3 nodes are created - in order to activate the hosts creation process, log into one of those three nodes and run the commands located in hosts-activate.sh
- As all the nodes are coming online, you should start seeing
kvexpress.in
,kvexpress.out
and other kvexpress related metrics flowing into Datadog. Some example metrics can be seen here. - All dns queries to dnsmasq will generate
goshe.dnsmasq.queries
metrics through goshe. - Some example metrics definitions are available here.
- Removing a node - just kill one or stop Consul on the node - will automatically update the dynamically generated hosts file located at
/etc/hosts.consul
. Runsudo service consul stop
to stop Consul. Take a look at that file on all other nodes. - Every time the file changes, a diff is sent to Datadog. Take a look at some example events.
- Make sure to destroy your cluster:
make destroy
. PLEASE NOTE: At larger cluster sizes - more than 200 nodes - Terraform may NOT be able to destroy your cluster cleanly because of AWS API errors. You may need to destroy your cluster using the web UI manually.