Skip to content

darron/kvexpress-demo

Repository files navigation

kvexpress demo

This demo was prepared for "Running Consul at Scale—Journey from RFC to Production" at SREcon 2016 in Santa Clara, CA.

Properly executed, it will spin up 3 Consul Servers and N number of Consul client nodes. These nodes will be setup with:

  1. Consul
  2. dnsmasq
  3. Datadog Agent
  4. kvexpress
  5. sifter
  6. goshe
  7. Consul Template

All nodes will be connected to the Consul servers and a dynamically generated hosts file will be created and passed to all nodes using kvexpress.

This has been tested with 500 to 2500 client nodes successfully.

IMPORTANT NOTE: If you execute this demo - it will cost you money. Please don't leave your 123 nodes running any longer than you need to.

Requirements

  1. Amazon Web Services account to run the VMs/nodes. This demo assumes that we'll be using EC2 Classic in US West 2.
  2. Datadog account to see the metrics generated. You can signup for a free trial here.
  3. Packer to build the AMIs.
  4. Terraform to deploy the cluster of nodes.
  5. direnv to help load some environment variables.

NOTE: You can build a VM on AWS to build and deploy the cluster from. Example script provided here

Instructions

  1. cp envrc .envrc - setup some AWS and Datadog specific API keys.
  2. cd terraform && cp variables.dist variables.tf - setup some Terraform configuration.
  3. make build - build and prepare the AMI with Packer.
  4. Update terraform/variables.tf with the AMI id.
  5. make cluster - build the entire cluster of nodes.
  6. Once the first 3 nodes are created - in order to activate the hosts creation process, log into one of those three nodes and run the commands located in hosts-activate.sh
  7. As all the nodes are coming online, you should start seeing kvexpress.in, kvexpress.out and other kvexpress related metrics flowing into Datadog. Some example metrics can be seen here.
  8. All dns queries to dnsmasq will generate goshe.dnsmasq.queries metrics through goshe.
  9. Some example metrics definitions are available here.
  10. Removing a node - just kill one or stop Consul on the node - will automatically update the dynamically generated hosts file located at /etc/hosts.consul. Run sudo service consul stop to stop Consul. Take a look at that file on all other nodes.
  11. Every time the file changes, a diff is sent to Datadog. Take a look at some example events.
  12. Make sure to destroy your cluster: make destroy. PLEASE NOTE: At larger cluster sizes - more than 200 nodes - Terraform may NOT be able to destroy your cluster cleanly because of AWS API errors. You may need to destroy your cluster using the web UI manually.