- Create Bind9 DNS environment
- Create Consul cluster used for service discovery
- Create Prometheus Server cluster
- Create Prometheus AlertManager service on defined nodes
- Create Prometheus NodeExporter service on defined nodes
- Create Prometheus Rules/Alerts
- Create Third Party Exporter
- Create Grafana service
- Create Haproxy load balancer
- Create Consul Template service
GOAL: Install Bind9 DNS master into the private network to resolve hostname A records and PTR records and forward Consul DNS requests.
GOAL: Install a 3 mode Consul cluster for service discovery and DNS resolution.
GOAL: Install a HA Prometheus cluster for monitoring, metrics, alerting, and integrate 3rd party exporters, like consul forwarding
GOAL: Install Haproxy to verify consul template functionality. TODO: update Haproxy version from 1.5 to 1.7
The Vagrantfile reads the ./ansible/hosts.yaml
to create the machines. This also serves as the inventory file for the plays listed below. You might want to use the great plugin Vagrant HostsUpdater to update your hypervisor's hostsfile:
vagrant plugin install vagrant-hostsupdater
Spin up the infrastructure:
vagrant up
This will, by default, create:
- core1.lan - bind9 DNS server, Consul Client
- prometheus1.lan - Prometheus server, Prometheus AlertManager, Grafana server
- prometheus2.lan - Prometheus server, Prometheus AlertManager, Grafana server
- consul1.lan - Consul Server, Consul Client, Prometheus node exporter
- consul2.lan - Consul Server, Consul Client, Prometheus node exporter
- consul3.lan - Consul Server, Consul Client, Prometheus node exporter
- client1.lan - Consul Client, Prometheus node exporter, Consul Template service with Haproxy cfg template
The ansible playbooks provision the nodes listed above. The Ansible roles have been created to work on Ubuntu 14 and Ubuntu 16 nodes, but the VMs have different network interface names, as well as Vagrant boxes.
Checkout branch ubuntu16
to use Ubuntu Xenial hosts.
Run all playbooks in order:
cd ansible
./run_playbooks.sh
Or manually run to playbooks:
cd ansible
ansible-playbook provision_bind9_servers.yaml -i inventory.py -u vagrant -k -b
ansible-playbook provision_prometheus_servers.yaml -i inventory.py -u vagrant -k -b
ansible-playbook provision_prometheus_alertmanager_servers.yaml -i inventory.py -u vagrant -k -b
ansible-playbook provision_prometheus_node_exporter_servers.yaml -i inventory.py -u vagrant -k -b
ansible-playbook provision_prometheus_consul_exporter_servers.yaml -i inventory.py -u vagrant -k -b
ansible-playbook provision_consul_servers.yaml inventory.py -i inventory.py -u vagrant -k -b
ansible-playbook provision_consul_client_servers.yaml -i inventory.py -u vagrant -k -b
ansible-playbook provision_consul_template_servers.yaml -i inventory.py -u vagrant -k -b
Check Forward DNS with nslookup:
DNS lookup the node-exporter service:
root@client1:~# nslookup prometheus.service.consul
Server: 172.136.1.11
Address: 172.136.1.11#53
Non-authoritative answer:
Name: prometheus.service.consul
Address: 172.136.2.12
Name: prometheus.service.consul
Address: 172.136.2.13
Name: prometheus.service.consul
Address: 172.136.2.11
DNS lookup the bind service:
root@client1:~# nslookup bind.service.consul
Server: 172.136.1.11
Address: 172.136.1.11#53
Non-authoritative answer:
Name: bind.service.consul
Address: 172.136.1.11
DNS lookup the grafana service:
root@client1:~# nslookup grafana.service.consul
Server: 172.136.1.11
Address: 172.136.1.11#53
Non-authoritative answer:
Name: grafana.service.consul
Address: 172.136.4.11
Name: grafana.service.consul
Address: 172.136.4.12
Check reverse DNS with dig
DNS Reverse lookup a node ip address
root@client1:~# dig -x 172.136.2.11
; <<>> DiG 9.9.5-3ubuntu0.14-Ubuntu <<>> -x 172.136.2.11
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17545
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 2, ADDITIONAL: 3
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;11.2.136.172.in-addr.arpa. IN PTR
;; ANSWER SECTION:
11.2.136.172.in-addr.arpa. 604800 IN PTR consul1.lan.
;; AUTHORITY SECTION:
2.136.172.in-addr.arpa. 604800 IN NS core2.lan.
2.136.172.in-addr.arpa. 604800 IN NS core1.lan.
;; ADDITIONAL SECTION:
core1.lan. 604800 IN A 172.136.1.11
core2.lan. 604800 IN A 172.136.1.12
;; Query time: 4 msec
;; SERVER: 172.136.1.11#53(172.136.1.11)
;; WHEN: Tue Sep 12 03:17:20 UTC 2017
;; MSG SIZE rcvd: 151
Prometheus Server UIs:
Grafana UIs:
Additionally, Create a new prometheus datasource in Grafana. TODO The load balanced Grafana endpoint needs session stickiness to handle authentication persistence
- Grafana UI on prometheus1.lan admin:admin
- Grafana UI on prometheus2.lan admin:admin
- Load Balanced Grafana UI (Load balanced)
Consul UIs:
Several Consul services are being used. TCP health checks and a simple script check on the DNS bind service.
Haproxy Admin:
To illustrate the usage of consul template, Grafana and the Prometheus admin are load balanced:
- Haproxy Admin UI admin:adminpw