Skip to content

Core services healthchecks

Nicolas Degory edited this page Sep 8, 2017 · 11 revisions

This page lists the checks that should be run to validate the deployment of the core services. curl checks should be run from a container on the overlay network shared by the tested service. logs checks can be run from a manager node. The command should be run after a stabilization delay.

This is implemented in the *.test.yml of the cluster/agent/stacks stack files (part of the agent image).

order service depends on context test command where (node label) delay (sec) timeout (sec)
1 elasticsearch single mode cluster health should be yellow curl -sf elasticsearch:9200/_cluster/health?wait_for_status=yellow&timeout=15s core 10 30
1 cluster mode cluster health should be green curl -sf elasticsearch:9200/_cluster/health?wait_for_status=green&timeout=30s core 15 45
2 etcd * endpoint health etcdctl --endpoints "http://etcd:2379" endpoint health | grep -qw healthy core 10 30
3 nats * api availability curl -sf "nats:8222/subsz" core 5 20
4 ampbeat nats, elasticsearch docker service logs amp_ampbeat 2>&1 grep -q "INFO ampbeat is running" manager 5 20
5 kibana elastisearch, ampbeat UI availability curl -sf "kibana:5601/app/kibana#/discover" core 10 30
6 node_exporter metrics URL curl -sf "node-exporter:9100/metrics" metrics 5 20
7 nats_exporter nats metrics URL curl -sf "nats-exporter:7777/metrics" metrics 5 20
8 haproxy_exporter haproxy metrics URL curl -sf "haproxy-exporter:9101/metrics" metrics 5 20
9 prometheus node_exporter, nats_exporter, haproxy_exporter status URL curl -sf "prometheus:9090/status" metrics 8 30
10 alertmanager prometheus metrics URL curl -sf "alertmanager:9093/metrics" metrics 5 20
11 grafana org info curl -sf "grafana:3000/api/org" metrics 10 30
12 proxy stats curl -sf "http://stats:stats@proxy:1936/haproxy\?stats\;csv" route 5 20
13 amplifier nats, elasticsearch, etcd metrics URL curl -sf "amplifier:5100/metrics" core 6 30
14 gateway amplifier docker service logs amp_gateway 2>&1 grep -q "gateway successfully initialized" manager 5 30
15 agent nats docker service logs amp_agent 2>&1 grep -q "Connected to nats streaming successfully" manager 5 30
16 portal gateway home page curl -sf "http://portal/" core 5 30