Prometheus for Website monitoring

Simple example of using prometheus to track website uptime.

Prometheus is built in a modular, "microservice" like way.

This example runs some small docker containers, using docker-compose to wire them together. First, the "real" parts of the stack:

The prometheus engine itself: Manages the state of all monitorables (in this case, the list of domains we care about monitoring)
A process called blackbox-exporter which prometheus polls to actually execute the health checks
An Alertmanager, which handles sending and managing state for alerts.

Then there are 3 small app containers that provide a simulation framework:

alertlogger: Handles webhook-based alerts from Alertmanager and logs them to a file (data/alertlogger/alerts.log)
flakyhost.com: A web server configured to intermittently fail then come back, so we can see the down/up alerting
reliablehost.com: A web server which (tries) to always be reliable

To play with this, if you want to also probe some real sites, you can edit the config/blackbox_target.yml file and add actual domains as well.

Then, make sure you have docker-compose (and docker) installed and run

>>> This builds the containers for the simulation framework
$ docker-compose build

>>> start all the containers. Run without the `-d` if you want to see container logs.
$ docker-compose up -d

>>> keep an eye on the logs coming out over the alertmanager
$ tail -f data/alertlogger/alerts.log

Then go to http://localhost:9090/alerts in your browser to see what, if any hosts are alerting.

2016/11/19 15:30:57 Request from 172.18.0.6:54166: POST /
2016/11/19 15:30:57 {"receiver":"default-receiver","status":"resolved","alerts":[{"status":"resolved","labels":{"alertname":"SiteDown","instance":"flakyhost.com","job":"blackbox"},"annotations":{"description":"site down: flakyhost.com","summary":"site down: flakyhost.com"},"startsAt":"2016-11-19T15:28:27.818Z","endsAt":"2016-11-19T15:29:27.818Z","generatorURL":"http://b873f429a190:9090/graph?g0.expr=probe_success+%3C+1\u0026g0.tab=0"}],"groupLabels":{"alertname":"SiteDown"},"commonLabels":{"alertname":"SiteDown","instance":"flakyhost.com","job":"blackbox"},"commonAnnotations":{"description":"site down: flakyhost.com","summary":"site down: flakyhost.com"},"externalURL":"http://438350b8d0ba:9093","version":"3","groupKey":15335440397915075285}
2016/11/19 15:30:57 site down: flakyhost.com
2016/11/19 15:30:57 Status: resolved


2016/11/19 15:31:57 Request from 172.18.0.6:54216: POST /
2016/11/19 15:31:57 {"receiver":"default-receiver","status":"firing","alerts":[{"status":"firing","labels":{"alertname":"SiteDown","instance":"flakyhost.com","job":"blackbox"},"annotations":{"description":"site down: flakyhost.com","summary":"site down: flakyhost.com"},"startsAt":"2016-11-19T15:31:27.818Z","endsAt":"0001-01-01T00:00:00Z","generatorURL":"http://b873f429a190:9090/graph?g0.expr=probe_success+%3C+1\u0026g0.tab=0"}],"groupLabels":{"alertname":"SiteDown"},"commonLabels":{"alertname":"SiteDown","instance":"flakyhost.com","job":"blackbox"},"commonAnnotations":{"description":"site down: flakyhost.com","summary":"site down: flakyhost.com"},"externalURL":"http://438350b8d0ba:9093","version":"3","groupKey":15335440397915075285}
2016/11/19 15:31:57 site down: flakyhost.com
2016/11/19 15:31:57 Status: firing

You can also see the other metrics that are tracked.

Go to http://localhost:9090/graph
Type probe_ then another name (probe_duration_seconds is an interesting one to see performance over time.)

These metrics could easily be added to a Grafana dashboard, as it has excellent Prometheus support.

For production use:

Prometheus and the blackbox exporter can be run in multiple hosts (and/or multiple data centers)
Alert manager can be run highly availably (they communicate with each other over a mesh protocol to block duplicate alerts)
You can run Grafana or other dashboards and see other information (like response time, etc)
Instead of a static config/blackbox_targets.yml, a second container could be run to programatically fetch those lists from an external source, such as a database or external API, and update the file. (The contents are dynamically reloaded within 30 seconds as needed.)
Other types of probes (beyond HTTP) can be configured, the blackbox_exporter is hugely versatile.

For full documentation see

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
alertlogger		alertlogger
config		config
flakyhost		flakyhost
.gitignore		.gitignore
LICENSE.md		LICENSE.md
PrometheusGraph.png		PrometheusGraph.png
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

alertlogger

alertlogger

config

config

flakyhost

flakyhost

.gitignore

.gitignore

LICENSE.md

LICENSE.md

PrometheusGraph.png

PrometheusGraph.png

README.md

README.md

docker-compose.yml

docker-compose.yml

Repository files navigation

Prometheus for Website monitoring

About

Releases

Packages

Languages

License

jbarratt/prometheus_sitemon

Folders and files

Latest commit

History

Repository files navigation

Prometheus for Website monitoring

About

Resources

License

Stars

Watchers

Forks

Languages