Monitoring with Prometheus
Prometheus is an open source monitoring and alerting toolkit.
In addition to Prometheus, we'll use:
- Consul - for automatic service discovery
- Grafana - for data visualization
Data Model: Prometheus fundamentaly stores data as time services, streams of timestamped values.
Exporters A piece of software that fetches metrics from a given system in a Prometheus server.
You can use ssh, but it's a pain. So HTTP is preferred. But you would only want the exporters to listen on the loopack interface.
https://gitlab.com/jessp01/oscon-prometheus-session
You can take a look at the DockerFile to see how she is installing and running all the software: https://gitlab.com/jessp01/oscon-prometheus-session/blob/master/docker/Dockerfile
Consul - Installed on each host so they can all register with the discovery service.
She seems to be using packages from her company (kaltura.org) for all of this.
You can find a bunch of community exporters here:
https://prometheus.io/docs/instrumenting/exporters/
On each host, you install Consul, and you install exporters for each service you want to monitor (apache, memcache, etc). Then the Prom Server will go out and ask each of those exporters for data peridoically and store it as timeseries data.
You can write Exporters for custom software or ones that don't exist. You can write them in Java, Go, Python, Node, etc. She thinks the Go library is the best maintained.
There is a Spring Exporter:
https://reflectoring.io/monitoring-spring-boot-with-prometheus/
oscon@52.87.250.122 passwd: oscon2018
52.207.189.25 - prom server
Follow this readme: https://gitlab.com/jessp01/oscon-prometheus-session/tree/master/docker
Then:
[root@8dc1daa17088 /]# vi /etc/sysconfig/kaltura-prometheus
Change the IP it's binding to from 127.0.0.1 to 0.0.0.0 so it will listen on all ports.
Then restart prometheus:
[root@8dc1daa17088 /]# /etc/init.d/kaltura-prometheus restart
Now you should be able to load the user interface: http://localhost:9090/
Start Grafana:
[root@8dc1daa17088 /]# /etc/init.d/grafana-server status
grafana-server is stopped
[root@8dc1daa17088 /]# /etc/init.d/grafana-server start
Starting Grafana Server: ... [ OK ]
[root@8dc1daa17088 /]# /etc/init.d/grafana-server status
grafana-server (pid 1250) is running...
Load the Grafana UI:
http://localhost:3000/ admin/admin
Now look under http://localhost:9090 under Status -> Targets and you'll see just one. You can load that: http://localhost:9090/metrics to see the out put of the Proms exporter.
On the 9090/graph interface, you can query for mertics:
http_request_duration_microseconds
click on graph. Neat.
Go into Grafana and add a Data Source: Choose the prometheus type.
Then go to Dashboards. Add the Prometheus and Grafana templates
THen you can go to Dashboards -> Home. At the top, change home to the Prometheus one.
https://prometheus.io/docs/alerting/overview/
The Alert manager will take care of De-duplicating, grouping, routing them to the correct receiver integration such as email, slack, PagerDuty, OpsGenie, etc.
Grafana has alerting, but it's not as powerful as the Prometheus AlertManager
It's not running.
Look at the config:
/opt/kaltura/prometheus/alertmanager/etc/alertmanager.yml
The rules for Alerting are managed in Prom, not AlertManager. They are configured in here:
/opt/kaltura/prometheus/etc/prometheus.yml
and the actual rules in : /opt/kaltura/prometheus/etc/*
They are in numbered order (like Start/Kill scripts)
She uses Monit as a WatchDog. It will restart services if they die. She used https://www.mailinator.com/ as a public throw away email address to test alerting emails so we could all log in.
- "That will be mighty hard to read for a tit"
- "Jolly Good"
- "Bloody Hell...Bloody isn't a cuss word, so it's ok to say at a conference"
ACES Learn to Code
- Git, GitHub GH-Pages
- Ozone Platform Developer Setup
- HTML, JavaScript, CSS
- Tomcat Web Server Setup
- A Simple Node.js App
- Spark with Docker
- Best Practices for Software Development
Other Tutorials
Conferences
- 2018 - DevOps Days Baltimore
- 2018 DevOps Days Baltimore, Part 2
- DevOpsDays---Baltimore
- Cross-Domain-Technical-Forum
- 2017 Potential Conferences
- LAS December 5th 2016
- DI2E Plugfest 2016
- OSCON 2015
- RWX-2015
- SpringOne-2017
- OSCON-2018
- DinosaurJS 2018
Training
- Developing on AWS
- Agile Team Facilitation
- Amazon AWS Big Data Solutions Day
- Cloudera Developer Training for Spark and Hadoop May 2016