Skip to content
Dave Carroll edited this page Feb 20, 2019 · 45 revisions

ganglia metric Useful Ganglia Topics


Ganglia is an open source trending platform that is highly customizable and adaptable to any environment. There are plenty of resources on the web covering how to set up and work with Ganglia so I won't cover as much of that here but I do share some interesting ways I use Ganglia to help my team monitor systems.

The side menu provides a list of topics including a few of the many metric modules I have written as well as some add-on's I have done to improve performance and a section on how to make Ganglia do something it really wasn't designed to do - Dynamic Metrics. Ganglia is a trending platform helping administrators and engineers review metrics over time but this platform can also be useful for something closer to "real-time" metrics vs. trending for short-term snapshot information. I provide some examples here as well.

Another fun add-on I did was to expose my metrics via RESTFUL API so that our Nagios platform could easily call for metric values and act upon those. Scripts, programs and other integrations also make use of these metrics.

TrendWatcher is a containerized app I wrote to analyze .rrd database files over time periods looking for anomalies as well as concerns. For example, my devops team may have Nagios configured to warn when a database disk hits an 80% full threshold, but when you are dealing with TB's of data, that may not be enough time to respond. TrendWatcher monitors every 30 minutes comparing disk growth to the past 24 hours, last week and last month. If it detects abnormal growth rates, it alerts on a console dashboard the team monitors. We use this for connection rates, memory, disk space - just about anything you can think of to provide an earlier response time.

Feel free to browse around and borrow anything you find to be useful.