Contriboard Application and Server Monitoring System

Janne Alatalo edited this page Jul 21, 2015 · 1 revision

Contriboard Application and Server Monitoring System

Following figure shows how jarmo is used as an application performance monitor in contriboard monitoring setup.

Contriboard Monitoring System

Jarmo

Jarmo is a simple StatsD-like application monitoring tool. The basic component of Jarmo is the server, which listens to JSON encoded messages via UDP. Unlike StatsD, which is a major inspiration for Jarmo, Jarmo doesn't define any data types, but instead relies on the various configurable reporters to do the heavy lifting and parse the data to their liking.

These reporters define a set interface, which the server will flush data to as configured. As data is received, the server will add a timestamp if the received data does not already contain one.

As mentioned before, Jarmo can receive JSON encoded data via UDP. We have implemented a few integrations for Jarmo, in order to easily gather statistics from the Contriboard application.

In the Contriboard service, we use Jarmo to collect statistics from the various application components. For example we can collect response time, status and any errors from the Contriboard API component, and from the IO component we can collect stuff like connection durations and amount of connections.

Influxdb

Influxdb is an open source time series database. It is ideal for collecting metrics and other time related data. It is very young project. The first production ready version was published in summer 2015. Influxdb is written in Go and has no external dependencies. The software is easy to install from deb and rpm packages and it is even available from Homebrew. Influxdb has a very SQL-like querying language that is easy to learn.

In contriboard monitoring system, Influxdb is used as a data storage. Influxdb has a very simple rest api that allows applications to write data to database using HTTP methods. The data can be written using line protocol or json, although apparently json format is deprecated and should not be used anymore.

Both application performance and server metrics data go to Influxdb. When all the data is in the same place it is easier to handle and visualize. Influxdb has a very convenient schema design where metrics meta data is encoded in tags. This means that multiple hosts can push their metrics to the same serie using different tag values. The querying of data is easy when the data can be filtered using where clause. For example, if two servers publish their CPU metrics to cpu_metrics serie using host tags. Querying host1 CPU-data can be done by running SELECT value FROM "cpu_metrics" WHERE host = 'host1'. This is nice because if you want the cpu_metrics from both of the hosts with one query, you can just left the where clause out.

Diamond

Diamond is a python daemon that collects system metrics in contriboard monitoring system setup. The daemon sends the data to influxdb using Influxdb handler. Diamond has many useful stats collectors that come with the installation. The collectors collect the data from the server and the handlers send the data forward. In the contriboard monitoring setup, Diamond collects CPU, disk, network and memory data.

Diamond is not the only option for system metrics collection. The creators of Influxdb published their own diamond-like metrics collector called Telegraf. Also Mozilla-Heka can be used as a metrics collector although it has no ready made collectors for metrics collecting and might need more configuration than the other two. Both of these are potential replacers for diamond.

grafana

Grafana is an open source data visualizer that works with Influxdb. Grafana supports Influxdb 0.9 since version 2.0. The newer version also has its own backend and supports authentication. This simplifies the installation process and makes it suitable for cloud environments. Before version 2.0, grafana had to be served with external web server and authentication had to be implemented with some kind of authentication proxy hack. Following figures show the grafana web-interface showing different metrics from Influxdb.

Grafana contriboard Grafana server

Grafana allows users to make dashboard templates. This makes it easy to make dynamic dashboards. In contriboard environment servers come and go. New versions of contriboard components are always deployed to new server instances using ansible. Also the testing environments are temporary. It would be a very hard work to always create new dashboards if something changes in the contriboard environment. Grafana templates allow users to define magic variables that are substituted when making queries. The following figures show how dynamic dashboards are created for contriboard monitoring.

Grafana templating Grafana templating

You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.