Skip to content
This repository has been archived by the owner on Feb 27, 2020. It is now read-only.

Commit

Permalink
Merge pull request #75 from Metaswitch/snmp
Browse files Browse the repository at this point in the history
[Reviewer: Ellie] Beef up SNMP docs
  • Loading branch information
rkd-msw committed Jul 3, 2015
2 parents 165b55a + af4e3c2 commit 39b0cdb
Show file tree
Hide file tree
Showing 2 changed files with 77 additions and 61 deletions.
56 changes: 19 additions & 37 deletions docs/Cacti.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ This document describes how to
- point it at your Clearwater nodes
- view graphs of statistics from your Clearwater nodes.

### Setting up a Cacti node
### Setting up a Cacti node (automated)

Assuming you've followed the [Automated Chef install](Automated_Install.md),
here are the steps to create and configure a Cacti node:
Expand All @@ -23,10 +23,21 @@ here are the steps to create and configure a Cacti node:
<name> cacti`
2. set up a DNS entry for it - `knife dns record create -E <name>
cacti -z <root> -T A --public cacti -p <name>`
3. point your web browser at `cacti.<name>.<root>/cacti/` (you may need to wait for the DNS entry to propagate before this step works)
4. accept all the configuration defaults
5. login (admin/admin) and set a new password
6. modify configuration by
3. create graphs for all existing nodes by running `knife cacti update -E <name>`
4. point your web browser at `cacti.<name>.<root>/cacti/` (you may need to wait for the DNS entry to propagate before this step works)

If you subsequently scale up, running `knife cacti update -E <name>` again will create graphs for
the new nodes (without affecting the existing nodes).

### Setting up a Cacti node (manual)

If you haven't followed our automated install process, and instead just have an Ubuntu 14.04 machine
you want to use to monitor Clearwater, then:

1. install Cacti on a node by running `sudo apt-get install cacti cacti-spine`
2. accept all the configuration defaults
3. login (admin/admin) and set a new password
4. modify configuration by
1. going to Devices and deleting "localhost"
2. going to Settings-\>Poller and set "Poller Type" to "spine" and
"Poller Interval" to "Every Minute" - then click Save at the bottom of the page
Expand All @@ -35,7 +46,7 @@ here are the steps to create and configure a Cacti node:
"Step" to 60
4. going to Graph Templates and change "ucd/net - CPU Usage" to
disable "Auto Scale"
5. going to Import Templates and import (in the following order) the attached XML files [cacti\_client\_count.xml](sample_files/cacti_client_count.xml), [cacti\_sip\_stress\_status.xml](sample_files/cacti_sip_stress_status.xml), [cacti\_bono\_latency.xml](sample_files/cacti_bono_latency.xml) and [cacti\_sprout\_latency.xml](sample_files/cacti_sprout_latency.xml) - these define new data input methods and graph templates for retrieving statistics from our components via [0MQ](http://www.zeromq.org/). For each template you import, select "Select your RRA settings below" and "Hourly (1 Minute Average)"
5. going to Import Templates and import the [Sprout](https://github.com/Metaswitch/chef/blob/master/cookbooks/clearwater/files/default/cacti/templates/cacti_host_template_sprout.xml), [Bono](https://github.com/Metaswitch/chef/blob/master/cookbooks/clearwater/files/default/cacti/templates/cacti_host_template_bono.xml) and [SIPp](https://github.com/Metaswitch/chef/blob/master/cookbooks/clearwater/files/default/cacti/templates/cacti_host_template_sipp.xml) host templates - these define host templates for retrieving statistics from our components via SNMP and [0MQ](http://www.zeromq.org/). For each template you import, select "Select your RRA settings below" and "Hourly (1 Minute Average)"

6. set up the 0MQ-querying script by
1. ssh-ing into the cacti node
Expand All @@ -47,7 +58,7 @@ here are the steps to create and configure a Cacti node:
cd cpp-common/scripts/stats
sudo bundle install

### Pointing Cacti at a Node
#### Pointing Cacti at a Node

Before you point Cacti at a node, make sure the node has the required
packages installed. All nodes need clearwater-snmpd installed (`sudo
Expand All @@ -58,7 +69,7 @@ clearwater-sip-stress-stats`).
To manually point Cacti at a new node,

1. go to Devices and Add a new node, giving a Description and Hostname,
setting a Host Template of "ucd/net SNMP host", Downed Device
setting a Host Template of "Bono", "Sprout" or "SIPp" depending on the node type), Downed Device
Detection to "SNMP Uptime" and SNMP Community to "clearwater"
2. click "Create Graphs for this Host" and select the graphs that you
want - "ucd/net - CPU Usage" is a safe bet, but you might also want
Expand All @@ -70,35 +81,6 @@ To manually point Cacti at a new node,
page (although it may take a couple of minutes for them to
accumulate enough state to render properly).

Alternatively, you can add nodes to Cacti based on chef configuration
using the following chunk of bash, run from the `~/chef` directory.

knife box list -E <name> | grep "Found node" | grep -v "cacti" | cut -d\ -f 3,8 | sort | while read description ip ; do
knife ssh -x ubuntu "role:cacti AND chef_environment:<name>" '
description='$description'
ip='$ip'
echo Configuring $description $ip...
host_id=$(sudo php -q /usr/share/cacti/cli/add_device.php --template=3 --community=clearwater --avail=snmp --description=$description --ip=$ip | tee -a /tmp/knife-ssh.cacti | grep Success | sed -e "s/\(^.*(\|).*$\)//g")
graph_id=$(sudo php -q /usr/share/cacti/cli/add_graphs.php --graph-type=cg --graph-template-id=4 --host-id=$host_id | tee -a /tmp/knife-ssh.cacti | grep "Graph Added" | sed -e "s/\(^[^)]*(\|).*$\)//g")
sudo php -q /usr/share/cacti/cli/add_tree.php --type=node --node-type=graph --tree-id=1 --graph-id=$graph_id >> /tmp/knife-ssh.cacti
if echo $description | grep -q bono ; then
graph_id=$(sudo php -q /usr/share/cacti/cli/add_graphs.php --graph-type=cg --graph-template-id=35 --host-id=$host_id | tee -a /tmp/knife-ssh.cacti | grep "Graph Added" | sed -e "s/\(^[^)]*(\|).*$\)//g")
sudo php -q /usr/share/cacti/cli/add_tree.php --type=node --node-type=graph --tree-id=1 --graph-id=$graph_id >> /tmp/knife-ssh.cacti
graph_id=$(sudo php -q /usr/share/cacti/cli/add_graphs.php --graph-type=cg --graph-template-id=37 --host-id=$host_id | tee -a /tmp/knife-ssh.cacti | grep "Graph Added" | sed -e "s/\(^[^)]*(\|).*$\)//g")
sudo php -q /usr/share/cacti/cli/add_tree.php --type=node --node-type=graph --tree-id=1 --graph-id=$graph_id >> /tmp/knife-ssh.cacti
fi
if echo $description | grep -q sipp ; then
graph_id=$(sudo php -q /usr/share/cacti/cli/add_graphs.php --graph-type=cg --graph-template-id=36 --host-id=$host_id | tee -a /tmp/knife-ssh.cacti | grep "Graph Added" | sed -e "s/\(^[^)]*(\|).*$\)//g")
sudo php -q /usr/share/cacti/cli/add_tree.php --type=node --node-type=graph --tree-id=1 --graph-id=$graph_id >> /tmp/knife-ssh.cacti
fi
if echo $description | egrep -q sprout ; then
graph_id=$(sudo php -q /usr/share/cacti/cli/add_graphs.php --graph-type=cg --graph-template-id=38 --host-id=$host_id | tee -a /tmp/knife-ssh.cacti | grep "Graph Added" | sed -e "s/\(^[^)]*(\|).*$\)//g")
sudo php -q /usr/share/cacti/cli/add_tree.php --type=node --node-type=graph --tree-id=1 --graph-id=$graph_id >> /tmp/knife-ssh.cacti
fi
cat /tmp/knife-ssh.cacti
rm /tmp/knife-ssh.cacti'
done

### Viewing Graphs

Graphs can be viewed on the top "graphs" tab. Useful features include
Expand Down
82 changes: 58 additions & 24 deletions docs/Clearwater_SNMP_Statistics.md
Original file line number Diff line number Diff line change
@@ -1,54 +1,88 @@
Clearwater provides a set of statistics about the performance of each Clearwater nodes over SNMP. Currently (as of release 43) this is only available on Bono, Sprout and Homestead nodes.
Clearwater provides a set of statistics about the performance of each Clearwater nodes over SNMP. Currently, this is available on Bono, Sprout, Ralf and Homestead nodes.

## Configuration

These SNMP statistics require the clearwater-snmp-handler-bono, clearwater-snmp-handler-sprout or clearwater-snmp-handler-homestead packages to be installed for Bono, Sprout and Homestead nodes respectively. These packages will be automatically installed when installing through the Chef automation system; for a manual install, you will need to run `sudo apt-get install clearwater-snmp-handler-bono` on Bono nodes, `sudo apt-get install clearwater-snmp-handler-sprout` on Sprout nodes and `sudo apt-get install clearwater-snmp-handler-homestead` on Homestead nodes.
These SNMP statistics require:

* the clearwater-snmp-handler-homestead package to be installed for Homestead nodes
* the clearwater-snmp-handler-chronos and clearwater-snmp-handler-astaire packages to be installed for Sprout and Ralf nodes

These packages will be automatically installed when installing through the Chef automation system; for a manual install, you will need to install the packages with `sudo apt-get install`.

## Usage

Clearwater nodes provide SNMP statistics over port 161 using SNMP v2c and community `clearwater`. The MIB definition file can be downloaded from [here](https://github.com/Metaswitch/clearwater-snmp-handlers/blob/master/PROJECT-CLEARWATER-MIB), or (for Clearwater nodes on releases before Halo, when the MIB file was updated to support IPv6) [here](https://github.com/Metaswitch/clearwater-snmp-handlers/blob/release-48/PROJECT-CLEARWATER-MIB).
Clearwater nodes provide SNMP statistics over port 161 using SNMP v2c and community `clearwater`. The MIB definition file can be downloaded from [here](https://github.com/Metaswitch/clearwater-snmp-handlers/blob/master/PROJECT-CLEARWATER-MIB).

Our SNMP statistics are provided through plugins or subagents to the standard SNMPd packaged with Ubuntu, so querying port 161 (the standard SNMP port) on a Clearwater node will provide system-level stats like CPU% as well as any available Clearwater stats.

If a statistic is indexed by time period, then it displays the relevant statistics over:

* the previous five-second period
* the current five-minute period
* the previous five-minute period

For example, a stat queried at 12:01:33 would display the stats covering:

* 12:01:25 - 12:01:30 (the previous five-second period)
* 12:00:00 - 12:01:33 (the current five-minute period)
* 11:55:00 - 12:00:00 (the previous five-minute period)

Our SNMP statistics are provided through plugins to the standard SNMPd packaged with Ubuntu, so querying port 161 (the standard SNMP port) on a Clearwater node will provide system-level stats like CPU% as well as any available Clearwater stats.
All latency values are in microseconds.

### Bono statistics

Bono nodes provide the following statistics:

* The standard SNMP CPU and memory usage statistics (see http://net-snmp.sourceforge.net/docs/mibs/ucdavis.html for details)
* The average latency, variance, highest call latency and lowest call latency (all in microseconds) seen over the past five seconds.
* The average latency, variance, highest latency and lowest latency for SIP requests, indexed by time period.
* The number of parallel TCP connections to each Sprout node.
* The number of incoming requests over the past five seconds.
* The number of requests rejected due to overload over the past five seconds.
* The average request queue size, variance, highest queue size and lowest queue size seen over the past five seconds.

* The number of incoming requests, indexed by time period.
* The number of requests rejected due to overload, indexed by time period.
* The average request queue size, variance, highest queue size and lowest queue size, indexed by time period.

### Sprout statistics

Sprout nodes provide the following statistics:

* The standard SNMP CPU and memory usage statistics (see http://net-snmp.sourceforge.net/docs/mibs/ucdavis.html for details)
* The average latency, variance, highest call latency and lowest call latency (all in microseconds) seen over the past five seconds.
* The average latency, variance, highest latency and lowest latency (all in microseconds) seen on the Cx interface over the past five seconds.
* The average latency, variance, highest latency and lowest latency (all in microseconds) seen on Multimedia-Auth Requests on the Cx interface over the past five seconds.
* The average latency, variance, highest latency and lowest latency (all in microseconds) seen on Server-Assignment Requests on the Cx interface over the past five seconds.
* The average latency, variance, highest latency and lowest latency (all in microseconds) seen on User-Authorization Requests on the Cx interface over the past five seconds.
* The average latency, variance, highest latency and lowest latency (all in microseconds) seen on Location-Information Requests on the Cx interface over the past five seconds.
* The average latency, variance, highest latency and lowest latency (all in microseconds) between Sprout and the Homer XDMS over the past five seconds.
* The average latency, variance, highest latency and lowest latency for SIP requests, indexed by time period.
* The average latency, variance, highest latency and lowest latency for requests to Homestead, indexed by time period.
* The average latency, variance, highest latency and lowest latency for requests to Homestead's `/impi/<private ID>/av` endpoint, indexed by time period.
* The average latency, variance, highest latency and lowest latency for requests to Homestead's `/impi/<private ID>/registration-status` endpoint, indexed by time period.
* The average latency, variance, highest latency and lowest latency for requests to Homestead's `/impu/<public ID>/reg-data` endpoint, indexed by time period.
* The average latency, variance, highest latency and lowest latency for requests to Homestead's `/impu/<public ID>/location` endpoint, indexed by time period.
* The average latency, variance, highest latency and lowest latency for requests to Homer, indexed by time period.
* The number of parallel TCP connections to each Homestead node.
* The number of parallel TCP connections to each Homer node.
* The number of incoming requests over the past five seconds.
* The number of requests rejected due to overload over the past five seconds.
* The average request queue size, variance, highest queue size and lowest queue size seen over the past five seconds.

* The number of incoming SIP requests, indexed by time period.
* The number of requests rejected due to overload, indexed by time period.
* The average request queue size, variance, highest queue size and lowest queue size, indexed by time period.
* The number of Memcached buckets needing to be synchronized and buckets already resynchronized during the current Astaire resynchronization operation (overall, and for each peer).
* The number of Memcached entries, and amount of data (in bytes) already resynchronized during the current Astaire resynchronization operation.
* The transfer rate (in bytes/second) of data during this resynchronization, over the last 5 seconds (overall, and per bucket).
* The number of remaining nodes to query during the current Chronos scaling operation.
* The number of timers, and number of invalid timers, processed over the last 5 seconds.

### Ralf statistics

Ralf nodes provide the following statistics:

* The standard SNMP CPU and memory usage statistics (see http://net-snmp.sourceforge.net/docs/mibs/ucdavis.html for details).
* The number of Memcached buckets needing to be synchronized and buckets already resynchronized during the current Astaire resynchronization operation (overall, and for each peer).
* The number of Memcached entries, and amount of data (in bytes) already resynchronized during the current Astaire resynchronization operation.
* The transfer rate (in bytes/second) of data during this resynchronization, over the last 5 seconds (overall, and per bucket).
* The number of remaining nodes to query during the current Chronos scaling operation.
* The number of timers, and number of invalid timers, processed over the last 5 seconds.

### Homestead Statistics

Homestead nodes provide the following statistics:

* The standard SNMP CPU and memory usage statistics (see http://net-snmp.sourceforge.net/docs/mibs/ucdavis.html for details)
* The average latency, variance, highest call latency and lowest call latency (all in microseconds) seen over the past five seconds.
* The average latency, variance, highest latency and lowest latency (all in microseconds) seen on the Cx interface over the past five seconds.
* The average latency, variance, highest latency and lowest latency (all in microseconds) seen on Multimedia-Auth Requests on the Cx interface over the past five seconds.
* The average latency, variance, highest latency and lowest latency (all in microseconds) seen on Server-Assignment, User-Authorization and Location-Information Requests on the Cx interface over the past five seconds.
* The average latency, variance, highest call latency and lowest latency on HTTP requests over the past five seconds.
* The average latency, variance, highest latency and lowest latency on the Cx interface over the past five seconds.
* The average latency, variance, highest latency and lowest latency on Multimedia-Auth Requests on the Cx interface over the past five seconds.
* The average latency, variance, highest latency and lowest latency on Server-Assignment, User-Authorization and Location-Information Requests on the Cx interface over the past five seconds.
* The number of incoming requests over the past five seconds.
* The number of requests rejected due to overload over the past five seconds.
* The total number of Diameter requests with an invalid Destination-Realm or invalid Destination-Host over the last 5 seconds.

0 comments on commit 39b0cdb

Please sign in to comment.