Metrics

Metrics availability based on milestones: https://docs.google.com/document/d/1oX10aH-jMqTqkZkip4eNAevwT8GgcG6_XxAgQdVnmNI/edit

List of metrics and their respective attributes currently made available via collectd plugins:

Node Level

CPU

percent-idle - CPU not doing any work
percent-interrupt - CPU allocated to hardware interrupts
percent-nice - CPU used to allocate multiple processes demanding more cycles than the CPU can provide
percent-softirq - CPU servicing soft interrupts
percent-steal - Steal time is the time that the CPU had something runnable, but the XEN hypervisor chose to run something else instead.
percent-system - CPU used by the operating system itself
percent-user - CPU used by user applications
percent-wait - CPU waiting for disk IO operations to complete

Graphite table name: tendrl.clusters.{cluster-id}.nodes.{node-name}.cpu.{attr-type}

Note: Source for above descriptions is: http://docs.rightscale.com/faq/How_do_the_CPU_Metrics_work_and_what_is_CPU_Steal.html

Memory

memory-buffered:Amount of memory used for buffering, mostly for I/O operations
memory-free: Total amount of unused memory
memory-used: Total amount of memory used
memory-cached: Memory used for caching disk data for reads, memory-mapped files or tmpfs data
percent-buffered: Percentage of memory used for buffering, mostly for I/O operations
percent-cached: Percentage of memory used for caching disk data for reads, memory-mapped files or tmpfs data
percent-free: Percentage of unused memory
memory-slab_recl: Amount of reclaimable memory used for slab kernel allocations
memory-slab_unrecl: Amount of unreclaimable memory used for slab kernel allocations
percent-slab_recl: Percentage of reclaimable memory used for slab kernel allocations
percent-slab_unrecl: Percentage of unreclaimable memory used for slab kernel allocations
percent-used: Percentage of memory used

Graphite table name: tendrl.clusters.{cluster-id}.nodes.{node-name}.memory.{attr-type} Note: Source of descriptions: https://github.com/signalfx/integrations/tree/master/collectd-memory/docs

Mount-point(df)

df_complex-free: This metric measures free disk space in bytes on this file system.
df_complex-reserved: This metric measures disk space in bytes reserved for the super-user on this file system.
df_complex-used: This metric measures used disk space in bytes on this file system.
df_inodes-free: This metric measures free inodes in the file system. Inodes are structures used by Unix file systems to store metadata about files.
df_inodes-reserved: This metric measures inodes reserved for the super user in the file system. Inodes are structures used by Unix file systems to store metadata about files.
df_inodes-used: This metric measures used inodes in the file system. Inodes are structures used by Unix file systems to store metadata about files.
percent_bytes-free: This metric measures free disk space as a percentage of total disk space on this file system.
percent_bytes-reserved: This metric measures disk space reserved for the super-user as a percentage of total disk space of this file system.
percent_bytes-used: This metric measures used disk space as a percentage of total disk space of this file system.
percent_inodes-free: This metric measures free inodes as a percentage of total inodes in the file system. Inodes are structures used by file systems to store information about files (other than its content).
percent_inodes-reserved: This metric measures inodes reserved for the super-user as a percentage of total inodes in the file system. Inodes are structures used by file systems to store information about files (other than its content).
percent_inodes-used: This metric measures used inodes as a percentage of total inodes in the file system. Inodes are structures used by file systems to store information about files (other than its content).

Graphite table name: tendrl.clusters.{cluster-id}.nodes.{node-name}.df-{mount-point}.{attr-type} Note: Source of descriptions: https://github.com/signalfx/integrations/blob/master/collectd-df/docs/

disk

disk_io_time
- io_time: The disk I/O time in milliseconds (ms).
- weighted_io_time: The aggregate time in milliseconds (ms) spent on I/O operations that are either in progress or have completed.
disk_merged
- read: The number of disk reads that have been merged into single physical disk access operations. In other words, this metric measures the number of instances in which one physical disk access served multiple disk reads.
- write: The number of disk writes that were merged into single physical disk access operations. In other words, this metric measures the number of instances in which one physical disk access served multiple write operations.
disk_octets
- read: The number of bytes read from a disk.
- write: The number of bytes written to a disk.
disk_ops
- read: The number of disk read operations.
- write: The number of disk write operations.
disk_time
- read: The average amount of time it took to do a read operation. For Darwin / Mac OS X, the unit is microseconds. For Linux and AIX, the unit is milliseconds (ms). For Solaris, the unit is nanoseconds. This metric is not reported on FreeBSD.
- write: The average amount of time it took to do a write operation. For Darwin / Mac OS X, the unit is microseconds. For Linux and AIX, the unit is milliseconds (ms). For Solaris, the unit is nanoseconds. This metric is not reported on FreeBSD.

Graphite table name: tendrl.clusters.{cluster-id}.nodes.{node-name}.disk-{disk-name}.{attr-type1}.{attr-type2} Note: Source of descriptions: https://github.com/signalfx/integrations/blob/master/collectd-disk/docs/

swap

percent-cached:
percent-free
percent-used
swap-cached: The amount of cached memory in bytes which is in the swap.
swap-free: The amount of free memory in bytes which is in the swap.
swap_io-in: The number of swap pages written per second.
swap_io-out: The number of swap pages read per second.
swap-used: The amount of used memory in bytes which is in the swap.

Graphite table name: tendrl.clusters.{cluster-id}.nodes.{node-name}.swap.{attr-type}

ping

ping-{monitoring-integration fqdn} : Ping latency(round-trip time) from current node to monitoring-integration node
ping_droprate-{monitoring-integration fqdn}: Ping packet drop rate from current node to monitoring-integration node
ping_stddev-{monitoring-integration fqdn}: Standard deviation of Ping latency.

Graphite table name: tendrl.clusters.{cluster-id}.nodes.{node-name}.ping.{attr-type}

network

if_dropped: The number of dropped packets. Reasons for dropped packets include:
- NIC ring buffers getting full and unable to cope-up with incoming bursts of traffic
- CPU receiving NIC interrupts is very busy and unable to process
- some cable/hardware/duplex issues
- some bug in NIC driver
- unable to process packets fast enough
- packets contain bad VLAN tags
- unknown protocols being used
- using IPv6 packets when the system is not configured to support IPv6
if_errors: The number of errors per second that occurred while transmitting / receiving packets due to any of the following:
- errors (duplex mismatch, faulty cable)
- fifo errors
- heartbeat errors
- window errors
- CRC errors too short frames
- too long frames
if_octets: The number of octets (bytes) per second sent or received by the interface.
if_packets: The number of successfully transmitted or received packets per second.

Graphite table name: tendrl.clusters.{cluster-id}.nodes.{node-name}.interface-{interface-name}

network_throughput-cluster_network

This is calculated as a summation of rx(incoming/received packets) a tx(outgoing/transmitted packets) corresponding to the network interface used/marked during cluster creation.

Graphite table name: tendrl.clusters.{cluster-id}.nodes.{node-name}.network_throughput-cluster_network.gauge-used

Note: This rx and tx are maintained as continuous counters and hence the plugin takes interval diff for a period of 1 second.

Gluster Cluster

inode_utilization

gauge-total
gauge-used
percent-percent_bytes

Inode utilization at the level of bricks fetched using os.statvfs on brick path Graphite table name:

tendrl.clusters.{cluster-id}.volumes.{volume-name}.nodes.{node-name}.bricks.{brick-path}.inode_utilization.{attr- type}
tendrl.nodes.{node-name}.bricks.{brick-path}.inode_utilization.{attr-type}

iops

gauge-read
gauge-write

Brick level read and write operations fetched using gluster volume profile info Graphite table name:

tendrl.clusters.{cluster-id}.volumes.{volume-name}.nodes.{node-name}.bricks.{brick-path}.iops.{attr-type}

utilization

gauge-total
gauge-used
percent-percent_bytes

Brick utilization fetched using os.statvfs on brick path Graphite table name:

tendrl.clusters.{cluster-id}.volumes.{volume-name}.nodes.{node-name}.bricks.{brick-path}.utilization.{attr-type}
tendrl.nodes.{node-name}.bricks.{brick-path}.utilization.{attr-type}

No. Of connections(clients)

No. of connections to the gluster volume fetched using: gluster volume status all clients --xml Note: This also includes brick connections in the counter. Graphite table name:

tendrl.clusters.{cluster-id}.volumes.{volume-name}.connections_count

Volume Status

The source for this information is gluster get-state glusterd odir /var/run file ... The volume status is encoded as follows:

'Started': 0
'Degraded': 1
'Stopped': 2 Note: Count of degraded volumes is currently not made available yet.

Graphite table name:

tendrl.clusters.{cluster_id}.volumes.{volume_name}.status

Node(Peer) Status wise counter at cluster level

The source for this information is gluster get-state glusterd odir /var/run file ... The following counters are made available:

Down: Nodes that are not marked ['Peer', 'in', 'Cluster'] or peer['connected'] == 'Connected'
Total

Graphite table name:

tendrl.clusters.{cluster_id}.nodes_count.down
tendrl.clusters.{cluster_id}.nodes_count.total

Volume status wise counts

The raw source for this information is gluster get-state glusterd odir /var/run file ... The following status-wise counters are made avaialable:

total
down

Note: Degraded volume counter needs to be added

Graphite table name:

tendrl.clusters.{cluster-id}.volume_count.total
tendrl.clusters.{cluster-id}.volume_count.down
Brick count

The raw source for this information is gluster get-state glusterd odir /var/run file ...

Graphite table name:

tendrl.clusters.<cluster_id>.brick_count.total

Volume level

Bricks count

The source for this information is gluster get-state glusterd odir /var/run file ...

Graphite table name:

tendrl.clusters.{cluster-id}.volumes.{volume-name}.bricks_count

Status

The source for this information is gluster get-state glusterd odir /var/run file ...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics

List of metrics and their respective attributes currently made available via collectd plugins:

Node Level

CPU

Memory

Mount-point(df)

disk

swap

ping

network

network_throughput-cluster_network

Gluster Cluster

inode_utilization

iops

utilization

No. Of connections(clients)

Volume Status

Node(Peer) Status wise counter at cluster level

Volume status wise counts

Volume level

Bricks count

Status

Screenshots

Gluster at a Glance

Gluster Hosts

Gluster Bricks

Gluster Volumes

Clone this wiki locally