For monitoring the distribution of CPU-spans in your event machine reactor thread.
Ruby
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
example
lib
test
LICENSE.MIT
README.md
Rakefile
em-monitor.gemspec

README.md

em-monitor is a gem that lets you monitor your eventmachine reactor.

Introduction

As we all know, event loops are an awesome programming model. You can (mostly) forget about thread-safety, but you can still do a bazillion IO-things in parallel.

They do have one significant downside though: you can only run one CPU-thing at a time.

This means that if you accidentally spend 30 seconds running a bad regex, everything in your loop is going to get stuck for 30 seconds (that's about a million years in computer terms). This is particularly bad because one user who triggers a bad regex slows down all your other users for all 30 seconds.

EM::Monitor can't fix your code for you, but it can let you know you have a problem.

Usage

em-monitor wraps every CPU-span of code in your program and measures how long is spent executing it. You can then extract this data periodically in two ways. EM::monitor_spans calls a block with an array of raw measurements on a regular interval (by default 60 seconds), EM::monitor_histogram buckets all the measurements and then sums them. This lets you plot the amount of time that your event loop is spending running short CPU-spans against the amount of time that your event loop is spending running long CPU-spans.

EM::monitor_spans(interval: 1) do |spans, from, to|
  puts "Between #{from} and #{to} (#{to-from}seconds) there were #{spans.size} CPU-spans:"
  puts spans.inspect
end
#=> Between 2013-02-07 02:19:37 and 2013-02-07 02:19:38 (1.00 seconds) there were 7 CPU-spans:
#=> [0.000565469, 0.000564702, 0.000568218, 0.000564348, 0.005066146, 0.050109482, 0.050113617]

EM::monitor_histogram(interval: 1) do |histogram, from, to|
  puts "In the last #{to - from} real seconds, we used #{histogram.values.inject(&:+)} CPU-seconds"
  histogram.each do |key, value|
    puts "#{value} CPU-seconds in spans shorter than #{key} seconds"
  end
end
#=> In the last 1.00 real seconds, we used 0.1572 CPU-seconds
# => 0.0452 CPU-seconds in spans shorter than 0.001 seconds
# => 0.0619 CPU-seconds in spans shorter than 0.01 seconds
# => 0.0500 CPU-seconds in spans shorter than 0.1 seconds
# => 0 CPU-seconds in spans shorter than 1 seconds
# => 0 CPU-seconds in spans shorter than 10 seconds
# => 0 CPU-seconds in spans shorter than Infinity seconds

Plotting results

The easiest way to plot the histogram data is as a stacked chart. If your tool of choice can't stack charts directly you can call EM::monitor_histogram(stacked: true) and this will cause larger buckets to include the sum of all the smaller buckets in addition to the CPU-spans that fell into that bucket directly.

This will give you a graph of absolute time used per minute, which you can normalize to a utilization percentage in two ways:

# The absolute magnitude of the lines plotted here will be correct,
# however if you plot a stacked area graph the area will under-estimate the impact
# of CPU-spans of similar order of magnitude to `interval`.
histogram.map{ |key, value| value * 100 / (to - from) }

# Looking at the absolute magnitude of this graph will over-estimate CPU-spans
# in the short term, however if you plot a stacked area graph the area will be
# more correct.
histogram.map{ |key, value| value * 100 / interval }

If you need to combine the results from multiple machines you should instead use the EM::monitor_histogram(cumulative: true), and centrally keep track of the total cumulative CPU. Plotting the derivative after summing will give you a stable plot that makes sense when averaged.

To get a feel for how this works look at example/gnuplot.rb or example/librato.rb.

Meta-fu

There's API documentation if you'd like it.

Everything is licensed under the MIT license, see LICENSE.MIT for details.

Pull requests and bug reports are very welcome.