GangliaRest API: Part I

Ganglia API: Nagios Polling

Note: If you want to skip my article and examples on GangliaRest and just install it, you can run pip install [gangliarest] (https://pypi.python.org/pypi/gangliarest) under Python 2.6 or 2.7. The supporting files are written really for CentOS/RHEL 6.x but you can modify things as you need. View the package source The README is here.

Ganglia is great for trending metrics over time but with a little creativity one can extend it to do many other things. One of these we use is to integrate Ganglia metrics into some select Nagios alerting. Now, there are other ways to allow Nagios to access and use Ganglia metric values, but for this particular method, I wanted a clean API that would expose metrics on demand to not only our remote Nagios system but also to any scripts or other means my team may come up with. This four-part article explores GangliaRest, a Python package for Ganglia Gweb implementations.

The idea was to create a web-based interface that was exceedingly simple to use, handled properly formatted requests and returned data with no frills. This would provide a multi-use scenario and allow for more customization over time. I basically wanted scripts on my Nagios machine to easily obtain metric data over a CURL request, adding to the more than 8,000 items we already monitor. I began by handing back json to a Python client I wrote on our Nagios server which then decoded the response for Nagios scripts calling it, but decided to cut that middleman out in favor of non-json to keep things very simple. In speaking with my lead systems engineer Don Jackson, we decided this was the best approach and would allow him to go wild with Nagios checks.

Before beginning there were a few items to consider. First, the workflow:

API request arrives at our Gweb host, properly formatted and accepted by the lightweight web app.
The request needed to be parsed by node and metric requested, then passed off to a search class.
The search class needed to search the RRDtree examining each node directory for a match.
The search class needed to then find a matching metric to poll.
We needed to extract the metric value and return that to the requestor.

To accomplish the above I needed a web app module as well as a metric search module. For the web app I selected web.py, a lightweight Python package to handle the connections. To search for and retrieve the requested metric, I had to come up with something new.

Implementing the web app began with a new class.

class GangliaRest(object):
''' Here we prime the web api to accept specific requests for
    metrics. Requests must match the urls section '''


    def __init__(self,host='0.0.0.0',port=8653):

        self.host = cfg.restHost
        self.port = cfg.restPort

        # Specific requets assigned to class.

        urls = ( '/node/(.*)/get_metric/(.*)', 'GetMetric',
             '/test(.*)', 'Test' )

        app = web.application(urls,globals())
        web.httpserver.runsimple(app.wsgifunc(), (self.host,self.port))

        loglib(cfg.logfile,"INFO: Started GangliaRest on IP %s and port %s" % (self.host,self.port))

The GET request is handled by a GET class which handles some housework like appending .rrd to the requested metric before the search kicks off. I first began writing a class to search the filesystem by walking the RRDtree and searching for a regex match on the requested node. Once located, I would set the location of the node within the RRDtree as a variable and go search for a matching metric file. Once that metric was located I would need to extract the RRD metric data. To do this I use the subprocess module and run rrdtool.

class GetMetricValue(object):
    ''' Here we look up the value of the metric that passed the expiration test. '''


    def __init__(self,nodepath,activeList):
        ''' constructor '''

        self.nodepath = nodepath
        self.activeList = activeList
        logfile = 'None' # override with instance attribute


        # Store metric values in a dict so we can sort next by top5
        self.to_sort = {}

        for self.metric in self.activeList:
            #loglib(cfg.logfile,"INFO: Checking metric %s located at %s" % (self.metric,self.nodepath))
            cmd3 = ' | grep "last_ds"'
            newcmd = '/usr/bin/rrdtool info '+ self.nodepath+'/'+self.metric + cmd3
            # Get last_ds and file name from rrdtool for each api_https metric
            p1 = Popen(newcmd, shell=True, stdin=PIPE, stdout=PIPE, stderr=STDOUT, close_fds=True)
            output1 = p1.stdout.read()

            try:
                self.lastds,self.val = output1.split('=')
                self.val = self.val.replace('"','').strip()

                if isinstance(self.val, int):
                    self.val = int(self.val)
                if isinstance(self.val, float):
                    self.val = float(self.val)

                self.to_sort[self.metric]=self.val
                #print("Metric %s has a value of %s" % (self.metric,self.val))
                loglib(logfile,'INFO: LastDS for %s is %s' % (self.metric,self.val))

            except Exception as e:
                loglib(logfile,'ERROR: Unable to get last ds for metric %s. Error thrown was %s' %(self.metric,e))
                #print e
                #print("Unable to get last ds for metric %s" % self.metric)
            continue

With the above, we extract the current metric value, convert it to either float or uint as needed and return the result back to our calling class method. We then increment a counter for a successful metric retrieval that I use in Ganglia to trend the load on my API app. The resulting metric was returned to the requestor with a simple value represented. Easy.

But, what might happen if my team really found this useful and began loading it up? I started thinking of all those filesystem scans for incoming API requests and wondered whether it would make sense to cache some of this. Turns out, I already had a local Redis server on my Gweb host that serviced my Dynamic Metrics package. What if I walked the filesystem for new API requests and cached the location in Redis so the next time Nagios polled for that node and metric, Redis could tell my app right where to find the metric file? That workflow should look something like:

incoming API request is parsed and handed to our search class
Should check to ensure Redis is available and query for node as key
On cache miss, walk the filesystem and find the node and metric and cache the location.

In the next section we'll look at this search class.

See part 2 - improving performance

Home
Improving Gweb and Gmond Performance

Ganglia DynamicGraph

GangliaRest API

Metric Modules

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GangliaRest API: Part I

Clone this wiki locally