-
Notifications
You must be signed in to change notification settings - Fork 815
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add etcd support with self and store metrics. #1235
Changes from 1 commit
a9209a4
e73b90b
a82ba01
b09c18a
79c587d
d331f51
af4dd89
7768583
2b90e3e
d2157e4
362d8c6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
# stdlib | ||
import time | ||
from hashlib import md5 | ||
import urllib2 | ||
|
||
# project | ||
from checks import AgentCheck | ||
from util import headers | ||
|
||
# 3rd party | ||
import simplejson as json | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
import requests | ||
|
||
class Etcd(AgentCheck): | ||
def check(self, instance): | ||
if 'url' not in instance: | ||
raise Exception('etcd instance missing "url" value.') | ||
|
||
# Load values from the instance config | ||
url = instance['url'] | ||
instance_tags = instance.get('tags', []) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe add the |
||
default_timeout = self.init_config.get('default_timeout', 5) | ||
timeout = float(instance.get('timeout', default_timeout)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure, if it's necessary to have a 2-level setup from the config file, how about putting the default as There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, not sure I follow. What's the 2-level setup? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry for being unclear. 0/ You've got a fixed default 5 (which could be put as a global var Basically I am saying that we can get rid of LEVEL 1 and add documentation for the LEVEL 2 override. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No worries. As I said I copied the mesos plugin and to boot I'm not much of a python programmer so I'm quite happy to take direction. ;) |
||
|
||
tags = instance_tags | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This new variable assignment could probably be avoided. |
||
|
||
storeResponse = self.get_store_metrics(url, timeout) | ||
if storeResponse is not None: | ||
for key in ['getsSuccess', 'getsFail', 'setsSuccess', 'setsFail', 'deleteSuccess', 'deleteFail', 'updateSuccess', 'updateFail', 'createSuccess', 'createFail', 'compareAndSwapSuccess', 'compareAndSwapFail', 'compareAndDeleteSuccess', 'compareAndDeleteFail', 'expireCount']: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would be great if you can make this large array as a class variable.
|
||
self.rate('etcd.store.' + key, storeResponse[key], tags=tags) | ||
|
||
for key in ['watchers']: | ||
self.gauge('etcd.store.' + key, storeResponse[key], tags=tags) | ||
|
||
selfResponse = self.get_self_metrics(url, timeout) | ||
if selfResponse is not None: | ||
if selfResponse['state'] == 'leader': | ||
self.gauge('etcd.self.leader', 1, tags=tags) | ||
else: | ||
self.gauge('etcd.self.leader', 0, tags=tags) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe for these metrics we want to remove the confusing '.self' namespace? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is a bit confusing, but it's also what etcd actually calls them. I erred on the side of sticking to the products names, since often changing those names — at the whims of DD's plugin authors — creates a mismatch between those with knowledge of the product and the names used in DD. But that's an opinion weakly held. :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for explaining, I only looked quickly at some etcd doc and plugins (and saw only .store and .leader namespaces) and I have used it just a few times, so definitely not an expert on the topic. If these names make sense to an etcd guru please keep them 😉 |
||
|
||
for key in ['recvAppendRequestCnt', 'sendAppendRequestCnt']: | ||
self.rate('etcd.self.' + key, selfResponse[key], tags=tags) | ||
|
||
for key in ['sendPkgRate', 'sendBandwidthRate']: | ||
self.gauge('etcd.self.' + key, selfResponse[key], tags=tags) | ||
|
||
def get_self_metrics(self, url, timeout): | ||
return self.get_json(url + "/v2/stats/self", timeout) | ||
|
||
def get_store_metrics(self, url, timeout): | ||
return self.get_json(url + "/v2/stats/store", timeout) | ||
|
||
def get_json(self, url, timeout): | ||
# Use a hash of the URL as an aggregation key | ||
aggregation_key = md5(url).hexdigest() | ||
try: | ||
r = requests.get(url, timeout=timeout) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Importing |
||
except requests.exceptions.Timeout as e: | ||
# If there's a timeout | ||
self.timeout_event(url, timeout, aggregation_key) | ||
self.warning("Timeout when hitting %s" % url) | ||
return None | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For this you can use a new feature called service checks, that can allow you to report a state to the backend with an error message, the big difference with events is that you can alert on service checks the same way you do with metrics. |
||
|
||
if r.status_code != 200: | ||
self.status_code_event(url, r, aggregation_key) | ||
self.warning("Got %s when hitting %s" % (r.status_code, url)) | ||
return None | ||
|
||
# Condition for request v1.x backward compatibility | ||
if hasattr(r.json, '__call__'): | ||
return r.json() | ||
else: | ||
return r.json | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This backwards compatibility check can probably go away, now that we use omnibus and pin the deps versions, @remh can you confirm? |
||
|
||
|
||
def timeout_event(self, url, timeout, aggregation_key): | ||
self.event({ | ||
'timestamp': int(time.time()), | ||
'event_type': 'http_check', | ||
'msg_title': 'URL timeout', | ||
'msg_text': '%s timed out after %s seconds.' % (url, timeout), | ||
'aggregation_key': aggregation_key | ||
}) | ||
|
||
def status_code_event(self, url, r, aggregation_key): | ||
self.event({ | ||
'timestamp': int(time.time()), | ||
'event_type': 'http_check', | ||
'msg_title': 'Invalid reponse code for %s' % url, | ||
'msg_text': '%s returned a status of %s' % (url, r.status_code), | ||
'aggregation_key': aggregation_key | ||
}) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
init_config: | ||
# time to wait on a etcd API request | ||
# default_timeout: 5 | ||
|
||
instances: | ||
# url: the API endpoint of your etcd instance | ||
# - url: "https://server:port" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
time
,urllib2
andmd5
do not seem to be used