gunicorn instrumentation via statsD and a new logger class #748

Merged
merged 23 commits into from Jun 28, 2014

Projects

None yet

8 participants

@alq666
Contributor
alq666 commented May 14, 2014

(This is another attempt at getting statsD instrumentation to gunicorn initially in #672)

What is this pull request?

statsd is the de facto low-overhead instrumentation of many web stacks. I have come across a number of different ways to instrument gunicorn, none of which are part of the core and rely on external scripts being run out-of-band to guess the number of requests processed or the amount of cpu spent.

I am hereby proposing that gunicorn optionally submits metrics to a local (or remote) statsD instance with the use of --logger-class gunicorn.instrument.statsd.Statsd option. Being that statsD uses UDP as transport, the receiving end being up or down does not affect the performance of the gunicorn application (unlike other stat collection schemes).

Here is an example of such metric being graphed.

metric_explorer___datadog-2

How does this work?

The arbiter and the base worker use a new logger that can take additional arguments to turn a log message into a metric. In addition certain known logging levels are automatically turned into metrics.

In particular the access method is overriden to capture duration and count of each gunicorn request.

The new logger class inherits from gunicorn.glogging.Logger so existing logging should be preserved. No external dependencies were created.

Who is behind this?

I use gunicorn at Datadog and have found statsD an indispensable aid to measure the performance of high-traffic applications. There is no company-specific bits in the pull request of course.

@tilgovi
Collaborator
tilgovi commented May 14, 2014

This looks like a pretty beautiful solution. I had forgotten how easy it is to send data to statsd.

I wish we had this at my last company!

@alq666
Contributor
alq666 commented May 14, 2014

Thanks @tilgovi

@tilgovi
Collaborator
tilgovi commented May 18, 2014

A few questions:

  • Can you imagine any value in having the extra keyword arguments available for log formatting?
  • If yes, should we use the extras keyword argument that log calls take anyway?
  • Should the worker count log in the arbiter be debug level? It's expected not to change except through administrator action and manage_workers is called every timeout cycle. That seems like unnecessary log noise (though I understand info level is pretty high, so maybe it's okay).
  • Should the logger check the log level or should we assume that anyone using statsD wants the metrics collected no matter the log level?
  • What do you think about logging resp.status (the status code)?
@alq666
Contributor
alq666 commented May 19, 2014

@tilgovi

  • I did not know about extras, I can indeed stuff my keywords into extras to save a call to a function call for each log entry.
  • With respect to logging level and instrumentation you are correct in that there is a coupling between using INFO or greater to get some instrumentation going. My thinking behind this was two-fold:
    1. to provide instrumentation that makes sense out of the box, without having to set up anything special (save command-line arguments). As such DEBUG statements don't send anything to statsD and ERROR/CRITICAL/WARNING only send limited metrics.
    2. to equate instrumentation with informational logging: instrumentation is most useful for production, where INFO is likely to be enabled (but not DEBUG).
  • resp.status is certainly worth capturing as a rate; I'll add that.
@alq666
Contributor
alq666 commented May 31, 2014

@tilgovi I've made the changes we discussed (using extra and capturing resp.status as a rate).

I have also added a test to verify that even if statsD instrumentation fails, the logger continues to work.

@tilgovi tilgovi and 1 other commented on an outdated diff May 31, 2014
tests/test_009-statsd.py
+ self.failp = failp
+ self.msgs = [] # accumulate messages for later inspection
+
+ def send(self, msg):
+ if self.failp:
+ raise TestException("Should not interrupt the logger")
+ self.msgs.append(msg)
+
+ def reset(self):
+ self.msgs = []
+
+class MockResponse(object):
+ def __init__(self, status):
+ self.status = status
+
+def test_statsd_fail():
@tilgovi
tilgovi May 31, 2014 Collaborator

These don't actually assert anything, do they?

@tilgovi
tilgovi May 31, 2014 Collaborator

That's fine, I guess. I see you've wrapped all the methods in the logger in except handlers.

@alq666
alq666 May 31, 2014 Contributor

Right, if the exception bubbles back up, which it should not, the test will fail.

@tilgovi
tilgovi May 31, 2014 Collaborator

Oh! Right :)
On May 31, 2014 1:54 PM, "Alexis Lê-Quôc" notifications@github.com wrote:

In tests/test_009-statsd.py:

  •    self.failp = failp
    
  •    self.msgs = []  # accumulate messages for later inspection
    
  • def send(self, msg):
  •    if self.failp:
    
  •        raise TestException("Should not interrupt the logger")
    
  •    self.msgs.append(msg)
    
  • def reset(self):
  •    self.msgs = []
    
    +class MockResponse(object):
  • def init(self, status):
  •    self.status = status
    
    +def test_statsd_fail():

Right, if the exception bubbles back up, which it should not, the test
will fail.


Reply to this email directly or view it on GitHub
https://github.com/benoitc/gunicorn/pull/748/files#r13263654.

@tilgovi
Collaborator
tilgovi commented May 31, 2014

@benoitc this looks alright to me. Thoughts?

@tilgovi
Collaborator
tilgovi commented Jun 10, 2014

@benoitc any reason not to include this in the next release"

@benoitc
Owner
benoitc commented Jun 12, 2014

The patch looks fine for me but I would change the way it is enabled before to integrate it. Instead of passing the class to the the logger class option and add an option to set the host I would automatically set the statd logger if the --logs-statsd-to option is used. Also since it would be a supported backend we should be able to only use a short name with the --logging-class option. While we would be here I would rename this option as --logging-backend. Also naming --logs-statsd-to in --statsd-host sounds better for me.

If it's done until tomorrow we can include it in R19.0. If not it will wait the next one (planned this month or in july).

Thoughts?

@benoitc
Owner
benoitc commented Jun 12, 2014

let's not wait much more for the 19,0, This feature will be added in 19.1 this month. We should put it in master ASAP.

@alq666
Contributor
alq666 commented Jun 12, 2014

@benoitc I can implemented the suggested changes over the next week or so.

@benoitc
Owner
benoitc commented Jun 12, 2014

@alq666 cool! will merge once everything is OK :)

@alq666
Contributor
alq666 commented Jun 19, 2014

@benoitc @tilgovi I changed the option to --statsd-host. That in itself is enough to trigger the statsD logger automatically so there is no need to use --logger-class to use it. I left the --logger-class option as is to keep this simpler.

@tilgovi
Collaborator
tilgovi commented Jun 19, 2014

Looking great.

The readme example needs to be updated to reflect the extras keyword use, yeah?

@alq666
Contributor
alq666 commented Jun 22, 2014

@tilgovi should be good now.

@matrixise
Collaborator

+1. Can we merge this branch?

@benoitc benoitc commented on the diff Jun 22, 2014
gunicorn/arbiter.py
@@ -482,6 +482,11 @@ def manage_workers(self):
(pid, _) = workers.pop(0)
self.kill_worker(pid, signal.SIGTERM)
+ self.log.info("{0} workers".format(len(workers)),
@benoitc
benoitc Jun 22, 2014 Owner

i can't test it right now but does this change works with the default logger?

@alq666
alq666 Jun 22, 2014 Contributor

It does work with default logger indeed:

Jun 22 15:53:28 localhost gunicorn.test.app.error: [17090] Starting gunicorn 19.1.0
Jun 22 15:53:28 localhost gunicorn.test.app.error: [17090] Listening at: http://127.0.0.1:8000 (17090)
Jun 22 15:53:28 localhost gunicorn.test.app.error: [17090] Using worker: sync
Jun 22 15:53:28 localhost gunicorn.test.app.error: [17095] Booting worker with pid: 17095
Jun 22 15:53:29 localhost gunicorn.test.app.error: [17096] Booting worker with pid: 17096
Jun 22 15:53:29 localhost gunicorn.test.app.error: [17097] Booting worker with pid: 17097
Jun 22 15:53:29 localhost gunicorn.test.app.error: [17098] Booting worker with pid: 17098
Jun 22 15:53:29 localhost gunicorn.test.app.error: [17090] 4 workers

using gunicorn --log-syslog --log-level=info --workers=4 test:app as a test.

@benoitc benoitc merged commit ceda9c5 into benoitc:master Jun 28, 2014

1 check passed

continuous-integration/travis-ci The Travis CI build passed
Details
@benoitc
Owner
benoitc commented Jun 28, 2014

merged thanks!

@srgshames

Really cool feature!

I was wondering if there's a way to differentiate between multiple applications? Is there a way to assign a name/prefix to an instance of a gunicorn app to be reflected in the metric name?

@tilgovi
Collaborator
tilgovi commented Sep 9, 2014

@srgshames good thought.
@alq666 do you think we should consider prepending the app name to the metrics? Or appending? It's easily available as self.cfg.proc_name from the logger.

@alq666
Contributor
alq666 commented Sep 9, 2014

I can certainly stick it in the middle in the normal case, just after
gunicorn but before the actual metric name.

In the case the statsd's implementation supports explicit tags (like
Datadog's), I will add the app name as a tag.
On Sep 9, 2014 2:52 AM, "Randall Leeds" notifications@github.com wrote:

@srgshames https://github.com/srgshames good thought.
@alq666 https://github.com/alq666 do you think we should consider
prepending the app name to the metrics? Or appending? It's easily available
as self.cfg.proc_name from the logger.


Reply to this email directly or view it on GitHub
#748 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment